Rob's Blog rss feed old livejournal twitter

2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002

August 31, 2019

Jeff got enough money for his new company to pay me 1/4 of what I was making at JCI, at least through the end of the year (by which time we need more money, preferably by having gotten something to market). *shrug* It keeps the lights on, and if we succeed it's changing the world for the better. He sent a wire transfer for the first 2 months. Ok then, I'm back onboard.

Still working on toybox while he gets everything else spun back up, but I need to get to a good stopping point. (Jeff is paying me. Android is not paying me. I do what I can, but it's $DAYJOB time again.)

August 29, 2019

Here's an updated version of the June 7 digression (in response to the new android status update).

I'm trying to break circular dependencies at the root of system builds. The minimal native system that can rebuild itself under itself, where you don't just have circular dependencies but _ALL_ the dependencies form one big circle, is conceptually 4 packages: kernel, command line, compiler toolchain, and libc. Anything needed to build any of those, and package the result into a bootable system, has to be in toybox. (There's more "convenience" stuff like top and vi, but those are non-negotiable.)

Long term what I really want to do is get linux, toybox, a vastly upgraded tinycc, and bionic to build standalone. (This means I need to write a makefile or similar for bionic; I have yet to get it to build outside of AOSP as a standalone host thing.)

Then as part of "bootstrapping up to more complexity" I want to use the LLVM C backend to produce a C version of LLVM that tinycc can build but which produces optimized output, then rebuild it with itself so it runs fast. I realize that new cfront won't run in the tiny system, but in theory the source it produces is architecture target agnostic, so I should be able to convert it once each release and import it into the tiny system. (Modulo tinycc not being able to build a lot of the code I tried to throw at it, and the people who've been "maintaining" it since making zero advances I'm aware of. Plus it supports maybe 3 architectures... There's work to do.)

One of the reasons to do this is countering trusting trust, but also because if you're can't reproduce something from first principles (in isolation from external uncontrolled variables) what you're doing isn't science, it's an art like alchemy. (Computer security is definitely still defense against the dark arts.)

August 28, 2019

Ugh, it's one of those days where everything I touch explodes into tangents. Elliott's feedback on the DIRTREE_STATLESS change breaking find. A pull request of the "something vague broke, work out a test case by reverse engineering my patch" form. (I love bug reports. Patches that don't clearly say how to reproduce the problem they purport to fix, less so.) And I tried to run a test under the new "make root run_root" infrastructure and init SEGFAULTED. Which I think is a glibc problem (or glibc/qemu mismatch) but... seriously, segfault? What the...

Now I just got a patch to patch to drop support for filenames with tabs in them because someone, somewhere, is still using mercurial.

I just wanted to poke at toysh some more while I still have the chance. (Jeff says funding has closed and now they're waiting for it to deposit! I told two different recruiters that I'm busy for the next 3 months. Fuzzy's scheduled to get back tuesday, I may be flying out to Tokyo then. Dunno yet.)

August 27, 2019

Fuzzy flew out with Adverb, up to Minnesota to see Fade's new apartment (which is across the hall from her old one, but now she only has one roommate instead of 4).

Adverb threw up on my bed at 6am (probably sympathetic nerves from Fuzzy, who was completely melting down over the trip).

August 26, 2019

Took a recovery day after flying, and now I'm banging on... the ls command, oddly enough. The top level android directory is sprayed down with selinux rules to prevent normal users from listing its contents (in a way that doesn't actually work, but ok). And the result is you can readdir() the contents but calling stat() on any of it fails, which drives ls bananas.

Ubuntu's ls doesn't handle this well either, but I finally managed to reproduce it with a chmod 444 directory.

August 24, 2019

Wrote a thing from the airport (which translates a bit weird from email to a blog entry when it comes to URLs, but eh):

From: Rob Landley
Subject: Can I submit a talk for _next_ year yet?

I'm in the airport to fly back from ELC and I really want to do a talk on "Why the Linux Foundation is bad at its' job." I had fairly extensive hallway conversations with people about it yesterday, and am having a twitter thread about it now, and really this should be a talk.

It would include bits of this and this and this but mostly use the framework of this to explain why this happened.


P.S. This year ELC was colocated with FIVE other events. There were 19 panel tracks running simultaneously. My talk was 35 minutes long and scheduled against 18 other talks. Microsoft and Facebook were everywhere. I should probably chart the Linux Foundation's destruction of a rather nice conference I was keynote speaker at in 2006: they took it over in 2010 and have attended several times since, commenting on its status each time, for example...

P.P.S. I have kind of a weird insider view of the Linux Foundation because I applied for a Documentation contract with OSDL right before they merged with FSG to became the Linux Foundation, and I wound up reporting to the Linux Foundation's CEO for a bit (having checkin phone calls with him while he changed planes in airports).

It was a surreal experience and I watched them figure out who they were and what they wanted to do in realtime, and I was kind of unhappy with the result but not in a position to do anything about it. Oh well.

OLF's CFP closed on the 17th, and when I saw that I sent 'em email. They're the only venue I can think of that's NOT run by the Linux Foundation (yet).

I bumped into one of the Linux Foundation guys in the Starbucks at ELC (he recognized me, I didn't recognize him) and he said they still use the "accretion disk" phrase I used to describe them once, as an in-joke around the office. This led to a long talk with the guy I'd been talking to (Joe from IBM, I should have asked him what his last name was) about the three waves stuff and how the linux foundation's effectively driven all the hobbyists out of Linux and why that's a bad thing. Kinda burned out my voice from talking too much.

Sigh, I should just record a podcast about this, but I'm terrible at doing things without an externally imposed deadline. I have too much to do that people are _already_ waiting for...

August 23, 2019

I gave my talk! The presenter after me was VERY nice and let me go 20 minutes over my time. (He didn't have anything scheduled after him, so he could run late, but it was still hugely nice of him.)

It went ok. Coulda been better, I'm closing tabs now and going "ah, I should have mentioned that" on all sorts of stuff, but... it was a decent talk.

I meant to upload my talk outline to my patreon but didn't get it ready soon enough, and then forgot to mention it at all during my talk. Oh well.

August 22, 2019

I saw Tim Bird yesterday (he passed in the hall, said hi and apologized he couldn't stop and talk, total interaction about 10 seconds), and I saw a second person I recognize in the line at starbucks this morning. So far, that's it from the "familiar faces".

Up bright and early poking at my talk outline. The problem with writing documentation is the same problem as teaching, when you explain it to somebody else you realize how you SHOULD have done it all along and stop and rip apart the code to fix stuff.

Some other things I go try to clean up and bump into a comment I left myself about WHY it is that way. (.singlemake isn't in generated because "make clean" would delete it otherwise, and "make clean test_sed" should work. Hmmm, maybe I can add a build dependency to recreate it? Hmmm...)

Ah, avoidance productivity...

August 21, 2019

Left for the airport at 6:30 am, flew to San Diego on Frontier, which is now out of the "south building" of Austin's airport, which is _adorably_ tiny. It's 3 gates. You walk out to the plane on the runway and climb up stairs, like we used to do on Kwajalein.

Got to San Diego, the temperature was nice and google maps said the hotel was only about two and a half miles from the airport, so I walked along the riverfront. There was NO SHADE (solid concrete, no trees) and I got sunburned.

Arrived at the hotel and... I don't recognize anyone here? The hobbyists have abandoned ELC. I am sad. (My badge says "hobbyist" as my affiliation, and it's the only one. Every single other badge is a corporate employer.)

Not that I can really blame them. The Linux Foundation has "colocated" ELC with at least one other event for something like 5 years now, and this time it's colocated with something like five other events. (SO many emails trying to get me to buy tickets to more than one event, for more $$$ of course.) There are 19 talk rooms, and they're giving us 35 minute talk slots.

So my talk tomorrow is against 18 other talks, and a half hour long. I flew to San Diego for this? Oh well...

August 20, 2019

Working on my conference talk. I have about 2 hours of material and 35 minutes of scheduled talk time (so realistically 30 minutes).

Honestly, this slot is for a lightning talk. The Linux Foundation has "colocated" ELC with so many other events this year that I am literally scheduled against EIGHTEEN other talks in the same time slot. That's in ADDITION to cutting the time in half. This is probably the last year I speak at ELC, because seriously, this is insane.

August 19, 2019

Trying to clean up for a toybox release.

Oh hey, you can spam dmesg as a regular user:

[367266.808983] EXT4-fs warning (device sda1): ext4_ioctl:483: Setting inode version is not supported with metadata_csum enabled.

That's probably not a good idea, but it's a kernel issue.

August 18, 2019

An email reply I wrote that might be of more general interest:

On 8/17/19 11:36 PM, XXXXX wrote:
> Hey, someone linked me to:
> which has a historical origin story for the /usr split.

That was written off the top of my head one day without checking references. It goes mildly viral every couple years, and one of the times it did a magazine asked if they could publish it and I sent them an updated version with a couple corrections and more sources.

> However, this is not the history I learned for it back in the 80s. I
> went looking for what I remembered, and found:
> which matches what I recall.

Yes, a thing I wrote extemporaneously to a mailing list 9 years ago got a detail wrong. Well spotted. Did the corrected version from 7 years ago fix it?

> Specifically:
>> In particular, in our own version of the system, there is a
>> directory "/usr" which contains all user's directories, and which is
>> stored on a relatively large, but slow moving head disk, while the
>> othe files are on the fast but small fixed-head disk.

Originally /usr stored the user's directories, then later the OS leaked into /usr (duplicating /bin and /lib and such), and when they got a third disk they mounted it on /home and moved the user directories there so the OS could eat the whole of the second disk (still on /usr).

Is this not what I said?

> I haven't been able to find any references to the "two identical disks"
> explanation earlier than your post, and I'm wondering if you happen to
> remember where you ran into it.

See the corrected version above. Their initial 2 disks were a small fast one (0.5 mb) mounted on / and a big slow one (2.5 mb RK05 external disk, sort of like a USB hard drive) mounted on /usr. I had the total size right, but off the top of my head didn't remember how they were divided. They later got a second 2.5 meg RK05 disk pack (and thus _had_ two identical disks, but that wasn't the initial setup), mounted the new one on /home, and moved the user directories there.

As for more references, replace "notes" in your URL above with "hist" and there's another page of reminiscence from Ritchie. There's also an excellent book called "A Quarter Century of Unix" by Peter Salus, written by the head of Usenix for Unix's 25th anniversary in 1994. It might be out of print, but there's a PDF online. (He wrote a sequel called the Daemon the Gnu and the Penguin which I have a copy of somewhere but never finished reading: kindle on my phone is always with me and physical books mostly aren't. It started out serialized on groklaw, I should just pull up that copy...) I probably also have relevant stuff in my history mirror but haven't looked at it in a while.

> I note also, when I worked on one of the Berkeleys, I don't think
> we had to worry about /lib matching /usr/bin, because shared libraries
> didn't go in /lib;

Given that dynamic linking in unix wasn't a thing for its first decade, I'd be surprised if you did. This says Sun's first shared library test case was xlib, which can't have been before 1987.

Sun hired away the first BSD maintainer in 1982 and he stayed in charge of their OS dev until AT&T paid them to rebase from BSD to System V (hence sunos->solaris (see the long strange trip to java and also the book "Under the Radar" by Robert Young, and one of the things in that mirror page above is Larry McVoy's sourceware operating system proposal), so if BSD was doing it Sun would already have been doing it. Here's a paper on a shared library prototype from 1991, by which point Unix was over 20 years old. (Here's some good stuff on early Berkeley history circa 1979. And this was nice too, written by the guy who took over from Bill Joy.)

> root was statically linked so it could be used
> for recovery purposes, and shared libraries went in /usr/lib.

Everything was statically linked on the Bell Labs version of Unix. The System V guys were a different division within AT&T. Bell Labs continued putting out releases through version 10 but version 7 was the last publicly released one (because AT&T wanted to commercialize Unix, and the Apple vs Franklin decision in 1983 allowed them to do so, so they suppressed the internal competitor), and then the Bell Labs guys started over with Plan 9, and Dennis Ritchie let himself be promoted to management...

> As you note, a lot of this no longer makes any *sense*, but I'm
> confused by the divergent accounts of what things were like back in the
> 70s and 80s.

There's plenty of primary references is you dig.

Bell Labs withdrew from the Multics project in 1969 (which was doomed: it had 36,000 pages of specification document written before the hardware arrived, and had O(N^2) algorithms all over the place so it ran ok with one user logged in, could sort of stumble along with 2, and froze as soon as the third simultaneous user logged on). The bell labs guys knew it was doomed and wound up spending all their time playing an astronomical flight simulator called "sky" (a "test load" for the system; basically an early text version of lunar lander or kerbal space program). Management blamed them for killing multics not because it was doomed but because clearly they'd played sky too much, so decided to punish them.

The multics contract was still funded through the end of the fiscal year and management refused to give them a new contract until that funding ended, AND transferred their offices to a storage attic (as punishment). In the move the ex-multics team lost access to the GE 645 mainframe they'd been running code on (they had no longer had a terminal connected to it, but could still send a paper tape via inter-office mail and get a printout or results tape mailed back), so they scrounged up a PDP-7 out of the storage closets (which had been set up to do graphics research with a vector display attached, but then the guy who'd done that had transferred to the Holmdel site and left it behind so it was theirs to play with, and was the ONLY computer they had exclusive 24/7 access to). Ken Thompson wrote his own assembler that could run on it, and used that to port "sky" to the PDP-7 now with _graphics_, but he and Dennis Ritchie also had filesystem ideas left over from multics they hadn't been able to do, so they wrote up a ramdisk based version on their PDP-7, which evolved into an OS they called "unix" which was either one of whatever multics was many of, or a castrated version of multics, depending on who was asking. All this was largely out of boredom waiting out their punishment. (There was a lovely presentation about this at Atlanta Linux Showcase in 1999 from one of the guys who was there. Apparently the roof of the attic they were exiled to leaked.)

When the Multics contract finally expired and they could bid on new contracts again, which came with funding for new hardware, they proposed buying a PDP-11 to create a typesetting system for the Patent and Licensing department (because those guys had money). The proposal was in 1970 and they got most of it working in 1971. (The timeline is in Quarter century of Unix.)

On the original PDP-11 system the Bell Labs guys ordered to service the patent and licensing contract for a typesetting system, their internal disk was 0.5 megs, Their RK05 disk pack was 2.5 megs but much slower, and was mounted on /usr. The OS leaked into /usr when they ran out of space on / hence /usr/bin. They got a second RK05 later, mounted it on /home, and moved the user account home directories to the new disk and let the OS consume the rest of the old one.

All this happened before the 1973 ACM paper (and 1974 presentation) that announced Unix to the world and got people requesting source licenses (which AT&T had to grant because of the 1957 antitrust consent decree, something the patent and licensing department knew very well), and it was before Ken Thompson took a year sabbatical from Bell Labs in 1975 to teach at his alma mater (the University of California at Berkeley; I _think_ it was fall 75 and spring 76 but would have to check the Quarter Century of Unix book to confirm), which resulted in the Berkeley Software Distribution which was awarded a Darpa contract in 1979 to connect VAX systems running Unix to the internet to replace the original (then ten year old) Honeywell IMP routers (see the book "Where Wizards Stay Up Late" by Katie Hafner for the BB&N stuff, and the Andrew Lenard link above for the 1979 replacement), and thus every internet connection DARPA sponsored for the next few years came with a VAX running BSD unix, which is why the Stanford University Network administrators decided to commercialize their m68k unix workstations under the SUN brand in 1982.

(Note: I haven't really checked references for this either, but I linked you to a few above if you'd like to.)

August 17, 2019

Grumble grumble bugs marshalling state between contexts. Writing the new shell I want to do as much as I can with in local variables in each C function, but the function either has to call a "give me more data" function (which makes it deeply awkward to feed multiple different types of input into it), or it has to return to its caller with a return code requesting more input, and save parsing state into a structure that's maintained between calls and gets fed back in for the continue.

I've chosen the latter approach, and I use a structure passed in as an argument (well, a pointer to it anyway) rather than globals because I'm not sure we won't be calling this from multiple overlapping contexts. (Functions, trap handlers, here documents, files sourced with the "." command, (subshells) and backgrounding& and $(commands) run in various <(contexts) that are parsed as late as possible...)

Parsing was working nicely until I tried "if true then date fi" with a newline between each word and that _should_ work, but the "if" set done=1 thus signalling that the next iteration of the loop should break the pipeline segment, then it checks it for HERE documents and sets pl = 0 which will allocate a new pipeline segment when we get another word of input (without which we don't _need_ another segment; we're done and I try to avoid empty segments so flow control doesn't have to deal with them)... but in this case parse_word() returns "there is no next word on this line" so we return to request more input, and when we get called again with the next word on a new line, the local variable state from last time is gone so the setup no longer knows we need a new segment. (It would have done it next iteration of the loop, but we returned rather than looping.)

Except the only time we're CONTINUING an existing segment is when we had unfinished quoting? Otherwise newline means the same as semicolon: end the current pipeline segment. (A pipeline segment is a command or a flow control statment like "if", "do", or "(". There's probably a better name for this but as with the OSI model the layers they've named and what we're actually _doing_ when you implement it don't match up.)

So yeah, I think the fix is that when we're re-parsing the initial context (coming back in with a line continuation), set pl = 0 unless we were continuing a quote or dealing with a HERE document.

August 16, 2019

Bash has too many error messages for minor variants of the same thing.

$ break 1 37
bash: break: only meaningful in a `for', `while', or `until' loop
$ for i in 1; do break walrus; done
bash: break: walrus: numeric argument required
$ for i in 1; do break 37 walrus; done
bash: break: too many arguments
$ for i in 1; do break 0; done
bash: break: 0: loop count out of range

Also, why is & supported after break and continue?

$ while true; do break & done
[1911]   Done                    break
[1912]   Done                    break
[1913]   Done                    break

That's an endless loop, and in theory with continue it's a tiny forkbomb. Is this what posix says to do? What? I can see why "test && break || walrus" is supported...

Sigh. To make _any_ of that work, I need to pass the block stack as an arugment to run_command(). (Or I need to stick it in GLOBALS but I'm trying not to do that because function calls and a single global context seem like they would combine poorly even before (subshell) and background& interacts with it.)

Hmmm, should I try to be _interoperable_ with bash? Specifically:

$ potato() { ls -l; }
$ export -f potato
$ env | grep potato
BASH_FUNC_potato%%=() {  ls --color=auto -l

That's not the export format I was using, but it _could_ be. (They did allias expansion in export? Why...)

August 15, 2019

Since toysh supports nommu and toybox doesn't do unnecessary multiple paths, I'm trying to make backgrounding and subshells work with vfork() instead of fork (if I implement them with fork() first they'll never get sufficient testing), and it's a SERIOUS PAIN.

The thing about vfork() is it doesn't duplicate the forked process, instead it suspends the parent process until the child exits or calls exec, and during that time the child is basically a thread not a process, and all its memory is shared with the parent. In fact, traditionally the child didn't even get a new _stack_ so changes to local variables and such show up in the parent too. Basically from the parent's point of view vfork() works like setjmp(), the actuall fork() is deferred until the exec() happens and until then it's the parent doing everything, at exec() time the parent longjmp()s back to the vfork() point and resumes what it was doing.

Except the child has its own PID and its own file handle table (so stuff you open/close after vfork() persists in the child after the exec(), and the parent's filehandles are unchanged when it resumes). So it's not a perfect analogy, but pretty close. And one thing it DOES help you keep straight is DON'T RETURN FROM THE FUNCTION THAT CALLED VFORK() because when the longjmp() happens your stack will be trashed and your next return will segfault.

ANYWAY, point of all this is vfork() has to be followed by an exec() or an _exit() and the parent's suspended until then. You only get ANOTHER process running at the same time once you exec. So implementing & or ( ) with vfork means toysh has to exec itself.

Now being ABLE to exec yourself is another fun little detail, because /proc/self/exe is only there when /proc is mounted (chroot and initramfs don't start that way), and otherwise you can sometimes sort of figure it out from argv[0] and $PATH (assuming you do so before either is modified and that they were set correctly: execve has "file to exec" and "argument list including argv[0]" as two separate arguments and it's only by convention that they overlap), and there's no guarantee that your binary is still available in the current filesystem anyway (switch_root deletes all the initramfs files but it still exists as long as it's open, pivot_root and umount -l are similar)... A process keeps the file open but it's not in the filehandle _table_: I argued with the kernel guys that execve(NULL, argv, envp) should re-exec the current process, but they kept boggling at why I'd want to do that and it never went anywhere.

Hmmm... you know, the container suspend/resume plumbing has to have solved that one, and they merged a lot of patches to allow exporting data that wasn't otherwise available to recreate a process's full state. I should look up how they do it... but they probably use and require /proc to be avaliable. :P

Anyway, backup to the topic: the child needs to know what the local variables and functions defined in the parent are, plus variable annotations like "integer" and "read only". AND it may need to be told what the child process is running.

Except the bash "shellshock" nonsense was "set an environment variable and bash runs it on launch". That's what I _need_ here, but I don't want anything _except_ this to be doing it. Hmmm... I can do stuff like use the empty name variable name that other stuff usually can't set, but you can set anything with env so that's out. I can include the PID and the PPID in the name, since those are known by the child process before vfork() and are trivial to query with getpid() and getppid() even in a no /proc environment, but are those two numbers (which usually have a range of 2-32768 unless you've written to /proc/sys/kernel/pid_max) enough security to prevent brute force attacks ala "this web page launches a shell script with an environment variable I control, imma hammer on it til I get it to run arbitrary code"?

Hmmm... ok, the child inherits filehandles from the parent, so if I open a specific filehandle number (derived from both pid and ppid) with specific flags, and then do the flag check ioctl on it and get the right flags back (which also means it's open), that's a second check that's cheapish to do and hardish to set from exploit contexts.

Hang on, I shouldn't do that. I should just set -c "thingtorun" on the child's command line. Yeah it's externally visible in ps but that's way less silly than allowing anything in environment variable space to execute. Modulo functions still need to get marshalled between contexts, but they don't auto-run. For what that's worth:

$ function echo () { blat; }
$ echo
bash: blat: command not found

Functions even override bash builtins...

August 14, 2019

Ok: changes to parse_word(): $( (anywhere) and (( (at the start of a word) start a quoting context ended by ), within which we count parentheses and the quoting context only ENDS when the parentheses count drops to zero. If (( has an internal parentheses count of zero but gets a single ) it retroactively becomes two individual parentheses.

Dear bash: if I end a line with a #comment but there's a continuation, and I then cursor up in the history? Don't glue the later lines after the #comment so they get eaten.

$ burble() #comment
> { hello;};burble
$ burble() #command { hello;};burble

The two are not equivalent. (You can do multiline history! Really! Busybox ash does!)

August 13, 2019

One more week until my plane flight to San Diego. The shell is not ready to demonstrate yet. Still finding and adding parsing corner cases:

$ ((1<3))
sh: 3: No such file or directory

That's wrong, bash doesn't do that. Same for $((1<3)), it's a variant of quoting. We go into a different context (arithmetic context) within which < and & and ; aren't redirects or pipe segment terminators.

Revisiting quoting (again), bash treats [ and $[ differently for line continuation purposes:

$ [ one two
bash: [: missing `]'
$ $[ one two

Since [ is an alias for "test", it's not even treated as a flow control statement, while $[ ] is an environment variable expansion handled by the quote logic in the tokenizer. (Quotes must have matching end quotes or we need more input.)

I'm trying to work out how to treat $(( vs (( which both behave the same way: at the start of a line by themselves, if you then hit enter, it prompts for a line continuation. I _think_ one is being treated as a flow control statement and the other as a quote variant: if I go "true ((" it complains about an unexpected (. BUT both of those disable redirection within them. The variable/quote would do so automatically: it would parse as a single word starting with $(( and thus not be recognized as a redirection by the thing looking for redirections. (Environment variable expansion happens _after_ redirections are identified.) I have to handle (( before doing redirections, which means the "run a command out of this pipeline segment" logic needs to recognize (( before doing the normal command line processing. We still need to CALL the pipeline processing logic because:

$ ((3<2)) || echo hello

But it looks like bash is doing a similar workaround, because:

$ X=3 ((3<2)) || echo hello
bash: syntax error near unexpected token `('

The local variable assignments don't get called. Even though just like with $((x+2)) the math logic will expand x into a variable (recursively!) until it gets a number or blank space... hang on, what happens if:

$ X=X; echo $((X+2))
bash: X: expression recursion level exceeded (error token is "X")

Ha! Ok, added to the test pile. Where was I. Quoting logic. Hmmm...

$ cat <(echo "$(((1<2)) && echo "hello")")

The $(( there is actually $( (( which we can tell because $(( (1<2)) " puts us in negative parenthetical territory without it being a )) together. So the token parsing here also needs similar break-and-downgrade logic. Can I make them the same?

However it goes, I have to redo the parse_word() logic. And looking at it, I'm not sure it ever handled echo "$("${'THINGY'}")" properly...

$ echo "$("${'THINGY'}")"
bash: ${'THINGY'}: bad substitution

Of course neither does bash. But the parsing doesn't syntax error on it, and if you remove the single quotes:

$ THINGY=whoami; echo "$("${THINGY}")"

Hmmm... For $(( I think treating it as $( followed by a ( is right from a parsing perspective, the variable expansion logic has to re-parse the block again anyway and bash already tracks parentheses nesting depth specially here:

$ echo $({hello)
bash: {hello: command not found
$ echo $((hello)

August 12, 2019

Dear bash: why does

)); do echo hello; done

produce no output, but:

for(( ; ; )); do echo hello; done


August 11, 2019

Ok, (( is a token when there is a matching )) rather than a matching ). Keeping in mind you can have ( ) inside (( )), but the balance can't go negative or the (( degrades to two ( (. I _think_ that's the rule, anyway. And of course in ((2>(1+3))) those last three characters don't parse as )) ) they parse as ) )). So the _tokenizer_ has to be tracking the parentheses count.

Posix says that shell environment variables MUST allow names with letters, digits (not starting with a digit), and underscore, and it _can_ accept more names. And that environment variable names are anything except =.

Bash is a lot pickier, it won't allow any punctuation except underscore, and rejects unicode characters:

$ せ=42
bash: せ=42: command not found

I'm pretty sure I want to accept unicode characters in variable names. Hmmm.

Bash accepts "function abc=def () { echo hello; }" but abc=def is a variable assignment, so I'm not sure how you'd call it... Ah:

$ "abc=def"

Strangely, it does NOT allow function "abc=def", says it's an illegal function name. (Function names are not dequoted?)

August 10, 2019

All this shell stuff is forcing me to learn corners of shell programming I didn't know. (Some I bumped into briefly years ago and totally forgot.) For example, in bash "for((i=1;i<5;i++));do echo $i;done" prints 1 through 4. No, you can't have spaces between the parenteses, (( and )) are tokens, which call for arithmetic evaluation of their contents. Busybox ash does not implement this syntax, it throws a syntax error parsing it.

Meanwhile in bash, "for true;do echo hello;done" parses, but prints nothing. But "for echo hello;do echo hello;done" is a syntax error, unexpected token 'hello'.

Speaking of which, in bash:

$ for ((i=3
> i<5
> i++))
> do echo $i; done
bash: syntax error: arithmetic expression required
bash: syntax error: `((i=3

This is the third thing (other than # and HERE documents) that cares about newline vs ; and it's annoying. Grrr.

Meanwhile, trying all the corner cases:

$ X=ii;for $X in a b c; do echo $ii; done
bash: `$X': not a valid identifier

In theory I should add negative stuff like that to the test list, in practice it's a pain to test negatives. Hmmm. I should probably do it anyway. I don't want to match exact error messages, but I can see that no stdout output was produced and it gave an error return code...

Darn it, I can't have (( be a regular token because:

$ ((echo a) | sed s/a/b/)

So what IS the rule for when (( is special and when it isn't? Hmmm...

$ ((1+
> 1<2))
$ echo $?

August 9, 2019

Finally caught up and applied the toybox patches Elliott sent during The Dark Time. (Except for the 2 I want to do a different way.)

Poking at my ELC talk, I have way way _way_ too much material I'd want to cover for the timeslot. When I first spoke at ELC in 2006, I had an hour. Last time I was there, a slot was 45 minutes. Now it's 35 minutes, scheduled against _18_ other talks, and ELC is "colocated" with TWENTY other events! (At least the Linux Foundation spam asking me to pay to register so I could attend some of those colocated events said so, and they were proud they'd just added more!)

Guys: a normal convention already has more stuff in any timeslot than I can go to. Fade goes to 4th street, a writer's conference that has ONE panel track that everybody's at. At Penguicon and Linucon we had 5 tracks going in parallel, and that was _already_ more than anybody could go to. (We were a combination Linux expo and science fiction convention: 2 SF tracks, 2 Linux tracks, and a "crossover" track.) It gave people choice, but there were also inevitable conflicts, which is why we posted panel recordings for the ones you missed. (And at Linucon, we had a half hour between panels for recovery time and mingling, and scheduled food events in the con suite.)

This pathological "colocating" the linux foundation is doing is not an advantage, it's a schedule conflict. It means they're BAD at this. But they don't think so because their goal is to get money. These are for-profit conventions done as fundraising events for a trade association which is the same kind of legal entity as the Tobacco Institute and Microsoft's old "Don't Copy That Floppy" sock puppet. Microsoft has now JOINED the Linux Foundation.

Sigh. Anyway, back to my talk. Writing up all the different things I could talk about, and then hopefully there's a clear subset that I can cover coherently in the allotted time.

August 8, 2019

Laptop back together with new hardware! Woo! Much reinstall.

While buying an ssd, since I was at best buy anyway, I got a USB bluetooth adapater so I can use my wireless headphones with my laptop as well as my phone. Plugged it in and... nothing. Ok, google a bit, devuan didn't install bluetooth support by default so there are some packages to install, and now I've got a gui thing that's... really clunky. But ok, switch on headphones, click scan, it sees it, and it paired but couldn't associate? What? (These are seperate steps?) It wants me to choose between "handsfree" and "audio sink" which should NOT be my problem. (At a guess one's with microphone and the other isn't, but that's JUST a guess an terribly phrased if so...)

Connection Failed: blueman.bluez.errors.DBusFailedError: Protocol not available.

Googled, found a page which didn't fix it. Maybe I need to reboot, but instead I yanked the usb thing out and threw it back in the box and dug out my wired headphones. Linux on the desktop! (My android phone Just Worked with these headphones. Of course android ripped out the "standard" Linux bluetooth daemon and wrote a new bluetooth demon from scratch a few years back, which is presumably _why_ it just works.)

Doing more or less the April 16 setup, except I copied the home directory verbatim (new enough install I wasn't worried about version skew in the file formats of the .hidden rc directories) so half the setup is already done (because the xfce config was retained, so it already knows it has 8 desktops and so on). But I still have to reinstall a bunch of packages and uninstall xscreensaver and undo the "vimrc.tiny" sabotage...

Why on earth does the user account debian/devuan creates NOT automatically belong to sudoers? (Linux on the desktop!) And adding the group to the user account requires a reboot for it to take effect (because all the loaded command line tabs haven't got it, and new xterms fork from a logged in user process that hasn't got it), but adding the USER to the sudoers file (which you're not supposed to do) takes effect immediately... *shrug* Big hammer, move on. (Except while I'm at it, yank mail_badpass out of there because it's not 1992 anymore. Honestly.)

August 7, 2019

Remember that dodgy laptop hard drive? Yeah, it died. Not exactly a surprise, but a disappointment. I had backups, didn't lose anything worth mentioning. Still a productivity hit from needing to reinstall and set the distro up again.

Went to Best Buy (because they were open 2 hours later than Fry's and not on the other side of town), held a replacement drive for like ten minutes while staring at the SSDs, and broke down and bought an SSD instead. (3 year manufacturer's warranty? Eh, ok, give it a try. I survived the last drive failing, so my discomfort with "this storage thing will wear out from use" isn't gonna be _worse_ than the previous disk giving out after a month or two. My big worry is I tend to drive my machines into swap thrashing and on an SSD I may not _notice_ and burn out the disk fast, but... 3 year warantee.)

Back up early, back up often.

August 6, 2019

Jeff sent me a link to the github project that was extending llvm to support sh2 (until I think the guy doing it graduated and got hired by Apple).

Kinda up to my neck in todo items right now, but I added it to the list.

August 5, 2019

Going back and editing old blog entries so I can post them, and I'm at the end of February where I was leaving JCI due to burnout.

The thing is, I was out of actual productivity, but I could still debug stuff. I could probably debug stuff while on fire. That's not creative, merely clever. There IS an answer, bisect your way to it with an axe, figure out how to stick a flashlight into the right cracks to illuminate the problem, making new cracks with said axe as necessary. That's powered by anger. Frustration only makes my debugging STRONGER.

But JCI's new enviornment was Yocto: all their code was freshly written or supported by Wind River, so they had relevant people to debug it and didn't really need to call me in to reverse engineer 20 year old code nobody available was already familiar with. And in _order_ to debug it I'd have had to become a lot more familiar with yocto, and all my "there's your problem" instincts were swamped with "well there's your problem: it's running yocto with systemd"...

Sigh. It's easy to look back on it from a fairly recovered state and go "I could have done X, Y, and Z", but at the time I had trouble getting out of bed. I wist that I should have paid down the mortgage, but life also has to be worth living.

August 4, 2019

Environment variable names can have any character but NULL and =, so "env '()=123' env | grep 123" shows ()=123 as a variable that was indeed set. But () isn't a valid _shell_ variable name, so if you launch a shell with that in the environment it's filted out from what "export" shows.

I have yet to find the variable name rules in the bash man page to implement this filtering with. (I can also read the posix spec, but I'm implementing the bash man page as my first pass. Plus 8 gazillion tests like this where I think of a thing, try it, see what the behavior is, and then try to track it down in the documentation.)

I _think_ the rule is that variable assignments don't allow quotes or escaping in the name part, so any control punctuation like " ' ` ; or & can't be part of the name... except the existence of _any_ punctuation before the = seems to throw it off. (You can't have an @ or % in there, which are fine unescaped in command names because they don't mean anything special to the shell.)

And if you can't assign it, you can't export it with the "export" command (which _does_ allow escaping because it's an argument to a command, albeit a builtin).

Oh, the _other_ fun "not a variable" is zero length name. (I have an "env =3 env | grep '^='" test in env.tests because yes, it works. No, the shell doesn't allow it. But you can use env to feed any of this to programs...)

Hmmm, how to test it though? Devian still has the Defective Annoying Shell hooked up to /bin/sh by default, so:

$ env -i =abc a%b=def ' '=xxx sh -c export
export PWD='/home/landley/toybox/toybox/lib'

So that's all filtered out and the only exported variable is PWD (which is created by the shell and updated every time you cd). But when I explicitly say bash:

$ env -i =abc a%b=def ' '=xxx bash --noprofile --norc -c export
declare -x OLDPWD
declare -x PWD="/home/landley/toybox/toybox/lib"
declare -x SHLVL="1"

Slightly more complicated. And uses " instead of ' because gratuitous inconsistency. Hmmm...

Ha. I dug up an old aboriginal linux image to try this out on bash 2.05b (well over 10 years old) and busybox "env -i =abc env" does _not_ set the variable with no name. (Hasn't been through the desert, I suppose.) Ah, that image hasn't got bash on it, it has busybox hush (which doesn't filter out the a%b and " " names I _can_ set with busybox env). But that doesn't mean anything here... darn it, I moved bash to the toolchain build, didn't I? Ok, ./ instead of ./ and yes, now bash is in the $PATH, and...

$ env -i a%b=abc " "=def bash --noprofile --norc -c export
declare -x  ="def"
declare -x OLDPWD
declare -x PWD="/home"
declare -x SHLVL="1"
declare -x a%b="abc"

Well that's nice. So way back when it didn't filter this out, BUT it leaked the same crap into the environment as default exported stuff. (And why is OLDPWD defined with no actual contents, by default? Expanding $UNDEFINED produces the same empty string? This smells like a bash bug.)

August 3, 2019

Heh. There's a subtlety in the redirection stack I'm not sure how to document.

My doubly linked list logic has three interesting functions: dlist_add(), dlist_pop(), and dlist_lpop(). (This is glossing over the difference between dlist_add() and dlist_add_nomalloc() which is about allocation granularity, I need to clean that up at some point.)

The dlist_add(list, entry) function adds an entry to the end of the doubly linked list, and when it returns the new entry is "list->prev", I.E. it's at the end of the list. The function dlist_pop(list) removes the first entry from the list and returns it. The function dlist_lpop(list) removes and returns the _last_ entry from the list. So pop() works like a queue and lpop() works like a stack.

Except the "list" passed to all three functions isn't a struct dlist *, it's a struct dlist **. Pointer to pointer. The reason is if you add to an empty list, dlist_add() receives a NULL argument and changes it to point to the newly added entry. If you dlist_lpop() the last entry, the list pointer becomes NULL. And each time you dlist_pop() list becomes list->next; So all three functions have circumstances under which they need to change the value of the list pointer passed in to them.

The _next_ problem is that the types of these pointers are struct double_list, but most of the users aren't. A doubly linked list starts with two entries: a next pointer and a prev pointer, in order. The rest of the list's contents are irrelevant to the list plumbing, so you can define any structure with those first two entries and typecast it. Except remembering that it's a pointer to a pointer instead of just a pointer is then the caller's responsibility, with a segfault if you get it wrong. And you have to typecast both the list pointer and the entry pointer (either the second argument to dlist_add() or the return value), which is just awkward.

This ties into the dlist_add_nomalloc() stuff: I have ideas on how to clean it up: right now struct dlist has 3 entries, the third of which is a char * because the first thing I implemented with this plumbing was the patch command, and that's what that needed. dlist_add() mallocs a new struct dlist and sets its data pointer to point to the second argument you passed in, which is easy but guarantees your allocation is in two parts. But dlist_add_nomalloc() just adds the structure you gave it ot the list without wrapping it. This determines whether you can just dlist_traverse(&list, free); or need to while (list) {dl = dlist_pop(&list); free(dl->data); free(dl);}

Anyway, a _better_ way to do this is probably to have struct dlist {struct dlist *next, *prev;} with no third member, and then create structs that _start_ with a struct dlist member. Because you can always typecast a structure pointer to its first member, C99 guarantees there's no leading padding and the alignment requirements are correct. Gotta touch a lot of files to do that, though. It's on the todo list.

But that's not the current subtlety. The current subtlety is that my redirection logic is _extending_ linked lists, but never actually modifying the existing members that are passed in. And they clean the new members they've added back off before they return. So I don't need to pass around pointers to pointers in _these_ functions, I can just pass pointers and take the address of my local copy when messing with the lists. Extending and truncating the lists messes with the ->prev and ->next pointers of the list members, but it puts 'em back how they were at the end, and even if we had a signal handler longjmp() to a recovery function it's a valid linked list at all times anyway that can be handled normally from the "root" pointer. (If the list was empty, passing in 0 means the local copy of the list initializes properly anyway, although _that_ wouldn't clean up from a longjmp() context that couldn't see new members if it had passed in an empty list. It would leak memory then.)

The longjmp() stuff is for implementing the "trap" command, of course. Not quite there yet, but gotta keep it in mind...

August 2, 2019

I ran out of my blood pressure pills (refill request has already gone from 24 to 48 to 72 hours to fill; yeah I need to lose 80 pounds, the curse of the desk job), and for some reason being off the medication is making me tired and irritable. I thought that's what being ON it was supposed to do? I meant to post a patreon update yesterday, but spent a surprising amount of the day sleeping instead.

I'm working on my ELC talk, and already have three times as much material as space. (This talk is only 35 minutes. And it's against something like eighteen other talks.) I should do podcasts.

I got to a good stopping point on the filehandle redirection code, got the result to build, tried to run it, and after I got through segfault city I realized the if/then/else/fi code I _was_ working on before diverging into filehandles wasn't quite finished, let alone tested. And the dangling bit is that the pipeline segment list is still grouped before flow control parsing. (The pipeline list and the expect stack are different lists built at different times, with unfortunatley different granularity: the pipeline list is what the shell script text parses into, and stays constant no matter how many times you call it. The expect stack is what's happening THIS run through it.) So I need to break the expect stack up differently.

And _that_ got me shuffling code around where I tripped over the lifetime rules on the different kind of here documents (<< vs <<<) being different, which is a problem because they otherwise get dumped into the same storage.

I need to get this to the point where I can feed lots of data through it and regression test against my 8 gazillion test cases. But that's a LONG way from now. A lot of shell has to work before you can run real shell scripts, let alone like the test suite. (Although you can #!/run the test suite in one shell and test a second shell.)

August 1, 2019

Fade flew back to Minneapolis yesterday. We've still got the dog for another couple weeks (until she gets back from a friend's wedding in Ireland, which she doesn't leave for until after her placement exams). So now I have ANOTHER reason I can't work at home; clingy dog. Clingy energy vampire dog that's making me irritable even now that I'm out elsewhere with my laptop.

The main reason I can't work at home is still cats climbing me and standing on my keyboard whenever I sit down to program. I grew up with cats, I love cats, and I don't want any more cats. I am tired of cats, and I am waiting for our current cats to die of old age so I can be cat-free. (Peejee and George are both 15, Zabina's much younger but we can probably find somebody to take her after that). I was thinking someday I'd graduate from cats to kids, but it didn't happen. (I had measles at 22, yes I got vaccinated as a kid, but didn't get the booster and my brother got it, and gave it to me.) And pets aren't a substitute for children.

The Kobayashi Dragon Maid music came on in my programming music rotation, it's the bounciest thing ever, but it's kinda melancholly right now because the animation studio that did it had _just_ started work on season 2... and then was firebombed in the worst domestic terrorist attack tokyo's seen in decades, killing multiple dozens of people including the series' director and putting more in the hospital for a long time. Most of the victims are young women in their 20's or early 30's, because it was a progressive studio that actually hired women.

As I said, kinda melancholly. And I have the long versions of both the intro and outro themes (the anime used only about half of each composition) in my programming playlist. Bit of a mood whiplash.

July 31, 2019

Googling for something else I found a patent using a file from my old aboriginal linux toolchain as an example. I boggle.

July 28, 2019

Trying to redo the loopfiles_lines() plumbing so the "flush" call at the end of a file has the filename in *pline and instead signals it's done with len = -1. (A zero length line is valid if you're stripping the terminators.) That way we can report errors on empty files (the way sha1sum -c wants). Unfortunately, the filename isn't passed in to do_lines, and if loopfiles_lines() makes the function() call itself there's a duplicate (0, 0) call before it? Hmmm, needs design work to shuffle stuff around so it collapses down cleanly.

Meanwhile, toysh is now just over 1000 lines and I've implemented maybe 1/3 of it. I suppose 3000 lines isn't outrageous for a shell but it would be far and away the largest toybox command. (Not counting bc, which I need to find a month to clean up sometime. No real reason for that to be much larger than sed, that I know of...)

July 27, 2019

I've got the majority of the redirect plumbing implemented, and I'm rereading the bash man page to figure out what all the corner cases I need to implement are.

Making the & background operator work on nommu (with vfork) is kind of brain melting. How does backgrounding a HERE document work?

$ while true;do sleep 1;read a;echo $a;[ -z "$a" ]&&break;done << EOF &
> one
> two
> three

I just tried in bash and it works fine, but... the child process is reading from a normal filehandle, so the shell process is writing the data to it. I guess it starts waiting for it to exit when the HERE document hits EOF? If the parent shell isn't going to block if it fills up the pipeline buffer bash has to fork _two_ processes, and the writer process does not exec(), so on nommu I gotta do the /proc/self/exe dance and marshall data through to the child. Possibly with -c. (Marshalling the local variables and function definitions is likely to be fun. I'm guessing "magic environment variable", and the obvious candidate is the one with the zero length name, see env.tests and why does "env =abc" work... then the body would be some sort of encoding of the data. That's gonna be a pain to secure...)

Sigh, maybe it's trusting the shell pipeline to hold the data and otherwise blocking the parent? That would be easier. How big a HERE document do I need to saturate a modern kernel's pipe buffer to _test_ that?

And of course just plain (subshell) and backgrounded | pipelines& is the same problem on nommu: vfork() without exec, so you exec _yourself_ and pass the child process any data it needs through a pipe or something. Sigh, I should revisit my old exec(NULL) request, without which nommu requires /proc/self/exe to do this sort of thing, which means it requires /proc and you could be in a chroot...

July 26, 2019

Fade found a new coffee shop (cherry wood, around the corner from Fiesta), and I swung by to see how things were. It was quite nice. About 4x the walk as the Wendys or HEB food court though. Still, exercise...

July 25, 2019

Twitter migrated my web ui to New Coke yesterday, so I've stopped using it (except occasionally through my phone). Getting a bit more done. But it does mean I can't tweet boggles like:

$ {walrus}<&-
bash: walrus: ambiguous redirect
$ walrus=potato
$ {walrus}<&-
$ echo $walrus

Which seems to be that atoi("potato") returns 0 and attempts to close 0-2 (stdin, stdout, stderr) are silently ignored. No message, no return code.

According to the bash man page, {var}<&- or {var}>&- indicates a filehandle to close by the contents of $var, but it doesn't say what the error handling should be if var is blank or has a non-integer value. I tried "strace bash -c 'var=blah; {var}<&-'" and it's... spawning a child process? I threw a lot of -F at it to follow forks and it's just doing a bunch of rtsigaction and friends? All the open() and close() calls are either /dev nodes or endless locale nonsense.

I think it's a NOP? There's just no error message? Does that mean atoi() said 0 and it won't close stdin? Yes, it looks like "0<&-" is ignored. So are 1 and 2. And:

$ potato=-1; {potato}<&-
bash: potato: ambiguous redirect

Is the same message as unset.

Sigh. I implemented <&- and then read in the man page that <&$BLAH can expand _to_ "-" and it counts... Lots of circular dependences in this stuff, hard to find good stopping points.

July 24, 2019

I'm grinding away on this shell stuff as fast as I can, but I'm not sure what I'll have ready by my ELC talk in a month. There's a lot of groping around. There are a lot of dark corners of shell programming I didn't previously know (because I'd never had to before).

I'm making a sh tests file I cut and paste each test I run against bash into. If I had to ask this question to work out what the correct behavior was, I want my shell to get it right. This is another manifestation of my todo list getting longer as I work.

I still hit weird corner cases I have to think through in basic stuff: is -1<<2 still -4? Because 0xffffffff becomes 0xfffffffc which is decrementing it by 3 which in ones complement means adding 3 so yes, it works the same way! Ones complement is clever. Which means int*4 can become int<<2 whether it's positive or negative.

(Yes, I _could_ trust the optimizer to do that for me, but the optimizer is written by C++ people these days and you only get sane behavior by -fstoppit-ing the optimizer to swiss cheese with command line arguments. Otherwise everything is "undefined behavior". Makes me reluctant to trust the optimizer to do anything, really.)

July 23, 2019

I'm confused by bash doing:

$ echo one two three four five' ' | while read -d ' ' a
> do echo 1$a; cat <(read -d ' ' a; echo 2$a); done

(Without that trailing quoted space on at first echo I only got the first 4, which is a seperate boggle. The boggle _here_ is the interleaving.)

The reason is, stdio reads use a buffer, and <() is a sub-process that in _theory_ has its own FILE * for stdin and thus its own buffer, and I don't see how the two are sharing the input cleanly? I did a get_line() function that WOULD get this right... by using inefficient single-byte reads to never overshoot the input data. Elliott is in the process of ripping that back out because it's slow to do single byte reads, but how else do you share lines of input between processes?

Anyway, the hard bit I'm banging on right now is redirects, because making:

$ echo hello | while read a; do echo $a; done <<< boing

Work without leaking filehandles _and_ reasonably supporting nommu is fun. I want to have one well-tested codepath, which means I want my shell to use vfork() internally all the time, which means I want to do as little work in the pre-exec child process as possible. Working out "as possible" is nonobvious.

Do I want to set up the filehandle redirects in the parent process, maybe dup2() the old stdin/stdout/stderr up to a high filehandle and then unwind it after the vfork()? Or set them up as high filehandles (>10 is reserved for shell internal use, at least according to the bash man page), and then have the child dup() them back down and close the old ones? (There doesn't seem to be a "move filehandle" option, just dup2()/close(). Oh well.)

I _can_ have the child call open(), it's a system call and a vfork() child has its own file descriptor table. But it should error out and not run the child process if the open didn't work, which means I prefer it to happen before the fork because error handling becomes much more awkward afterwards. And if you add in supporting the weird tcp/ip stuff (< /dev/tcp/host/port) I _really_ want that to happen before vfork()... But then you combine signals (the "trap" shell builtin) and unwinding interrupted setup without leaking anything...

July 22, 2019

This sort of thing is part of why I maintained a tinycc fork for 3 years. I really want a small simple C compiler with known C99 behavior that we can port to various platforms and NOT do crazy optimizations generating endless "undefined" behavior.

These days we'd need a new cfront converting C++ to C in order to use such a compiler to bootstrap gcc or llvm on a new architecture (and thus have an optimizing compiler, not merely a correct one). _BUT_ if we teach LLVM to produce C output as one of its targets, suddenly we have a modern cfront.

If I had all the money I'd ever need, I'd hire somebody to work on this... Alas, while I keep encouraging people to steal my ideas they never do.

I also note that the entire article on detecting and defeating insane compiler "optimizations" boils down to "this is why volatile was invented", so of course the C++ loons have proposed its removal. (As long as they don't break _C_ they can do whatever the like to C++. But never let C++ developers take over development of a C compiler. They will break it. It's what they _do_.)

Ooh, Jeff sent me a link to exactly that, an llvm backend that produces C. Have LLVM run itself through this, teach tinycc to compile the result, and you don't need cfront anymore! You can bootstrap up to a modern optimizing compiler from a tiny auditable system!

Ok, I need to dig up qcc once I get toybox to 1.0.

July 21, 2019

Here's a fun one:

$ chicken() { echo hello; chicken() { echo also ;}; chicken;}
$ chicken
$ chicken

The lifetime rules here are awkward. The original definition of chicken() is still running when it's overwritten, so the memory can't be freed until the function exits. And given that it could be a _recursive_ function... I think I need a reference count? (Or a function to clone a pipeline?)

Do I need this for variables too? I don't think so... They're used atomically, you don't run _in_ a variable you just resolve it. Even ${abc:${abc:${abc}}} is resolving the innermost one first and working its way back out. You can modify a variable in $((x=y)) (although you can only set it to a number: "fred=rutabega; rutabega=123; echo $((x=fred))" outputs 123) or assign default values to unset variables with ${name:=value}, but these happen in the context of one variable resolution and then the result is used for the next: ${abc:$X:$Y} doesn't care what the value of abc is until it has the values of $X and $Y, and the whole thing becomes a string which is then parsed by ${}.

July 19, 2019

Had one of Fade's vitamin pills on an empty stomach this morning. Last time my stomach got _that_ unhappy from a vitamin pill on an empty stomach was ...20 years ago now, driving to work at Trilogy.

July 18, 2019

Sorry if this shell blogging has been terse bordering on unintelligible. I've been doing a lot of design work and when I try to blog my thought process it gets REALLY LONG. Here, lemme do today's just to demonstrate.

I'm still arguing with the flow control logic. Right now each flow control terminating statement is its own pipeline segment, but the flow control _starting_ statements are bunched together, because "if if if true; then false; fi; then false; fi; then false; fi" turns out to be a valid line.

The pipeline data structure is more or less a linked list of struct sh_arg { int c; char *v[];} structures. (Plus HERE document information, but that's irrelevant here, and it's doubly linked to make it easier to create in order rather than reversed.) Which means each pipeline segment is a command and its arguments, and each argument list ends with either NULL or a "control operator" like ; or | or && indicating how it interacts with the next one.

Flow control complicates this, because "(echo hello)|cat" is actually "(" is a starting flow control statement meaning "you fork and do all this in a subshell", "echo" and "hello" are a command you fork and exec, ")" is the corresponding end to the flow control statement, and then "|" applies to everything inside the parentheses (so the hello goes through it), and then "cat" is another command you fork and exec. So you actually wind up with:

( echo hello
) |

And that's how I parsed it when I implemented the line continuation and syntax checking, which does the right set of ">" prompts until it has a complete thought, including ending quotes, flow control statements, and HERE documents.

This breaks the input down into pipelines, figuring out where each stage ends due to newlines or ; | || etc, with the special case of ")" being both a control operator _and_ a flow control statement (so it instead of appending to the previous argument list as "how this ended", whenever it's not at the start of a new statement already it ends the previous statement with a NULL and starts a new one it's the first word of). All the OTHER flow control terminators already have to be at the start of a pipeline segment (which is why "if true; then false; fi" needs the semicolons before then and fi, same with "{echo;}" needing that semicolon so } isn't an argument to echo. Yes, ) ends both a word and a statement, but } is basically just a glorified command. My initial trivial run logic just tried to run each pipeline segment in sequence (the control operators were saved but not used), so in the above "(echo)|cat" example "(" would be an unknown command not in the $PATH, so would ")", and then "cat" would hang awaiting input from the console.

Now I'm trying to actually implement the flow control, and there's several parts to getting it right. Each flow control statement has a corresponding end, and any pipes or redirects done to the end apply to the whole statement, as in "if echo hello; then echo next; fi | tee blah.txt" writes both hello and next to blah.txt. So before I can run _any_ of the statements, I need to apply the flow control redirects to the entire context. Meaning I need to find them up front, so I needed logic to search forward for the matching end _without_ losing my place at the beginning. My syntax checking logic had a stack that popped itself, meaning performing the search modified the stack contents...

The stack is because flow control _nests_, so unless I want a recursive function I need a stack of active flow control statements, which is why I did the "expect" stack back in syntax checking. (It's called that because it tracks what statement am I expecting next: while/do/done, if/then/fi, etc.)

I extended my syntax checking function to run in 3 modes, which answer 3 questions. The original mode answers the questions "is the expect stack empty" (if not we need more lines to balance it out), and "was there a syntax error" like "(echo&&)|cat" which needs a statement after the &&. (Which means the flow control needs to know when to expect a non-empty statement, and it adds NULL entries to the expect stack to track those.)

The second mode is "where do I start running this line, and in what context": this returns an offset into argv at which there's something to fork/exec, which equals arg->c (I.E. eats the whole line) if the whole line is flow control statements, ala:

  echo hello

The pipeline stage parsing logic doesn't glue lines together, just ends them early. (The quoting logic glues them together for unfinished quotes or trailing \ but that's a previous layer.) And since you can wind up on a separate line (and thus in a seprate pipeline stage) when you actually have something to execute, the inciting statement (like "if" or "then") gets saved at the top of the expect stack, for the caller to pop.

The third mode is that "search ahead" mode, which doesn't pop the stack but instead returns whether or not the caller _should_ pop the stack (meaning we have an end of flow control statement. I'm oversimplifying a bit because if/then/elif/elif/elif/fi can go on for a while, but the flow_control() function already knows that when we expect "fi" we should also check for "else" and "elif". The point is it won't pop back past the "if", and only "fi" has the special redirect and pipe stuff that applies to the whole block, any pipes or redirects on "then" or "else" apply to the command that goes there.

Unfortunately, actually trying to use this sucks, because pipeline segments contain BOTH flow control AND command to execute. And not just ONE flow control statement either, you can have "if if if true" and we're back to needing a recursive function or another stack to deal with that.

I tried adding a second piece of data (segment and offset _into_ segment), but tracking this turned into a mess and I was snapshotting the expect stack to do the search forward for the end meaning I had a stack of copies of the stack...

So back up, redesign. If you sometimes wind up with flow control and command in a separate pipeline stage, and have to handle it, then make that ALWAYS the case and thus the only case to handle. What I need to do is break up the pipeline further, so each _starting_ flow control statement is also its own pipeline segment (line ending ones are now), so "if while true; do break; done; then echo hello; fi" becomes a list of 9 pipeline segments:

while NULL
true ;
break ;
done ;
then NULL
echo hello ;

Then the code to track where we are at runtime just needs a pointer to the pipeline segment to deal with next, and then I can implement a flow_control_end() function that takes a starting pipeline location and returns a pointer to corresponding end segment (just start a new expect stack, and when it's empty again we're done)...

Hang on, better: each pipeline stage isn't _really_ just a struct sh_arg, that's just one of the members. I can add another member to annotate each statement with "start of flow control", "flow control change", and "end of flow control" on the first pass. (The "change" ones are needed because "if one; two; three; then" can have multiple statements belonging to the "test" part before you get to the "perform conditionally" part, let alone else and elif and such.) Which means I don't have to re-call the flow control function, the information goes into the pipeline list. And I can ask a simpler question in the flow_control() function: is this WORD a flow control statement or not, yes/no, and then have the pipeline assembly use the result by noticing the stack got deeper (or have the function return the 3 categories in the return code).

Except I can't. And the reason it doesn't work is functions: "function walrus() {" is either 4 or 5 words depending on whether you say "function", and it can even have a newline between the ")" and the "{", but _can't_ have a newline before either of the parentheses. I have special case logic in the flow control to handle this now, but the function name, (, and ) get parsed into 3 seperate words before this (kinda necessary: any or all of them could have spaces before or after them, or neither, they count as "control characters" and thus become their own words). To be a function, they MUST come in sequence on the same line (except the { can optionally be a later line, but no _other_ words can come between them. Violating this is a syntax error, and yes "thing();{" is allowed but "thing()&{" is a syntax error, remind me to add a test for that. The difference between "newline" and "semicolon" is pretty darn fuzzy at times.)

I was fudging the contents of the expect stack to always have the special "function" keyword on it (even when you didn't use the function keyword in the input), but I'm _not_ currently modifying the pipeline input. (I'm splitting it into chunks, but not changing what any of it says.) I suppose I could assemble an artificial "function NAME ( ) {" pipeline segment after syntax checking, but the problem is in order for it _to_ pass syntax checking, I need to either pass in more than one word at a time, or abuse the expect stack a lot. Hmmm... I need at least one word of readahead in order to recognize functions in the "posix" case: without the function keyword, the second word of a statement being "(" means it's a function. (Or a syntax error if the third word isn't ")" and the next word after that isn't "{".)

And _this_ means I can't do the flow_control() stuff until I've assembled a pipe segment terminated by ; or newline or such, at least for recognizing functions. Which makes splitting the pipeline segments a LOT more awkward. because I'm undoing work I already did instead of looking at words before they get grouped. (Possibly I need to move recognizing shell functions outside of flow_control() and have the caller do it?)

The existing flow_control() function appends to and pops the expect stack on its own (in the original "syntax checking" mode). The new mode 3 tells the caller when _they_ should pop the stack because it's found a block end. What I probably want is to go back to a single mode, have it report when it found a terminator via the return code...

Darn it, the SAD part is that you can have a command and a function declaration in the same segment: "blah(){echo hello; }" parses to _two_ segments... hang on, no it doesn't. Because I added the special case handling for ) meaning it's a terminator! So the segments are 1) "blah" "(", 2) ")" "{", 3) "echo" "hello" ";" (well, "echo" "hello" and then ";" instead of NULL at the argv[argc] position), and 4) "}". Even given the possibility for segment 2 to have a newline and split the ")" and "{" into its own segments, ")" is always at the start of a segment!

There's still sort of the 3 magic things on the same line for "function NAME (" with function being optional (parsing that had a goto at one point, now it has an if (thingy || (!strcmp(argv[i], "function") && i++)) which is less awkward than a goto but not by much... Ok, the _rest_ of it should be doable, this particular knot I need to stare at more.

For one thing, if you type "name (" into bash and hit enter, it's a syntax error. It needs the other ) to be on the same line. And the parsing is losing that distinction. For bash, ) only starts a new line when it does NOT immediately follow a ( as the previous token. So that's wrong, although... does _accepting_ input that the other one rejects count as an error? (Valid input runs, error checking is less rigorous to the point where more things run? Hmmm... Ideally I'd like the behavior to be the same, the question is how much work I'm willing to put into it.)

Let's see, a function starts with either the word "function", or an otherwise unrecognized word followed by (. Then you must have (on the same line): word ( ). And then the next word must be {. So if I lift function recognition _out_ of flow_control(), it handles individual words, manages the expect stack itself, and returns "start of block", "gearshift", "end of block", "command", or "syntax error"...

Anyway, dinnertime. Didn't get any new code written. All design rumination, today in a "musing out loud" fashion rather than my usual "long walks staring vaguely into the distance".

July 17, 2019

Today's toysh question is about changing lifetime rules for the expanded command arguments. (Translation: moving the expand_args() call, which mallocs one or more new args[] for each arg (wildcards like * can expand into a hundred arguments) into run_command() along with the corresponding responsibility to free it. Which also means I don't need storage space in sh_process to marshall the data into the other function, so it's a cleanup?)

Object lifetimes are always one of the big design thingies. Get the object lifetimes, data represenation, and data flow right, and the rest of the code tends to be the easy part.

July 16, 2019

I really liked the Good Omens miniseries Neil Gaiman did at Terry Pratchett's dying request, and Neil tweeted a link to the Good Omens BBC radio play from a few years back, so I gave the first episode a listen and...

I want to pat them on the head for trying. Having seen this done EXACTLY RIGHT, listening to this audio play is just painful. So many things the TV series did so smoothly in passing I didn't even notice them, get set up here with pliers and a block and tackle. It's nice that Neil and Terry had a cameo in the first episode, but they served no plot purpose. And Azieraphale and Crowley... I'm sorry, having seen the TV versions, I just can't take the radio versions seriously. They're... not right.

July 15, 2019

I promised Panagiotis (the magazine guy I've been talking to) a column and a head shot today. I need to do that, but I'm hip deep in shell coding, and _that_ needs to be demonstrable by August 22 so I don't have to yet again say that toysh is unusable in my "Toybox vs Busybox" talk. That's the _main_ thing preventing it from being usable in systems on its own. (The other, at least for mkroot, is a lack of a good "route" command. But I can open that can of worms after getting toysh basically working.)

I'm still trying to fit HERE documents elegantly into the design. I want to parse HERE documents as I finish each arg: struct sh_arg *arg is a single argv[argc] ended by | && ; and so on. (Or newline.) I either need add more members, brutally abuse the ones I have, or wrap it in an enclosing struct...

July 12, 2019

Got the continuation logic checked in, now working on flow control. (Walked to UT to bang on stuff at that little table. Yay exercise.)

One problem is that flow control skips commands (sort of the point), which means HERE documents are parsed out of order, so my "assemble a linked list of them" approach isn't good enough. (Or at least awkward when you've got loops inside loops.) I need to attach each HERE document to the corresponding struct sh_arg holding the argc/argv[] pair that made us read the extra lines.

July 11, 2019

The reason you can declare functions inside other functions is that declaring a function is basically an "export" into a different array, so:

$ abc() { echo hello; def() { echo ghi; }; def; }
$ def
bash: def: command not found
$ abc
$ def

And THAT's how they nest. Function declaration is basically a form of variable assignment, with similar lifetime rules. Yes, I added a test.

July 10, 2019

Ok, back to flow control. Or specifically, back to running commands and trying to get redirection to work. Today's revelatory test is:

$ if cat <<< one; then cat <<< two; fi <<< three

Which means you open the outermost redirections first, and the inner ones discard them. So I have to parse to the end of the flow control block and work back in, on a stack. And you can have "if if true; then echo; fi; true; then echo; fi" nesting arbitrarily deep, so you need a stack. (Or to be able to parse backwards from the end, but I already had to parse from the front to get the line continuations right.)

If I break flow control into some kind of tree structure, it's gotta be linked lists because if/elif/elif/elif/elif/fi can go on arbitrarily long. Hmmm, but the _only_ place that redirections (or pipes) apply to the entire grouping is the group/block terminations, because there's a statement after each of the other flow control words, and the redirect applies to that statement. Only a redirect after a group terminator applies to the whole group (because there's no statement allowed there).

So I don't need a structure, I need a way to find the end of this block. Which _is_ a minor variant of the same parse-forward logic, I just need a test to know when I'm done. Hmmm...

July 6, 2019

I've been arguing with ( ) in shells for a bit, and finally worked out what culdesac I went down:

( ( ( echo ) echo ) )

does not become

( ( ( echo )
echo )

It becomes:

( ( ( echo
) echo

And in THAT framework, ( echo ) >> blah" is the same logic as "if true; then echo; fi >> blah". I.E. flow control terminators can have trailing redirects but not trailing arguments.

July 5, 2019

Once upon a time Intel's x86 processors were optimized for price/performance ratio. They took over the desktop, but were very energy inefficient so (at least up around until 2004) you couldn't use them without wall current and a giant heat sink with a fan on it. Arm chips were optimized for energy efficiency (power/performance), which is why phones are all arm. Over the years, x86 went from "the processor" to kind of a sideline overshadowed by raspberry pi.

Something similar is happening with battery technology: lithium is the x86 of battery tech, light and highly reactive but also scarce and hard to work with. Lithium batteries evolved in a specific niche, and people are applying lithium to other niches because it's there, not because it's a good fit.

Lithium is the lightest metal, and lithium batteries are optimized for power/weight ratio. Size and weight are important in phones and laptops and electric cars, but a building storing solar power from the rooftop overnight really doesn't care how much the battery weighs. Since about 1970 battery technology has been driven by portable electric devices, which funneled a bunch of R&D into lithium and gave it a big headstart. But just like x86, the technology that dominated the old niches is kind of a bad fit for the new ones.

Lithium is _so_ chemically reactive its pure form explodes on contact with water, which makes it tricky to work with. Lithium batteries break down (wear out) easily by reacting in ways they're not supposed to, and making them last a long time involves sticking them in a high-tech refrigerator that can eat 15% of the power they store. And even today, lithium battery designers have to be very careful to avoid fires and explosions.

While Lithium isn't as scarce as many elements, it's not exactly abundant either. The places it's easy to mine tend to have political and logistical problems, and those concentrated deposits are finite. And batteries are made from _groups_ of chemicals: you need a cathode, anode, membrane, and electrolyte. So far the battery chemistries that best work around Lithium's chemical issues involve cobalt, which has even worse, political and logistical problems. In most places cobalt is so dilute it's only produced as a side effect of mining something else (copper in Congo, Nickel in Australia). You get tons of the cheap metal, and a tiny fraction of cobalt as basically a side effect. (Similarly, while there's lithium in seawater, you have to process ten million liters of seawater, I.E. ten thousand metric tons, to get one liter of lithium. You'd only really do that as a side effect of water desalination efforts going after the water itself, and there are still cheaper ways to get water in most places.)

Meanwhile, if you don't care about weight, you can make batteries from other things, like sodium, zinc, and nickel. They haven't had a trillion dollars of R&D pumped into them over the past 50 years, but they're all _much_ easier to mine, and generally easier to work with. Sodium's half of table salt (seawater has 100,000 times as much sodium as lithium), and zinc is so cheap we switched to making pennies out of it when copper became too expensive. Annual production of Lithium is 35,000 tons per year (I.E. 0.035 million tons). Nickel is 2.25 million tons. Zinc is 11.9 million tons. Sodium's 225 million tons. That's about 65 times as much nickel, 340 times as much zinc, and 6500 times as much sodium produced each year than lithium.

The old nonrechargeable batteries from the 1980's ("D cells" in flashlights and such) were Zinc/Carbon chemistry. Basically made from pennies and charcoal. The Nickel-Iron battery chemistry is over 100 years old, and it NEVER WEARS OUT. Many of the nickel iron batteries manufactured 100 years ago by Thomas Edison's company are still in use (hence the nickname "Edison Battery", although as with everything with Edison's name on it the actual inventor was somebody else and he just commercialized it). The classic nickel/iron chemistry uses water as the electrolyte, and if you overcharge the battery it electrolyses the water to produce hydrogen and oxygen (which both consumes electricity and produces exposive byproducts, one of which also eats the ozone layer). So in practical terms you need to keep these batteries somewhere ventilated and top them off with water every few weeks. People mix and match chemistries all the time, here's a zinc anode and nickel cathode, and a colorado company called iron edison is making lithium/iron batteries.

And there's other chemistries like aluminum air using aluminum as the anode and oxygen as the cathode: the reason aluminum recycling pays isn't because aluminum ore is rare: Aluminum is the third most common element in earth's crust (47% oxygen, 28% silicon, 8% aluminum. It's light so when the planet was molten it floated to the surface.) But aluminum metal isn't found in nature, it oxidzes too easily. (The metal forms a thin layer of oxide that prevents the rest from oxidizing, but over time in nature it all turns into oxide.) Aluminum metal is basically stored electricity: it's created from ore by electroysis, and oxidizing it gives off electricity. It's easy to get the electricity back out by dipping it in something that dissolves the oxide (like sodium hydroxide) so the air can get at it.

Aluminum's well-known for being light, and air batteries are great because they have twice the power/weight ratio of things that need to carry their cathode with them. This is also why burning gasoline has so much power for the weight, because half the reaction mass isn't carried with you but taken from the environment. What's not so easy is to create a _rechargable_ aluminum air battery, because if you dissolve the oxide off and all the aluminum turns into oxide you've melted the aluminum into solution. You have to condense it back out when you recharge it, which is tricky. (This is also why lead/acid batteries try not to discharge too much: if you melt the lead into solution it condenses out in the wrong places when you try to get it back.) And if you just condense it out of solution onto one of the electrodes, you have a smooth surface with comparatively little surface area and thus low voltage. You generally want complex topology to increase surface area and thus rate of reaction, which you lose when your material dissolves and re-condenses. For things like self-driving fleet vehicles "slot in a new aluminum cartridge" isn't necessarily a big ask, but battery walls care about lifetime, generally as measured in charge cycles before capacity drops to 80%, and so far that's sidelined aluminum.

But they're working on it, and a thousand other possible battery chemistries. Heck, the guy from the "shipping container sized batteries" ted talk years ago is still at it, for some reason using molten calcium anodes and antimony cathodes now. (Antimony is about as common as thallium, not anybody else's first choice when it comes to "scaling up".)

July 4, 2019

Yup, the bad hard drive sector always happens when I "right click->insepect element->delete node" in chrome. At which point chrome crashes and I get the error in dmesg. The number of the sector in dmesg changes, but the sequence to trigger it doesn't.

The SCSI layer's remapping insanity continues to get weirder, but I think it's a single flaw that probably happened when my laptop got dropped? I should still get a new hard drive and keep it backed up, but it doesn't appear to be getting worse.

A guy named @tomjchicago on twitter is making a really good case that the Resident has frontotemporal dementia. Reagan had altzheimer's in office (to the point Nancy took all his meetings his last year), now this. What the GOP really _likes_ turns out to be "senility".

July 3, 2019

So, continuing with shell sequencing:

$ if cat << EOF ; then
> one two three
> echo hello
> fi
one two three

Which confirms I need to peel off and satisfy HERE documents _before_ parsing other flow control. First the line gets broken down into ; & && | || statements, then peel off redirects (but don't interpret the non-here ones yet), then flow control. Except yesterday's test says the << EOF still has to BE there when flow control is parsed in order to start statements. (In case somebody has an executable named "else".) So they must be parsed but not removed, a later pass performs the actual I/O redirection..

Another parsing problem with the code I have: things like "else" are only contextually special, so if you try to run a command "elif blah" outside any other flow control, my code so far will happily try. I might need to add gratuitous error checking just because people expect it.

So, the sequencing should be to check for flow control, but peel off redirects right after that, and act on the HERE document's request for more lines before the flow control's request for more lines.

Luckily flow control works on statement granularity (an argv[] ending with ; and friends), and so do redirects, and I've GOT a good data structure representing those and have already parsed the tokens into that. So even if it modifies... except the redirect parsing shouldn't remove them from the argv[], it's the expansion stage that should skip over those. Hmmm...

My "sh.tests" file is only 240 lines long. It should probably be longer.

Hmmm, { } queues up statements to execute later, because | applies to the lot of them. This is basically what functions are doing as well, and ( ) is similar but runs it in a subshell. And ) ends a line even when not the first statement, ala "(echo)|cat". Whereas { echo;}|cat requires the space and the semicolon.

Sigh. I made a design decision during token parsing about how input lines are represented that's really inconvenient for HERE documents. When it needs more data to finish a line, it returns the pointer to the character to resume at, and the line reader chops that bit off and uses it as the start of the next line. But what I want for HERE documents is new lines as they are, and I don't currently have a way to represent that. So now I need to go reread the token parser and see what it actually needs and if _it_ can glue the lines together, or if the line handling function should do it, or...

Ok, the problem is I haven't got a place to store "fragment of previous line". I'm not sticking much in TT.globals during parsing because that gets brittle fast: with "source filename" and such I may need to call it from multiple places, semi-recursively, and "echo else hi >; if true; then .; fi" says the else is an error, meaning sourcing more files is NOT transparent to flow control. So yes, this code needs to be reentrant.

The argument that _is_ passed in to parse_line() is a doubly linked list of argc/argv[] pairs I'm pretty sure I could abuse (off the top of my head, append a pointer to the continuation point as the last argv[] and make argc negative to indicate continuation in progress, although really an extra argument isn't too bad and I should just do that). BUT the next question is object lifetime rules. I'm strndup()ing each string into the argv so the argument can go away as soon as the function returns, and the current return value is saying what _portion_ of the line can go away. In theory I need to retain these lines for input history (cursor up/down to see previous lines, which I haven't implemented yet), but that's only for interactive mode. Still, if the caller saves the entire _set_ of lines until parsing's done and passes it in as another doubly linked list (easy way to make a stack, additions are already in order and top is just head->prev)... I still need the "where I left off" argument because it's position _within_ the line.

Other issue: sh -c arguments can have a HERE document in it! IN which case they stop at embedded newlines: bash -c "$(echo -e "cat << EOF\nhello\nEOF")". So whatever I do already has to cope with that, meaning the definition of "lines" is _not_ what getline() is returning. The tokenizer current treats \n as just whitespace (except for \ at end of line, which cannot end token parsing).

So: parse_line() returns fragment of current line left to parse, caller currently glues the next line to it and passes it back, which loses data HERE documents need. What does a HERE document need? There isn't an unfinished part of THIS line when we're doing HERE documents because we don't get to that point until we're done with lines, that's only tokenizing. Flow control (and redirects) deal with parsed pipeline info.

What I need is some place to store the HERE document status so all these functions can act on it.

July 2, 2019

Learning is contextual for me, I have to attach new information to what I know or it doesn't make sense, and I'm terrible at retaining things that don't make sense. That's why instead of reading the posix spec over and over, I'm implementing stuff and running tests to see how bash handles varying corner cases. Then when I'm done, I read the spec to see what I got wrong.

At the moment, I'm wrestling with HERE documents. The following is not exactly legal:

$ << EOF if true; then cat; fi
> hello
bash: syntax error near unexpected token `then'
But it's revealing because the here document DOES get parsed. IN fact:
$ << EOF
> boing

A here document is parsed! And then the data discarded. See also:

$ <<EOF if true
bash: if: command not found

Flow control can't go after redirects. Which means the flow control shouldn't parse words after redirects, they fall through as unknown symbols and start a command. (Except, for some reason, "(", which is "unexpected token", but not reported until after the HERE document concludes. *jazzhands*)

July 1, 2019

New month, I should say hi to patreon.

Interesting. Bash and dash do the same here:

$ cat << one << two
> echo hello
> two
> one
> and
> two

And mksh on my phone says "/system/bin/sh: can't create temporary file /data/local/shxijx9k.tmp: Permission denied", which is too broken to contribute usefully to this analysis.

Android's traded away a lot of usability for "security" in scare quotes. My new phone gives me a full-screen exception with java stack dump popup (which goes away after 3 seconds so it's hard to write down, RealInterceptorChain.proceed() calling StreamAllocation.findHealthyConnection() and so on) when the podcast app that worked fine under M tries to check half the feeds. After a week or so I figured out any rss feed that's still http instead of https (about half of them) is an "unknown service", which is silly if it's an rss feed of mp3 files. I expect they want me to bug the RSS feed providers, but instead I gave up on being able to listen to those feeds from my phone. I still have the old phone with a USB battery duct taped to the back of it which I can load podcasts on via wifi. Or plug my headphones into my laptop.

Yesterday the self serve soda machine at Wendy's was out of service all day "downloading updates" (the little clock said it had been doing so for 12 hours). Why? What's the point? Why give an avenue to hack soda machines (giving sugared drinks to diabetics and aspartame to phenylkeraneutics)?

I'm not really a fan of "the internet of things"...

June 30, 2019

Darn it. Because of HERE documents, we have to retain flow control state while parsing. I was flushing the state stack and re-parsing it when we'd added another line (to minimize the state passed between functions), but I can't because when you're in a HERE document you grab the next line verbatim without tokenizing (whitespace is significant), and flow control happens after tokenizing.

Gotta frog a bunch of functions and redo them. Again.

(No, I haven't bought a new hard drive yet. I _think_ the failure mode here is "process crashes", not "garbled data winds up pushed to github". That's why it _has_ the checksums. But I shouldn't leave it too long. And I'm to the point where "rsync to usb disk" isn't necessarily an ideal backup option? If it goes "I can't read this file", does it leave the backup copy alone?" (I _think_ so, but should confirm?))

June 29, 2019

Biked downtown to Cuvee (a coffee shop on 6th) to meet Grant, the guy behind the upduino. I picked up two of the 2.0 boards from him, and we spoke for over an hour. Cool guy. He says he's got about 500 more of the 2.0 (and 300 of the 1.0) in stock, so we can order as many as we like.

This is the ice40 board that Jeff did the j2 bringup on a few months back, Jon-Tobias in Germany is porting the TRON os to j-core as a project to present at a conference in january, and I got 2 so I can send him one. (Alas, I need to install a horrible lattice binary-only toolchain with license key nonsens to get it to work...)

Grant _also_ says he's got a bitstream that emulates an FPGA (which is kinda meta), and can prove it only uses technologies whose patents have expired, and he'd like to make a chip from it. I poked Jeff and he said designing an FPGA is reasonbaly easy and well-understood: each cell is a lookup table with a small RAM using the input wires as bits to look up what the output should be, which lets each one work as any combination of and/or/nand/not/etc gates. The xilinx ones use 6 inputs wires (1<<6 is 64 bits of ram to find the output bit for every possible on/off combination of the inputs), the lattice ones use 4 bit inputs (so 16 possible input values, meaning 16 bits of ram to look up whether the output is on or off for each input).

The rest of the FPGA is just busses connecting the LUTs together, and then software doing the placement and routing is where all the actual brains are. But the busses are where the hard part is: the FPGA fabric is a lot of loops, and it's up to the programming in the LUTs (I.E. the software making the bitstream) to prevent these from acting like short circuits. All the ASIC mask generation tools flag these loops and go "possible short circuit! Error!" So you have to disable the ASIC tools' sanity checks to get it to output an FPGA mask, at which point the FAB refuses to guarantee the result works and wants a large insurance policy before they proceed.

Turning verilog or vhdl into actual circuitry means running it through a compiler, and each ASIC fab has its own compiler backend they wrote themselves by hand. Remember the days where every different hardware platform had its own proprietary compiler/linker toolchain with its own bugs and you had to test each port and work around everything? Fabbing is like that, only moreso. So you make a GIANT test suite to check every weird little bit of your circuit will work in simulation after the fab's tools are done with it (they spit out a "verilog netlist" which is kind of like decompiling assembly back to C; it's horribly unreadable but the important thing is the circuit simulation tools can consume it).

So designing an FPGA circuit is easy. Getting a modern FAB to manufacture it is a political nightmare, because their toolchain is guaranteed to barf on the routing fabric and they don't wanna get sued charging you tens of thousands if not millions of dollars for masks that they can't prove will actually work.

Fun stuff. Still don't understand how the LUTs connect together (how the whole routing mesh part is controlled), but that seems to be the fiddly bit.

Biked from the coffee shop to the Other Other Bank (not my credit union, not the bank our checking account's at, but the one our mortgage was sold to a few years ago) to take a couple months expenses out of the home equity loan. Fade's home for the summer and all the things recruiters have waved at me involve packing up and going to another city for months. And I'm making (slow and frustrating but measurable) progress on toysh. Don't wanna stop now...

June 28, 2019

Well that's not good.

My laptop fell off the arm of the couch onto a hard floor a week or so back, and landed on its side, and it was frozen with completely garbled graphics afterwards. But a power cycle _seemed_ to fix it.

A couple days ago, I noticed a /dev/sda disk sector error in dmesg. (And immediately re-backed-up the disk to an external drive.) But it was read only, and always the same number (which kept coming up because Linux kept trying to read it over and over for some reason), so I was thinking maybe the write glitched when the disk fell over?

It's reporting it again today... with a different sector number. Still the same one over and over, but that's "time to buy a new hard drive", which means reinstalling the OS. Great, that's gonna cost more than a day, I expect. (Sigh, should I run memtest86 overnight too? Did something get unseated? Devuan's crotchety and evil enough it's not always easy to figure out what's a hardware bug and what's a repeatable software pattern like "I requeried for available wifi too soon after enabling my phone's hotspot, now devuan will _never_ see it until I disable and re-enable wireless"...)

I mean, chrome crashes now semi-regularly. But every crash has been when I right click->inspect element->delete node. Maybe websites have found a bug to exploit to discourage that? (Not gonna stop, for the same reason I still block every single promoted tweet in my twitter feed, even though twitter now seems to have an endless stream of them. Don't care: still blocking every single one.)

June 26, 2019

Still writing shell stuff, and needing to support:

$ if blah () { echo bang; true; }; blah; then echo hello; fi; blah

Which is just cheating.

June 25, 2019

Did my first checkin of the redone toysh infrastructure, which is parsing input to request more data, and then executing it without any flow control or variable expansion or redirection or... Working on it.

Meanwhile, Elliott is going over the toybox test suite and getting stuff to pass on Android. Which is quite nice. He's hitting todo items I've been meaning to do forever...

June 21, 2019

Why did I say 10kw of solar panels yesterday? Because ~8 hours of sunlight at 10kw is 80kwh, more than even the larger 75kwh battery could hold, which is _plenty_ for an average household. Remember 39kwh was the highest state average, and then we rounded up the battery size another 20%, and this doesn't even count electricity used during the day coming straight from the solar panels without even needing the battery; your battery should hardly ever be anywhere near fully drained, even at 6am. Plus you get more than 8 hours of sunlight anywhere south of canada. You might be fine with a 6kw kit, and even a 4.5kw kit (currently $5k at Home Depot) produces 36 kilowatt hours on a good day, vs the american average of 28 killowatt hours. (7 extra kwh/day would take a week to fill a 50kwh battery, but as long as you're net positive you'd get there.)

Sites that say 6k is enough for half of american homes are talking about selling enough electricity back to the grid to wind up with a net $0 bill. (In which case how does the power company stay in business when half their cost is maintaining the transmission lines?) Utilities generally pay less for electricity than they charge for it, so you need to sell _more_ back to them to break even. (And _everybody_ can't do this if they still need cash from somewhere, and lots of it.) With batteries, "enough" is much lower, and 6k*8 hours is still 48kw, one day away from home could fill the battery completely from fully drained.

I'm being conservative here because a bad enough thunderstorm can cut solar output in half while it lasts, and sometimes those go on for days, so extra's nice. Clouds and rain won't entirely stop solar production (there's still light and solar panels aren't just picking up visible frequencies), but a thick enough layer of snow can, and up north they run the panels in reverse for a bit to heat them up and melt the snow. If you don't let it accumulate, it's not hard to deal with. (When it's too cold for that to melt it, it's also too cold to snow.) Still, this is why I'm looking at an extra-large solar array. (People who assume they'll be charging and driving an electric car 100km every day might even want a 15kw system, but transportation as a service makes that highly unlikely to happen outside of rural areas. Then again, sufficiently rural people still use dial-up internet because the future has passed them by. Short of another New Deal style Tenesee Valley Authority program to dig them sanitary outhouses, this will too.)

June 20, 2019

I think electric utility companies might to go the way of milkmen and paperboys. The numbers don't look good.

These days batteries are the limiting factor on solar power, and have been since solar panels became the cheapest way to generate electricity in 2016. It's cheaper to build new solar plants than keep running existing fossil fuel plants, but solar only produces power half the time (leading to the "duck curve") Utilities have been installing gas turbines (which can go from "off" to "full power" in under a minute) to handle the rapid generation ramp-up as the sun goes down just as everybody gets home from work, but utilities are increasingly reluctant to buy more gas turbines because they're already "stranded assets" that won't be in service long enough to pay off their construction costs.

Utilities know that battery prices are declining rapidly, going down 50% over a recent 3 year period. Utilities want to install batteries to store extra power generated during the day, which they've got buckets of. California stopped installing new solar for a bit because at peak output, what they've got already supplies more power than can use, leading to "curtailment" where they unplug the panels and waste the electricity that has nowhere to go. (For a while california law _required_ it to be used and they wound up paying other states to take it, but they got that fixed and now they just waste it.) They can install plenty more solar panels, it's cheap and easy, but they need batteries to make it work.

So how much battery power _does_ a household need? About as much as a low-end electric car has. The average american household uses 10.4 megawatt hours of electricity each year, but the highest average household electricity consumption is in louisiana at 14.2 megawatts/year, so let's use that. This means each louisiana household is using about 39 kilowatt hours per day, on average.

The Tesla Model 3 battery packs 50 kilowatt hour and 75 kilowatt hour versions, respectively. This means even the small Model 3 battery pack can store more electricity than a household uses in 24 hours

That article estimates the 50kwh battery pack should cost around $7500 to make, but that tesla charged $9000 to upgrade from a 50 kwh battery pack to the 75kwh battery (battery production is still their limiting factor making vehicles so they're trying to discourage the big battery, it means they sell more cars). At $9000 per 25kwh, they're charging $18000 for 50kwh of batteries, which means even at that inflated retail price batteries are less than half the cost of a Tesla model 3.

Here's a site selling replacement model 3 battery modules, which each cost $1350, hold 5.2kwh (10 to assemble the "small" 50kwh battery costing about $14000 before volume discounts), and have a max discharge rate of 30kw (or 5kw "continuous"), which means _each_ of them can run an inefficient (old) electric stove, and the full 50kwh battery could handle ten of them. (For reference the new tesla supercharers work at 250kw, so if anything these numbers are conservative. 6000 watts from a 240 volt dryer outlet requires a 25 amp circuit breaker. Most homes have a total electrical service of 100 to 200 amps and 250 amps would be 60kw... once again the "small" model 3 battery neatly fits a house.

So a modern electric car battery is sized to power a house... if you wanted to unplug that house from the grid _entirely_ and generate your own power. Why would anyone want to do that?

For one thing, if you sell power back to the grid, when the grid power goes out _your_ power is legally required to go out. (So you don't electrocute the workers trying to fix the downed lines.) If you unplug from grid power, your power doesn't have to go out when the lines go down.

For another, the utility companies need money, and lots of it. If everybody had a net $0 bill, they'd go bankrupt. The cost of maintaining the electricity distribution network is about $750/year per customer, or $62.50/month, which is more than half the annual average electricity bill of $112. So even if electricity generation became _free_ for the utilities, the bill can't really go down that much. When _your_ bill goes down, they consider you a free rider, and start to scheme ways to get money out of you to pay for the distribution infrastructure you're only using as an insurance policy.

Today a 10kw solar kit (with not just solar panels but the inverter and such to use them) costs $11k, which can fill our 50kw battery (costing ~$14k) in 5 hours of maximum production. This means a solar system that can entirely replace the grid currently costs less than a second car, and the prices are going down fast.

If battery prices drop another 50% over the next 3 years, and 50% more 3 years after that, that $14000 would become $7500 and then $3750 by around 2025. Meanwhile solar panel prices are getting 50% cheaper every 5 years (Swanson's Law). So in 2025, you'll probably pay less than $4k for a day of batteries and as much again for solar panels, or about $8k for the entire system. And these are _retail_ prices, not wholesale. You can probably get a solar system powerful enough not to _need_ a grid tie installed for under $10k. And if you're worried about running out of power, you can spend $4k on an extra day of batteries. (72 hours of batteries is enough to get your roof replaced without losing power.) And it only gets cheaper from there, just like computers did for 50 years.

At which point... what do the utilities do? Not everybody switches, but enough people to seriously screw up their business model, which only makes switching more attractive to the rest...

This is what Tony Seba meant when (starting back in 2014) he talked about "God Parity" coming after "Grid Parity". That $62.50/month infrastructure cost for a grid tie has to come _way_ down for a grid tie to still be a thing in 10 years, and as with gasoline cars going electric you only need about a quarter of the existing customers to defect before the economics of the old way go into a death spiral.

June 19, 2019

I reached the point where I have a toysh parsing pass that compiles! And immediately segfaults. And still has large unfinished in the parsing. But hey, _progress_.

There's a lot of sequencing issues. In theory << EOF is the same sort of command as < FILE, but it switches parsing modes so you request additional lines, and those lines are stored verbatim (not parsed) so the spaces within are retained. ("cat << EOF" demonstrates that.) I worried about that for a bit, but the test:

$ cat << EOF ; echo hello; EOF

Shows that the plumbing I've been writing already has this sequence right. :) (Also, the EOF it checks for is a full exact line, leading or trailing space will break it, as will \ or quotes.)

Yeah, I'm sure a lot of this is covered in the posix doc, but I read the thing, implement, and then read the thing again. And while I'm implementing and hit a question, testing against bash is better than looking it up in posix not just because it's faster, but because if the two diverge I'm going with bash.

June 18, 2019

I had tickets for Weird Al's "with strings attached" concert at Bass Concert Hall (just north of the giant football stadium) and headed to UT a few hours early with my laptop to get some programming in near the venue. At concert time I went over, stood in line, and was told that my backpack was against their TSA-style "clear bags only" policy (which I'd never heard of until I showed up), and that even though the doors we were lining up in front of said "bag check line" in large black letters, no they didn't check bags. If I'd brought cash I could pay to leave my laptop out in a locker in the texas sun in the parking lot, but I hadn't, and no I couldn't see the concert I'd paid for.

So I walked home (took most of an hour, round trip to go back afterwards would have been over an hour into the concert). On the walk home I called Bass Concert Hall's ticket line to see if I could get my money back (concert hadn't started yet), and could not get a human. Their phone tree didn't recognize my phone number, and wanted an "order number" that did not appear to be any of the numbers printed on my ticket.

I was thinking of disputing the charge through my credit card, but I don't mind Weird Al getting the money. Just Bass Concert Hall. And I don't blame Weird Al for this: last two times I saw him was at an open air venue (which I was wearing the t-shirt from), and the Drafthouse. Both were great. It's Bass Conert Hall that sucks, never go there for anything.

Stopped at HEB on the way home and they'd clearanced a bunch of lamb chops. Fuzzy made those for dinner, and Fade had one when her flight finally got in (late, it was delayed an hour or so by airline trouble du jour). The dog was ecstatic to see her, of course.

June 17, 2019

I've rebuilt the sh parsing stuff 3 or 4 times as I hit corner cases the old way doesn't handle right. It's mostly "ah, the data structures I'm parsing into have to look like _this_", and now I've hit another one because the parentheses stuff is funky.

This is the first pass parsing that tries to figure out when we've got a complete thought and don't need to read another line of input ($PS1 vs $PS2 prompts). I know bash does this before executing anything because I keep getting continuation prompts for:

$ echo hello; if
> true
> then
> echo boing
> fi

Parsing a line into words, I need a continuation if it ends with unbalanced quotes, including infinitely nested $( and ${ and yes "$(" means you need ")" in sequence to end it. I've already got a function for that which seems right-ish, returning a pointer to the next unconsumed character of the line. It returns 0 if it needs a continuation, and returns the pointer that was passed into it when it's done and there's nothing more to do. (This is presumably a pointer to the unquoted, unescaped NULL terminator at the end of a line, but that's not what I'm testing.) I was initially calling that to get an argc[argv] list of unexpanded words (via xstrndup(start, end-start)) with all the quotes and environment variables and such still in them, and then looping through that list to handle flow control statements, which can _also_ request continuations. (See blockquote above.) However, flow control statements are only valid at the _start_ of a pipeline... wait, no they're not.

echo hello | if true; then read i; echo i=$i; fi

Ok, flow control statements have 2 parts, a test and a body, and the test... can also contain flow control statements.

$ if echo hello | if true; then read i; echo i=$i; fi; then echo hello; fi

Which obviously I knew because "if true && false; then echo hi; fi" has a pipeline in true && false. But "if true && then echo hi; fi" is a syntax error, same if you replace the && with |, which is why I was thinking it had to be after ; or newline. And that's still sort of right, it's just in between it can nest arbitrarily deep.

I keep sitting down to explain how my current understanding works and finding the loose thread where "here's the test that shows why that's wrong". Eh, keep this writeup, it shows what I was thinking. (Yes, I read the posix spec and the man page, and retained none of it because I mostly learn by doing.)

Ok, the POINT is, I made a tree structure which is a linked list of pipelines containing a linked list of argc+argv[] structures, thinking I could iterate through the list of pipelines to check for flow control statements at the start.

And where I got derailed is things like "( echo ) | cat" where the echo isn't part of this pipeline, it pops out into its own pipeline that slots back into this one as a single argv (which you can redirect output of or background as a group). I'm not trying to RUN it yet, I'm just trying to PARSE it into a discrete series of argv[] which I could exec() as their own process.

And it is "start of thought", but they nest. End of thought, then new flow control statement. Flow control statements similarly pop out, ala "while true ; do echo hello; done | tee blah" and to handle this I need a data structure that can arbitrarily nest "if (true); then echo hello; fi".

You know, except for the part about ) ending a pipeline, ( is basically a flow control statement. As far as I can tell the only reason ( is funkier than if or while is so it can match ) without ) having to be at the start of a command. Alright, what's the data structure for this. It's a tree with branch nodes and argument leaves. There's a "how did this command end" annotation recording | vs &&, but ; and implicit ; via newline are special, they're flow control _within_ the tree. Hmmm... Ok, | && || glue statements together, so do being in ( ) or { } or any flow control bracketing (if/else/fi, while/do/done, etc). Great.

So getting back to parsing: ( and ) are their own words wherever they occur, and ) starts a new line but ( does not, although it increments a counter so we do continuation until we get the next ), but so does { and that's not special to the parser it's just a flow control statement.

So I think ) always ends a line...

$ boom(){ echo hello; }; boom

Yup. But the ) is not on a NEW line, because:

$ boom (
bash: syntax error near unexpected token `newline`

So it doesn't work like "}", it's special and magic. Great. Sigh, it's a little bit like | that stacks. It's not a pipeline transition character because each statment has _one_ and you can't have two of those in a row. But it's not a flow control statement either because it's not logically at the start of a line, it's sort of an attribute of the previous line? Where does this slot into parsing? I can detect it, what do I do to RECORD it? It's really the function definition case that's nasty... Hmmm.

June 16, 2019

I've been having long email conversations with a magazine publisher who wants me to start podcasting for his network. I'm interested, but don't have the time.

On the other hand, I seldom get much done without externally imposed deadlines. I wrote a regular column for 3 years, I organized 2 conventions, I should _make_ time... which is why I'm talking to him I guess? But there's a certain amount of "launching a new thing" which means it isn't real yet, and toysh -> ELC talk and then I need to find a new $DAYJOB...

June 14, 2019

Elliott hit a weird bug in find, which is frustrating because I can't reproduce his test case. I don't see _how_ the change I made could have user-visible effects, with one exception that _shouldn't_...

June 12, 2019

I was scheduled to meet with Amazon today for an all-day interview, and when I sat down to sign the NDA at 8am before heading out, I got a simultaneous headache and stomach ache, and decided to listen to myself. Contacted them to call it off. (There were like 8 other applicants at the bar on monday, they'll be fine.)

They wouldn't tell me what they're building. There were hints dropped at the monday meet-n-greet (consumer product, in people's homes, runs apps, has an app controlling it, video is involved somehow). But they explicitly said I wouldn't learn what I was actually working on until I showed up for my first day of work, and YET they wanted an NDA for the interview.

Amazon treats its warehouse workers terribly. The idea of working for a _different_ part of a company that places no obvious value on human life doesn't exactly fill me with warm fuzzies. They see people as "customers", and the only thing that excited the amazon guys wasn't technology, it was the possibility of profit. This could sell well and make money.

It was probably this tweet that set off my subconscious. Making the richest man in the world richer isn't what I want to do with my life. His wife got $36 billion in a divorce (what, 1/10 of his fortune?) and she's giving it away. That's what normal people do. Hoarding more cash than you can physically spend (by carrying suitcases full of $100 bills from the bank to a purchase place) while people performatively beg for their life on gofundme is sick.

June 11, 2019

Fade flew off to Minneapolis for a week, attending 4th street. I'm grinding away at shell script parsing.

Some things bash does I don't currently feel obligated to reproduce:

$ time echo hello | sleep 3
real	0m3.003s
$ echo hello | time sleep 3
bash: time: command not found

And then there's:

$ echo abc)dev bash: syntax error near unexpected token `)'

Sigh. It's because switch/case, I know. Still kinda silly.

June 10, 2019

Off to meet with amazon at a happy hour. (They're sort of offering me a job-ish. Well, an interview. I'm torn about whether I _want_ to work for them. I'm willing to work with Google and my household is paying the amazon prime daneguild and bying stuff from them. Is taking money from them a higher bar than giving money to them? Hmmm...)

Back from the happy hour. Talked 3 waves with one of the amazon guys, he explained about the company having multiple "pillars". Google had multiple pillars before it did Alphabet to turn them into properly separated business units. Sounds like it's going to get ugly.

Headed home at 9 to spend the evening with Fade, she flies out in the morning for 4th street (annual writing conference) back in Minneapolis.

June 9, 2019

Ok, if you go:

ls & sleep 5; echo hello; } | tee blah &

Then you background two jobs, but everything in the curly brackets gets redirected through tee. (The ; at the end of hello is so } isn't an argument to echo, it's command logic not quoting logic.)

I still need to work out what this means:

[[ X == && ]]
bash: unexpected argument `&&' to conditional binary operator
bash: syntax error near `&'

This is such an enormous can of worms. Oh well, dinking away at it...

Went through the bash "help" with no arguments output and got a list of builtins to implement in the first pass. Rereading the posix shell spec "chapter 2" page for comparison...


$ X=echo
$ $X hello

Gotta expand the first argument before checking for commands, EXCEPT:

$ X='['
$ $X -gt 3 ]
bash: [: -gt: unary operator expected
$ X='{'
$ $X ls ; }
bash: syntax error near unexpected token `}'

Environment variables are expanded _after_ the flow control statements, and "{" is parsed like "if".

The sh.tests file is going to need its own subdirectory of shell snippets, isn't it? (Environment variables are expanded _after_ the flow control statements, and "{" is parsed like "if".)

$ BLAH="ls
> echo hello"
$ sh -c "$BLAH"

The -c context parses like shell script; newline _there_ is a statement break. Ok, I can probably work with that.

June 8, 2019

Bash is terribly inconsistent:

$ (( 1 +
> 2 ))
$ [ thing
bash: [: missing `]'
$ [[ thing
bash: unexpected token `newline', conditional binary operator expected
bash: syntax error near `thing'

June 7, 2019

The promised digression.

I want toybox to provide the commands used by the base OS build (compiler, libc, kernel, and toybox itself) so a minimal build system can rebuild itself under itself from source, and then bootstrap up to arbitrary complexity natively under the result (solving the "circular dependency" problem with OS builds _and_ providing a good base for Android to build itself under itself someday).

This starts with the Posix and LSB commands that still matter today (ignoring crap like sccs or remove_initd), adding commands scripts/record-commands finds in package builds that would otherwise be a circular dependency, and commands used by simple init script like mkroot's to boot a minimal build environment to a usable state. If you can't rebuild the system under itself without a command (including being able to boot and run the init script and set up the network and wget more source to build), then toybox needs to provide that command.

I also want to provide conveniences like top and lsusb you really miss when trying to debug stuff on a build system (or initramfs), but those are negotiable. They could be external packages as long as they build natively under the base system.

"A package build needs it" doesn't mean it needs to be in toybox if the package providing it can be built under the simple toybox+compiler+libc+kernel system. Toybox _can_ provide it, but by definition it's a convenience then.

This is why "we need three types of decompressor (zcat, bzcat, and xzcat), but only one type of compressor (gzip)" makes sense to me: you can build bzip2 and xz packages natively under your minimal system if you need that output format, but a source tarball you wget may _come_ compressed in a lot of formats, and it's nice to have some compression available when producing output (the deflate algorithm is the 80/20 rule of compressors).

My 1.0 goal is building Linux From Scratch under toybox because that's easy for me to test (especially cross compiling to a new target where host binaries won't run and natively building LFS there) and is demonstrably "enough" to then build all the Beyond Linux From Scratch packages (I.E. bootstrap up to arbitrary complexity in old world Linux).

I want to do the same with AOSP next, but that's a huge can of worms which involves writing enough of a read-only git client to drive repo and clone source repositories, since aosp doesn't provide wgettable package tarballs. Plus tackling the whole "how to bootstrap ninja" issue. (Posix has make, and I can teach it to handle kbuild. Ninja is such a moving target AOSP wouldn't build under debian's ninja last I checked, it needed its own...)

(Shipping prebuilt binaries is convenient, but unless you have the ability to rebuild EVERYTHING then A) you can't counter trusting trust, B) bootstrapping a new target architecture is black art, C) there's magic in your build that will collect bugs and bit rot over the years, the way gentoo forgot how to do a "stage 1" build and we had to work it out again.)

Building AOSP is going to involve building a lot of packages under toybox, which aren't in toybox. If the getprop/setprop package has to be early in the build because later AOSP packages depend on it, that's fine. As long as the kernel, compiler, libc, and toybox builds don't, and then it _can_ build without further circular dependencies.

I should start working "what's left" into release notes, but I _just_ cracked open the toysh can of worms again, haven't quite merged mkroot into toybox as a "make root" target, and I'm also trying to stick modern toybox into my old aboriginal linux build to regression test it against the ancient Linux From Scratch build there. (Since I had that working and automated, it's still a good regression test even if the package versions are old. I can migrate the build over to mkroot and update to current in a controlled manner once I've reestablished a working baseline...)

June 5, 2019

So then there's this nonsense:

$ while true
> do echo hello; break; done
$ while true; do echo hello; break; done

In bash if I enter a continued line and then cursor up to see the command in command history, it's been stitched together with a semicolon in that case. But if I echo "hello[enter]world" that still has the newline. Why?

Out of curiosity I checked busybox ash's command history (at least the version installed in debian) and it's not doing that, it gives the last line entered and doesn't collate them. (Not that I'm working on fancy command editing and history right now, I'm trying to get a shell that can run the mkroot init script at the moment.)

June 4, 2019

Shell argument parsing: "echo > blah" is obvious, but "echo ab>blah" and "echo ab> blah" also work. The spaces around > are optional which says that the "ab", ">", and "blah" need to parse as 3 arguments and then the second pass consumes those arguments rather than adding them to the new process context.

And of course >> and << and && and || are all a thing. So is ;; in case statements. And there's 2>&1 and the <<< single line here document. So it's _runs_ of these characters.

Then there's this nonsense:

$ echo $((3<7))
$ [ 3<7 ] && echo true
$ [[ 3<7 ]] && echo true
bash: unexpected token 284 in conditional command
bash: syntax error near `3<'

The middle [ test ] is either a true inequality or a nonzero string (which also resolves to true) so I don't understand why it's returning false, and [[ test ]] is a different parsing context I don't understand at all yet.

And how do I represent "$("ls")". Even with recursion it's unhappy because the $( needs to end with ) but the " needs to end with " so I need a quote stack, and if I need a quote stack I don't need to do recursion. (In the parsing, anyway.)

Note: PS1= and unset PS1 are not the same thing. I should add that to the test suite...

June 2, 2019

Shell design!

I need to parse a command line into an intermediate format, and save that intermediate format for loops and functions and so on. The intermediate format does _not_ have environment variables resolved, but needs to parse them to know where ls dir/$(echo hello; echo walrus)/{a,b}* ends.

This is the big design blocker I stopped on last time, because there's inherent duplication and I want one codepath to handle both, a bit like find has. But it's fiddly to work out. In this first pass, things like "$@" and *.{a,b,c} don't expand into multiple arguments, but in the second pass they do.

Hmmm, earlier I was doing a lot of linked lists (which I thought were more nommu friendly), but now I'm doing more realloc() arrays because Linux on nommu has per-process heaps so small memory allocations aren't as terribly fragmenty as they would be on bare metal. I think struct pipeline makes more sense as an array, except it would need a "number of entries" count (unless I null terminate it?)

The other design hiccup was that I wanted to avoid gratuitous copying of strings (use the environment string passed to -c directly, mmap the file and use it directly, if all else fails the memory returned from getline() can be used directly... except them I'm mixing allocation contexts where "echo a $BLAH c" had some arguments you don't free() and some you do when you're done, and tracking them was annoying.

Get it working first, _then_ optimize... (I've been researching this on and off so long there's been years of optimization ideas mixed into the design, which is premature optimization.)

Parsing a pipeline into intermediate format needs to understand continuations, which means if/while, {, and (, plus $(( and strangely ${ although I'm not sure what you could legitimately put in there that would resolve? Fiddle fiddle... Ah, ${BLAH: 1 : 3 } works. But you can't have a space (or newline) before that first colon. This is such an ad-hoc undocumented mess. Right. At a guess number parsing eats leading and trailing whitespace. (And while blah; do thingy needs the newline/semicolon there because do would be an argument to while otherwise.)

If I go BLAH=12345 and echo $BLAHa it doesn't show 12345a, which environment variables are set has no influence on parsing, it does the lookup after determining the bounds of the variable. (Hence the ${blah} syntax.) It takes letters, digits, and the _ character, but not the rest of the punctuation. There's probably a posix stanza on this buried in the mess, and might also be something in the bash man page. I gotta re-read both of those, it's been ages...

And of course the problem with line continuations is the design right now is a getline() loop that calls a function on each line, so when I have a "I need more data" continuation point, it has to _return_ that and then the outer loop needs to prompt differently or something. Of course I have global variables for the prompt state. Hmmm...

$ bash -c 'ls $('
bash: -c: line 0: unexpected EOF while looking for matching `)'
bash: -c: line 1: syntax error: unexpected end of file

Not quite the error message I expected, but yeah, that's the logicalish thing to do there. (Quotes, another thing to line continuation. And of course here documents. I'm gonna have to whack-a-mole this adding one more thing at a time...)

But the fiddly part is still that parsing has to understand if; then; fi _and_ command execution has to understand it too, and I really don't want that in two separated places that can get out of sync. Hmmm...

Why is ; on its own line an error? I can hit enter. It only ends a nonzero command? Two trailing ; are an error? (They aren't in C...) Why is bash complaining about this? Huh, the Defective Annoying SHell is doing it too, is this some posix corner case?

"ls one;cat" even mid-argument the semicolon ends the line when not escaped, got it. Same for "(" for some reason (when would that be valid mid-word?) And of course:

$ cat blah<(echo hello)thing
cat: blah/dev/fd/63thing: No such file or directory

Bash's parsing is a bit self-defeating at times. Why would you recognize that THERE? What's the POINT? When could it serve a purpose?

Another fun thing is I run a lot of tests from the command line, like:

$ echo one\
> two
$ echo one\;two

Which are non-obvious how to turn into tests/sh.test entries. I just did:

$ sh -c "$(echo -n 'echo one\\\ntwo')"
$ bash -c "$(echo -n 'echo one\\\ntwo')"

And the Defective Annoying SHell (which is what devuan points /bin/sh to) behaved differently from bash, but bash is NOT behaving like it just did from the command line! (readlink /proc/$$/exe says /bin/bash because my account's shell in /etc/passwd is /bin/bash; this is the compromise ubuntu made to avoid admitting they made a mistake redirecting /bin/sh to point to dash in 2006, change everybody's login shell so you're never actually using dash at the command line, and every #! in the world changes to explicitly say "bash" and we all avoid dash THAT way. Idiots.)

Let's try again...

$ bash -c "$(echo 'echo -e one\\\ntwo')"
$ dash -c "$(echo 'echo -e one\\\ntwo')"
-e one

Of course. And notice the newline IN THE SECOND ONE. Grrr...

It is really hard to implement shell unit tests _in_ shell scripts. Anyway, can of worms for another day, add comments with the command lines I ran and work out how to turn that into regression tests later...

June 1, 2019

Discovered that if you pull down TWICE from the top of the phone screen (I.E. try to open the pulldown menu when it's already open), the pulldown menu expands and THEN you have a brightness slider and the little gear that lets you edit what's in the original pulldown.

Only took me a week of owning the phone to figure out how to do really basic things with it that were obvious last version.

May 31, 2019

Posted about the ls -l / android issue to the list, waiting to hear back about what they're trying to accomplish here.

Upgrading tar to extract the old tarball (to unblock the aboriginal linux linux from scratch build control image experiment). It's not very well documented and I have one example. Maybe I should dig up more old tarballs to test against, but this has "rathole" written all over it...

May 30, 2019

Installed adb to push stuff to my phone, and "adb shell" can ls /, but it complains about trying to stat a lot of directories.

The current dirtree() infrastructure is opening directories without O_PATH, and won't recurse if stat() fails (because it can't tell it's a directory). Hmmm.

I did a fresh aboriginal linux checkout and I'm trying to plug current toybox into it, building the old versions of all the other packages (including the kernel). If I can get it building, I want to plug in the old Linux From Scratch 6.8 build control image and try to build that.

This is a regression test I haven't done in forever, which I stopped doing because I couldn't update the old toolchain (for license reasons) and then updating the kernel got progressively more painful with the old toolchain, and I rebased to mkroot but still haven't reproduced all the infrastructure the old build had to do an automated LFS build, so... let's try to plug current toybox into the old thing and see what happens.

One problem is I haven't supported uclibc in forever, might need to swap in musl too but let's see what uclibc does first...

Heh, there's an old toybox patch to work around a uclibc bug I never merged (because that nonsense doesn't belong in the toybox repo, but I need it here). The old patch doesn't remotely apply to current grep, had to rewrite it.

Toybox tar can't extract the genext2fs tarball, because it's _so_ old it doesn't say "ustar" in the header. Huh.

May 29, 2019

Finally got the 0.8.1 release out. I forgot to update the date of the release notes from when I started writing them to when I actually got the release out (mkroot wouldn't build because AOSP wants tar to call "bzip2 -d" and toybox hasn't _got_ bzip2 compression side just bzcat, so I had to make it try both and fall back).

I want to fix "ls -l /" on Android Pie, which is basically the same bug as 527045debecb. I installed terminal, ran ls -l as my first command, and got "permission denied" on "." with no other output.

Elliott fixed xabspath() but I'd like to fix dirtree(). The current dirtree() infrastructure is opening directories without O_PATH, and won't recurse if stat() fails (because it can't tell it's a directory). Hmmm, needs some design work.

The real question is what's Android trying to _prevent_ here?

May 28, 2019

Fade's back!

No blogging. Fade's back.

May 27, 2019

New phone doesn't have wifi hotspot in the pulldown list of stuff.

The pulldown has a wifi icon, and if I hold it down it goes to a wifi menu, but it's not the HOTSPOT menu, it's too far down the selection tree. The rest are bluetooth (don't care), do not disturb, flashlight, display (android M had the brightness slider in the pulldown, this is less good), and battery. If there's any way to change the list of what's in the pulldown, I haven't found it yet.

So what I have to do to enable the wifi hotspot is exit to the desktop, pull up, gear icon, network and internet, hotspot & tethering, wifi hotspot, on. Lateral progress. Android M had a brightness slider in the pulldown menu, this makes me go into a menu (so turning the brightness back _up_ if I left it down and wind up in sunlight where I can't see the display is basically impossible with this phone now, instead of blind fumbling that takes three or four tries like with the old one).

May 26, 2019

Coming up to the end of the 3 months I said I'd take off.

My ELC talk that got approved was "toybox vs busybox" and I don't want to say "busybox has a usable shell, toybox doesn't" as one of the big differences _still_, so trying to get the shell at least basically usable before I have to go disappear into the land of $DAYJOB for who knows how long because none of the android ecosystem wants to pay me to work on this stuff.

May 25, 2019

New phone. Went in to try to enable it yesterday and worked out I'd have to buy a new sim card for $30, went home to try to fish the old sim card out of my old phone and got talked out of it by Jeff D. who explained that the encryption algorithms in the sim card get quietly updated regularly so switching to a new sim card periodically is a good thing.

So today I went back and gave T-mobile its gratuitous $30 profit. (If I'd bought the phone through them they'd throw in a sim card for free, but I wanted one guaranteed to be unlocked.)

I wound up using the default AOSP image it came with rather than trying to image it because I need a working phone. As with the last 3 phones...

May 21, 2019

Taking some notes for the "toybox vs busybox" talk I volunteered to do in August. I was maintainer of busybox for a while, and had written about 1/3 of its code when I handed it off, and I can at least explain what I was trying to accomplish.

I also created and maintain toybox, I can certainly explain what I'm trying to do here and why I couldn't do it in a busybox context. And there was also a period between leaving busybox and refocusing toybox on android where I maintained it outside of busybox for my own reasons, largely "better design"...

So I don't have a shortage of material. But ELC shortened its talks from an hour to 45 minutes a few years ago, and I should probably leave 15 minutes for questions...

May 20, 2019

Oh good grief, no it is not called that and only ever WAS by Richard Fontana, who is weird about it to this day. Stop deadnaming my license!

Richard Fontana made a mistake, refused to admit the mistake, tried to get SPDX to replicate his mistake so he didn't look so weird standing out like that, defended his mistake for _years_ after losing that battle with a shifting array of reasons (his conclusion never changed but his justification for it constantly did), and when he finally got outvoted at OSI has done his best to memorialize his mistake everywhere he has control over (such as the OSI page on the license).

Long before I got that confusion cleared up, people were using and recommending 0BSD out in the wild and they NEVER used Fontana's name for the thing. This has his fingerprints all over it.

I don't modify wikipedia[citation needed] in part because I'd never have time to do anything else, and these days they block the entire IPv6 address range (so I can't use phone tethering) and every McDonalds and similar open wifi, so I can't even leave a nasty note on the talk page. But seriously, dude. This is misinformation. I started calling it zero clause BSD long before Fontana ever heard of it. I got permission from Marshall McKusick to call it a BSD license in 2013, _years_ before Fontana ever heard of it.

May 19, 2019

Did Norman Borlaug's work make China's one child policy look stupid, or was it always stupid?

Norman Borlaug is possibly the single most important person in the 20th century. He's why india and china can feed themselves. The man literally quadrupled global food production with "semidwarf" varieties of rice and wheat. He started his breeding programs to improve disease resistance, but when nitrogen fertilizer became cheap and ubiquitous (the [LINK]Haber-Bosch process predates World War I, but took a while to scale up and branch out), plants were limited by how much fertilizer you could give them before stalks fell over under the weight of the grain they were growing. Borlaug's solution was to make plants shorter, both so they put less energy into growing stalk and because shorter plants were sturdier and can hold more grain before collapsing, so you could nitrogen the HELL out of them.

But the real gains came when he applied the same trick to rice. These days most of the world's population lives in the circle of rice (sung, obviously, to the lion king), and they're all growing dwarf rice. This provides enough food for ~4 billion people in and around India and China.

Meanwhile, China had a revolution kicking out its royalty a century back, and just like the French revolution they killed all their scientists and wound up with a tin pot dictator in charge who may not have been a net positive _despite_ how bad the royalty they replaced was. Napoleon got millions of his countrymen killed by declaring war on the entire world (twice), but Chariman Mao mostly just starved his subjects to death. He had aristotle's problem of never looking at the world around him and instead making stuff up divorced from reality, then enforcing that vision upon the world. During the "great leap forward" he said the country wasn't producing enough iron and demanded they "make iron" without procuring more iron _ore_. Rather than explain to him where iron comes from, his subjects melted down all their farming tools into neatly stacked iron ingots for inspection by party officials, and wound up starving. (Humoring the Great Leader is one of the classic blunders, "Potemkin village" is from 1700's Russia but Mao got plenty of such tours.) Other parts of the great leap forward included exiling all the schoolteachers and college students to rural farms (where they starved, farming is a skill they didn't have). Mao ordered everyone to kill birds because he thought they ate crops, and then when the insect population exploded without predators his solution was to spray bulk insecticides that drove pollinators extinct so large swaths of china pollinate by hand. (More modern china has tried to bury the history of all these failures, just like they've buried tianamen square. They insist that their "lack of bees" is due to the shenanigans Bayer's been pulling recently, not due to Mao's edicts three generations back. *shrug* US public schools don't exactly open history class with smallpox blankets and the trail of tears, or the way we staged a coup to get Hawaii. The War of 1812 was approximately as stupid; we lost to _canada_ and they set the white house on fire.)

The one child policy was another of Mao's ideas, which led not only to their "bare branches" problem (millions of surplus unwed men because of sex selection in the one and only child parents were allowed to have), it also means China's baby boom problem is somehow even worse than the USA's. China has two generations of only children, I.E. a generation of only children whose parents were a themselves a generation of only children, meaning each child has 4 grandparents with no other descendants. In a society where "retirement" meant having enough kids to take care of the parents in old age, this is an _issue_.

So Norman Borlaug's work increasing the food supply means Mao's one child policy was outright pointless. Add in the fact that educating women means they have more options than just being barefoot and pregnant their entire lives, and birth rates among the young need government support (maternity leave, free daycare, etc) to get _up_ to replacement rates. (Around the world: Europe, the USA, China, you name it. This is apparently a side effect of late stage capitalism viewing productivity exclusively as various forms of manufacturing while completely ignoring and refusing to fund caretaking work, but China's gone all in on capitalism lately so has acquired this problem too...)

And only the young have the _option_ to replace themselves, the overhang of old people that can't have kids anymore can't. I have _no_ idea what China plans to do about any of this, but am glad to be very far away.

May 18, 2019

Debugging the sparse tar compression side, which means I have run "diff -u <(tar cv --owner root --group root --mtime @1234567890 --sparse fweep | hd) <(./tar cv --owner root --group root --mtime @1234567890 --sparse fweep | hd) | less" with malice of forethought. (Well, actually I ran "TAR='tar cv --owner root --group root --mtime @1234567890 --sparse fweep | hd' diff -u <($TAR) <(./$TAR) | less" because I'm me.)

May 17, 2019

Finally went to the store to order a new phone and they're out. Ordered from the website instead. They estimate delivery on the 23rd.

May 16, 2019

Got the grep bug sorted out, it was a missing else and an inverted test that was hidden by the missing else. (So it _seemed_ to work but what it was actually doing was ignoring the -x.)

And now of course people are trying to use it, there's another grep bug after that...

May 14, 2019

Here's an email I wrote but didn't send in response to this, because it went to a dark place (which is nevertheless true):

> Technically yes, because the first initrd could find the second by some
> predefined means, extract it to a temporary directory and do a
> pivot_root() and then the second would do some stuff, find the real
> root and do a pivot_root() again.

You can't pivot_root off of initramfs, you have to switch_root. (You _used_ to be able to, which moved initramfs from / and allowed you to unmount it, at which point the kernel locked hard endlessly traversing the mount list. I know because I hit that bug in 2005 and they fixed it.)

No, I'm saying that if /init is in the static initrd and you _also_ specify an external initrd the kernel _also_ extracts the external one into initramfs, _after_ having extracted the built-in one. (Both archives are extracted, one after the other, into the same ramfs/tmpfs mount.)

If the semantics are O_EXCL and it skips files it can't extract properly, then the external one couldn't replace files in the static one. You just have to make sure that it extracts both before trying to exec /init (which it looks like it currently does but I haven't tested it). And such an init could do anything and end with "mv newinit /init; exec /init".

(And while we're there it's _embarassing_ that you have to enable CONFIG_BLK_DEV_RAM to get the external image unpacked, which means you have to enable CONFIG_BLK_DEV which depends on CONFIG_BLOCK meaning you have to enable the block layer when running entirely from initramfs? That's one of the things I pointed out years ago but nobody ever did anything about it, and I tend not to send many patches here these days because dealing with linux-kernel is the opposite of fun. You literally have a bureacratic process with a 26 step checklist for submitting patches now, which you're supposed to read _after_ the 884 line submitting-patches document which I guess comes after the 8 numbered process documents. And then get dinged by and it's just... no fun. You've gone _way_ out of your way to drive the hobbyists out. Congratulatious, you succeeded, it's all middle aged cubicle dwellers arguing about how to help John Deere prevent farmers from modifying their tractors. The development-process.rst file is aimed at developers "and their managers" because the linux-kernel committee can no longer comprehend developers without managers. Nobody's doing it for fun anymore because it stopped being fun a long time ago.)

And now, I mute the _rest_ of the thread. Do what you like, I think teaching the kernel to do magic in-band signaling here is a terrible technical idea _and_ unnecessary but it's obviously not my call. I'm aware I'm about 7 years too late for that sort of concern to matter to the bureaucracy linux-kernel has become (since at least, and I'm only replying because I was cc'd.

Sigh. I should do the patch to make external initramfs loading work if you've disabled the block layer. And resend the patch making DEVTMPFS_MOUNT apply to initramfs. And there's like 5 others on the todo list...

May 13, 2019

I've misplaced my phone. The downside of the "no sim card" state is if you lose track of your phone, you can't call it to have it ring. Black phone on back background.

Hey, one of my talk proposals was accepted to ELC in August. It's the "toybox vs busybox" one, which personally I think is the _least_ interesting of the topics, but eh, that's what they want to hear...

May 11, 2019

My phone is dying. It keeps saying "no sim card" randomly (requiring a reboot to see it again), and randomly switches itself off as if the battery's died, but when I charge it the battery says it's at 80% or some such.

It's been like that ever since I got caught in a thunderstorm on wednesday with the phone in my pocket. It dried out and started working again after a few hours, but not reliably...

Looking at Pixel 3a xl. Should I buy from t-mobile or from google? I'd like to do the whole AOSP install route if I can...

May 9, 2019

BSD development predated the web, or even widespread internet availability by about 5 years. (It was, in fact, responsible for much of it.) This means it had the problem of privileged communication channels dividing its community into "first class" and "second class" citizens.

BSD started off with a single development office with its devs physically located together in Berkeley, the same problem which prevented mozilla and openoffice from becoming real open source projects for many years after they _tried_ to open up. When almost all your devs are right down the hall from each other, any devs _not_ participating in those privileged channels of communication (face to face due to physical proximity) are sidelined so your development "in group" erodes the "out group" into irrelevance.

Remember that the original 1987 Usenix paper "The Cathedral and the Bazaar" wasn't about proprietary software, it was about 2 different types of open source development. The paper was written by the EMACS Lisp extension library maintainer, and was a comparison of the Free Software Foundation's members-only "cathedral" (with physical copyright assignments on paper) with the Linux "bazaar" taking patches from anybody and everybody on an open mailing list.

This is why toybox's "privileged" communication channel is a mailing list anybody with email can join, and even _then_ I deal (grudgingly) with github pull requests and bug reports and such (even though I MEANT to use that as just a distribution channel), because the younger generation of devs prefers that to email and I don't want to exclude them. (Go where the people are.)

Sigh. I gave a talk about this, but alas it was at Floush in Chicago and their recordings were screwed up both times I went there. I should do podcasts, but I suck without externally imposed deadlines and feedback. The great thing about programming is the box tells me what's wrong every time I try to compile and run anything.

May 8, 2019

Here's a portion of an email I _didn't_ send to scsijon on the toybox list. (It was off topic.)

I would have expected glibc rather than gcc to be the one to break that, it's not the compiler's BUSINESS to care about this. But ever since the C++ developers took over the C compiler they've been expanding "undefined" behavior as much as possible, presumably because C _not_ being a giant pile of irrational edge cases that make no sense so you just have to memorize them was a big advantage C had over C++, and they can't have that.

*shrug* I consider gcc kind of end-of-life at this point. LLVM doesn't act nearly as insecure about C's continued existence, and not being able to compile existing code with gcc 9 would be a bug in gcc 9 as far as I'm concerned.

Presumably there's a -fstop-being-stupid flag for this too if it did turn out to be relevant, or it would be trivial to work around gcc 9's bug if we did wind up hitting it, but this is 100% a gcc bug in cutting edge gcc. (Are they building with -werror or something? Looks like -Werror=no-format-overflow is what would turn it off, which sounds like the "may be used uninitialized, but isn't" inability to turn off the broken warnings without turning off unrelated non-broken warnings all over again...)


P.S. C++'s entire marketing strategy, going back to 1986, is "C++ contains the whole of C and is thus just as good a language, the same way a mud pie contains an entire glass of water and is thus just as good a beverage". C is a simple language with minimal abstraction between the programmer and the hardware. The _programs_ are complex but the language is not. C++ adds a lot of language complexity and abstractions that unavoidably leak implementation details so you can't debug them without knowing every detail of how they were implemented. C is a portable assembly language, as close to the hardware as it can get without a port from x86 to arm being a complete rewrite, and even then hardware details like endianness and alignment and register size peek through. I'm all for replacing C++ with go/swift/rust/etc. I object to a drowning C++ climbing on top of C and dragging it down with it.

May 7, 2019

Elliott continues to make AOSP (the Android Open Source Project, the base build for all android devices) do a "hermetic" build, which is their name for an airlock step. (They're shipping prebuilt binaries instead of building an airlock locally, but fine. Either way it means android is building under the tools android is deploying, which is halfway to native build support.)

Which also means they hit problems in the toybox tools, which I have to drop everything and fix. Which is why today I'm working on tar sparse support: they have tarballs generated by the host tar which they want to extract in the airlock, and they've got sparse files in them toybox tar can't currently understand.

And of course if I'm adding it to extract, I'm adding it to create too. The options are "not doing it" and "doing it right", the middle ground is called toys/pending.

May 6, 2019

Today Elliott pointed me at a fix to his sed performance issue, which is to use REG_STARTEND. This tells regex() to use the contents of the regmatch_t on _input_ to say where the end of the string is, which means A) no strlen() on the input each call to regex() (which is really slow when you're replacing lots of small matches on a very long string, hence the performance issue), and B) I can implement regexec0 to include null terminators without a hacky for loop over the data.

REG_STARTEND seems to have started life on freebsd over a decade ago, and is now supported by _everything_ except musl libc. It was picked up by glibc in 2004, it's on macos and freebsd and bionic, and had even made it into uClibc before that project died, but it's not in musl. And the reason is that Rich declined to support it when the issue came up, saying his users were wrong for wanting to support those use cases "hideous hacks". (There's a lot of that going around in musl; the users are wrong for wanting to do what the users want to do, musl is only for people who think like Rich.)

It's also not documented in the regex man page, so I poked Michael Kerrisk to fix the man pages, complained at Rich, and checked in the fix and a test with a 5 second timeout.

It was actually a multi-stage fix because I had to edit the string in place and avoid gratuitous realloc() because libc does _not_ short circuit same-size realloc, that's the caller's job. I'd have xrealloc() do it but that doesn't know how big the old one was...

May 5, 2019

Banging on the board I took a paid sidequest to work on (making the WF111 work on the SAMA5D3), and its BSP bit-rotted. The company that made it got acquired a few years back, and the 6 year old youtube videos on how to do stuff with this board point to websites that redirect to the new corporation's main page. Great.

I feel guilty charging them for the time it takes to learn how this stuff works, but the guy they had working on it retired.

May 3, 2019

The grep --line-buffered thing has been pending for a while, but the _input_ is also line buffered. I need to rewrite do_lines() to read large blocks of data (or even mmap it, dunno where the "it's a win" size is for that though, need to benchmark).

I'd like to avoid gratuitous copying, which means read a large buffer and pass in a pointer/len within the buffer, except for three problems: 1) where/when do I null terminate? (Inserting a NUL modifies the buffer, and if I keep the \n it has to stomp the next character _after_ the terminator, which may be off the end of the allocation.) 2) lines wrap off the end of the buffer and I have to either memmove or remalloc(), possibly both, 3) some of the users want to keep the buffer, at which point they strdup.

Basically I have to audit all callers to come up with a design, which is hard to do with a dirty tree.

May 1, 2019

Finally got around to updating my resume. I'm not looking for work yet but a recruiter wanted to know and I presumably have to do it eventually.

What I'd _really_ like to do is grow my patreon to the point I can do open source full time, but I don't expect that to happen before I run out of savings again. Or alternately get some of the big companies using toybox to buy "support contracts" so again, I can do this full time...

April 29, 2019

Broke down and saw Ant man Endgame, primarily because Fade saw and enjoyed it and would want to talk about it. (I'd happily see Carol Danvers II but I'd already been told she wouldn't even have half the screen time of Rocket Raccoon.)

I only wanted to walk out of the theatre in disgust once this time, when the same trap that fridged the lead female character of guardians of the galaxy fridged the lead female character of the original MCU avengers lineup, and put to rest the calls for a "Black Widow Movie". We got a single female-fronted MCU movie, the topic is done forevermore! (Other than that, lots of plot points were predictable by going "which actors want out of their contract, and the shakycam was so bad I lost track of the plot a bunch of times. I think I followed how they got 4 of the 6 McGuffins? No idea what happened to the one Loki stole, for example...)

And unfortunately, the movie put me in an irritable mood to review the second man.c patch. I had to back out my second round of changes (more than a day's worth of work) to apply it, and now I'm looking at the various changes that messed up code I'd carefully cleaned up the first time and it's triggering my "can I ignore this command forever" reaction, to which the answer is "no, the android guys will use the broken code out of pending and then it's even more work to clean up because I keep breaking their use cases behind the great google proprietary firewall"...

I've put a lot of skill points into programming. I'm not really that good at managing the work of others, but I can do it. But when the two overlap and other people are messing up my code I want them to GO AWAY and let me get on with it, which is not how open source is supposed to work and I know it. (I program by debugging an empty screen. Things moving behind my back while I'm debugging is BAD DEBUGGING and the way to fix it is to MAKE IT STOP. I can do pair programming just fine, but "I was working on a redo of this code and you sent me a patch for the old version..." I pretty much back out and discard all my work and start over.)

I said I was irritable.

April 28, 2019

Went to the farmer's market today. Learned it takes 4 lamb hearts to make a pound, and Fuzzy got a raspberry mead she's quite happy with. Plus duck eggs. (Woo-oo.)

April 27, 2019

I'm not hugely interested in seeing them kill of the _other_ half of the Marvel universe, so I bought a ticket for Shazam. I've already been spoiled on it, but eh. Sounds like a good movie anyway.

I got an automated email that my old [PATCH v3] Make initramfs honor CONFIG_DEVTMPFS_MOUNT stopped applying, so I'm doing The Lazy Way to see what happened:

1) patch -p1 -i blah.patch
2) git log init/main.c
3) git checkout -f $LASTHASH^1 # the ^1 means commit before that

And repeat until you find the last commit it applied to, and then the one you just ^1'd is the one that made it stop applying.

I'm tempted to automate this (git log $FILE | head -n 1 | awk {print $2}) and I could do patch --dry-run even, and I should probably do --no-merges on the log but I _have_ hit cases where a merge commit is what broke. (And those are "make puppy eyes at upstream" or "dig into the code and try to figure out what the fsck is going on".)

April 26, 2019

A few tangents I edited out of emails today:

It's a crying shame there isn't yet a chromebook shell you slide google phone du jour into and the usbc gives you keyboard, touchpad, display, battery/charger, and a better heat sink/fan for the phone... Yes I'm aware google wants everybody's data in the cloud so you can't work when the net's down. I'm weird.

Java was my primary programming language for ~3 years, about as long as C++ was. I went commodore basic to C to C++ to C to Java to C to Python to C. I'm the guy who told Sun's Mark English that the java 1.0 spec didn't have a way to truncate a file (just missed the 1.1 cutoff but they added it to 1.2).

Java _stopped_ being my primary programming language when they replaced the lightweight AWT with "Swing" and all that model/view/controller nonsense. Plus "no JDK for Linux" was the #1 bug on for 11 months with no official response and _then_ sun screwed over blackdown on the Linux JDK stuff hugely, then they bloated the language so much they had to start doing javaEE subsets, refused to open source it forever and then turned into a patent troll when they did (I'm aware defending themselves against "Microsoft J" was a useful legal battle, but the antitrust breakup should have handled that if we didn't have an infestation of republicans.)

April 25, 2019

The nice clean keyboard of my new laptop is getting slowly grunged up by a human body hunching over it for hours. Sad but inevitable. (I've been trying to keep my fingernails trimmed to slow the rate at which the letters wear off the keys, but it's been like a week and the slow slide away from New Laptop Smell is inevitable. Which is weird becuse google says this model is from somewhere around 2012. I guess it was in a box or something? Anyway, big step up from what I _was_ using, even if I haven't tried to reflash the screen formware to get it to Stop Trying To Help With The Brightness. It only does it when I switch windows, so it's not as annoying as it could be.)

April 24, 2019

Benchmarked grep and found my version's way slower than devuan's version. (Which Elliott's been complaining about, and I confirmed he's right. Well, first I wrote a giant email I didn't send arguing about it, and then I benched and went "ok".)

Thought of a new approach where do_lines() chops text out of a buffer without copying it, which should be much faster, which brings up lifetime rules and requires changing the callers. _BUT_ it would allow me to get rid of the old get_lines(fd) API, which I've meant to do forever.

Ideally I'd want to mmap() that buffer when possible, but how long is the file? I'd need to llength() the file (except really I just want the simple length, the whole llength() mess was to get the size of noncompliant block devices that didn't properly report their size, and since the cdrom went away that's probably not a thing anymore?)

Anyway, giant files can be bigger than the available address space (certainly on 32 bit), so we'd want to map chunks of them. And if we're reading instead of mapping we definitely need a finite size because a read() is into an allocated buffer that doesn't discard clean pages it can read back in from a file. Which raises the problem of what to do when a line crosses the boundary. With mmap we can unmap(), lseek() and mmap() again, with a larger size if necessary. (And the data's probably still in the page cache afterwards.) With read() we can copy the data down, fill out the buffer to the end, and realloc() as necessary. (There's always the possibility of a pathologically large line that's bigger than any finite buffer we've allocated. Although the question of what to DO about such lines remains: we don't want "tr '\0' X < /dev/zero | grep" to trigger the OOM killer.

Anyway, I'm too tired to implement this right now. Which is odd because it doesn't seem like I've done anything today? But I wrote the giant email I didn't send, which was a lot of work, so I guess I have? We all have our process...

April 23, 2019

Hired dudes took down 3 sections of fence int he back yard to get at the poison sumac _tree_ growing between our fence and the neighbor's fence. They estimate it's been there for 15 years, but didn't try to take over the entire yard until we took out the bamboo that was crowding it out.

I gave them more money. I blogged about my weeks of misery and being afraid to touch the cats, and it's just been _looming_ ever since. (Not so much when I was in Milwaukee and Minneapolis and Tokyo and such, but if you're wondering why travel seemed like a great idea...)

April 22, 2019

Hired Dudes (as @kbspangler likes to say) are removing the poison ivy from the front and back yard (which turns out to be poison sumac, not poison ivy). Identifying it all, chopping it up, hauling it away, and painting poison on the stumps. It's so nice to finally find people willing to do that, and I've given them more money than they asked for to do it because YES, THANK YOU!

April 21, 2019

The "Wicd" wifi network chooser thingy doesn't work on the UT campus. I'm guessing there are too many networks here and it's overloading and saying no networks found. How they could be stupid enough to hardwire in a limitation like that... Eh, it's a Linux GUI tool. Of course they would.

(And every time I tell it my phone's password, it doesn't save it. I looked under "preferences" to see if maybe I needed to put it there, but I can't find anything there? How would I get it to forget a network that isn't currently present? There's no list of historical associations like in ubuntu. This thing was not well throught through.)

Anyway, I got Tar switched over to the new environment variable plumbing. I'm not sure the --to-command stuff works reliably (a short write will error_exit() out of tar entirely, even when writing to one of many short-lived child process here), but this isn't a _new_ problem and in my "tar xf linux-4.20.tar.gz --to-command sha1sum | head" tests the data for the whole file tends to go into the pipe before the consumer responds to it so each sha1sum instance complains it got a short write and then tar says it exited with error code 1, but it neither exits nor gets out of sync with the tarball. Pretty sure if I gave it a tarball with a big enough file in it the toybox one would exit, the question is what the debian one would do?

And now that mkroot isn't using busybox gzip (and thus needs gzip/bzip2/xz built in to busybox or tar doesn't know what they are and doesn't have the -z and -j command line options), I can enable gzip! Which I haven't quite finished cleaning up and promoting yet because I couldn't use it yet...

April 20, 2019

It's the evening before a holiday that HEB's 24 hour location actually closes for, so they're clearancing the bakery again. (Well, putting the 50% discount stickers on anything that expires tomorrow.) We have a chest freezer. Camped the spawn and spent $60 on many bags of discount baked goods.

Got new lib/env.c infrastructure checked in yesterday, so now I make tar use it.

The reason for going down this tangent isn't just that the shell needs it soon, it's that tar should use vfork(), but you can't independently modify your environment variables after vfork() because it's a common heap. But if I set and reset them in the _host_ before the vfork() and do the normal "leak the variable contents" thing setting the environment before the vfork(), and it sets a dozen-ish variables each file, for an unlimited number of files (how long's the tarball?) it can do bad things to memory. (I could putenv() and track them manually but if I need infrastructure to do that and the shell needs it eventually... So tangent.)

Switched my email to the new laptop today. Thunderbird's file selector can't select hidden directories (you can't type a path in, and the chooser doesn't show hidden directories) so I had to sed the config files by hand to change the path where the new copy of the "Local Folders" live, but luckily they're text files. (Yeah yeah, Linux: Smell the Usability. I think we've all given up on "linux on the desktop" ever happening at this point. I'd just like to avoid Android being as stupid as Firefox.)

(For some reason there was a 2 gigabyte sqlite file lying around with a last updated date of 2017. I'm guessing version skew? Yay freeing up disk space I suppose. The new laptop has a 2 terabyte disk in it, but I'm sure that'll be insufficient at some point.)

April 19, 2019

Spent a chunk of today arguing with Dell's firmware. Might know how to fix it, but haven't convinced myself a display annoyance is worth possible bricking yet. (How do you get the specific right firmware update for an aftermarket laptop? Apparently this thing was the height of technology at the end of 2012, Moore's Law is stone dead at this point.)

April 18, 2019

The behavior of debian's "env" command is... the same naieve one I just noticed and was about to fix in toybox:

$ env =blah | grep blah
$ env =blah env | grep blah
$ env =blah /bin/sh
  $ env | grep blah

Bash sanitizes out an environment variable with a blank name, but env doesn't.

Sigh, I should modify env to test the new lib/env.c infrastructure. It doesn't _need_ it (it's not persistent, it's fire and forget, nobody cares if it leaks a little memory before printing output or calling exec(), it's limited by the command line and setenv(argc[i]) directly is less memory than strdup anyway. BUT I want the env infrastructure to get a workout.

April 17, 2019

Brought new laptop out to the nice UT courtyard, tried to build mkroot, and... the kernel build failed because it hasn't got flex. Ok, try to phone tether... no networks found in the network gui thing. (It's not using networkmangler, which is great, but it also means I'm less familiar with this one's knobs.)

So, check from the command line, ifconfig says wlan0 is there, maybe I've hit the RF kill switch? Where is it on this laptop... the right side. Accidentally turned it off, turn it back on again and... stack dump in dmesg.

[30254.731639] iwlist: page allocation failure: order:4, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK)
[30254.731651] CPU: 3 PID: 27021 Comm: iwlist Not tainted 4.9.0-6-amd64 #1 Debian 4.9.88-1+deb9u1
[30254.731653] Hardware name: Dell Inc. Latitude E6230/0YW5N5, BIOS A19 02/21/2018
[30254.731738]  [] ? get_page_from_freelist+0x8f0/0xb20
[30254.731742]  [] ? ioctl_standard_iw_point+0x20b/0x3d0
[30254.731779]  [] ? cfg80211_wext_siwscan+0x480/0x480 [cfg80211]
[30254.731785]  [] ? ioctl_standard_call+0x81/0xd0
[30254.731789]  [] ? wext_handle_ioctl+0x75/0xd0
[30254.731793]  [] ? dev_ioctl+0x2a3/0x5b0
[30254.731798]  [] ? sock_ioctl+0x120/0x290
[30254.731802]  [] ? do_vfs_ioctl+0xa2/0x620
[30254.731806]  [] ? SyS_ioctl+0x74/0x80
[30254.731810]  [] ? do_syscall_64+0x8d/0xf0
[30254.731814]  [] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

[30254.731825] active_anon:78161 inactive_anon:78636 isolated_anon:0
                active_file:907684 inactive_file:151303 isolated_file:0
                unevictable:4 dirty:1471 writeback:0 unstable:0
                slab_reclaimable:2798373 slab_unreclaimable:8816
                mapped:16671 shmem:9549 pagetables:4208 bounce:0
                free:38992 free_pcp:23 free_cma:0

Why is it trying to do an order 4 allocation? That's 64 pages of contiguous memory on an active system that's doing an rsync from a usb drive to the main system? (Second pass of file copying from backups.)

So I have to stop the rsync, sync && echo 3 > /proc/sys/vm/drop_caches, and _then_ toggle the RF kill switch? That's kinda pathetic... ok, and now it's back to finding no networks.

[30900.379840] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.380069] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.380159] iwlwifi 0000:02:00.0: Radio type=0x1-0x2-0x0
[30900.618546] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.618776] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.618866] iwlwifi 0000:02:00.0: Radio type=0x1-0x2-0x0

What does disabled mean here? Is this because the apt-get upgrade yesterday updated the iwlwifi firmware? (I dunno why it did it, the one it installed with from the dvd worked fine. Would it work after a reboot if I'd never suspended?)

Ok, I clicked "disable wifi" in the gui thing, waited 10 seconds, "enable wifi", and then something like 30 seconds later (it toggled the rf kill bit again according to dmesg) hit "scan" and NOW it can see my phone... darn it, AND THEN IT WENT AWAY AGAIN.

This is amazingly brittle and I don't know what the magic incantation is, but it's CLEARLY software being broken here. Ugh, in addition to the driver needing an order 4 allocation and being unable to get it if the system isn't COMPLETELY IDLE, the gui tool is horked: "iwlist wlan0 scanning" shows me a bunch of networks. Lemme see if I can remember how to associate by hand... wow it's congested here, my phone is cell 24 in this list. (Lots of instances of "utexas" and "eduroam", that's a university for you.)

Ok, "iwconfig wlan0 essid Arkleseizure key s:password" ... is not it because it doesn't support wpa passphrase. Which EVERYTHING uses now. Right, let's see, that's the wpa_passphrase command which takes the ssid and the password... is that the same as the essid? Let's try it... I got a 64 byte hex string, which is longer than "key" will accept as an argument. And the iwconfig man page's section on the "key" and "enc" options (what's the difference between them?) talks about registering multiple keys and referring to them by number...? What is this nonsense.

Alright, let's try USB tethering. In dmesg I get:

[32786.261845] usbcore: registered new interface driver cdc_ether
[32786.268344] rndis_host 3-2:1.0 usb0: register 'rndis_host' at usb-0000:00:14.0-2, RNDIS device, de:d3:48:09:5c:0e
[32786.268382] usbcore: registered new interface driver rndis_host

And cdc_ether is presumably the ethernet thing going "hey, right here" and there's no second ethernet interface in ifconfig. Still just eth0, wlan0, and lo. The gui thing is apparently ONLY for wireless, doesn't have any wired control options. Do I need to insmod something myself? Why is this not working? Ok, dig down into /lib/modules/*/kernel/drivers/net/usb and it looks like I need to "modprobe cdc_ether"... which seems to have been a no-op? Ah, it's already there in /proc/modules. And that's the rndis_host stuff I guess? What does /sys/class/net say, that shows a usb0...

AH! My bad. ifconfig -a shows it, usb0 is down and ifconfig only shows up interfaces by default, -a shows all of them. So the only problem is the net app isn't responding to it, so run "dhclient usb0" manually to do dhcp on it and...

Ha! I have net!

And rain and thunder and lightning 20 feet away from my table. Quite the storm, might be here for a while. Of course when I go walking there's a storm. Yes I brought an umbrella, but downpour and lightning seem a bit much for it. Been going for a while, though, starts and stops a lot, maybe I can head out in a gap and make it to another overhang to wait out the next outburst?

Hmmm. Devuan's "Power Manager Plugin" has critical power level set to suspend at 10% but it never triggered, and at 1% (!) I noticed because the power light went solid orange, and I closed the laptop and plugged it backed in a bit (despite the lightning) to have enough power to get home.

Devuan has different bugs than Ubuntu 14.04 did, but the whole "Linux: smell the usability" thing is out in full force. 28 years of Linux and we still SUCK at this.

April 16, 2019

So, new laptop! Installed Devuan Ascii with xfce, and this time the setup is (still derivative of last time):

Install devuan, selecting xfce and leaving most of the options stuff unclicked. Boot into new system.

Fiddle with GUI stuff at the top a lot. I have cpu graph, disk performance monitor, network monitor, workspace switcher (8 desktops in 2 rows of 4), free space checker, DateTime doing "%a, %b %d, %r" in 14 point font, the network "notification area", power manager plugin, pulsaudio plugin, and clipman. Plus I went into settings->panel from the applications menu and told the bottom panel to hide itself "intelligently".

Then click on the battery icon, power manager settings, and suspend when lid is closed, system sleep mode "suspend" for both, on critical battery power suspend, disable "lock screen" checkbox, display power management blank after 10 minutes, sleep 11, switch off 12, reduce brightness after "never".

Then since that didn't stop the stupid screen lock on suspend (physical access to my laptop is game over, don't pretend it isn't until you've fixed badusb and friends), "apt-get remove xscreensaver".

Next up apt-get install aptitude chromium mercurial subversion git-core pdftk ncurses-dev xmlto libsdl-dev g++ make flex bison bc strace diffstat

The hard drive is annoyingly clicking. hdparm -B 254 /dev/sda fixed it, added that to /etc/rc.local. (Wow the /etc directory has a lot of crap in it in devuan 2.0.) Googled for a bit to see if the hard drive parking itself every 2 seconds was worse for its longevity than the vibration of hitting keys and jostling it around the table when the heads aren't parked, but nobody seems to have studies. (Presumably it still has the impact accelerometer that does the emergency park.)

aptitude install -R thunderbird (to get it _not_ to install the "lightning" calendar extension because this isn't outlook).

Set the terminal background color to _actually_ black, not just a dark grey, and make the text color fully white. I'm in enough bad lighting situations as it is, I don't need grey-on-grey in terminal windows. (I also switched from monospace to monospace bold, but I'm not sure it's an improvement? Hmmm... no, don't think so. Switched it back.)

apt-get remove libgnome-keyring0 (which I don't use and causes stupid chromium popups... and that didn't stop it. Nor did telling chrome never to store passwords, or switching off lots of password things in chrome://flags. And I dunno how to change the xfce pulldown menu to start it with --password-store=basic (what I _want_ is --password-store=none). When I need a darn password I'll enter the darn password, stop trying to "help" here. I NEVER tell my browser to save passwords, it defeats the purpose of passwords. Save a key cookie if you want to do that...

April 15, 2019

Dug up the 2 terabyte hard drive I bought in Milwaukee and walked back to the discount electronics place to try to pay them to install it into the new laptop (exercise!), and they more or less declined. (They'll install stuff I buy from them for free, but trying to hire them to install _my_ stuff is more expensive than buying the part from them.) Oh well, I can do it myself, just didn't want to.

Huh, this one is MUCH less painful than swapping out hardware in my netbook was, that required popping out the keyboard and digging _down_ into the machine, this one the bottom panel comes off and the memory and hard drive are right there. Convenient! (And totally made in china; Dell had nothing to do with the design of this hardware.)

Installing Devuan on it. Unlike the new oversizes system76 laptop (which I still have on a shelf) it did not ask for strange binary firmware that's not on the USB stick! Woo! (The easy way to get something to work with Linux is to pick hardware that's several years old.)

Dug up a 2 gigabyte hard drive for it.

April 14, 2019

Bought a new laptop! Walking back from Tax place again (this time carrying an umbrella as a parasol) because I had to drop off the check for the bank routing info, and on the way back I stopped in to the "discount electronics" place on Andersen near Lamar, and they had a Dell Latitude E6230 for cheap that's a nice form factor, reasonable processor (core i5), and can hold 16 gigs of ram.

I'm not asking for _that_ much in a laptop. It's a several year old model (2015 I think?), but Moore's Law dying means that matters a lot less than it used to. (Technically the S-curve tarted bending down around 2001 and the exponent gradually decreased until it's asymptotically approaching 1, meaning the advances these days are linear rather than exponential. The technology's still advancing, but not in a world changing way.)

And unlike anything I've ever seen from System76, THIS one is reasonably sized. (It doesn't QUITE fit into the netbook bag because the extended battery sticks out too much, but would if it didn't so points for trying.)

April 13, 2019

Back in Austin, went to my tax appointment and got a bad sunburn walking back. (Even though I was in the shade of I-35 for at least half of it.) Gotta go back tomorrow to drop off a check. Then I went to natural gardener with Fuzzy. They're out of African Basil.

Rideshare is expensive (between one way to taxes and both ways to natural gardner spent over $50 on it today), but my car is dead and self-driving is coming so I don't really want to replace it if I don't have to. Waymo's Guy In Charge Of That estimates they'd like to charge $1k/year ($85/month) for a flat rate subscription in a municipal metro are, as in your phone's Google Maps app grows a "take me there" button next to "directions" that when pressed turns into a countdown of seconds until your vehicle arrives. They're already prototyping this in Arizona, the tech is ready it's just regulation catching up to allow them _not_ to have someone sitting in the driver's seat "just in case". (Because nothing says paying full attention like somebody _not_ driving. "Driver assist" is an accident waiting to happen, either the human is driving or the human is NOT driving, halfway states are called a "distracted driver".)

To clarify: Google's technology is ready, but they've been working on it for over 15 years now. Uber's keeps killing people because they suck, and are trying to play catch-up, but keep in mind Musk didn't found Uber, Martin Eberhard did. Musk acquired it in a hostile takeover with the money he made from Paypal in the dot-com boom. His technology only advances when he buys other companies (like SolarCity and Maxwell) or when he hires people away from them who are already doing things (when doesn't work so well for them and tends to turn into lawsuits).

All the others are still playing catch-up, but everybody's working on it because it's a game of musical chairs. Lots of people are doing parts of the business model the way lots of people were doing parts of smartphones (apple newton, palmpilot, the motorola razr running ran apps written in java) before iPhone and Android shipped in 2007.

The thing is, one car per person was always a terribly inefficient model. Individually owned cars are only driven about 4% of the time (parked 96%), even human-driven taxis are driven about 40% of the time (the humans still sleep). Assuming self-driving cars are on te road 10x as much (which is a conservative estimate) you'd need 1/10th as many cars to serve a given metro area (yes even at rush hour, which is about 3 hours each way meaning multiple round trips even without carpooling).

Then add in the fact that an electric car lasts a million miles each before you have to service anything (other than replacing the tires every 30k miles: no air filters, no oil to change, no transmission, the batteries have active liquid cooling so they last a long time...)

So if you're a car company seeing the rise of cheap electric self-driving vehicle fleets, you're playing a game of musical chairs: your industry's manufacturing volume is about to drop by an order of magnitude and there's only enough market to support 1/10 as many car companies as we have today. They're all racing to switch over before their competitors do.

People immediately go "but what if somebody barfs in the car" (then you can report the car soiled in the app and request another, and they know which phone was riding in the car so they can place blame appropriately and prevent a recurrence), and this is why they're doing trials and limited rollouts and building service centers and so on.

The estimates are that the gasoline distribution network will collapse around 2025 when volume falls below fixed costs and the whole mining/shipping/refining/delivery/sale network we have now becomes unprofitable. At that point gas stations stop selling gas and become convenience stores, and gasoline becomes something those who still need it order delivery of (like liquid nitrogen from airgas), and keep their own tank on site. Sufficiently rural areas will be "stuck on dialup" for a couple extra decades, but cities will get rid of parking lots fast: that land's way too valuable when an app-summonable vehicle can pick you up and drop you off from the curb and never need to be parked anywhere but the fleet maintenance depot.

That's why I dowanna get a new car if this is only a couple years away. It's like installing a land line once digital cell phones have arrived, but not yet having a cell tower in range of my house. Or using dialup when cable modems are available, or still having cable TV when you have streaming services. I don't want to own a car, I want the app. I just need coverage to reach where I live.

April 8, 2019

I haven't been posting as here much because I've been posting to the mailing list, today's issue is reestablishing the setenv lifetime rules again so I can reopen the toysh can of worms.

April 7, 2019

Went to see Captain Marvel again, this time with my sister and the niecephews. Shortly before the big fight on the spaceship (the one set to I'm Just a Girl) a guy in the back row was discovered unconscious, and the next 20 minutes the theatre had the lights on while the movie played and everybody was looking at the back of the theatre instead of the screen as the theatre staff asked him loudly if he was diabetic until the EMTs showed up and carried him out.

They didn't pause the movie, but they did give us free passes to see another movie some other time as we left (and told us that he'd had an epileptic siezure but was otherwise fine).

This is why people wear bracelets for this sort of thing. One the one hand I feel bad for the guy, on the other he cost the theatre the revenue from a packed house and my niecephews missed the climax of the movie. (Not the punching spaceships part, but the entire facing down Annette Bening part and the montage the internet will inevitably set to "I get knocked down but I get up again". That's this movie's version of the camera circling the avengers while the theme plays, the punching spaceships bit is denouement.)

April 2, 2019

Submitted 5 ELC talk proposals (I think 3 of them were to "whatever conference they're hiding ELC behind this year", this pairing thing is terrible). Of course it was at the last minute (which due to pacific time was 2 hours later than I thought). I should memorialize them for posterity, but didn't.

Trying to finish and promote tar today so it can go in Android Q, which is mostly filling out the test suite so everything's tested (and fixing what that finds), and SKIP_HOST isn't granular enough.

What I want to say is "some non-toybox versions of this are expected to fail, but it's still a good regression test for us", such as the fact that the gnu/dammit tar can't handle "cat tarball.tgz | tar tv" and mine can. (I can autodetect type on a nonseekable stdin. It was a pain, but I refused to let it _not_ do it.)

But if you extract toybox source onto a mkroot system where the host is toybox and want to run the tests on the host toybox? That should be fine.

April 1, 2019

I hate april fool's day. Trying to stay off twitter.

The gnu/dammit tar has a --full-time but doesn't have --no-full-time, which is annoying because I'm printing --full-time by default. Sigh. I can add the other thing for compatibility, but ow? (If ls does --full-time, why doesn't tar -tv?) Sigh. Ok, implement --full-time just so the test suite can pass TEST_HOST...

Huh, I added a TARHD variable I can set where this test passes the output through "hd >&2" so a hexdump of the created tarballs goes to stdout. That way if they differ, I can figure out what differs. But what I _really_ want is to catch the failure and run the host and target versions through hd _then_, which means I need to be able to register an error handler function. Hmmm... It's a pity bash hasn't got an "is this function defined" check. No wait, that's under the shell builtins... "type -t name". Returns "function" if it's a shell function. Ok...

March 31, 2019

The L records the gnu/dammit tar outputs for long filenames have the permissions and user/group names filled out. They're not needed (they're in the next header and those are the ones that get used), but they're filled out. Meanwhile fields like timestamp are zeroed. There's no obvious pattern to it, I think it's an implementation detail (sequence packets are initialized?) leaking through into the file format.

No, it's worse. The owner/group is always "root" and the permissions are 644. So the field could be zeroed but it's instead nonsense. As with the " " after the checksum, just gotta match the nonsense to get binary equivalent tarballs.

March 30, 2019

I'm writing tar tests, trying to do a proper thorough job of testing tar (which the previous tests didn't really), and I did "tar c --mtime @0 /dev/null | tar xv", which should more or less be ls -l on a char device, but:

--- expected
+++ actual
@@ -1 +1 @@
-crw-rw-rw- root/root 1,3 1970-01-01 00:00 dev/null
+crw-rw-rw- root/root 0 1970-01-01 00:00:00 dev/null

It's showing size, not major, minor. (This is the gnu/dammit one.) I want TEST_HOST to pass, but they're showing useless info here. "Be compatible" is fighting with "do it right". Hmmm...

What does posix say? Hmmm. The last posix spec for tar was 1997, before they removed it (just like cpio, the basis for rpm and initramfs; Posix went off the rails and we're I'm waiting for Jorg Schilling to die before trying to correct anything). And that says:

The filename may be followed by additional information, such as the size of the file in the archive or file system, in an unspecified format. When used with the t function letter, v writes to standard output more information about the archive entries than just the name.

Great, EXPLICITLY unspecified. Thanks Posix! You're a _special_ kind of useless.

March 28, 2019

Ok, the Embedded Linux Conference and Open Source Summit are colocated in San Diego in August (the Linux Foundation does this to dilute the importance of conferences, it's about like how Marvel had endless crossoves to force you to buy more issues back in the 90's right before they went bankrupt). The CFP closes April 2. I should submit a thing.

Topics. Ummm. I could do an updated 3 waves thing (lots of good links for that, credentials, A03 is fan run and thus better at what it does, more on that, credentials vs accomplishment, and so on.) I could do a talk on 0BSD, on mkroot, on toybox closing in on 1.0...

March 27, 2019

So, tar paths...

$ tar c tartest/../tartest/hello | hd
tar: Removing leading `tartest/../' from member names
00000000  74 61 72 74 65 73 74 2f  68 65 6c 6c 6f 00 00 00  |tartest/hello...|

It's matching .. sections (the code I'm replacing was just looking at _leading_ ../ which isn't good enough).

$ tar c tartest/../../toy3/tartest/hello | hd
tar: Removing leading `tartest/../../' from member names
00000000  74 6f 79 33 2f 74 61 72  74 65 73 74 2f 68 65 6c  |toy3/tartest/hel|
00000010  6c 6f 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |lo..............|

And the gnu/dammit code is stupid.

$ tar c tartest/sub/../hello | hdtar: Removing leading `tartest/sub/../' from member names
00000000  68 65 6c 6c 6f 00 00 00  00 00 00 00 00 00 00 00  |hello...........|

_really_ stupid.

Of course figuring out what/how to cannonicalize is weird too, because I don't have abspath code that stops when it matches a directory, and there's no guarantee it would anyway rather than jumping right over it. I want the _relative_ path to be right.

Sigh. Compatibility, do what the existing one's doing...

March 25, 2019

Got a heads up from Elliott that auto-merges of external projects into the Android Q branch end on April 3, feature freeze in run up to the release. So if I want to get tar promoted and in, I've got until then.

March 24, 2019

Once again trying to work out if old = getenv("X"); setenv("X", "blah", 1); setenv("X", old, 1); is allowed. Because old is a pointer into the environment space, and setenv replaces that environment variable. Under what circumstances do I need a strdup() in there?

I dug into this way back in 2006 but don't remember the details...

March 18, 2019

Tar cleanup corner case: the gnu/dammit tar fills out the checksum field weird, I kinda dowanna do that but the resulting tarballs won't be binary identical if I _don't_...

Backstory: tar header fields are fixed length records with left-justified ascii contents, padded with NUL bytes. The numerical ones are octal strings (because PDP-7 used a 6 bit byte, we say the machine Ken and Dennis wrote Unix on had 18k of ram but that was 1024 18-bit words of memory).

The "checksum" field is just the sum of all the bytes in the header, and is calculated as if the checksum field itself is memset with space characters. (Then you write the checksum into the field after you've calculated it.) The checksum has 7 digits reserved (plus a NUL) but due to all the NUL bytes in the header, the checksum is almost always 6 digits. So it _should_ have 2 NUL bytes after it... but it doesn't. It has a NUL and a space, ala:

00000090  31 34 31 00 30 31 32 32  36 36 00 20 30 00 00 00  |141.012266. 0...|

The _reason_ for this is historical implementations would memset the field, iterate over the values, and then sprintf() into the field which would add a NULL terminator but not overwrite the last space in the field. And the gnu/dammit tar is either _doing_ that, or emulating it.

I'm not memsetting spaces into the cksum field, I'm starting with 8*' ' and skipping those 8 bytes... but the result is I'm printing out two NUL bytes at the end instead of NUL space. And if you check for binary identical files...

It's _almost_ certain no tar program out there is going to care about this, but if I don't and I use canned tarballs in my tests, CHECK_HOST would always fail with the gnu/dammit implementation. (Or possibly busybox, I haven't looked at what that's doing yet.)

March 16, 2019

Oh FSM. I feel I should do a response to LWN's motivations and pitfalls for new "open source" licenses article, but you could just watch my 3 minute rant on there being no such thing as "the GPL" anymore, copyleft fragmentation inevitably increasing as a result, and the need for a reclaimed public domain via public domain equivalent licenses that don't have the "stuttering problem".

Of course there's no mention of 0BSD or similar, they haven't noticed it yet. A lot of people haven't worked this sea change through to a logical conclusion yet, they're still trying to make a better buggy whip because their old one stopped serving their needs. Fighting the last war...

March 15, 2019

That side gig is hanging over me. I want to do the thing for them, it's not hard, but I'm huddling under an "out of service" sign.

March 13, 2019

At Fade's. Well, currently at the McDonald's down the street from Fade's.

Tar has an interesting corner case in autodetecting file type: if it's a seekable file you can read the first tar header block (512 bytes) and if it doesn't start with "ustar" (unix standard tar, posix-2001 and up so an 18 year old format we can pretty much assume at this point, albeit with extensions) then check for compression signatures for gzip and bzip...

At which point, if it's _seekable_ you seek back to the beginning, fork, and pass the filehandle off to gzip or similar. I just redid xpopen() so it can inherit a filehandle from the host namespace as its stdin/stdout. (It can still do the pipe thing: feed it -1 and it'll create a pipe, but feed it an existing filehandle and it'll move it to stdin/stdout of the new process; I should probably have it close it in the parent space too but haven't yet because when you pass along stdin/stdout _those_ shouldn't get closed and is that the only case?)

But if it's _not_ seekable, I have 512 bytes of data I need to feed into the new process, and there's no elegant way to do that. I kind of have to fork another instance of tar with the appropriate -zjJ flag and then have _this_ one spin in a loop forwarding it data through a pipe(2).

Which is awkward, but doable...

March 12, 2019

Packed out of apartment, onna bus to Fade's.

Hey, ubuntu found a new way to fail. Doesn't suspend because kworker/u16 (Workqueue: kmemstick memstick_check [memstick]) failed to suspend for more than 120 seconds, and so the suspend was aborted _after_ I'd closed the lid and put it in my laptop bag, so instead it got VERY HOT.

Bravo, ubuntu. Yes of course the correct thing to do if the memory stick probe hangs for 2 minutes is to MELT THE HARDWARE. Linux: smell the usability!

March 11, 2019

First day where I would be working if I hadn't quit the job. Sitting in the apartment poking at computer stuff. I had a long todo list I haven't done any of yet. Luckily, over the years I've learned that "not doing stuff" is an important part of the process. I need cycle time. Rest, recovery, sleep, staring out windows. I gave up a lot of money to be able to afford _not_ to do stuff today, and am enjoying it.

That said, I should at the very least drop off the "moving out of the apartment" form, and maybe take my bike back to the bike co-op I got it from and go "here, free bike". (It's a vintage Schwinn, it's lovely. Someone will want it as much as I did. Alas, can't easily take it out of state with me.)

Somebody tried to sign up to the mailing list, and I forwarded them to mkroot, but as I told them in the email... "I mostly talk about it on the toybox mailing list. And patreon. And my twitter. And my blog..." (It had a mailing list but I stopped using it after a thing happened. I have a vague roadmap to merge it into toybox and stop doing it as a standalone project, but need to implement route and toysh in toybox first.)

March 10, 2019

And thunderbird filled up all memory, wasn't watching, didn't kill it fast enough, and it locked the machine hard. Had to power cycle. Wheee.

Lost 8 desktops full of open windows, most of which had many tabs. Rebuilding much state. The most irreproducible loss is, of course, all the thunderbird windows where I clicked "reply" to pop open a window to deal with later. Thunderbird keeps no record of that whatsoever. (Kmail would reopen them all when restarted, but alas that was bundled into a giant desktop suite and went down with the ship it was tied to the mast of. Pity, it was a much better mail client than thunderbird. Oh well.)

Once upon a time, Linux had an OOM killer that would kill misbehaving processes if the system was in danger of running out of memory and locking up. People complained that their process might get killed. So the kernel devs neutered the OOM killer so it doesn't work remotely reliably and now the whole system locks up as often as it's saved by the OOM killer, because killing _every_ process is clearly an improvement to killing _a_ process.

Sigh. Lateral progress.

March 9, 2019

Thunderbird's sluggish again so I tried to clean out the linux-kernel folder. Since this is the big machine with 16 gigs of ram and 8 gigs swap, I told it to move 96k messages instead of the usual 20k at a time. It moved all the messages, and then did its Gratuitous Memory Hog thing it always does at the end (because Thunderbird is programmed strangely). It ate all 16 gigs DRAM, worked its way through all 8 gigs swap, and then I called killall thunderbird from the crl-alt-F1 console before the machine could hard lock (because the OOM killer dosn't work anymore, no idea why).

And of course when I started it back up, none of the messages it had spent hours copying to the new folder had been deleted from the old one.

Could somebody not crazy write an email client? This doesn't seem hard. Far and away the _most_ annoying thing about thunderbird is when it pops up a pulldown menu or hovertext, and then freezes for 6 minutes doing something where the CPU or disk is pegged, and the darn pop-up follows me when I switch desktops, blocking whatever's behind it.

So now I tried right click delete... and it's moving 96k messages to a trash folder. Sigh. NO, DELETE THEM! NOT MOVE TO TRASH! NOW WHEN THIS CRASHES I'M GOING TO WIND UP WITH _THREE_ COPIES!

It's a good thing this machine has gigabytes of free disk space because this email client is written by idiots. And once you start one of these operations that's going to take 4 hours (and then maybe try to crash the OS again afterwards if you're not babysitting it), there's no way to interrupt it short of kill -9 which would leave the files in who knows what state...

March 8, 2019

Last day at JCI. Stress level: curled into a ball, whimpering.

Sigh. I'd really like to move the Android guys to a more conventional build approach, where the Android NDK toolchain is not just a generic-ish toolchain but is the one used by AOSP, so that 1) you can export CROSS_COMPILE=/path/to/toolchain/prefix- and if your build is cross compile aware it just works, 2) Android isn't shipping 2 slightly different toolchains that do the same thing.

They are reluctant to do this because A) windows, B) they see me trying to apply conventional embedded-ish development to android as weird. (Everybody except them is an app developer. This isn't how you build apps!)

Sigh. I keep going "this reduces to this, just implement the general case and it should work in a lot more situations" and getting "but that's not how we've ever thought of it, you'll confuse people". I get different variants of it from the linux kernel guys, the distro maintainers, embedded developers, the android guys, compiler developers... everybody's in their own niche.

March 7, 2019

I've been doing a review pass of pending/tar.c and adding a bunch of "TODO: reading random stack before start of array" and so on, and I've come to the conclusion I need to change the xpopen_both() api. Because if the child process needs its stdin or stdout hooked up to an existing filehandle, there's no current way to do that.

The way it works now is you pass in an int[2] array and it hooks up a pipe to each one that's zero, and writes the other end of the pipe into that slot (int[0] going to the stdin of the process and int[1] coming from the stdout of the process). But what I _want_ is if I feed an existing filehandle to the process, THAT filehandle should become the stdin or stdout of the process. (So gzip can read from or write to a tarball.)

Also, once upon a time I had strlcpy() which was like strncpy but would reliably add a null terminator and didn't do the stupid (memset the rest of the range after we copied). It was just something like "int i; if (!len--) return; i = strlen(src)+1; memcpy(dst, src, i>len ? len : i); dst[len] = 0;" and it worked fine. But unfortunately BSD had the same idea, and added it to libc in a conflicting way (const const const str const *const) and I think uClibc picked that up, so I switched to xstrncpy() which will error_exit() if the string doesn't fit. Which 99% of the time is what you want: don't silently corrupt data. BUT with tar and the user and group name fields...

Hmmm, except if they don't fit what _do_ we want? Truncating could (theoretically) collide with another name, and if the lookup by name fails we've already got UID/GID. (I did bufgetpwuid but didn't implement a negative dentry mechanism for optimizing _failed_ username lookups...)

Ah, it's using snprintf(), close enough. (I keep confusing that with strncpy, which is stupid and will memset the rest of the space with zeroes for no apparent reason. But snprintf() will just _stop_writing_ at the appropriate spot, leaving a null terminator and not gratuitously molesting the rest of the field.)

March 6, 2019

Last week at work. Totally listless. Paralyzed, basically. I'm stress eating and stress tweeting.

Also, SEI has resurfaced with Probaby Money (not yet the same as Actual Money but you never know), and I've mentioned my recruiter found me a side gig (telecommuting getting a medical sensor board upgraded to new driver versions), and I'm kind of annoyed that I quit my $DAYJOB (which paid quite well) so I would have TIME, and that time is already filling up with other less-well-paying work.

I'm totally aware this is a self-inflicted problem, but... dude. I should be better at saying no.

March 4, 2019

Dreamhost has been poking me about renewal for Got the check in the mail today.

(I know way too much about how the sausage is made to be comfortable doing financial transactions online. I'm aware it's silly, and yet...)

March 3, 2019

Poking at toys/pending/tar.c and of course the first thing I do (after a quick scan and some "this sprintf is actually a memset" style cleanups) is build it, make an empty subdirectory, and "tar tvzf ~/linux-4.20.tar.gz". And I get a screen full of "tar: chown 0:0 'linux-4.20/arch/mips/loongson64/common/serial.c': Operation not permitted".

Sigh. This is unlikely to be a small task.

March 2, 2019

Fighting bad Linux userspace decisions.

So top -H is showing the right CPU usage for child threads, but the main thread of a process has the cumulative CPU usage. I _think_ this is because /proc/$PID/stat and /proc/$PID/task/$PID/stat have different data (I.E. the kernel is collating when you read through one API but not reading the same data through another API).

I have a test program that spawns 4 child threads and has them spin 4 billion times in a for(;;) loop, and I just poked it to dprint(1, "pid=%d") the PID and TID values (to a filehandle so I don't have to worry about stdio flushing for FILE *), and I hit my first problem: glibc refused to wrap the gettid() system call? (What the... the man page bitches about thread IDs being an "opaque cookie" and I'm going "this is legacy crap from back when pthreads as an abomination, before NTPL, isn't it?" Sigh, so use syscall() to call gettid so I have the number I can look in /proc under.

Second problem: the process doesn't _end_ until the threads finish spinning and exit, which means the output doesn't close, so my little pipeline doing:

./a.out | sed -n s/pid=//p | (read i; cat /proc/$i{,/task/$i}/stat)

Is sitting there blocked in read until a.out exits, at which point the cat says the /proc entries don't exist anymore. This is DESPITE the fact that if you chop it at the second | you get the value followed by a newline immediately! It's just that bash's read is blocking trying to get more data AFTER the newline, for reasons I don't understand? (Even read(4096) should return a _short_read_. And yes the "read i;" needs a sleep 1 after it to accumulate enough data to see the divergence reliably, but this bug hits first and that confuses debugging right now.)

This totally needs to be a test case for toysh. My "bash replacement" should get this RIGHT, even if ubuntu's bash doesn't. (I was even desperate enough to check /bin/dash, which also got it wrong in the same way. Well, ok dash didn't understand the curly bracket syntax, but it waited out ./a.out's runtime _before_ getting that wrong.)

March 1, 2019

Two different coworkers basically need the toybox version of a command to fix a problem they're having. One is that busybox's ar can't extract an ipk file, another is a busybox tar bug where if you tar -xC into a subdir that results in broken symlinks (in this case a root filesystem install from initramfs into a mount point where /etc/localtime points to a timezone file that's there in the subdir but the symlink points to the absolute path of where it would on the final system), busybox tar does NOT chown the symlink. So the symlink belongs to root:root instead of whoever it's supposed to belong to, even though the tar file has the right info.

Alas, I haven't implemented toybox tar and ar because I've been too busy with $DAYJOB. I'm not sure if this is ironic or merely unforunate. I'd ask Alanis Morisette, but I'm told she had problems with that too.

February 28, 2019

It's the last day of the month and I kept meaning to check if any conference call for papers were expiring, but I just couldn't bring myself to care.

I told my boss at $DAYJOB on monday I'm too burned out to accomplish anything else, but they still haven't let me know when my last day is. They keep saying they're _not_ unhappy with my performance on the morning call, but _I_ am unhappy with my performance.

One of the big differences between my mental health in my 20's and now is I know when I need to bow out for self care. (I often miss when I _should_, but am reasonable about working out when I _need_ to.)

February 27, 2019

I'm doing board bringup for that side gig, and they just emailed me a large explanation of the hardware they need working. I unboxed the new board yesterday and confirmed the bits connect together, but haven't actually powered up the result yet.

My first goalpost on any new board is "boot to an initramfs shell prompt on serial console", at least when I'm trying to understand everything and rebuild it properly from source. Getting that working means:

1) Our compiler toolchain is generating the right output for the board, both in kernel mode and userspace/libc.

2) We know how to install code onto the board and run it. (Whether it's tftp into memory or flash it to spi or jtag or what.)

3) The bootloader is working, running itself and doing setup (DRAM init, etc), then loading and starting our kernel.

4) If we get kernel boot messages then the kernel we built is packaged correctly, has a usable physical ram mapping, and is correctly writing to the serial port.

5) If we can run our first program (usually rdinit=/bin/sh) then the kernel is enabling interrupts properly (the early_printk stuff above is simple spin-and-write-bytes with interrupts disabled, that's why printing fewer early boot messages can speed up the board booting), finding a clock to drive the scheduler, and this is where we verify the libc and executable packaging parts of the toolchain work right (because we're finally using them; often I do a statically linked rdinit=/bin/helloworld first if it's giving me trouble.)

When we're done "I built and ran a userspace program that produced output" means I should be able to build arbitrary other ones, and a toybox shell is the generic universal "do lots of stuff with the board" one, where you can mount /proc and /sys and fiddle with them, load modules, etc. That's basically where you get traction with the board.

When an existing BSP gives you a working Linux reference implementation, most of these steps are probably just isolating and copying what it's doing, but I like to step through and move all that stuff into the "I know what it's doing, or at least where to look it up if it breaks" category on any new board I have to support in a nontrivial way.

Then the next thing is usualy digesting the kernel .config into a miniconfig and seeing what's there, coming up with the minimal set of options to do the shell prompt thing and cataloging the rest of them.

February 26, 2019

I'm trying to figure out if my normal response to spam callers is "punching down". I always try to hit the buttons to get through to a human, then say "You spam people for a living. That's sad." and then hang up.

The problem is, I'm doing this to the minimum wage drones in some poverty-stricken rural area who are... doing it for a living. Not the people benefitting from it and collecting 90% of the money from whatever scam it is. But alas, this is the only way I know to push back. (It's not like our current government will do anything about it, not until the GOP finishes imploding, which won't happen until the Boomers die and the fossil fuel companies lose their position as 1/6 of the planet's economy.)

February 25, 2019

Told my boss I'd like to wrap up at work. The money is _lovely_ and this is work I could do in my sleep _if_ I could do it. Unfortunately I've got a variant of writer's block, which is a bit like having a big term paper due and being unable to start because you're so stressed out.

I've been spinning my wheels here so long that I've exhausted my coping mechanisms.

February 22, 2019

How is this page's bit on toybox wrong, let me count the ways:

The Toybox license, referred to by the Open Source Initiative as the Zero Clause BSD license,[7] removes all conditions from the ISC license, leaving only an unconditional grant of rights and a warranty disclaimer.[8] It is also listed by the Software Package Data Exchange as the Zero Clause BSD license, with the identifier "0BSD."[9]

It's not important that it's from toybox, other projects use it too. It was the OpenBSD suggested template license and I got Kirk McKusick's permission to call it zero clause BSD. IT doesn't remove _all_ conditions, it removes half a sentence. And SPDX approval came long before OSI, so a better phrasing would be:

The Zero Clause BSD license [7] (SPDX identifier "0BSD"[9]) removes half a sentence from the OpenBSD suggested template license [], leaving only an unconditional grant of rights and a warranty disclaimer.[8]

Anybody want to edit wikipedia[citation needed] to fix this?

February 21, 2019

Still deeply burned out.

VirtualBox's .vdi files provide "sparse" block devices that grow as you use more space in them (up to the maximum size specified at creation time). The ext4 filesystem assumes any block device it's on might be flash under the covers, and attempts to wear level them via round-robin allocation.

Guess how these two interact! Go on, guess!

I set up a new VM, and because my previous one ran out of space I was generous about provisioning it, thinking it would only use the space when it actually needed it. After deleting two other VMs and a DVD iso and trying to figure out why a VM using 60 gigs in the virtual Linux system was consuming 160 gigs on the host...

I had a BAD DAY. And now I need to redo the VM from scratch because even if I could shrink the ext4 partition (the resize tool can grow them while mounted, but not shrink them), I dunno how to tell the emulator to give back the space it would stop using...

Darn it, I was excited about this, but no. The person who pointed me at it said it was a bash test suite that might help me with toysh being a bash replacement. But the readme didn't say what to _do_ to run th bash tests. I figured out that bin/bats with the thing to run, but its output with no arguments was useless and --help didn't really help either. I eventually figured out "bin/bats test" but then it only ran 44 tests and they tested the test suite, not the shell?

At which point I figured out that it's not a shell test, it's test plumbing written _in_ bash. That's useless, I've written and _published_ 2 sets of test infratructure in bash myself already (one in toybox, one in busybox). That's uninteresting, what's interesting is the _tests_, and this has none. And it's doing the "#!/usr/bin/env bash" thing which is INSANE: why do you trust /usr/bin/env to be there at an absolute path? Posix doesn't require that. Android (until recently) didn't even have a /bin directory. It's /bin/bash even on weird systems like MacOS X. The ONLY place that installs it but puts it somewhere else is FreeBSD, and that's FreeBSD-specific breakage. It's a fixable open source system: drop a symlink and move on. (Just like we all fix /bin/sh pointing to the defective annoying shell on debian.)

February 20, 2019

Upgrades to the su command came up recently, and it's been on my todo list forever: if you want to run a command as an arbitrary UID/GID, it's kinda awkward to do so with sudo or su because both conventionally want to look up a name out of /etc/passwd, and will error out on a uid with no passwd entry even for root. But these days with things like containers, there's lots of interesting UIDs and GIDs that aren't active in /etc/passwd. (And then there's the whole android thing of not having an /etc/passwd and using their version of the Windows Registry instead, because keeping system information in human readable text files is too unixy or something....)

So anyway, I want su -u UID and su -g GID[,gid,gid...] to work, at least for root. And I want to be able to run an arbitrary command line without necessarily having to wash it through a random command shell. And _implementing this is fairly straightforward. No the hard part is writing the help text to explain it, especially if I've kept compatibility with the original su behavior.

A word on the legacy su behavior: way back when setting a user's shell in /etc/passwd to /bin/false or /dev/null was a way of preventing anybody from running commands as that user. Then su grew -s to override which shell you were running as, so this stopped working from a security standpoint. (Besides, if you were running as root you could whip up a trivial C program to do it anyway, but the point was _su_ no longer enforced it.) And it let you specify -c to pass a command line to that shell so su could "run a command as a user" instead of being interactive, so this ability is already _there_ for most users, just awkward to use.

But su has an awkward syntax where it runs a shell and unrecognized options are passed through as options _to_the_shell_. (So the -c thing was kind of accidental at first.) So using su as sudo isn't just "su user ls -l", it's su user -s $(which ls) -l if you don't want to invoke a gratuitous shell in between. And defining new su options means they _don't_ get passed through to the shell.

What would have made sense was a syntax like xargs, where the first command that's not an option stops option parsing for the wrapper. But that's not what they did back circa 1972...

February 19, 2019

Burnout. So much burnout.

When I came to this job a year ago, I was interested in the technology. I was helping get realtime Linux running on an sh4 board. (The larger context was they shipped a Windows CE product back in the 90's, and Windows CE was being end of lifed by Microsoft. So this Microsoft shop was switching to Linux, which I'm all for and happy to help with. As for the sh4 boards, they had a bunch of this hardware installed at customer sites, and a large stock of part inventory to make more boxes with at the factory, so getting Linux running on those was useful to them.)

Coming _back_ in January was because the money was good, it was easy to just keep going, I didn't have another job lined up, and we've still got about half the home equity loan to pay off from riding down SEI.

But this time... they've already built up a reasonable Linux team (including people I know like Rich Pennington of ellcc and Julianne Haugh of the shadow password suite), all the new work is on standard x86 and arm boxes with gigahertz and gigabytes, they're using wind river's fork of yocto's fork of openembedded with systemd ru(n|i)ning everything, the application is still dot net code running on mono talking to a windows GUI app...

And I'm not entirely sure what I'm doing. Not "I don't know how to do this", I mean what am I trying to accomplish? What is this activity _for_?

I'm part of an enormous team where we have over a dozen people in a room for an hour twice a week going over excel spreadsheets reacting to comments on the design of things like "background file transfer" (strangely not rsync) which is somehow a 12 week project for over a dozen people, told "this is what you're doing this week" more or less via the waterfall method. There's an API document, an implementation of this API via gratuitous translation layer with a management daemon using dbus to talk to systemd, and then functions you plug in for a given architecture that the guy who wrote the daemon could have done in a couple hours.

I think this has turned into a "bullshit job". And I am unhappy. The money remains excellent, but... that's pretty much it.

February 18, 2019

If I titled blog posts, this one would be "Tabsplosion is a symptom of overload".

When I say "that's on the todo list", I'm fudging a bit. The toybox todo list does indeed have a todo.txt. And a And a todo2.txt, todo3.txt, todo/*.txt, and various commandname.txt files with notes on individual commands.

My toybox work directory (for a couple years now) is ~/toybox/toy3, following my convention of doing a git checkout in a directory with the name of the project, so various debris that doesn't get checked into git has someplace to lib. This _starts_ as ~/toybox/toybox and there's a ~/toybox/clean for testing that I've committed sane chunks and it builds properly. Eventually so much half-finished cruft builds up in my work directory I clone a clean one and do some "needs to happen fast" project in there, and keep the old one around in hopes of salvaging the old work. (Which, as with viewing bookmarked pages again, never happens. This is why I have so many open tabs, there's a _chance_ I'll get back to the todo item it represents.)

This is how I wound up with toy3. (And in fact a toy4 and toy5 that didn't stick.) Those other directories have their own todo files in them. (Much of which overlaps, but not all.)

And then there's ~/toybox/pending which is full of things like a checkout of Android's minijail, libtommath, jeff's fixed point library from the GPS stuff we did, my old dvpn code (from like 2001), the rubber docker containers tutorial I attended at, a CC0 reference implementation of sha3, snapshots of this and this in case the pages go down, and so on. The todo item is implicit in a lot of those. :)

I also have old tweet threads and blog entries and such that I should collate at some point. A lot of my todo items point to those.

As for the topic sentence, my todo list grows fastest when I don't have time to follow the tangents right now. So I make a note of it and move on.

February 17, 2019

The bus back from Minneapolis left at 9:25pm, and was supposed to get in at 3:30 am but got in at 4am.

I'm still using the giant System76 laptop from 2008, which is 6 years old but has 16 gigs of ram and 8x processor and a terabyte hard drive and is fairly reasonable now that I've gotten a new battery for it, except for 2 things. It's still fairly ginormous, and the hard drive is rotating media so I'm nervous using it in a high-vibration environment. Such as on my lap on a bus for 6 hours, even when there is a working outlet.

A coworker at Johnson Controls (Julianne Hough, the long-ago author of the Shadow password suite) has a "laptop" that's a tablet with a case and keyboard. Except it's a mac. I want an Android device that does that (and in theory I can get add a 128 gig sd card to however much built in storage the sucker has so I should be able to get something reasonable), but every time I actually buy something it's a cheap clearance device like the annual Amazon Fire tablet sales during "prime day", and they're so locked down that it's just not worth the effort to crack them. This is a structural problem: what I'm trying to to with toybox is turn android in a usable general-purpose computing environment you can actually use as a development workstation more or less out of the box, but they're terrified of the "evil butler" problem. (Which isn't _just_ a tablet problem, EFI TPM nonsense does this for PCs, there are periodic LWN articles on that.) You should be able to aftermarket blank 'em, but how you distinguish that from "an organized crime organization like the NSA or GOP sent a dude into your room for 30 minutes while you're at dinner and now your device serves them not you until they decide to assasinate you".

Sadly, I haven't installed devuan on the other System76 oversized monstrosity because firmware nonsense and too busy to care. I got email from System76 that they've introduced a laptop to their lineup that _isn't_ visible from space, but I don't trust them. If buying System76 _doesn't_ mean I can just slap an arbitrary Linux distro on it because it's full of magic firmware that never went upstream, what's the _point_? If I have to install a magic distro-specific Linux distro fork, I might as well get a GPD Pocket or something.

February 16, 2019

Hanging out with Fade in Minneapolis. I have deployed heart-shaped butterfingers at her. (It's her favorite candy bar, and there was a sale.)

Yay, the gitub pull request adding 0BSD to the license chooser got merged!

This means I have developed just enough social skills to disagree with someone about how to help without pissing them off to the point they no longer want to help! (Although it's still a close thing, I wouldn't say I'm _good_ at this. I'm still far too easily irritated and have to really _push_ to compromise. (In this case that would mean swallowing my principles and editing a wikipedia page directly.)

February 15, 2019

There are over 100 toybox forks on github. I did not expect that. Hmmm... The most forked of which just added a logo and half an "rdate" command, back in 2016...

The downside of 0BSD licensing is when you find a nice patch in an external repo that wasn't submitted upstream (or if it was, I missed it), I'm nervous about merging it because forks of toybox are not actually required to be under the same license.

In this case the repo it's checked into still has the same LICENSE file and no notes to the contrary, and I can probably rely on that, but I'm still nervous and like to ask. Submissions ot the list mean they want it in, which means it has to be under the right license to go in. The submission _is_ the permission grant, the specific wording is secondary.

The intent of the GPL was to force you to police code re-use: if you accidentally sucked GPL code into your project, you had to GPL your project. (In reality you just as often had to remove it again and delete the offending version, as Linux did with the old-time unix allocation function Intel contributed to the Itanic architecture directory back during the SCO trial. Solving infringement via a product recall and pulping a print run has plenty of precedent.)

Then GPLv3 hapened and "the GPL" split into incompatible versions, and suddenly you had to police your contributions just as hard, your GPLv2 or later project couldn't accept code from GPLv3 or GPLv2-only sources, and the easy thing to do was break GPLv2-only. These days there's no such thing as "The GPL" anymore, thanks to the FSF. "The GPL" fragmented into three main incompatible GPL camps (GPLv2 and GPLv3 can't take code from each other, and the dual license of "GPLv2 or later" can't take code from either one), and then there's endless forks like Affero GPL complicating it further. This means there is no longer a "universal receiver" license covering a united pool of all copyleft code into a single common community of reusability, which is why copyleft use has slowly declined ever since GPLv3 came out. These days with GPL code you have to police in both dirctions, incoming _and_ outgoing code.

0BSD goes the other way from the glory days of "The GPL": you have to be careful about accepting contributions (and I'm more paranoid than most about that, having been involved in more copyright enforcement suits than any sane person would want). But what that buys you is the freedom for anyone wanting to reuse your code elsewhere to just do it, whenever and wherever however they like. No forms to fill out, no signs to post, have fun. They don't even have to tell me if they did it. (The internet is very good at detecting plagairism, I'm not worried about that.)

A fully permissive license holding nothing back is the modern equivalent of placing the code into the public domain. The berne convention grants a copyright on all newly created works whether you want it to or not (the notice is just for tracking purposes of _who_ has the copyright, so you're not in the "the original netcat was written by 'hobbit', how do I get in touch with 'hobbit' or their estate?"), but there's no enabling legislation for disposing of a copyright. You can't STOP owning a copyright, except by transferring it to someone else.

And thus the need for public domain equivalent licensing. You can't free(copyrght) but you can work out a solution.

February 14, 2019

Date is funky. The gnu/dammit date didn't implement posix, and busybox gets it wrong. Time zones changing names because of daylight savings time.

Testing day of the week. Found a hack. Coded it up. Went to test it.

$ ./date -D %j -d 1
Sun Jan  0 00:00:00 CST 1900
landley@halfbrick:~/toybox/toy3$ busybox date -D %j -d 1
Thu Feb 14 00:00:00 CST 2019


The C API for this is kinda screwed up too, although we need a new one that handles nanoseconds anyway.

February 13, 2019

The biggest sign that "const" is useless in C is that string constants have been rodata forever, but their _type_ isn't because that would be far too intrusive.

Putting "const" on local variables or function arguments doesn't affect code generation (which has liveness anaysis). It can move globals from the "data" segment to the "rodata" segment, which is nice and the compiler doesn't get without whole-tree LTO because the use crosses .o boundaires, but everywhere else it just creates endless busywork propagating a useless annotation down through multiple function calls without ever affecting the generated code.

I periodically recheck on new generations of compiler to see if it's _started_ to make a diference, but I don't see how it can because liveness analysis already has to happen for register allocation/saving/restoring, and that covers it better than manual annotation can? In this respect "const" seems like "register" or non-static "inline", ala "Ask not for whom ma bell tolls: let the machine get it".

Sadly, even though I do add "const" to various toybox arrays to move them into rodata, the actual toy_list[] isn't const because sticking "const" on it wants to propagate down into every user through every function argument (otherwise it's warning city and in fact errors out about invalid application of sizeof() to incomplete types when I all did was add "const" in two places).

February 12, 2019

Phone interview with the side gig, I'd get to poke at a new architecture (we are the knights who say nios) which qemu has a thing for! But no musl support for it, and Linux support is out of tree? Really? (A whole unmerged architecture that people are still using?) It's frustrating there's no easy way to get qemu-system-blah to tell you what it provisions a board emulation with. (How much memory, I/O controllers, disks, network, USB...)

It would be nice if "qemu-system-nios -M fruitbasket --whatisit" could say these things. The board has to _know_ them, somehow. Maybe through the device tree infrastructure? I might try to teach it, but all my previous qemu patches languished unmerged for years. Not worth the effort.

February 8, 2019

Very very tired. Went off caffeine monday but it's 4 days later and still tired. Burned out, half days yesterday and today.

I turned down a job in Minnesota a recruiter offered me. 20% less money isn't a deal breaker, but... they're not on the green or blue lines? It's an hour and half each way to Fade's via public transit (green line, bus, then walk) so I'd need to get an apartment near the work site to avoid a longish commute from the university (and Fade), and they're in some sort of suburban industrial park where there are family houses but no efficiency apartments? And this employer moves to seattle in june anyway.

Contracting company at the recruiter I got the JCI job through wants me to skype with somebody for evening and weekend jobs. It would pay off the home equity loan faster...

February 6, 2019

I'm trying to build Yocto in a fresh debootstrap. You'd think this would be documented, but it's a bit like the "distros only build under earlier versions of itself" problem, because Yocto is a corporate pointy-haired project and Red Hat is Pointy Hair Linux.

As a first pass I want to run a yocto instance under qemu, but when I downloaded it yocto wanted me to install a bunch of packages like "makeinfo" that I don't want on my host system. Hence debootstrap chroot.

So install debootstrap (I used apt-get on ubuntu), then the wiki instructions say the setup is:

debootstrap stable "$PWD/dirname"

Where "stable" is the release name, next argument is the directory to populate, and the third is the repository URL to fetch all the packages and manifest data from.

So clone yocto (git clone git://, checkout the right branch (current stable appears to be "thud"), and then "source oe-init-build-env" and...

mount /proc /sys /run
apt-get install locales &&
locale-gen en_US.UTF-8 &&
su - user &&
cd /home/poky && 
source oe-init-build-env &&
LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 bitbake core-image-minimal
  echo en_US.UTF-8 UTF-8 >> /etc/locale.gen &&
locale-gen &&
update-locale LANG=en_US.UTF-8

What on earth is a uninative binary shim? All I can find is this and it's at best "related". It's downloading a binary it has to run on the system, and can't build from source. So much for building yocto on powerpc or sh4 or something. Thanks yocto!

Python 3 refuses to work right if you haven't got a UTF8 locale enabled, and yocto's bitbake scripts explicitly check for this and fail... but don't say how to fix it. So I read the python docs and downloaded the python 3 source code. Python's getfilesystemencoding() is calling locale.nl_langinfo(CODESET) (at least on unix systems), which comes from langinfo_constants[] in _localemodule.c in the Python3 source...

Right, you have to install the "locales" package, then run locale-gen, but the online examples showing how you can feed it a locale on the command line are wrong (including the one in the "Setting up your choot with debootstrap" section of the ubuntu wiki), it ignores the command line, you have to edit the locale.gen file to add the locale you want, then you need to update-locale to get it to use it, and THEN you can set the LC_ALL envornment variable.

Darn, yocto's parallism ignores "taskset 1 cmdline...". It's building on an 8x SMP machine so it's trying to do 8 parallel package downloads through phone tethering, and the downloads keep timing out and aborting. Hmmm... Google, google... It's bitbake controlling this, I can set the environment variable BB_NUMBER_THREADS to the number of parallel tasks.

Ok, core-image-minimal is currently building gnome-desktop-testing and libxml2. I object to 2 of the 3 words in this target name. I'll give them "image". Yeah, I accept that this is probably an image. But gnome-desktop-testing is neither core, nor minimal.

February 5, 2019

Doing release cleanup on sntp.c I hit the fact that android NDK doesn't have adjtime(). Grrr. I dowanna add a compile-time probe for this, and unfortunately while I have USE_TOYBOX_ON_ANDROID() macros to chop out "a" from the optstr, I never did the SKIP_TOYBOX_ON_ANDROID() macros (only include contents if this is NOT set) because I haven't needed them before now.

Sigh, I can just #define adjtime to 0 in lib/portablity.h. It's a hack, but android isn't using this anyway (they presumably set time from the phone baseband stuff via the cell tower clocks, not via NTP). It doesn't make the whole code stanza drop out like making FLAG(a) be zero would (then the if(0) triggers dead code elimination), but... I wanna get a release out already, it was supposed to happen on the 31st.

February 4, 2019

Ok, toybox release seriousness. What do I need to finish up to cut a release...

SNTP is the main new command and I've already used the "Time is an illusion, lunchtime doubly so" Hitchhiker's Guide quote. Oh well.

I've got an outstanding todo item from the Google guys about netcat, but it's a bug I found so I haven't quite been prioritizing it. (As in nobody else reported this bug to me, so it's not holding anybody else up.) Still, I got the ping (once they know about it, they wanted it fixed)...

February 3, 2019

Greyhound topped itself on the bus ride back to Milwaukee. Of course it left most of an hour late, when we got on it hadn't been cleaned (my seat's drink holder had an empty coke bottle in it), for the first time in my experience they checked photo IDs (and the woman behind me couldn't get on the bus because she hadn't brought hers, bus left without her), somehow 2 stops later every single seat was full even though they'd left a through-to-chicago passenger behind at the first stop, the outlets didn't work for the first 2 hours, and the heat was stuck on full the entire time and was somewhere over 80 degrees. (Eventually they opened the emergency exits on top of the bus and left them open so we wouldn't die, but it was never comfortale.) Around 6pm the bus tracker web page decided that before the next stop our bus would travel back in time to 1pm and continue on to retroactively reach chicago around 3:30 pm (going something like 200 miles per hour along that part of the route), and we were kind of looking forward to it by that point but alas, we were disappointed. Then they switched drivers in Madison, and the new driver started heading south straight to Chicago and had to BACK UP to go to Milwaukee when enough people checking Google Maps noticed and yelled at them. Over the intercom the driver claimed to have "missed an exit", and threatened to pull over and let anybody who complained out on the side of the road (we were in Janesville at that point, 40 miles south along I-90), and then drove back north (reconnecting with I-94 at Johnson's Creek) instead of taking I-43 diagonally to our destination. According to phone speedometer apps, on the trip north (along non-interstate roads) the bus sometimes got up to 55 miles per hour, but averaged less than that.

Still, I arrived in to Minneapolis only 2 hours late. Not my worst greyhound trip, but still memorable. (Beats the trip _to_ minneapolis where the driver intentionally triggered feedback on the intercom six times and said "wakey wakey" between each one as we got in around 1:30 am. I'm told Greyhound was an oil company ploy to discredit travel by bus and encourage individual driving instead. Given that the "buy up the busses and destroy them to promote freeways" plot in "Who Framed Roger Rabbit" is the part of the movie based on real events (in our world, they did it and won)...

There's also a significant element of "punishing people for being poor" going on here. I'm taking the bus not just because it's cheaper, but because between the shortage of direct flights from milwaukee to minneapolis gives me a lot fewer departure options, and even with a direct flight the "arrive 2 hours early at an airport many miles south of town" plus the minneapolis airport requiring multiple transfers to get Fade's apartment via public transportation (meanwhile greyhound is right on the Green Line, which lets off about 500 feet from Fade's apartment)... end result is the bus gets me there about as fast as flying, and if I'm lucky I can work the whole way. The bus terminal's a 15 minute walk from work without having to opt out of the Porno-Scanners for the Freedom Grope.

But there's very very strong signaling "this is for the Poors, you shouldn't be here if you have any other choice, we punish you now"... ("We" being "republicans", which is a "we" I personally am very much NOT a part of even when I'm not hanging out with the tired poor huddled masses yearning to breathe free that they despise so much.)

February 1, 2019

Our story so far: I got the record-commands plumbing checked into toybox and hooked up to mkroot, and along the way I found and fixed a sed bug that was preventing commands from building tandalone with toybox in the $PATH. (The regex to figure out which toys/*/*.c file this command lives in was returning empty, because -r wasn't triggering.)

So I fixed that, got the record-commands wrapper hooked up, built everything, and... all the targets built? Except I just fixed _sed_ and I knew the kernel build break was a _grep_ bug because replacing the airlock's grep symlink with a link to the host's grep made the build work! (I often do "what commands changed recently" guesses like that before trying to narrow it down systematically...)

Sigh. I pulled linux-git to a newer version so I'm not quite testing the same kernel source, or was it 4.19 or 4.20 I was testing? I hate when things start working again when I DIDN'T FIX THEM, it just means I lost a test case and whatever loose flakiness it revealed is still there but has gone back into hiding. It's possible switching grep versions changed something that got fed into sed, but that's still a bug: the output should be the same.

Darn it, now I've got to waste time figuring out how to break it again the right way.

January 31, 2019

Bus to Minneapolis so I can spend my birthday tomorrow with Fade.

I emailed Linus about arch-sh not booting, he pointed me at a pending fix that hadn't quite made it into mainline yet, and I confirmed it fixed it for me, but oddly has both my emails but not Linus's in between?

Yesterday's toybox build break wasn't a grep bug, it was a sed bug, which broke toybox building anything with toybox in the $PATH. (The regex to figure out which toys/*/*.c file this command lives in was returning empty, because -r wasn't triggering.) Apparently I haven't got a tests/sed.test that checks "does -r do anything".

January 30, 2019

It's -20F out. The expected high is -7. I got permission to work from home today. (Mostly poking at yocto and going "huh".)

There's some sort of bug in grep that's breaking the kernel build, but I haven't reduced it to a test case yet, and what I used to use for this sort of thing in aboriginal linux was my old command line logging wrapper. So I spent most of a day getting the command line wrapper logging merged into toybox and integrated into mkroot, and... the toybox build is broken by the same grep bug, which means the logging wrapper install won't work in the context of the airlock (I.E. I can't build toybox with toybox in the $PATH, due to the bug I'm trying to _diagnose_).

Going back to bed.

January 28, 2019

It's too cold. And we have 8 inches of snow. My normal 20 minute walk to work (12 if I hurry) took 35 minutes today, including helping push a stuck car out of an intersection (along with a snowplow driver who got out to push on the other side).

When I got in only two coworkers I recognized were here. I'd go home early, but I'm already here and outside is the problem.

January 26, 2019

Busy week at work, wasn't sleeping well. Meant to spend today working on toybox release, but spent it recovering instead.

The big overdue thing at work is "timesync", which is where the SNTP stuff comes in. Back in late October we tried to figure out how the box keeps its clock up to date: it was close enough to just doing standard NTP that people had glossed it over as NTP... but not quite.

First of all, it's using SNTP ("Simple Network Time Protocol"), which is a subset of the NTP protocol (same 48 byte UDP packets with fields in the same place) that oddly enough has its own set of RFCs, and then in NTPv4 it all got bundled into one big SNTP+NTP RFC that's more or less illegible. So I went back to the earlier ones and am pretty much just implementing the old stuff and asking wikipedia[citation needed] whether it's safe to ignore whatever they changed.

An SNTP client can read data from an NTP server (it just doesn't care about several of the fields), but an NTP client can't read from an SNTP server (the fields SNTP doesn't care about are zeroed), and windows "NTP servers" tend to be SNTP. So if you use the Linux NTP client with a windows server, it doesn't work. (That took a while to figure out, and started us down this whole tangent.)

The box needs to be able to act as an sntp client (sntp not ntp because some exiting installs use the windows server), and it needs to be able to act as an ntp server (possibly sntp would be good enough because the downstream boxes are also running our software, but nobody seems to have _written_ an sntp server for Linux, because full NTP server works for SNTP client). And then it's got multicast.

Multicast? Yeah, there's a multicast variant in the sntp RFC, and JCI implemented it in old stuff (back in the 90's), but it's not working for some reason and it's .NET code which is a language I don't know (which isn't entirely a blocker but does slow me down) and which I haven't got a build environment for (which is the real blocker). And the ISC reference implementation in C doesn't appear to do multicast (because it's not 1996 anymore).

Note: Napster pretty much killed off Multicast starting around 1999. No podcasts use multicast. Youtube, Netflix, Hulu, and Amazon Prime do not use multicast. The original use case for multicast was "all that" and when it arrived it didn't, which means there isn't really a use case out there for it. The Mbone shut down years ago. Wikipedia[citation needed] says it's still used inside some LANs to do hotel televisions and stuff, but it's not routed through the wider internet anymore, and there really isn't a modern userbase for it, just the occasional LAN-local legacy install.

Instead we got MP3 and MP4 compression which shrinks data to 1/10 of its original size but means a single dropped packet is fatal. (As you can see with HDTV broadcasts "smearing" when the signal is marginal; and that's with a lot of effort put into implementing recovery!)

But JCI wants multicast because the old one they're replacing did multicast and they want to sell the Linux image as a strict upgrade to the WinCE image on the same hardware, without a single dropped feature. And long long ago their salesbeings pushed multicast as a Cool Thing We Can Do. So I wound up reading the RFC and writing a new one in C.

P.S. Although there isn't a Linux SNTP server, there _is_ a Linux SNTP client. It's one of the binaries the ISC source tarball _can_ build, but generally doesn't. I'm trying to convince buildroot to enable it. I suspect this was last tested by an actual human a decade ago, but we'll see...

January 23, 2019

Added multicast support to the sntp stuff. Should probably not name the multicast enabling function leeloo_dallas() but I've had enough sleep deprivation lately that's the sort of name I'm using. (Look, my brain takes the word "multicast", sticks a fifth elephant reference on the front and sings the whole thing to camptown races (doo dah, doo dah). When I'm tired enough this sort of thing leaks out into the outside world.)

All the config is on the command line: if you "snmp" it queries the server, prints the time, and how off the current clock is. Adding -s sets it, -a sets it via adjtime().

I initially had it so you could list as many servers as you liked on the command line and it would iterate through them, but if it switches between ipv4 and ipv6 I'd have to reopen the socket and I dowanna.

January 20, 2019

Ok, I need record-commands from Aboriginal Linux (which is built around wrappy.c), and rather than just dumping them into scripts/ I want to break that up into make/ and tests/harness...

Except that directory also has bloatcheck and showasm (halfway between build and testing), and which generates documentation (is that build?) and I have a todo item to split up into a script that generates the headers and a script that builds the .c files. I think all the second half of is using from the first half is the do_loudly() function (which turns a command's output into a single dot unless V=1 is set)...

January 19, 2019

Working on sntp, and FreeBSD build/testing.

January 18, 2019

Darn it, poking at mkroot and I updated toybox to current git and swapped in "test" with the newly promoted toybox version, and the Linux kernel build is breaking on all architectures. And it's a funky one too, even on a -j1 build it goes:

  LD      vmlinux
  SORTEX  vmlinux
make: *** [vmlinux] Error 2

That provides no information about what went WRONG! Thank you make.

Which means I need to dig up my old command line wrapper from Aboriginal Linux; I should probably stick it in the toybox scripts/ directory, except that's geting pretty crowded with build and test infrastructure. (I provide make wrappers as a gui and "make help" lists the options but DEPENDING on make is uncomfortable, it would be nice if running stuff directly was easy to not just do, but figure out at a glance...)

I should split scripts/ up somehow. I can move the make stuff into a make/ subdirectory, but then scripts/ isn't all the scripts so shouldn't be called that. The problem is "tests" is a bunch of *.test files, one per command, and I'd like to keep that accessible and clean. It's already got a tests/files directory under it that's a bit awkward, but manageable. I could put tests/harness under there with the infratructure part, but then running it would be tests/harness/ which is awkward. I could put "harness" at the top level but then it's much less obvious what the name means. Hmmm... tests/commands/sed.test? A top level tests directory with _three_ things under it?

Maybe I should add symlinks to the top level, ./ and ./ pointing into the appropriate subdirectory where the infratructure lives...

Sigh. Naming things, cache invalidation, and off by one errors remain the two biggest problems in computer science.

January 17, 2019

Human reaction time is measured in milliseconds, plural. A 60fps frame rate is a frame every 17 milliseconds. Computer reaction times are measured in nanoseconds. A 1ghz processor is advancing its clock once per nanosecond.

Those are pretty much the reason to use those two time resolutions: nanoseconds is overkill for humans, and even in computers jitter dominates at that level: DDR4 CAS latency's like 15 nanoseconds, an sh4 syscall has an ~8k instruction round trip last I checked, even small interrupts can flush cache lines...) Meanwhile milliseconds aren't enough for "make" to reliably distinguish which of two files is newer when you call "touch" twice in a row on initramfs with modern hardware.

64 bits worth of milliseconds is 584 million years, so a signed 64 bit time_t in milliseconds "just works" for over 250 million years. Rich Felker complained that multiplying or dividing by 1000 is an expensive operation (doesn't boil down to a binary power o 2 shift), but you've already got to divide by 60, 60, and 24 to get minutes, hours, and seconds...

Using nanoseconds for everything is not a good idea. A 32 bit number only holds 4.2 seconds of nanoseconds (or + or - 2.1 seconds if signed), so switching time_t to a 64 bit number of nanoseconds would only about double its range. (1<<32 seconds is just over 68 years, 1970+68 = 2038 when signed 32 bit time_t overflows. January 19 at 3:14 am, and 7 seconds.)

Splitting time_t into a structure with seperate "seconds" and "nanoseconds" fields is fiddly on two levels: keeping two fields in sync (check nanoseconds, then check seconds, then check nanoseconds again to see if it overflowed between the two and you're off by a second), _and_ the fact that you still need 64 bits to store seconds but nanoseconds never even uses the top 2 bits of a 32 bit field, but having the seconds and nanoseconds fields be two different types is really ugly, but guaranteed wasting of 4 bytes that _can't_ be used is silly, but if you don't a 12 byte structure's probably going to be padded anyway...

And computers can't accurately measure nanoseconds: A clock crystal that only lost a second every 5 years would be off by an average of over 6 nanoseconds per second, and that's _insanely_ accurate. Crystal oscillator accuracy is typically measured in parts per million, each of which is a thousand nanoseconds. A cheap 20ppm crystal is off by around a minute per month, which is fine for driving electronics. (The skew is less noticeable when the clock is 37khz, and does indeed produce that many pulses per second, and that's the common case: most crystals don't naturally physically vibrate millions of times per second, let alone billions. So to get the fast rates you multiply the clock up (double it and double it again), which means the 37000.4 clock pulses per second becomes multiple wrong clock pulses at the higer rate.

The easy way to double a clock signal is with a phase locked loop, a circuit with a capacitor and a transistor in a feedback loop that switches from "charging" to "discharging" and back when the charge goes over/under a threshold, so it naturally swings back and forth periodically (which is trivial to convert to a square wave of high/low output as it switches between charging and discharging modes). The speed it cycles at is naturally adjustable: more input current makes it cycle faster because the capacitor's charging faster, less current makes it cycle slower. If you feed in a reference input (add an existing wave to the input current charging the capacitor so it gets slightly stronger/weaker), it'll still switch back and forth more or less constantly, but the loop's output gradually syncs up with the input as long as it's in range, which smooths out a jittery input clock and gives it nice sharp edges.

Or the extra input signal to the PLL can just be quick pulses, to give the swing a periodic push, and it'll sync up its upswing with that too. So to double a clock signal, make an edge detector circuit that generates a pulse on _both_ the rising and falling edges of the input signal, and feed that into a phase locked loop. The result is a signal switching twice as fast, because it's got a rising edge on _each_ edge of the old input signal, and then a falling edge halfway in between each of those. Chain a few doublers in sequence and you can get it as fast as your transistors can switch. (And then divide it back down with "count 3 edges then pulse" adder-style logic.

But this also magnifies timing errors. Your 37khz clock that's actually producing 37000.4 edges per second becomes multiple wrong nanosecond clock ticks per second. (You're still only off by the same fraction of a percent, but it's a fraction of a percent of a lot more clock pulses.) Clock skew is ubiuitous: nno two clocks EVER agree, it's just a question of how much they differ by, and they basically have _tides_. You're ok if everything's driven by the same clock, but crossing "clock domains" (area where a different clock's driving stuff) they slide past each other and produce moire patterns and such.

Eventually, you'll sample the same bit twice or miss one. This is why every I/O device has clock skew detection and correction (generally by detecting the rising/falling edge of signals and measuring where to expect the next one from those edges. Of course you have to sample the signal much faster than you expect transitions in order to find the transitions, but as long as the signal transitions often enough it lets you keep in sync. And yes this is why everything has "framing" so you're never sending an endless stream of zeroes and lose track of how MANY zeroes have gone by, you are periodically _guaranteed_ a transition.).

Clock drift isn't even constant: when we were working to get nanosecond accurate timestamps for our syncrophasors at SEI, our boards' thermally stabilized reference clock (a part we special-ordered from germany, with the crystal in a metal box sitting on top of a little electric heater, to which we'd added half an inch of styrofoam insulation to keep the temperature as constant as possible and then put THAT in a case) would skew over 2 nanoseconds per second (for a couple minutes) if somebody across the room opened the door and generated an _imperceptible_ breeze. (We had a phase-locked loop constantly calculating the drift from GPS time and correcting. And GPS time is stable because the atomic clocks in the satellites are regularly updated from more accurate atomic clocks on the ground. In the past few years miniature atomic clocks have made it to market (based on laser cooling, first demonstrated in 2001), but they're $1500 each, 17 cubic centimeters, and use 125 milliwatts of power (thousands of times the power draw of the CMOS clock in a PC; not something you run off a coin cell battery for 5 years).

Sigh. Working on this timing SNTP stuff, I really miss working on the GPS timing stuff. SNTP should have just been milliseconds, it's good enough for what it tries to do. In toybox I have a millitime() function and use it for most times. (Yes another one of my sleep deprivation names. "It's millitime()". And struct reg* shoe; in grep.c is a discworld reference. I renamed struct fields *strawberry in ps.c already though.)

Rich Felker objected that storing everything in milliseconds would mean a division by 1000 to get seconds, and that's expensive. In 2019, that's considered expensive. Right...

January 16, 2019

Sign. No Rich, that's not how my relationship with Android works. I cannot "badger Android until they fix this nonsense".

I have limited traction and finite political capital. Leading them with a trail of breadcrumbs works best, which means I do work they might find useful and wait (often years) for them to start using it. And I can explain _why_ I want to go in a certain direction, and what I hope to achieve, and make as compelling an argument for that vision as I can.

But often, they've already made historical technical decisions that then become load-bearing for third party code, and you can't move the rug because somebody's standing on it. And their response is more or less "that might have been a nice way to go way back when, but we're over here now".

I'm trying to clean out the rest of the BSD code so that they're solidly using toybox, and making it so they can use as much of "defconfig" as possible. If the delta between android's deployment and toybox defconfig is minimized, then adding stuff to defconfig is most likely to add it to android. (This maximizes my traction/leverage. But it's _always_ gonna be finite, because they're way bigger than me.)

This means work on grep (--color), mkfs.vfat, and build stuff. The macos (and now FreeBSD) build genericization helps, as does the android hermetic build stuff. (Getting them closer to being able to use my build infrastructure, although they haven't got make and don't like arbitrary code running in their build.)

It's a bit like domesticating a feral cat. Offer food. Then offer food in the utility room. Except instead of a feral cat, one of the biggest companies in the world has a large team of full-time employees that's been doing this for 20 years now (The "Android One" came out in what, 2007?) which is constantly engaging with multiple large teams of phone vendor developers, collectively representing a many-multi-billion dollar industry that on such a vastly different scale they can't even _see_ me.

I can't even afford to work full time on this stuff. I'm doing what I can. You wanna post your concerns on the toybox list, go for it.

January 15, 2019

Sigh, $DAYJOB needs sntp, so let's do that for toybox...

Reading RFC 4330 (well a half-dozen RFCs, this has had a lot of versions and the new ones have added useless crap that's more complexity than help). Oh great, this protocol doesn't have a Y2038 problem, it has a Y2036 problem. They have a 64 bit timestamp: the bottom 32 bits of which is fraction of a second (meaning they devote 2 bits to recording FRACTIONS OF A NANOSECOND), leaving them 32 bits for seconds... starting from January 1 1900. For a protocol designed in the 1980's. So they ate 2/3 of the space before the protocol was _designed_. That's just stupid.

Anyway, the common workaround is if the high bit's _not_ set then it wrapped, which buys another 60 years or so. Still utterly insane to design the protocol that way.

January 14, 2019

Exhausted. Not sure I slept at all last night, just lay awake in bed. Is it possible to get jetlag without changing time zones?

Back at work: spent most of the day going through a month of missed email. They assigned a number of issues to me.

Back in my apartment, the manager was happy to see me and had a desk and a bed in storage, and says he'll replace the gas stove with electric (yay!). They should really put some solar panels on this building. (They don't just go on the roof, you can put them down the sides of tall buildings too, you don't even have to worry about sweeping the snow off of those.)

Poking at patch.c because I got reminded of todo items. Trying to add fuzz factor, which was easy enough (and my design for it's better) but... there's no tests/patch.test, and I don't seem to have patches that _require_ fuzz factor lying around.

I _used_ to just throw new commands through Aboriginal Linux and the LFS build, which was applying lots of patches. I suppose I could dig through the repo there and find where I adjusted them to eliminate fuzz factor. (Because even though I ported toybox patch to busybox over a decade ago, they still haven't added fuzz support to it. There's a lotta that going around, where things I was planning to do ages ago still aren't done in various projects, and it ranges from crickets to insistence that status quo is perfect and we've always been at war with eastasia. (People declared busybox "done" at the 1.0 release, which was before the majority of my contributions and long before you could use it in a build environment. Thing didn't happen therefore shouldn't happen is a failure of imagination. As Howard Aiken said long ago you don't need to worry about people stealing your ideas. Heck, I've been trying to get people to steal my ideas for a very long time, in a Tom Sawyer "paint the fence" way so I don't have to do it myself.

January 13, 2019

Flight back to Milwaukee. Sigh. Conflicted, but... this is the path of least resistance, and I know I can do it. (Neither Google nor the phone vendors will pay me to do Toybox or the android self-hosting stuff, nobody's interested in mkroot (hardly anybody was intersted in aboriginal even after I got it building LFS), and I can't afford to just do open source all the time. Gotta pay the mortgage. (I should really try to at least pay off that home equity loan this time.)

Got a hotel. It's $130/night, that's more per week than my old efficiency apartment here cost in a month. I should try to get that back in the morning. (They hadn't rented it out last I heard, and it's paid through the end of the month since I have to keep paying for it until they rent it out or 60 days goes by.)

I wrote up a thing about how patches work, because somebody on the list asked. I should collect and index those somehow, I suppose...

January 12, 2019

I committed a fix:

> Which is the "mode" of the symlink, except that mode says the filetype _is_ a
> symlink and you can't O_CREAT one of them so it's gonna get _really_ confused...
> Try now? (I added a test.)

Except that's inelegant (race condition between dirtree population and this stat, filesystem can change out from under us change?) and we're _supposed_ to feed dirtree the right flags so the initial stat() is following or not following the symlink appropriately. Why is it not doing that in this case... Hmmm...

January 11, 2019

Broke down and told chrome _not_ to restore state, just let it forget all those todo items. So now I have one window with only a dozen or so open tabs, which can restart itself without wasting half an hour fighting with it every time I open my laptop. I give it a week.

I should really pack my suitcase...

January 10, 2019

The battery on my laptop no longer holds ANY charge. Unplug it and it switches off instantly. Serious crimp in my "wander out somewhere and program for a bit at a quiet table" workflow. Even when I go somewhere with an outlet (which I now feel guilty about because I'm costing the place money, even if it's only a few cents), it loses all context going there and going back. Complete reboot each time.

And convincing chrome NOT to reload 8 windows with 200 tabs each in them (maintain the todo item links but leave the tabs in "failed to load" state rather than trying to allocate 30 gigabytes of RAM and max out my phone tether for 2 hours) is a huge pain. Doing "pkill -f renderer" USED to work but now SOMETIMES works, sometimes causes tabs to hang (still display fine but I can't scroll DOWN and it won't load new contents in that tab, but I can cut and paste the URL to a new tab that WILL load it so the URL is retained which is all I really wanted), and sometimes randomly crashes the whole browser process. Even pointing /etc/resolv.conf at while chrome starts up to force the resolve to fail no longer prevents the reloads, these days it just _delays_ its load; it tries to reload periodically and once it can reloads everything.

They keep "upgrading" chrome to make it a worse fit for my needs, and of course I can't stick with old versions because "security". (You can sing "cloud rot" to the tune of Love Shack.)

January 9, 2019

Looming return to milwaukee, starting to get paralyzed. Fade flies out tomorrow, although essentially it's tonight so early in the morning (she and Adverb are visiting family in California before heading back to minneapolis for the spring semester, both her sisters live there and I think more of her family is flying in for a reunion?)

I should get a plane ticket, but the TSA and air traffic controllers miss their first paycheck on Friday. Bit reluctant to fly with air traffic controllers considered "nonessential"... (Bit reluctant to _eat_ with FDA inspection considered nonessential.)

January 8, 2019

Visited the eye doctor for my 6 month follow-up. Not obviously going blind! Yay!

Eyes dilated, not a lot of programming today.

January 7, 2019

Wandering back to an open tab in which I have:

$ truncate -s $((512*68)) test.img && mkfs.vfat test.img && dd if=/dev/zero of=test.img seek=$((0x3e)) bs=1 count=448 && hexdump -C test.img

Which at the _time_ was the smallest filesystem mkdosfs would create. (The dd blanks some stuff that varies gratuitously between runs so I can diff two of them and see what changed when I resize the filesystem.)

But now I'm running a newer dosfstools version and it's saying that 512*100 is the smallest viable filesystem. And THAT is clearly arbitrary. Sigh, I should look up the kernel code for this and see what the actual driver says.

January 6, 2019

Rebuilt mkroot with linux-4.20 (after rebuilding the musl-cross-make toolchains with current musl). The s390x kernel wants sha256sum now.

Sigh. Throw another binary in the PENDING list of the airlock install in toybox/scripts/ (It's in the roadmap.)

January 5, 2019

Attempting to install devuan on the giant new laptop, because the ubuntu they stuck on it has systemd and it's possible I'd use a BSD first. Devuan is basically a debian fork retaining the original init system and with a really stupid over-engineered nigh-unmaintainable mirror overlay system written in python. (I have no idea why they did that last part, and hope it's merely a transitional problem.)

The System76 bios is "black screen with no output" until their ubuntu boots, which is kinda annoying. I guessed "reboot several times and hit escape and alt-f2 and so on a lot during said blackness" and eventually got a bios screen that let me boot from a USB stick.

Devuan's installer is really _sad_ compared to Ubuntu. What Ubuntu did was boot to a live CD, then run a gui app. That's basically copying the cutting edge knoppix technology from 2003 (which is 15 years ago now), and they've been doing it since... 2004 I think?

Devuan started with a menu of multiple install options (I have no clue here and cannot make an informed decision, STOP ASKING ME FOR INFORMATION I DO NOT HAVE YET), but all of them seem to go to a fullscreen installer with a font that's way too small for comfort, and no way to change it. Ok, soldiering on: it's freaking out that I used unetbootin to create the USB boot stick, promising a plague of locusts and possibly frogs if I continue. But it doesn't say how I SHOULD have created it, and it seems to be working fine, so I ignored it and continued.

It's refusing to provide binary firmware for the wireless card (iwlwifi-8265) because Freedom Freedom Blue Facepaint Mel Gibson. If a manufacturer was too cheap to put a ROM in their hardware and they expect the driver to load the equivalent data into SRAM, debian sits down in the mud and sulks. Great.

I think I've found where to get the firmware from debian, but "devuan ascii" isn't clearly mirroring any specific debian distro? (The previous ones were, the newest one... isn't.) The instructions say to put it in a "/firmware" directory on the USB stick, which seems separate from _booting_ from the USB stick...) All the devuan ascii docs say that all necessary firmware is bundled. Hmmm...

Ok, downloading the 4+ gigabyte "DVD" version of the devuan installer (for a complete offline install) to make a new USB stick from, and I should try to fish the firmware files out of the system76 ubuntu install before wiping it. (There's a certain amount of "should I use the 2 gb hard drive of the 1gb flash drive" for this install, I left the flash disk in because it's already there and I don't ever intend to use systemd ubuntu.)

This has already eaten all the time I allocated to poke at this.

January 3, 2019

Three days of rain and I've gotten nothing done. Barely left the house. I'm not recovered enough from seasonal affective disorder yet for the gloom outside not to put me in hibernation mode.

I was ok moving up to milwaukee in January from Austin, that was a discontiguous break and my internal clock did not adjust. But staying in milwaukee for 3 months while the days got shorter, _that_ screwed me up.

Partly it's that the sun coming up reliably knocks me out, because college. The last couple years at Rutgers were primiarly night courses due to governor Witless destroying the comp-sci program with stupid budget cuts so the lost _all_ their full-time faculty (including the head of the department; if you're denied tenure you _can't_stay_ past 5 years and they blanket denied tenure to everybody, and comp-sci had only peeled off of the physics department to become its own thing 4 years before the budget cuts...). This was the #2 most popular major on campus after "undecided" and everything had to be taught by adjuncts after their day jobs, and now you _couldn't_ complete it without lots of night classes. So I'd get home long after sunset and do more programming, then the sun would come up and I'd go "oh, didn't realize the time" and go to bed. (Which was fine if I didn't have to catch a bus to go back to class until 3pm or so.)

Now the sun coming up knocks me out. Being awake at night is fine... until the sun comes up. When my alarm's set at 6:30 am and the sun comes up over an hour later, getting up in the morning is a _problem_. And that sort of anchors the rest of it...)

January 2, 2019

Did a little research for the multicast doc in the ipv4 cleanup stuff.

Multicast failed to take off because improved compression schemes (like mp3 and mp4) greatly restricted storage and bandwith requirements of media while rendering partial delivery of data useless, and due to the widespread deployment of broadband internet via cable modem and DSL. The decline of multicast started in 1999 when Napster provided a proof of concept that distributing MP3 files via unicast could scale. RealAudio quickly lost market share to unicast media delivery solutions. These days Youtube, Netflix, Hulu, and Amazon Prime all use unicast distribution.

The decline started 20 years ago and the multicat mbone (which this address range was reserved for) essentially ceased operations about 15 years ago. The last signs of life I can find are from about 2003.

Multicast was never widely used, the range was allocated for growth that did not occur, and remaining users are treating it as a LAN protocol which could use any other LAN-local address range their routers were programmed to accept. Note also that LAN-local multicast was conserving bandwidth on 10baseT local area networks, and we have widely deployed cheap gigabit ethernet now (with 10gigE available for those who want to spend money).

Reserving 268 million IPv4 addresses for multicast, in 2019, is obviously a complete waste. We can put them back in the main pool.

Back to 2018