Rob's Blog rss feed old livejournal twitter

2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002

June 2, 2019

Shell design!

I need to parse a command line into an intermediate format, and save that intermediate format for loops and functions and so on. The intermediate format does _not_ have environment variables resolved, but needs to parse them to know where ls dir/$(echo hello; echo walrus)/{a,b}* ends.

This is the big design blocker I stopped on last time, because there's inherent duplication and I want one codepath to handle both, a bit like find has. But it's fiddly to work out. In this first pass, things like "$@" and *.{a,b,c} don't expand into multiple arguments, but in the second pass they do.

Hmmm, earlier I was doing a lot of linked lists (which I thought were more nommu friendly), but now I'm doing more realloc() arrays because Linux on nommu has per-process heaps so small memory allocations aren't as terribly fragmenty as they would be on bare metal. I think struct pipeline makes more sense as an array, except it would need a "number of entries" count (unless I null terminate it?)

The other design hiccup was that I wanted to avoid gratuitous copying of strings (use the environment string passed to -c directly, mmap the file and use it directly, if all else fails the memory returned from getline() can be used directly... except them I'm mixing allocation contexts where "echo a $BLAH c" had some arguments you don't free() and some you do when you're done, and tracking them was annoying.

Get it working first, _then_ optimize... (I've been researching this on and off so long there's been years of optimization ideas mixed into the design, which is premature optimization.)

Parsing a pipeline into intermediate format needs to understand continuations, which means if/while, {, and (, plus $(( and strangely ${ although I'm not sure what you could legitimately put in there that would resolve? Fiddle fiddle... Ah, ${BLAH: 1 : 3 } works. But you can't have a space (or newline) before that first colon. This is such an ad-hoc undocumented mess. Right. At a guess number parsing eats leading and trailing whitespace. (And while blah; do thingy needs the newline/semicolon there because do would be an argument to while otherwise.)

If I go BLAH=12345 and echo $BLAHa it doesn't show 12345a, which environment variables are set has no influence on parsing, it does the lookup after determining the bounds of the variable. (Hence the ${blah} syntax.) It takes letters, digits, and the _ character, but not the rest of the punctuation. There's probably a posix stanza on this buried in the mess, and might also be something in the bash man page. I gotta re-read both of those, it's been ages...

And of course the problem with line continuations is the design right now is a getline() loop that calls a function on each line, so when I have a "I need more data" continuation point, it has to _return_ that and then the outer loop needs to prompt differently or something. Of course I have global variables for the prompt state. Hmmm...

$ bash -c 'ls $('
bash: -c: line 0: unexpected EOF while looking for matching `)'
bash: -c: line 1: syntax error: unexpected end of file

Not quite the error message I expected, but yeah, that's the logicalish thing to do there. (Quotes, another thing to line continuation. And of course here documents. I'm gonna have to whack-a-mole this adding one more thing at a time...)

But the fiddly part is still that parsing has to understand if; then; fi _and_ command execution has to understand it too, and I really don't want that in two separated places that can get out of sync. Hmmm...

Why is ; on its own line an error? I can hit enter. It only ends a nonzero command? Two trailing ; are an error? (They aren't in C...) Why is bash complaining about this? Huh, the Defective Annoying SHell is doing it too, is this some posix corner case?

"ls one;cat" even mid-argument the semicolon ends the line when not escaped, got it. Same for "(" for some reason (when would that be valid mid-word?) And of course:

$ cat blah<(echo hello)thing
cat: blah/dev/fd/63thing: No such file or directory

Bash's parsing is a bit self-defeating at times. Why would you recognize that THERE? What's the POINT? When could it serve a purpose?

Another fun thing is I run a lot of tests from the command line, like:

$ echo one\
> two
$ echo one\;two

Which are non-obvious how to turn into tests/sh.test entries. I just did:

$ sh -c "$(echo -n 'echo one\\\ntwo')"
$ bash -c "$(echo -n 'echo one\\\ntwo')"

And the Defective Annoying SHell (which is what devuan points /bin/sh to) behaved differently from bash, but bash is NOT behaving like it just did from the command line! (readlink /proc/$$/exe says /bin/bash because my account's shell in /etc/passwd is /bin/bash; this is the compromise ubuntu made to avoid admitting they made a mistake redirecting /bin/sh to point to dash in 2006, change everybody's login shell so you're never actually using dash at the command line, and every #! in the world changes to explicitly say "bash" and we all avoid dash THAT way. Idiots.)

Let's try again...

$ bash -c "$(echo 'echo -e one\\\ntwo')"
$ dash -c "$(echo 'echo -e one\\\ntwo')"
-e one

Of course. And notice the newline IN THE SECOND ONE. Grrr...

It is really hard to implement shell unit tests _in_ shell scripts. Anyway, can of worms for another day, add comments with the command lines I ran and work out how to turn that into regression tests later...

June 1, 2019

Discovered that if you pull down TWICE from the top of the phone screen (I.E. try to open the pulldown menu when it's already open), the pulldown menu expands and THEN you have a brightness slider and the little gear that lets you edit what's in the original pulldown.

Only took me a week of owning the phone to figure out how to do really basic things with it that were obvious last version.

May 31, 2019

Posted about the ls -l / android issue to the list, waiting to hear back about what they're trying to accomplish here.

Upgrading tar to extract the old tarball (to unblock the aboriginal linux linux from scratch build control image experiment). It's not very well documented and I have one example. Maybe I should dig up more old tarballs to test against, but this has "rathole" written all over it...

May 30, 2019

Installed adb to push stuff to my phone, and "adb shell" can ls /, but it complains about trying to stat a lot of directories.

The current dirtree() infrastructure is opening directories without O_PATH, and won't recurse if stat() fails (because it can't tell it's a directory). Hmmm.

I did a fresh aboriginal linux checkout and I'm trying to plug current toybox into it, building the old versions of all the other packages (including the kernel). If I can get it building, I want to plug in the old Linux From Scratch 6.8 build control image and try to build that.

This is a regression test I haven't done in forever, which I stopped doing because I couldn't update the old toolchain (for license reasons) and then updating the kernel got progressively more painful with the old toolchain, and I rebased to mkroot but still haven't reproduced all the infrastructure the old build had to do an automated LFS build, so... let's try to plug current toybox into the old thing and see what happens.

One problem is I haven't supported uclibc in forever, might need to swap in musl too but let's see what uclibc does first...

Heh, there's an old toybox patch to work around a uclibc bug I never merged (because that nonsense doesn't belong in the toybox repo, but I need it here). The old patch doesn't remotely apply to current grep, had to rewrite it.

Toybox tar can't extract the genext2fs tarball, because it's _so_ old it doesn't say "ustar" in the header. Huh.

May 29, 2019

Finally got the 0.8.1 release out. I forgot to update the date of the release notes from when I started writing them to when I actually got the release out (mkroot wouldn't build because AOSP wants tar to call "bzip2 -d" and toybox hasn't _got_ bzip2 compression side just bzcat, so I had to make it try both and fall back).

I want to fix "ls -l /" on Android Pie, which is basically the same bug as 527045debecb. I installed terminal, ran ls -l as my first command, and got "permission denied" on "." with no other output.

Elliott fixed xabspath() but I'd like to fix dirtree(). The current dirtree() infrastructure is opening directories without O_PATH, and won't recurse if stat() fails (because it can't tell it's a directory). Hmmm, needs some design work.

The real question is what's Android trying to _prevent_ here?

May 28, 2019

Fade's back!

No blogging. Fade's back.

May 27, 2019

New phone doesn't have wifi hotspot in the pulldown list of stuff.

The pulldown has a wifi icon, and if I hold it down it goes to a wifi menu, but it's not the HOTSPOT menu, it's too far down the selection tree. The rest are bluetooth (don't care), do not disturb, flashlight, display (android M had the brightness slider in the pulldown, this is less good), and battery. If there's any way to change the list of what's in the pulldown, I haven't found it yet.

So what I have to do to enable the wifi hotspot is exit to the desktop, pull up, gear icon, network and internet, hotspot & tethering, wifi hotspot, on. Lateral progress. Android M had a brightness slider in the pulldown menu, this makes me go into a menu (so turning the brightness back _up_ if I left it down and wind up in sunlight where I can't see the display is basically impossible with this phone now, instead of blind fumbling that takes three or four tries like with the old one).

May 26, 2019

Coming up to the end of the 3 months I said I'd take off.

My ELC talk that got approved was "toybox vs busybox" and I don't want to say "busybox has a usable shell, toybox doesn't" as one of the big differences _still_, so trying to get the shell at least basically usable before I have to go disappear into the land of $DAYJOB for who knows how long because none of the android ecosystem wants to pay me to work on this stuff.

May 25, 2019

New phone. Went in to try to enable it yesterday and worked out I'd have to buy a new sim card for $30, went home to try to fish the old sim card out of my old phone and got talked out of it by Jeff D. who explained that the encryption algorithms in the sim card get quietly updated regularly so switching to a new sim card periodically is a good thing.

So today I went back and gave T-mobile its gratuitous $30 profit. (If I'd bought the phone through them they'd throw in a sim card for free, but I wanted one guaranteed to be unlocked.)

I wound up using the default AOSP image it came with rather than trying to image it because I need a working phone. As with the last 3 phones...

May 21, 2019

Taking some notes for the "toybox vs busybox" talk I volunteered to do in August. I was maintainer of busybox for a while, and had written about 1/3 of its code when I handed it off, and I can at least explain what I was trying to accomplish.

I also created and maintain toybox, I can certainly explain what I'm trying to do here and why I couldn't do it in a busybox context. And there was also a period between leaving busybox and refocusing toybox on android where I maintained it outside of busybox for my own reasons, largely "better design"...

So I don't have a shortage of material. But ELC shortened its talks from an hour to 45 minutes a few years ago, and I should probably leave 15 minutes for questions...

May 20, 2019

Oh good grief, no it is not called that and only ever WAS by Richard Fontana, who is weird about it to this day. Stop deadnaming my license!

Richard Fontana made a mistake, refused to admit the mistake, tried to get SPDX to replicate his mistake so he didn't look so weird standing out like that, defended his mistake for _years_ after losing that battle with a shifting array of reasons (his conclusion never changed but his justification for it constantly did), and when he finally got outvoted at OSI has done his best to memorialize his mistake everywhere he has control over (such as the OSI page on the license).

Long before I got that confusion cleared up, people were using and recommending 0BSD out in the wild and they NEVER used Fontana's name for the thing. This has his fingerprints all over it.

I don't modify wikipedia[citation needed] in part because I'd never have time to do anything else, and these days they block the entire IPv6 address range (so I can't use phone tethering) and every McDonalds and similar open wifi, so I can't even leave a nasty note on the talk page. But seriously, dude. This is misinformation. I started calling it zero clause BSD long before Fontana ever heard of it. I got permission from Marshall McKusick to call it a BSD license in 2013, _years_ before Fontana ever heard of it.

May 19, 2019

Did Norman Borlaug's work make China's one child policy look stupid, or was it always stupid?

Norman Borlaug is possibly the single most important person in the 20th century. He's why india and china can feed themselves. The man literally quadrupled global food production with "semidwarf" varieties of rice and wheat. He started his breeding programs to improve disease resistance, but when nitrogen fertilizer became cheap and ubiquitous (the [LINK]Haber-Bosch process predates World War I, but took a while to scale up and branch out), plants were limited by how much fertilizer you could give them before stalks fell over under the weight of the grain they were growing. Borlaug's solution was to make plants shorter, both so they put less energy into growing stalk and because shorter plants were sturdier and can hold more grain before collapsing, so you could nitrogen the HELL out of them.

But the real gains came when he applied the same trick to rice. These days most of the world's population lives in the circle of rice (sung, obviously, to the lion king), and they're all growing dwarf rice. This provides enough food for ~4 billion people in and around India and China.

Meanwhile, China had a revolution kicking out its royalty a century back, and just like the French revolution they killed all their scientists and wound up with a tin pot dictator in charge who may not have been a net positive _despite_ how bad the royalty they replaced was. Napoleon got millions of his countrymen killed by declaring war on the entire world (twice), but Chariman Mao mostly just starved his subjects to death. He had aristotle's problem of never looking at the world around him and instead making stuff up divorced from reality, then enforcing that vision upon the world. During the "great leap forward" he said the country wasn't producing enough iron and demanded they "make iron" without procuring more iron _ore_. Rather than explain to him where iron comes from, his subjects melted down all their farming tools into neatly stacked iron ingots for inspection by party officials, and wound up starving. (Humoring the Great Leader is one of the classic blunders, "Potemkin village" is from 1700's Russia but Mao got plenty of such tours.) Other parts of the great leap forward included exiling all the schoolteachers and college students to rural farms (where they starved, farming is a skill they didn't have). Mao ordered everyone to kill birds because he thought they ate crops, and then when the insect population exploded without predators his solution was to spray bulk insecticides that drove pollinators extinct so large swaths of china pollinate by hand. (More modern china has tried to bury the history of all these failures, just like they've buried tianamen square. They insist that their "lack of bees" is due to the shenanigans Bayer's been pulling recently, not due to Mao's edicts three generations back. *shrug* US public schools don't exactly open history class with smallpox blankets and the trail of tears, or the way we staged a coup to get Hawaii. The War of 1812 was approximately as stupid; we lost to _canada_ and they set the white house on fire.)

The one child policy was another of Mao's ideas, which led not only to their "bare branches" problem (millions of surplus unwed men because of sex selection in the one and only child parents were allowed to have), it also means China's baby boom problem is somehow even worse than the USA's. China has two generations of only children, I.E. a generation of only children whose parents were a themselves a generation of only children, meaning each child has 4 grandparents with no other descendants. In a society where "retirement" meant having enough kids to take care of the parents in old age, this is an _issue_.

So Norman Borlaug's work increasing the food supply means Mao's one child policy was outright pointless. Add in the fact that educating women means they have more options than just being barefoot and pregnant their entire lives, and birth rates among the young need government support (maternity leave, free daycare, etc) to get _up_ to replacement rates. (Around the world: Europe, the USA, China, you name it. This is apparently a side effect of late stage capitalism viewing productivity exclusively as various forms of manufacturing while completely ignoring and refusing to fund caretaking work, but China's gone all in on capitalism lately so has acquired this problem too...)

And only the young have the _option_ to replace themselves, the overhang of old people that can't have kids anymore can't. I have _no_ idea what China plans to do about any of this, but am glad to be very far away.

May 18, 2019

Debugging the sparse tar compression side, which means I have run "diff -u <(tar cv --owner root --group root --mtime @1234567890 --sparse fweep | hd) <(./tar cv --owner root --group root --mtime @1234567890 --sparse fweep | hd) | less" with malice of forethought. (Well, actually I ran "TAR='tar cv --owner root --group root --mtime @1234567890 --sparse fweep | hd' diff -u <($TAR) <(./$TAR) | less" because I'm me.)

May 17, 2019

Finally went to the store to order a new phone and they're out. Ordered from the website instead. They estimate delivery on the 23rd.

May 16, 2019

Got the grep bug sorted out, it was a missing else and an inverted test that was hidden by the missing else. (So it _seemed_ to work but what it was actually doing was ignoring the -x.)

And now of course people are trying to use it, there's another grep bug after that...

May 14, 2019

Here's an email I wrote but didn't send in response to this, because it went to a dark place (which is nevertheless true):

> Technically yes, because the first initrd could find the second by some
> predefined means, extract it to a temporary directory and do a
> pivot_root() and then the second would do some stuff, find the real
> root and do a pivot_root() again.

You can't pivot_root off of initramfs, you have to switch_root. (You _used_ to be able to, which moved initramfs from / and allowed you to unmount it, at which point the kernel locked hard endlessly traversing the mount list. I know because I hit that bug in 2005 and they fixed it.)

No, I'm saying that if /init is in the static initrd and you _also_ specify an external initrd the kernel _also_ extracts the external one into initramfs, _after_ having extracted the built-in one. (Both archives are extracted, one after the other, into the same ramfs/tmpfs mount.)

If the semantics are O_EXCL and it skips files it can't extract properly, then the external one couldn't replace files in the static one. You just have to make sure that it extracts both before trying to exec /init (which it looks like it currently does but I haven't tested it). And such an init could do anything and end with "mv newinit /init; exec /init".

(And while we're there it's _embarassing_ that you have to enable CONFIG_BLK_DEV_RAM to get the external image unpacked, which means you have to enable CONFIG_BLK_DEV which depends on CONFIG_BLOCK meaning you have to enable the block layer when running entirely from initramfs? That's one of the things I pointed out years ago but nobody ever did anything about it, and I tend not to send many patches here these days because dealing with linux-kernel is the opposite of fun. You literally have a bureacratic process with a 26 step checklist for submitting patches now, which you're supposed to read _after_ the 884 line submitting-patches document which I guess comes after the 8 numbered process documents. And then get dinged by and it's just... no fun. You've gone _way_ out of your way to drive the hobbyists out. Congratulatious, you succeeded, it's all middle aged cubicle dwellers arguing about how to help John Deere prevent farmers from modifying their tractors. The development-process.rst file is aimed at developers "and their managers" because the linux-kernel committee can no longer comprehend developers without managers. Nobody's doing it for fun anymore because it stopped being fun a long time ago.)

And now, I mute the _rest_ of the thread. Do what you like, I think teaching the kernel to do magic in-band signaling here is a terrible technical idea _and_ unnecessary but it's obviously not my call. I'm aware I'm about 7 years too late for that sort of concern to matter to the bureaucracy linux-kernel has become (since at least, and I'm only replying because I was cc'd.

Sigh. I should do the patch to make external initramfs loading work if you've disabled the block layer. And resend the patch making DEVTMPFS_MOUNT apply to initramfs. And there's like 5 others on the todo list...

May 13, 2019

I've misplaced my phone. The downside of the "no sim card" state is if you lose track of your phone, you can't call it to have it ring. Black phone on back background.

Hey, one of my talk proposals was accepted to ELC in August. It's the "toybox vs busybox" one, which personally I think is the _least_ interesting of the topics, but eh, that's what they want to hear...

May 11, 2019

My phone is dying. It keeps saying "no sim card" randomly (requiring a reboot to see it again), and randomly switches itself off as if the battery's died, but when I charge it the battery says it's at 80% or some such.

It's been like that ever since I got caught in a thunderstorm on wednesday with the phone in my pocket. It dried out and started working again after a few hours, but not reliably...

Looking at Pixel 3a xl. Should I buy from t-mobile or from google? I'd like to do the whole AOSP install route if I can...

May 9, 2019

BSD development predated the web, or even widespread internet availability by about 5 years. (It was, in fact, responsible for much of it.) This means it had the problem of privileged communication channels dividing its community into "first class" and "second class" citizens.

BSD started off with a single development office with its devs physically located together in Berkeley, the same problem which prevented mozilla and openoffice from becoming real open source projects for many years after they _tried_ to open up. When almost all your devs are right down the hall from each other, any devs _not_ participating in those privileged channels of communication (face to face due to physical proximity) are sidelined so your development "in group" erodes the "out group" into irrelevance.

Remember that the original 1987 Usenix paper "The Cathedral and the Bazaar" wasn't about proprietary software, it was about 2 different types of open source development. The paper was written by the EMACS Lisp extension library maintainer, and was a comparison of the Free Software Foundation's members-only "cathedral" (with physical copyright assignments on paper) with the Linux "bazaar" taking patches from anybody and everybody on an open mailing list.

This is why toybox's "privileged" communication channel is a mailing list anybody with email can join, and even _then_ I deal (grudgingly) with github pull requests and bug reports and such (even though I MEANT to use that as just a distribution channel), because the younger generation of devs prefers that to email and I don't want to exclude them. (Go where the people are.)

Sigh. I gave a talk about this, but alas it was at Floush in Chicago and their recordings were screwed up both times I went there. I should do podcasts, but I suck without externally imposed deadlines and feedback. The great thing about programming is the box tells me what's wrong every time I try to compile and run anything.

May 8, 2019

Here's a portion of an email I _didn't_ send to scsijon on the toybox list. (It was off topic.)

I would have expected glibc rather than gcc to be the one to break that, it's not the compiler's BUSINESS to care about this. But ever since the C++ developers took over the C compiler they've been expanding "undefined" behavior as much as possible, presumably because C _not_ being a giant pile of irrational edge cases that make no sense so you just have to memorize them was a big advantage C had over C++, and they can't have that.

*shrug* I consider gcc kind of end-of-life at this point. LLVM doesn't act nearly as insecure about C's continued existence, and not being able to compile existing code with gcc 9 would be a bug in gcc 9 as far as I'm concerned.

Presumably there's a -fstop-being-stupid flag for this too if it did turn out to be relevant, or it would be trivial to work around gcc 9's bug if we did wind up hitting it, but this is 100% a gcc bug in cutting edge gcc. (Are they building with -werror or something? Looks like -Werror=no-format-overflow is what would turn it off, which sounds like the "may be used uninitialized, but isn't" inability to turn off the broken warnings without turning off unrelated non-broken warnings all over again...)


P.S. C++'s entire marketing strategy, going back to 1986, is "C++ contains the whole of C and is thus just as good a language, the same way a mud pie contains an entire glass of water and is thus just as good a beverage". C is a simple language with minimal abstraction between the programmer and the hardware. The _programs_ are complex but the language is not. C++ adds a lot of language complexity and abstractions that unavoidably leak implementation details so you can't debug them without knowing every detail of how they were implemented. C is a portable assembly language, as close to the hardware as it can get without a port from x86 to arm being a complete rewrite, and even then hardware details like endianness and alignment and register size peek through. I'm all for replacing C++ with go/swift/rust/etc. I object to a drowning C++ climbing on top of C and dragging it down with it.

May 7, 2019

Elliott continues to make AOSP (the Android Open Source Project, the base build for all android devices) do a "hermetic" build, which is their name for an airlock step. (They're shipping prebuilt binaries instead of building an airlock locally, but fine. Either way it means android is building under the tools android is deploying, which is halfway to native build support.)

Which also means they hit problems in the toybox tools, which I have to drop everything and fix. Which is why today I'm working on tar sparse support: they have tarballs generated by the host tar which they want to extract in the airlock, and they've got sparse files in them toybox tar can't currently understand.

And of course if I'm adding it to extract, I'm adding it to create too. The options are "not doing it" and "doing it right", the middle ground is called toys/pending.

May 6, 2019

Today Elliott pointed me at a fix to his sed performance issue, which is to use REG_STARTEND. This tells regex() to use the contents of the regmatch_t on _input_ to say where the end of the string is, which means A) no strlen() on the input each call to regex() (which is really slow when you're replacing lots of small matches on a very long string, hence the performance issue), and B) I can implement regexec0 to include null terminators without a hacky for loop over the data.

REG_STARTEND seems to have started life on freebsd over a decade ago, and is now supported by _everything_ except musl libc. It was picked up by glibc in 2004, it's on macos and freebsd and bionic, and had even made it into uClibc before that project died, but it's not in musl. And the reason is that Rich declined to support it when the issue came up, saying his users were wrong for wanting to support those use cases "hideous hacks". (There's a lot of that going around in musl; the users are wrong for wanting to do what the users want to do, musl is only for people who think like Rich.)

It's also not documented in the regex man page, so I poked Michael Kerrisk to fix the man pages, complained at Rich, and checked in the fix and a test with a 5 second timeout.

It was actually a multi-stage fix because I had to edit the string in place and avoid gratuitous realloc() because libc does _not_ short circuit same-size realloc, that's the caller's job. I'd have xrealloc() do it but that doesn't know how big the old one was...

May 5, 2019

Banging on the board I took a paid sidequest to work on (making the WF111 work on the SAMA5D3), and its BSP bit-rotted. The company that made it got acquired a few years back, and the 6 year old youtube videos on how to do stuff with this board point to websites that redirect to the new corporation's main page. Great.

I feel guilty charging them for the time it takes to learn how this stuff works, but the guy they had working on it retired.

May 3, 2019

The grep --line-buffered thing has been pending for a while, but the _input_ is also line buffered. I need to rewrite do_lines() to read large blocks of data (or even mmap it, dunno where the "it's a win" size is for that though, need to benchmark).

I'd like to avoid gratuitous copying, which means read a large buffer and pass in a pointer/len within the buffer, except for three problems: 1) where/when do I null terminate? (Inserting a NUL modifies the buffer, and if I keep the \n it has to stomp the next character _after_ the terminator, which may be off the end of the allocation.) 2) lines wrap off the end of the buffer and I have to either memmove or remalloc(), possibly both, 3) some of the users want to keep the buffer, at which point they strdup.

Basically I have to audit all callers to come up with a design, which is hard to do with a dirty tree.

May 1, 2019

Finally got around to updating my resume. I'm not looking for work yet but a recruiter wanted to know and I presumably have to do it eventually.

What I'd _really_ like to do is grow my patreon to the point I can do open source full time, but I don't expect that to happen before I run out of savings again. Or alternately get some of the big companies using toybox to buy "support contracts" so again, I can do this full time...

April 29, 2019

Broke down and saw Ant man Endgame, primarily because Fade saw and enjoyed it and would want to talk about it. (I'd happily see Carol Danvers II but I'd already been told she wouldn't even have half the screen time of Rocket Raccoon.)

I only wanted to walk out of the theatre in disgust once this time, when the same trap that fridged the lead female character of guardians of the galaxy fridged the lead female character of the original MCU avengers lineup, and put to rest the calls for a "Black Widow Movie". We got a single female-fronted MCU movie, the topic is done forevermore! (Other than that, lots of plot points were predictable by going "which actors want out of their contract, and the shakycam was so bad I lost track of the plot a bunch of times. I think I followed how they got 4 of the 6 McGuffins? No idea what happened to the one Loki stole, for example...)

And unfortunately, the movie put me in an irritable mood to review the second man.c patch. I had to back out my second round of changes (more than a day's worth of work) to apply it, and now I'm looking at the various changes that messed up code I'd carefully cleaned up the first time and it's triggering my "can I ignore this command forever" reaction, to which the answer is "no, the android guys will use the broken code out of pending and then it's even more work to clean up because I keep breaking their use cases behind the great google proprietary firewall"...

I've put a lot of skill points into programming. I'm not really that good at managing the work of others, but I can do it. But when the two overlap and other people are messing up my code I want them to GO AWAY and let me get on with it, which is not how open source is supposed to work and I know it. (I program by debugging an empty screen. Things moving behind my back while I'm debugging is BAD DEBUGGING and the way to fix it is to MAKE IT STOP. I can do pair programming just fine, but "I was working on a redo of this code and you sent me a patch for the old version..." I pretty much back out and discard all my work and start over.)

I said I was irritable.

April 28, 2019

Went to the farmer's market today. Learned it takes 4 lamb hearts to make a pound, and Fuzzy got a raspberry mead she's quite happy with. Plus duck eggs. (Woo-oo.)

April 27, 2019

I'm not hugely interested in seeing them kill of the _other_ half of the Marvel universe, so I bought a ticket for Shazam. I've already been spoiled on it, but eh. Sounds like a good movie anyway.

I got an automated email that my old [PATCH v3] Make initramfs honor CONFIG_DEVTMPFS_MOUNT stopped applying, so I'm doing The Lazy Way to see what happened:

1) patch -p1 -i blah.patch
2) git log init/main.c
3) git checkout -f $LASTHASH^1 # the ^1 means commit before that

And repeat until you find the last commit it applied to, and then the one you just ^1'd is the one that made it stop applying.

I'm tempted to automate this (git log $FILE | head -n 1 | awk {print $2}) and I could do patch --dry-run even, and I should probably do --no-merges on the log but I _have_ hit cases where a merge commit is what broke. (And those are "make puppy eyes at upstream" or "dig into the code and try to figure out what the fsck is going on".)

April 26, 2019

A few tangents I edited out of emails today:

It's a crying shame there isn't yet a chromebook shell you slide google phone du jour into and the usbc gives you keyboard, touchpad, display, battery/charger, and a better heat sink/fan for the phone... Yes I'm aware google wants everybody's data in the cloud so you can't work when the net's down. I'm weird.

Java was my primary programming language for ~3 years, about as long as C++ was. I went commodore basic to C to C++ to C to Java to C to Python to C. I'm the guy who told Sun's Mark English that the java 1.0 spec didn't have a way to truncate a file (just missed the 1.1 cutoff but they added it to 1.2).

Java _stopped_ being my primary programming language when they replaced the lightweight AWT with "Swing" and all that model/view/controller nonsense. Plus "no JDK for Linux" was the #1 bug on for 11 months with no official response and _then_ sun screwed over blackdown on the Linux JDK stuff hugely, then they bloated the language so much they had to start doing javaEE subsets, refused to open source it forever and then turned into a patent troll when they did (I'm aware defending themselves against "Microsoft J" was a useful legal battle, but the antitrust breakup should have handled that if we didn't have an infestation of republicans.)

April 25, 2019

The nice clean keyboard of my new laptop is getting slowly grunged up by a human body hunching over it for hours. Sad but inevitable. (I've been trying to keep my fingernails trimmed to slow the rate at which the letters wear off the keys, but it's been like a week and the slow slide away from New Laptop Smell is inevitable. Which is weird becuse google says this model is from somewhere around 2012. I guess it was in a box or something? Anyway, big step up from what I _was_ using, even if I haven't tried to reflash the screen formware to get it to Stop Trying To Help With The Brightness. It only does it when I switch windows, so it's not as annoying as it could be.)

April 24, 2019

Benchmarked grep and found my version's way slower than devuan's version. (Which Elliott's been complaining about, and I confirmed he's right. Well, first I wrote a giant email I didn't send arguing about it, and then I benched and went "ok".)

Thought of a new approach where do_lines() chops text out of a buffer without copying it, which should be much faster, which brings up lifetime rules and requires changing the callers. _BUT_ it would allow me to get rid of the old get_lines(fd) API, which I've meant to do forever.

Ideally I'd want to mmap() that buffer when possible, but how long is the file? I'd need to llength() the file (except really I just want the simple length, the whole llength() mess was to get the size of noncompliant block devices that didn't properly report their size, and since the cdrom went away that's probably not a thing anymore?)

Anyway, giant files can be bigger than the available address space (certainly on 32 bit), so we'd want to map chunks of them. And if we're reading instead of mapping we definitely need a finite size because a read() is into an allocated buffer that doesn't discard clean pages it can read back in from a file. Which raises the problem of what to do when a line crosses the boundary. With mmap we can unmap(), lseek() and mmap() again, with a larger size if necessary. (And the data's probably still in the page cache afterwards.) With read() we can copy the data down, fill out the buffer to the end, and realloc() as necessary. (There's always the possibility of a pathologically large line that's bigger than any finite buffer we've allocated. Although the question of what to DO about such lines remains: we don't want "tr '\0' X < /dev/zero | grep" to trigger the OOM killer.

Anyway, I'm too tired to implement this right now. Which is odd because it doesn't seem like I've done anything today? But I wrote the giant email I didn't send, which was a lot of work, so I guess I have? We all have our process...

April 23, 2019

Hired dudes took down 3 sections of fence int he back yard to get at the poison sumac _tree_ growing between our fence and the neighbor's fence. They estimate it's been there for 15 years, but didn't try to take over the entire yard until we took out the bamboo that was crowding it out.

I gave them more money. I blogged about my weeks of misery and being afraid to touch the cats, and it's just been _looming_ ever since. (Not so much when I was in Milwaukee and Minneapolis and Tokyo and such, but if you're wondering why travel seemed like a great idea...)

April 22, 2019

Hired Dudes (as @kbspangler likes to say) are removing the poison ivy from the front and back yard (which turns out to be poison sumac, not poison ivy). Identifying it all, chopping it up, hauling it away, and painting poison on the stumps. It's so nice to finally find people willing to do that, and I've given them more money than they asked for to do it because YES, THANK YOU!

April 21, 2019

The "Wicd" wifi network chooser thingy doesn't work on the UT campus. I'm guessing there are too many networks here and it's overloading and saying no networks found. How they could be stupid enough to hardwire in a limitation like that... Eh, it's a Linux GUI tool. Of course they would.

(And every time I tell it my phone's password, it doesn't save it. I looked under "preferences" to see if maybe I needed to put it there, but I can't find anything there? How would I get it to forget a network that isn't currently present? There's no list of historical associations like in ubuntu. This thing was not well throught through.)

Anyway, I got Tar switched over to the new environment variable plumbing. I'm not sure the --to-command stuff works reliably (a short write will error_exit() out of tar entirely, even when writing to one of many short-lived child process here), but this isn't a _new_ problem and in my "tar xf linux-4.20.tar.gz --to-command sha1sum | head" tests the data for the whole file tends to go into the pipe before the consumer responds to it so each sha1sum instance complains it got a short write and then tar says it exited with error code 1, but it neither exits nor gets out of sync with the tarball. Pretty sure if I gave it a tarball with a big enough file in it the toybox one would exit, the question is what the debian one would do?

And now that mkroot isn't using busybox gzip (and thus needs gzip/bzip2/xz built in to busybox or tar doesn't know what they are and doesn't have the -z and -j command line options), I can enable gzip! Which I haven't quite finished cleaning up and promoting yet because I couldn't use it yet...

April 20, 2019

It's the evening before a holiday that HEB's 24 hour location actually closes for, so they're clearancing the bakery again. (Well, putting the 50% discount stickers on anything that expires tomorrow.) We have a chest freezer. Camped the spawn and spent $60 on many bags of discount baked goods.

Got new lib/env.c infrastructure checked in yesterday, so now I make tar use it.

The reason for going down this tangent isn't just that the shell needs it soon, it's that tar should use vfork(), but you can't independently modify your environment variables after vfork() because it's a common heap. But if I set and reset them in the _host_ before the vfork() and do the normal "leak the variable contents" thing setting the environment before the vfork(), and it sets a dozen-ish variables each file, for an unlimited number of files (how long's the tarball?) it can do bad things to memory. (I could putenv() and track them manually but if I need infrastructure to do that and the shell needs it eventually... So tangent.)

Switched my email to the new laptop today. Thunderbird's file selector can't select hidden directories (you can't type a path in, and the chooser doesn't show hidden directories) so I had to sed the config files by hand to change the path where the new copy of the "Local Folders" live, but luckily they're text files. (Yeah yeah, Linux: Smell the Usability. I think we've all given up on "linux on the desktop" ever happening at this point. I'd just like to avoid Android being as stupid as Firefox.)

(For some reason there was a 2 gigabyte sqlite file lying around with a last updated date of 2017. I'm guessing version skew? Yay freeing up disk space I suppose. The new laptop has a 2 terabyte disk in it, but I'm sure that'll be insufficient at some point.)

April 19, 2019

Spent a chunk of today arguing with Dell's firmware. Might know how to fix it, but haven't convinced myself a display annoyance is worth possible bricking yet. (How do you get the specific right firmware update for an aftermarket laptop? Apparently this thing was the height of technology at the end of 2012, Moore's Law is stone dead at this point.)

April 18, 2019

The behavior of debian's "env" command is... the same naieve one I just noticed and was about to fix in toybox:

$ env =blah | grep blah
$ env =blah env | grep blah
$ env =blah /bin/sh
  $ env | grep blah

Bash sanitizes out an environment variable with a blank name, but env doesn't.

Sigh, I should modify env to test the new lib/env.c infrastructure. It doesn't _need_ it (it's not persistent, it's fire and forget, nobody cares if it leaks a little memory before printing output or calling exec(), it's limited by the command line and setenv(argc[i]) directly is less memory than strdup anyway. BUT I want the env infrastructure to get a workout.

April 17, 2019

Brought new laptop out to the nice UT courtyard, tried to build mkroot, and... the kernel build failed because it hasn't got flex. Ok, try to phone tether... no networks found in the network gui thing. (It's not using networkmangler, which is great, but it also means I'm less familiar with this one's knobs.)

So, check from the command line, ifconfig says wlan0 is there, maybe I've hit the RF kill switch? Where is it on this laptop... the right side. Accidentally turned it off, turn it back on again and... stack dump in dmesg.

[30254.731639] iwlist: page allocation failure: order:4, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK)
[30254.731651] CPU: 3 PID: 27021 Comm: iwlist Not tainted 4.9.0-6-amd64 #1 Debian 4.9.88-1+deb9u1
[30254.731653] Hardware name: Dell Inc. Latitude E6230/0YW5N5, BIOS A19 02/21/2018
[30254.731738]  [] ? get_page_from_freelist+0x8f0/0xb20
[30254.731742]  [] ? ioctl_standard_iw_point+0x20b/0x3d0
[30254.731779]  [] ? cfg80211_wext_siwscan+0x480/0x480 [cfg80211]
[30254.731785]  [] ? ioctl_standard_call+0x81/0xd0
[30254.731789]  [] ? wext_handle_ioctl+0x75/0xd0
[30254.731793]  [] ? dev_ioctl+0x2a3/0x5b0
[30254.731798]  [] ? sock_ioctl+0x120/0x290
[30254.731802]  [] ? do_vfs_ioctl+0xa2/0x620
[30254.731806]  [] ? SyS_ioctl+0x74/0x80
[30254.731810]  [] ? do_syscall_64+0x8d/0xf0
[30254.731814]  [] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

[30254.731825] active_anon:78161 inactive_anon:78636 isolated_anon:0
                active_file:907684 inactive_file:151303 isolated_file:0
                unevictable:4 dirty:1471 writeback:0 unstable:0
                slab_reclaimable:2798373 slab_unreclaimable:8816
                mapped:16671 shmem:9549 pagetables:4208 bounce:0
                free:38992 free_pcp:23 free_cma:0

Why is it trying to do an order 4 allocation? That's 64 pages of contiguous memory on an active system that's doing an rsync from a usb drive to the main system? (Second pass of file copying from backups.)

So I have to stop the rsync, sync && echo 3 > /proc/sys/vm/drop_caches, and _then_ toggle the RF kill switch? That's kinda pathetic... ok, and now it's back to finding no networks.

[30900.379840] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.380069] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.380159] iwlwifi 0000:02:00.0: Radio type=0x1-0x2-0x0
[30900.618546] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.618776] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.618866] iwlwifi 0000:02:00.0: Radio type=0x1-0x2-0x0

What does disabled mean here? Is this because the apt-get upgrade yesterday updated the iwlwifi firmware? (I dunno why it did it, the one it installed with from the dvd worked fine. Would it work after a reboot if I'd never suspended?)

Ok, I clicked "disable wifi" in the gui thing, waited 10 seconds, "enable wifi", and then something like 30 seconds later (it toggled the rf kill bit again according to dmesg) hit "scan" and NOW it can see my phone... darn it, AND THEN IT WENT AWAY AGAIN.

This is amazingly brittle and I don't know what the magic incantation is, but it's CLEARLY software being broken here. Ugh, in addition to the driver needing an order 4 allocation and being unable to get it if the system isn't COMPLETELY IDLE, the gui tool is horked: "iwlist wlan0 scanning" shows me a bunch of networks. Lemme see if I can remember how to associate by hand... wow it's congested here, my phone is cell 24 in this list. (Lots of instances of "utexas" and "eduroam", that's a university for you.)

Ok, "iwconfig wlan0 essid Arkleseizure key s:password" ... is not it because it doesn't support wpa passphrase. Which EVERYTHING uses now. Right, let's see, that's the wpa_passphrase command which takes the ssid and the password... is that the same as the essid? Let's try it... I got a 64 byte hex string, which is longer than "key" will accept as an argument. And the iwconfig man page's section on the "key" and "enc" options (what's the difference between them?) talks about registering multiple keys and referring to them by number...? What is this nonsense.

Alright, let's try USB tethering. In dmesg I get:

[32786.261845] usbcore: registered new interface driver cdc_ether
[32786.268344] rndis_host 3-2:1.0 usb0: register 'rndis_host' at usb-0000:00:14.0-2, RNDIS device, de:d3:48:09:5c:0e
[32786.268382] usbcore: registered new interface driver rndis_host

And cdc_ether is presumably the ethernet thing going "hey, right here" and there's no second ethernet interface in ifconfig. Still just eth0, wlan0, and lo. The gui thing is apparently ONLY for wireless, doesn't have any wired control options. Do I need to insmod something myself? Why is this not working? Ok, dig down into /lib/modules/*/kernel/drivers/net/usb and it looks like I need to "modprobe cdc_ether"... which seems to have been a no-op? Ah, it's already there in /proc/modules. And that's the rndis_host stuff I guess? What does /sys/class/net say, that shows a usb0...

AH! My bad. ifconfig -a shows it, usb0 is down and ifconfig only shows up interfaces by default, -a shows all of them. So the only problem is the net app isn't responding to it, so run "dhclient usb0" manually to do dhcp on it and...

Ha! I have net!

And rain and thunder and lightning 20 feet away from my table. Quite the storm, might be here for a while. Of course when I go walking there's a storm. Yes I brought an umbrella, but downpour and lightning seem a bit much for it. Been going for a while, though, starts and stops a lot, maybe I can head out in a gap and make it to another overhang to wait out the next outburst?

Hmmm. Devuan's "Power Manager Plugin" has critical power level set to suspend at 10% but it never triggered, and at 1% (!) I noticed because the power light went solid orange, and I closed the laptop and plugged it backed in a bit (despite the lightning) to have enough power to get home.

Devuan has different bugs than Ubuntu 14.04 did, but the whole "Linux: smell the usability" thing is out in full force. 28 years of Linux and we still SUCK at this.

April 16, 2019

So, new laptop! Installed Devuan Ascii with xfce, and this time the setup is (still derivative of last time):

Install devuan, selecting xfce and leaving most of the options stuff unclicked. Boot into new system.

Fiddle with GUI stuff at the top a lot. I have cpu graph, disk performance monitor, network monitor, workspace switcher (8 desktops in 2 rows of 4), free space checker, DateTime doing "%a, %b %d, %r" in 14 point font, the network "notification area", power manager plugin, pulsaudio plugin, and clipman. Plus I went into settings->panel from the applications menu and told the bottom panel to hide itself "intelligently".

Then click on the battery icon, power manager settings, and suspend when lid is closed, system sleep mode "suspend" for both, on critical battery power suspend, disable "lock screen" checkbox, display power management blank after 10 minutes, sleep 11, switch off 12, reduce brightness after "never".

Then since that didn't stop the stupid screen lock on suspend (physical access to my laptop is game over, don't pretend it isn't until you've fixed badusb and friends), "apt-get remove xscreensaver".

Next up apt-get install aptitude chromium mercurial subversion git-core pdftk ncurses-dev xmlto libsdl-dev g++ make flex bison bc strace diffstat

The hard drive is annoyingly clicking. hdparm -B 254 /dev/sda fixed it, added that to /etc/rc.local. (Wow the /etc directory has a lot of crap in it in devuan 2.0.) Googled for a bit to see if the hard drive parking itself every 2 seconds was worse for its longevity than the vibration of hitting keys and jostling it around the table when the heads aren't parked, but nobody seems to have studies. (Presumably it still has the impact accelerometer that does the emergency park.)

aptitude install -R thunderbird (to get it _not_ to install the "lightning" calendar extension because this isn't outlook).

Set the terminal background color to _actually_ black, not just a dark grey, and make the text color fully white. I'm in enough bad lighting situations as it is, I don't need grey-on-grey in terminal windows. (I also switched from monospace to monospace bold, but I'm not sure it's an improvement? Hmmm... no, don't think so. Switched it back.)

apt-get remove libgnome-keyring0 (which I don't use and causes stupid chromium popups... and that didn't stop it. Nor did telling chrome never to store passwords, or switching off lots of password things in chrome://flags. And I dunno how to change the xfce pulldown menu to start it with --password-store=basic (what I _want_ is --password-store=none). When I need a darn password I'll enter the darn password, stop trying to "help" here. I NEVER tell my browser to save passwords, it defeats the purpose of passwords. Save a key cookie if you want to do that...

April 15, 2019

Dug up the 2 terabyte hard drive I bought in Milwaukee and walked back to the discount electronics place to try to pay them to install it into the new laptop (exercise!), and they more or less declined. (They'll install stuff I buy from them for free, but trying to hire them to install _my_ stuff is more expensive than buying the part from them.) Oh well, I can do it myself, just didn't want to.

Huh, this one is MUCH less painful than swapping out hardware in my netbook was, that required popping out the keyboard and digging _down_ into the machine, this one the bottom panel comes off and the memory and hard drive are right there. Convenient! (And totally made in china; Dell had nothing to do with the design of this hardware.)

Installing Devuan on it. Unlike the new oversizes system76 laptop (which I still have on a shelf) it did not ask for strange binary firmware that's not on the USB stick! Woo! (The easy way to get something to work with Linux is to pick hardware that's several years old.)

Dug up a 2 gigabyte hard drive for it.

April 14, 2019

Bought a new laptop! Walking back from Tax place again (this time carrying an umbrella as a parasol) because I had to drop off the check for the bank routing info, and on the way back I stopped in to the "discount electronics" place on Andersen near Lamar, and they had a Dell Latitude E6230 for cheap that's a nice form factor, reasonable processor (core i5), and can hold 16 gigs of ram.

I'm not asking for _that_ much in a laptop. It's a several year old model (2015 I think?), but Moore's Law dying means that matters a lot less than it used to. (Technically the S-curve tarted bending down around 2001 and the exponent gradually decreased until it's asymptotically approaching 1, meaning the advances these days are linear rather than exponential. The technology's still advancing, but not in a world changing way.)

And unlike anything I've ever seen from System76, THIS one is reasonably sized. (It doesn't QUITE fit into the netbook bag because the extended battery sticks out too much, but would if it didn't so points for trying.)

April 13, 2019

Back in Austin, went to my tax appointment and got a bad sunburn walking back. (Even though I was in the shade of I-35 for at least half of it.) Gotta go back tomorrow to drop off a check. Then I went to natural gardener with Fuzzy. They're out of African Basil.

Rideshare is expensive (between one way to taxes and both ways to natural gardner spent over $50 on it today), but my car is dead and self-driving is coming so I don't really want to replace it if I don't have to. Waymo's Guy In Charge Of That estimates they'd like to charge $1k/year ($85/month) for a flat rate subscription in a municipal metro are, as in your phone's Google Maps app grows a "take me there" button next to "directions" that when pressed turns into a countdown of seconds until your vehicle arrives. They're already prototyping this in Arizona, the tech is ready it's just regulation catching up to allow them _not_ to have someone sitting in the driver's seat "just in case". (Because nothing says paying full attention like somebody _not_ driving. "Driver assist" is an accident waiting to happen, either the human is driving or the human is NOT driving, halfway states are called a "distracted driver".)

To clarify: Google's technology is ready, but they've been working on it for over 15 years now. Uber's keeps killing people because they suck, and are trying to play catch-up, but keep in mind Musk didn't found Uber, Martin Eberhard did. Musk acquired it in a hostile takeover with the money he made from Paypal in the dot-com boom. His technology only advances when he buys other companies (like SolarCity and Maxwell) or when he hires people away from them who are already doing things (when doesn't work so well for them and tends to turn into lawsuits).

All the others are still playing catch-up, but everybody's working on it because it's a game of musical chairs. Lots of people are doing parts of the business model the way lots of people were doing parts of smartphones (apple newton, palmpilot, the motorola razr running ran apps written in java) before iPhone and Android shipped in 2007.

The thing is, one car per person was always a terribly inefficient model. Individually owned cars are only driven about 4% of the time (parked 96%), even human-driven taxis are driven about 40% of the time (the humans still sleep). Assuming self-driving cars are on te road 10x as much (which is a conservative estimate) you'd need 1/10th as many cars to serve a given metro area (yes even at rush hour, which is about 3 hours each way meaning multiple round trips even without carpooling).

Then add in the fact that an electric car lasts a million miles each before you have to service anything (other than replacing the tires every 30k miles: no air filters, no oil to change, no transmission, the batteries have active liquid cooling so they last a long time...)

So if you're a car company seeing the rise of cheap electric self-driving vehicle fleets, you're playing a game of musical chairs: your industry's manufacturing volume is about to drop by an order of magnitude and there's only enough market to support 1/10 as many car companies as we have today. They're all racing to switch over before their competitors do.

People immediately go "but what if somebody barfs in the car" (then you can report the car soiled in the app and request another, and they know which phone was riding in the car so they can place blame appropriately and prevent a recurrence), and this is why they're doing trials and limited rollouts and building service centers and so on.

The estimates are that the gasoline distribution network will collapse around 2025 when volume falls below fixed costs and the whole mining/shipping/refining/delivery/sale network we have now becomes unprofitable. At that point gas stations stop selling gas and become convenience stores, and gasoline becomes something those who still need it order delivery of (like liquid nitrogen from airgas), and keep their own tank on site. Sufficiently rural areas will be "stuck on dialup" for a couple extra decades, but cities will get rid of parking lots fast: that land's way too valuable when an app-summonable vehicle can pick you up and drop you off from the curb and never need to be parked anywhere but the fleet maintenance depot.

That's why I dowanna get a new car if this is only a couple years away. It's like installing a land line once digital cell phones have arrived, but not yet having a cell tower in range of my house. Or using dialup when cable modems are available, or still having cable TV when you have streaming services. I don't want to own a car, I want the app. I just need coverage to reach where I live.

April 8, 2019

I haven't been posting as here much because I've been posting to the mailing list, today's issue is reestablishing the setenv lifetime rules again so I can reopen the toysh can of worms.

April 7, 2019

Went to see Captain Marvel again, this time with my sister and the niecephews. Shortly before the big fight on the spaceship (the one set to I'm Just a Girl) a guy in the back row was discovered unconscious, and the next 20 minutes the theatre had the lights on while the movie played and everybody was looking at the back of the theatre instead of the screen as the theatre staff asked him loudly if he was diabetic until the EMTs showed up and carried him out.

They didn't pause the movie, but they did give us free passes to see another movie some other time as we left (and told us that he'd had an epileptic siezure but was otherwise fine).

This is why people wear bracelets for this sort of thing. One the one hand I feel bad for the guy, on the other he cost the theatre the revenue from a packed house and my niecephews missed the climax of the movie. (Not the punching spaceships part, but the entire facing down Annette Bening part and the montage the internet will inevitably set to "I get knocked down but I get up again". That's this movie's version of the camera circling the avengers while the theme plays, the punching spaceships bit is denouement.)

April 2, 2019

Submitted 5 ELC talk proposals (I think 3 of them were to "whatever conference they're hiding ELC behind this year", this pairing thing is terrible). Of course it was at the last minute (which due to pacific time was 2 hours later than I thought). I should memorialize them for posterity, but didn't.

Trying to finish and promote tar today so it can go in Android Q, which is mostly filling out the test suite so everything's tested (and fixing what that finds), and SKIP_HOST isn't granular enough.

What I want to say is "some non-toybox versions of this are expected to fail, but it's still a good regression test for us", such as the fact that the gnu/dammit tar can't handle "cat tarball.tgz | tar tv" and mine can. (I can autodetect type on a nonseekable stdin. It was a pain, but I refused to let it _not_ do it.)

But if you extract toybox source onto a mkroot system where the host is toybox and want to run the tests on the host toybox? That should be fine.

April 1, 2019

I hate april fool's day. Trying to stay off twitter.

The gnu/dammit tar has a --full-time but doesn't have --no-full-time, which is annoying because I'm printing --full-time by default. Sigh. I can add the other thing for compatibility, but ow? (If ls does --full-time, why doesn't tar -tv?) Sigh. Ok, implement --full-time just so the test suite can pass TEST_HOST...

Huh, I added a TARHD variable I can set where this test passes the output through "hd >&2" so a hexdump of the created tarballs goes to stdout. That way if they differ, I can figure out what differs. But what I _really_ want is to catch the failure and run the host and target versions through hd _then_, which means I need to be able to register an error handler function. Hmmm... It's a pity bash hasn't got an "is this function defined" check. No wait, that's under the shell builtins... "type -t name". Returns "function" if it's a shell function. Ok...

March 31, 2019

The L records the gnu/dammit tar outputs for long filenames have the permissions and user/group names filled out. They're not needed (they're in the next header and those are the ones that get used), but they're filled out. Meanwhile fields like timestamp are zeroed. There's no obvious pattern to it, I think it's an implementation detail (sequence packets are initialized?) leaking through into the file format.

No, it's worse. The owner/group is always "root" and the permissions are 644. So the field could be zeroed but it's instead nonsense. As with the " " after the checksum, just gotta match the nonsense to get binary equivalent tarballs.

March 30, 2019

I'm writing tar tests, trying to do a proper thorough job of testing tar (which the previous tests didn't really), and I did "tar c --mtime @0 /dev/null | tar xv", which should more or less be ls -l on a char device, but:

--- expected
+++ actual
@@ -1 +1 @@
-crw-rw-rw- root/root 1,3 1970-01-01 00:00 dev/null
+crw-rw-rw- root/root 0 1970-01-01 00:00:00 dev/null

It's showing size, not major, minor. (This is the gnu/dammit one.) I want TEST_HOST to pass, but they're showing useless info here. "Be compatible" is fighting with "do it right". Hmmm...

What does posix say? Hmmm. The last posix spec for tar was 1997, before they removed it (just like cpio, the basis for rpm and initramfs; Posix went off the rails and we're I'm waiting for Jorg Schilling to die before trying to correct anything). And that says:

The filename may be followed by additional information, such as the size of the file in the archive or file system, in an unspecified format. When used with the t function letter, v writes to standard output more information about the archive entries than just the name.

Great, EXPLICITLY unspecified. Thanks Posix! You're a _special_ kind of useless.

March 28, 2019

Ok, the Embedded Linux Conference and Open Source Summit are colocated in San Diego in August (the Linux Foundation does this to dilute the importance of conferences, it's about like how Marvel had endless crossoves to force you to buy more issues back in the 90's right before they went bankrupt). The CFP closes April 2. I should submit a thing.

Topics. Ummm. I could do an updated 3 waves thing (lots of good links for that, credentials, A03 is fan run and thus better at what it does, more on that, credentials vs accomplishment, and so on.) I could do a talk on 0BSD, on mkroot, on toybox closing in on 1.0...

March 27, 2019

So, tar paths...

$ tar c tartest/../tartest/hello | hd
tar: Removing leading `tartest/../' from member names
00000000  74 61 72 74 65 73 74 2f  68 65 6c 6c 6f 00 00 00  |tartest/hello...|

It's matching .. sections (the code I'm replacing was just looking at _leading_ ../ which isn't good enough).

$ tar c tartest/../../toy3/tartest/hello | hd
tar: Removing leading `tartest/../../' from member names
00000000  74 6f 79 33 2f 74 61 72  74 65 73 74 2f 68 65 6c  |toy3/tartest/hel|
00000010  6c 6f 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |lo..............|

And the gnu/dammit code is stupid.

$ tar c tartest/sub/../hello | hdtar: Removing leading `tartest/sub/../' from member names
00000000  68 65 6c 6c 6f 00 00 00  00 00 00 00 00 00 00 00  |hello...........|

_really_ stupid.

Of course figuring out what/how to cannonicalize is weird too, because I don't have abspath code that stops when it matches a directory, and there's no guarantee it would anyway rather than jumping right over it. I want the _relative_ path to be right.

Sigh. Compatibility, do what the existing one's doing...

March 25, 2019

Got a heads up from Elliott that auto-merges of external projects into the Android Q branch end on April 3, feature freeze in run up to the release. So if I want to get tar promoted and in, I've got until then.

March 24, 2019

Once again trying to work out if old = getenv("X"); setenv("X", "blah", 1); setenv("X", old, 1); is allowed. Because old is a pointer into the environment space, and setenv replaces that environment variable. Under what circumstances do I need a strdup() in there?

I dug into this way back in 2006 but don't remember the details...

March 18, 2019

Tar cleanup corner case: the gnu/dammit tar fills out the checksum field weird, I kinda dowanna do that but the resulting tarballs won't be binary identical if I _don't_...

Backstory: tar header fields are fixed length records with left-justified ascii contents, padded with NUL bytes. The numerical ones are octal strings (because PDP-7 used a 6 bit byte, we say the machine Ken and Dennis wrote Unix on had 18k of ram but that was 1024 18-bit words of memory).

The "checksum" field is just the sum of all the bytes in the header, and is calculated as if the checksum field itself is memset with space characters. (Then you write the checksum into the field after you've calculated it.) The checksum has 7 digits reserved (plus a NUL) but due to all the NUL bytes in the header, the checksum is almost always 6 digits. So it _should_ have 2 NUL bytes after it... but it doesn't. It has a NUL and a space, ala:

00000090  31 34 31 00 30 31 32 32  36 36 00 20 30 00 00 00  |141.012266. 0...|

The _reason_ for this is historical implementations would memset the field, iterate over the values, and then sprintf() into the field which would add a NULL terminator but not overwrite the last space in the field. And the gnu/dammit tar is either _doing_ that, or emulating it.

I'm not memsetting spaces into the cksum field, I'm starting with 8*' ' and skipping those 8 bytes... but the result is I'm printing out two NUL bytes at the end instead of NUL space. And if you check for binary identical files...

It's _almost_ certain no tar program out there is going to care about this, but if I don't and I use canned tarballs in my tests, CHECK_HOST would always fail with the gnu/dammit implementation. (Or possibly busybox, I haven't looked at what that's doing yet.)

March 16, 2019

Oh FSM. I feel I should do a response to LWN's motivations and pitfalls for new "open source" licenses article, but you could just watch my 3 minute rant on there being no such thing as "the GPL" anymore, copyleft fragmentation inevitably increasing as a result, and the need for a reclaimed public domain via public domain equivalent licenses that don't have the "stuttering problem".

Of course there's no mention of 0BSD or similar, they haven't noticed it yet. A lot of people haven't worked this sea change through to a logical conclusion yet, they're still trying to make a better buggy whip because their old one stopped serving their needs. Fighting the last war...

March 15, 2019

That side gig is hanging over me. I want to do the thing for them, it's not hard, but I'm huddling under an "out of service" sign.

March 13, 2019

At Fade's. Well, currently at the McDonald's down the street from Fade's.

Tar has an interesting corner case in autodetecting file type: if it's a seekable file you can read the first tar header block (512 bytes) and if it doesn't start with "ustar" (unix standard tar, posix-2001 and up so an 18 year old format we can pretty much assume at this point, albeit with extensions) then check for compression signatures for gzip and bzip...

At which point, if it's _seekable_ you seek back to the beginning, fork, and pass the filehandle off to gzip or similar. I just redid xpopen() so it can inherit a filehandle from the host namespace as its stdin/stdout. (It can still do the pipe thing: feed it -1 and it'll create a pipe, but feed it an existing filehandle and it'll move it to stdin/stdout of the new process; I should probably have it close it in the parent space too but haven't yet because when you pass along stdin/stdout _those_ shouldn't get closed and is that the only case?)

But if it's _not_ seekable, I have 512 bytes of data I need to feed into the new process, and there's no elegant way to do that. I kind of have to fork another instance of tar with the appropriate -zjJ flag and then have _this_ one spin in a loop forwarding it data through a pipe(2).

Which is awkward, but doable...

March 12, 2019

Packed out of apartment, onna bus to Fade's.

Hey, ubuntu found a new way to fail. Doesn't suspend because kworker/u16 (Workqueue: kmemstick memstick_check [memstick]) failed to suspend for more than 120 seconds, and so the suspend was aborted _after_ I'd closed the lid and put it in my laptop bag, so instead it got VERY HOT.

Bravo, ubuntu. Yes of course the correct thing to do if the memory stick probe hangs for 2 minutes is to MELT THE HARDWARE. Linux: smell the usability!

March 11, 2019

First day where I would be working if I hadn't quit the job. Sitting in the apartment poking at computer stuff. I had a long todo list I haven't done any of yet. Luckily, over the years I've learned that "not doing stuff" is an important part of the process. I need cycle time. Rest, recovery, sleep, staring out windows. I gave up a lot of money to be able to afford _not_ to do stuff today, and am enjoying it.

That said, I should at the very least drop off the "moving out of the apartment" form, and maybe take my bike back to the bike co-op I got it from and go "here, free bike". (It's a vintage Schwinn, it's lovely. Someone will want it as much as I did. Alas, can't easily take it out of state with me.)

Somebody tried to sign up to the mailing list, and I forwarded them to mkroot, but as I told them in the email... "I mostly talk about it on the toybox mailing list. And patreon. And my twitter. And my blog..." (It had a mailing list but I stopped using it after a thing happened. I have a vague roadmap to merge it into toybox and stop doing it as a standalone project, but need to implement route and toysh in toybox first.)

March 10, 2019

And thunderbird filled up all memory, wasn't watching, didn't kill it fast enough, and it locked the machine hard. Had to power cycle. Wheee.

Lost 8 desktops full of open windows, most of which had many tabs. Rebuilding much state. The most irreproducible loss is, of course, all the thunderbird windows where I clicked "reply" to pop open a window to deal with later. Thunderbird keeps no record of that whatsoever. (Kmail would reopen them all when restarted, but alas that was bundled into a giant desktop suite and went down with the ship it was tied to the mast of. Pity, it was a much better mail client than thunderbird. Oh well.)

Once upon a time, Linux had an OOM killer that would kill misbehaving processes if the system was in danger of running out of memory and locking up. People complained that their process might get killed. So the kernel devs neutered the OOM killer so it doesn't work remotely reliably and now the whole system locks up as often as it's saved by the OOM killer, because killing _every_ process is clearly an improvement to killing _a_ process.

Sigh. Lateral progress.

March 9, 2019

Thunderbird's sluggish again so I tried to clean out the linux-kernel folder. Since this is the big machine with 16 gigs of ram and 8 gigs swap, I told it to move 96k messages instead of the usual 20k at a time. It moved all the messages, and then did its Gratuitous Memory Hog thing it always does at the end (because Thunderbird is programmed strangely). It ate all 16 gigs DRAM, worked its way through all 8 gigs swap, and then I called killall thunderbird from the crl-alt-F1 console before the machine could hard lock (because the OOM killer dosn't work anymore, no idea why).

And of course when I started it back up, none of the messages it had spent hours copying to the new folder had been deleted from the old one.

Could somebody not crazy write an email client? This doesn't seem hard. Far and away the _most_ annoying thing about thunderbird is when it pops up a pulldown menu or hovertext, and then freezes for 6 minutes doing something where the CPU or disk is pegged, and the darn pop-up follows me when I switch desktops, blocking whatever's behind it.

So now I tried right click delete... and it's moving 96k messages to a trash folder. Sigh. NO, DELETE THEM! NOT MOVE TO TRASH! NOW WHEN THIS CRASHES I'M GOING TO WIND UP WITH _THREE_ COPIES!

It's a good thing this machine has gigabytes of free disk space because this email client is written by idiots. And once you start one of these operations that's going to take 4 hours (and then maybe try to crash the OS again afterwards if you're not babysitting it), there's no way to interrupt it short of kill -9 which would leave the files in who knows what state...

March 8, 2019

Last day at JCI. Stress level: curled into a ball, whimpering.

Sigh. I'd really like to move the Android guys to a more conventional build approach, where the Android NDK toolchain is not just a generic-ish toolchain but is the one used by AOSP, so that 1) you can export CROSS_COMPILE=/path/to/toolchain/prefix- and if your build is cross compile aware it just works, 2) Android isn't shipping 2 slightly different toolchains that do the same thing.

They are reluctant to do this because A) windows, B) they see me trying to apply conventional embedded-ish development to android as weird. (Everybody except them is an app developer. This isn't how you build apps!)

Sigh. I keep going "this reduces to this, just implement the general case and it should work in a lot more situations" and getting "but that's not how we've ever thought of it, you'll confuse people". I get different variants of it from the linux kernel guys, the distro maintainers, embedded developers, the android guys, compiler developers... everybody's in their own niche.

March 7, 2019

I've been doing a review pass of pending/tar.c and adding a bunch of "TODO: reading random stack before start of array" and so on, and I've come to the conclusion I need to change the xpopen_both() api. Because if the child process needs its stdin or stdout hooked up to an existing filehandle, there's no current way to do that.

The way it works now is you pass in an int[2] array and it hooks up a pipe to each one that's zero, and writes the other end of the pipe into that slot (int[0] going to the stdin of the process and int[1] coming from the stdout of the process). But what I _want_ is if I feed an existing filehandle to the process, THAT filehandle should become the stdin or stdout of the process. (So gzip can read from or write to a tarball.)

Also, once upon a time I had strlcpy() which was like strncpy but would reliably add a null terminator and didn't do the stupid (memset the rest of the range after we copied). It was just something like "int i; if (!len--) return; i = strlen(src)+1; memcpy(dst, src, i>len ? len : i); dst[len] = 0;" and it worked fine. But unfortunately BSD had the same idea, and added it to libc in a conflicting way (const const const str const *const) and I think uClibc picked that up, so I switched to xstrncpy() which will error_exit() if the string doesn't fit. Which 99% of the time is what you want: don't silently corrupt data. BUT with tar and the user and group name fields...

Hmmm, except if they don't fit what _do_ we want? Truncating could (theoretically) collide with another name, and if the lookup by name fails we've already got UID/GID. (I did bufgetpwuid but didn't implement a negative dentry mechanism for optimizing _failed_ username lookups...)

Ah, it's using snprintf(), close enough. (I keep confusing that with strncpy, which is stupid and will memset the rest of the space with zeroes for no apparent reason. But snprintf() will just _stop_writing_ at the appropriate spot, leaving a null terminator and not gratuitously molesting the rest of the field.)

March 6, 2019

Last week at work. Totally listless. Paralyzed, basically. I'm stress eating and stress tweeting.

Also, SEI has resurfaced with Probaby Money (not yet the same as Actual Money but you never know), and I've mentioned my recruiter found me a side gig (telecommuting getting a medical sensor board upgraded to new driver versions), and I'm kind of annoyed that I quit my $DAYJOB (which paid quite well) so I would have TIME, and that time is already filling up with other less-well-paying work.

I'm totally aware this is a self-inflicted problem, but... dude. I should be better at saying no.

March 4, 2019

Dreamhost has been poking me about renewal for Got the check in the mail today.

(I know way too much about how the sausage is made to be comfortable doing financial transactions online. I'm aware it's silly, and yet...)

March 3, 2019

Poking at toys/pending/tar.c and of course the first thing I do (after a quick scan and some "this sprintf is actually a memset" style cleanups) is build it, make an empty subdirectory, and "tar tvzf ~/linux-4.20.tar.gz". And I get a screen full of "tar: chown 0:0 'linux-4.20/arch/mips/loongson64/common/serial.c': Operation not permitted".

Sigh. This is unlikely to be a small task.

March 2, 2019

Fighting bad Linux userspace decisions.

So top -H is showing the right CPU usage for child threads, but the main thread of a process has the cumulative CPU usage. I _think_ this is because /proc/$PID/stat and /proc/$PID/task/$PID/stat have different data (I.E. the kernel is collating when you read through one API but not reading the same data through another API).

I have a test program that spawns 4 child threads and has them spin 4 billion times in a for(;;) loop, and I just poked it to dprint(1, "pid=%d") the PID and TID values (to a filehandle so I don't have to worry about stdio flushing for FILE *), and I hit my first problem: glibc refused to wrap the gettid() system call? (What the... the man page bitches about thread IDs being an "opaque cookie" and I'm going "this is legacy crap from back when pthreads as an abomination, before NTPL, isn't it?" Sigh, so use syscall() to call gettid so I have the number I can look in /proc under.

Second problem: the process doesn't _end_ until the threads finish spinning and exit, which means the output doesn't close, so my little pipeline doing:

./a.out | sed -n s/pid=//p | (read i; cat /proc/$i{,/task/$i}/stat)

Is sitting there blocked in read until a.out exits, at which point the cat says the /proc entries don't exist anymore. This is DESPITE the fact that if you chop it at the second | you get the value followed by a newline immediately! It's just that bash's read is blocking trying to get more data AFTER the newline, for reasons I don't understand? (Even read(4096) should return a _short_read_. And yes the "read i;" needs a sleep 1 after it to accumulate enough data to see the divergence reliably, but this bug hits first and that confuses debugging right now.)

This totally needs to be a test case for toysh. My "bash replacement" should get this RIGHT, even if ubuntu's bash doesn't. (I was even desperate enough to check /bin/dash, which also got it wrong in the same way. Well, ok dash didn't understand the curly bracket syntax, but it waited out ./a.out's runtime _before_ getting that wrong.)

March 1, 2019

Two different coworkers basically need the toybox version of a command to fix a problem they're having. One is that busybox's ar can't extract an ipk file, another is a busybox tar bug where if you tar -xC into a subdir that results in broken symlinks (in this case a root filesystem install from initramfs into a mount point where /etc/localtime points to a timezone file that's there in the subdir but the symlink points to the absolute path of where it would on the final system), busybox tar does NOT chown the symlink. So the symlink belongs to root:root instead of whoever it's supposed to belong to, even though the tar file has the right info.

Alas, I haven't implemented toybox tar and ar because I've been too busy with $DAYJOB. I'm not sure if this is ironic or merely unforunate. I'd ask Alanis Morisette, but I'm told she had problems with that too.

February 28, 2019

It's the last day of the month and I kept meaning to check if any conference call for papers were expiring, but I just couldn't bring myself to care.

I told my boss at $DAYJOB on monday I'm too burned out to accomplish anything else, but they still haven't let me know when my last day is. They keep saying they're _not_ unhappy with my performance on the morning call, but _I_ am unhappy with my performance.

One of the big differences between my mental health in my 20's and now is I know when I need to bow out for self care. (I often miss when I _should_, but am reasonable about working out when I _need_ to.)

February 27, 2019

I'm doing board bringup for that side gig, and they just emailed me a large explanation of the hardware they need working. I unboxed the new board yesterday and confirmed the bits connect together, but haven't actually powered up the result yet.

My first goalpost on any new board is "boot to an initramfs shell prompt on serial console", at least when I'm trying to understand everything and rebuild it properly from source. Getting that working means:

1) Our compiler toolchain is generating the right output for the board, both in kernel mode and userspace/libc.

2) We know how to install code onto the board and run it. (Whether it's tftp into memory or flash it to spi or jtag or what.)

3) The bootloader is working, running itself and doing setup (DRAM init, etc), then loading and starting our kernel.

4) If we get kernel boot messages then the kernel we built is packaged correctly, has a usable physical ram mapping, and is correctly writing to the serial port.

5) If we can run our first program (usually rdinit=/bin/sh) then the kernel is enabling interrupts properly (the early_printk stuff above is simple spin-and-write-bytes with interrupts disabled, that's why printing fewer early boot messages can speed up the board booting), finding a clock to drive the scheduler, and this is where we verify the libc and executable packaging parts of the toolchain work right (because we're finally using them; often I do a statically linked rdinit=/bin/helloworld first if it's giving me trouble.)

When we're done "I built and ran a userspace program that produced output" means I should be able to build arbitrary other ones, and a toybox shell is the generic universal "do lots of stuff with the board" one, where you can mount /proc and /sys and fiddle with them, load modules, etc. That's basically where you get traction with the board.

When an existing BSP gives you a working Linux reference implementation, most of these steps are probably just isolating and copying what it's doing, but I like to step through and move all that stuff into the "I know what it's doing, or at least where to look it up if it breaks" category on any new board I have to support in a nontrivial way.

Then the next thing is usualy digesting the kernel .config into a miniconfig and seeing what's there, coming up with the minimal set of options to do the shell prompt thing and cataloging the rest of them.

February 26, 2019

I'm trying to figure out if my normal response to spam callers is "punching down". I always try to hit the buttons to get through to a human, then say "You spam people for a living. That's sad." and then hang up.

The problem is, I'm doing this to the minimum wage drones in some poverty-stricken rural area who are... doing it for a living. Not the people benefitting from it and collecting 90% of the money from whatever scam it is. But alas, this is the only way I know to push back. (It's not like our current government will do anything about it, not until the GOP finishes imploding, which won't happen until the Boomers die and the fossil fuel companies lose their position as 1/6 of the planet's economy.)

February 25, 2019

Told my boss I'd like to wrap up at work. The money is _lovely_ and this is work I could do in my sleep _if_ I could do it. Unfortunately I've got a variant of writer's block, which is a bit like having a big term paper due and being unable to start because you're so stressed out.

I've been spinning my wheels here so long that I've exhausted my coping mechanisms.

February 22, 2019

How is this page's bit on toybox wrong, let me count the ways:

The Toybox license, referred to by the Open Source Initiative as the Zero Clause BSD license,[7] removes all conditions from the ISC license, leaving only an unconditional grant of rights and a warranty disclaimer.[8] It is also listed by the Software Package Data Exchange as the Zero Clause BSD license, with the identifier "0BSD."[9]

It's not important that it's from toybox, other projects use it too. It was the OpenBSD suggested template license and I got Kirk McKusick's permission to call it zero clause BSD. IT doesn't remove _all_ conditions, it removes half a sentence. And SPDX approval came long before OSI, so a better phrasing would be:

The Zero Clause BSD license [7] (SPDX identifier "0BSD"[9]) removes half a sentence from the OpenBSD suggested template license [], leaving only an unconditional grant of rights and a warranty disclaimer.[8]

Anybody want to edit wikipedia[citation needed] to fix this?

February 21, 2019

Still deeply burned out.

VirtualBox's .vdi files provide "sparse" block devices that grow as you use more space in them (up to the maximum size specified at creation time). The ext4 filesystem assumes any block device it's on might be flash under the covers, and attempts to wear level them via round-robin allocation.

Guess how these two interact! Go on, guess!

I set up a new VM, and because my previous one ran out of space I was generous about provisioning it, thinking it would only use the space when it actually needed it. After deleting two other VMs and a DVD iso and trying to figure out why a VM using 60 gigs in the virtual Linux system was consuming 160 gigs on the host...

I had a BAD DAY. And now I need to redo the VM from scratch because even if I could shrink the ext4 partition (the resize tool can grow them while mounted, but not shrink them), I dunno how to tell the emulator to give back the space it would stop using...

Darn it, I was excited about this, but no. The person who pointed me at it said it was a bash test suite that might help me with toysh being a bash replacement. But the readme didn't say what to _do_ to run th bash tests. I figured out that bin/bats with the thing to run, but its output with no arguments was useless and --help didn't really help either. I eventually figured out "bin/bats test" but then it only ran 44 tests and they tested the test suite, not the shell?

At which point I figured out that it's not a shell test, it's test plumbing written _in_ bash. That's useless, I've written and _published_ 2 sets of test infratructure in bash myself already (one in toybox, one in busybox). That's uninteresting, what's interesting is the _tests_, and this has none. And it's doing the "#!/usr/bin/env bash" thing which is INSANE: why do you trust /usr/bin/env to be there at an absolute path? Posix doesn't require that. Android (until recently) didn't even have a /bin directory. It's /bin/bash even on weird systems like MacOS X. The ONLY place that installs it but puts it somewhere else is FreeBSD, and that's FreeBSD-specific breakage. It's a fixable open source system: drop a symlink and move on. (Just like we all fix /bin/sh pointing to the defective annoying shell on debian.)

February 20, 2019

Upgrades to the su command came up recently, and it's been on my todo list forever: if you want to run a command as an arbitrary UID/GID, it's kinda awkward to do so with sudo or su because both conventionally want to look up a name out of /etc/passwd, and will error out on a uid with no passwd entry even for root. But these days with things like containers, there's lots of interesting UIDs and GIDs that aren't active in /etc/passwd. (And then there's the whole android thing of not having an /etc/passwd and using their version of the Windows Registry instead, because keeping system information in human readable text files is too unixy or something....)

So anyway, I want su -u UID and su -g GID[,gid,gid...] to work, at least for root. And I want to be able to run an arbitrary command line without necessarily having to wash it through a random command shell. And _implementing this is fairly straightforward. No the hard part is writing the help text to explain it, especially if I've kept compatibility with the original su behavior.

A word on the legacy su behavior: way back when setting a user's shell in /etc/passwd to /bin/false or /dev/null was a way of preventing anybody from running commands as that user. Then su grew -s to override which shell you were running as, so this stopped working from a security standpoint. (Besides, if you were running as root you could whip up a trivial C program to do it anyway, but the point was _su_ no longer enforced it.) And it let you specify -c to pass a command line to that shell so su could "run a command as a user" instead of being interactive, so this ability is already _there_ for most users, just awkward to use.

But su has an awkward syntax where it runs a shell and unrecognized options are passed through as options _to_the_shell_. (So the -c thing was kind of accidental at first.) So using su as sudo isn't just "su user ls -l", it's su user -s $(which ls) -l if you don't want to invoke a gratuitous shell in between. And defining new su options means they _don't_ get passed through to the shell.

What would have made sense was a syntax like xargs, where the first command that's not an option stops option parsing for the wrapper. But that's not what they did back circa 1972...

February 19, 2019

Burnout. So much burnout.

When I came to this job a year ago, I was interested in the technology. I was helping get realtime Linux running on an sh4 board. (The larger context was they shipped a Windows CE product back in the 90's, and Windows CE was being end of lifed by Microsoft. So this Microsoft shop was switching to Linux, which I'm all for and happy to help with. As for the sh4 boards, they had a bunch of this hardware installed at customer sites, and a large stock of part inventory to make more boxes with at the factory, so getting Linux running on those was useful to them.)

Coming _back_ in January was because the money was good, it was easy to just keep going, I didn't have another job lined up, and we've still got about half the home equity loan to pay off from riding down SEI.

But this time... they've already built up a reasonable Linux team (including people I know like Rich Pennington of ellcc and Julianne Haugh of the shadow password suite), all the new work is on standard x86 and arm boxes with gigahertz and gigabytes, they're using wind river's fork of yocto's fork of openembedded with systemd ru(n|i)ning everything, the application is still dot net code running on mono talking to a windows GUI app...

And I'm not entirely sure what I'm doing. Not "I don't know how to do this", I mean what am I trying to accomplish? What is this activity _for_?

I'm part of an enormous team where we have over a dozen people in a room for an hour twice a week going over excel spreadsheets reacting to comments on the design of things like "background file transfer" (strangely not rsync) which is somehow a 12 week project for over a dozen people, told "this is what you're doing this week" more or less via the waterfall method. There's an API document, an implementation of this API via gratuitous translation layer with a management daemon using dbus to talk to systemd, and then functions you plug in for a given architecture that the guy who wrote the daemon could have done in a couple hours.

I think this has turned into a "bullshit job". And I am unhappy. The money remains excellent, but... that's pretty much it.

February 18, 2019

If I titled blog posts, this one would be "Tabsplosion is a symptom of overload".

When I say "that's on the todo list", I'm fudging a bit. The toybox todo list does indeed have a todo.txt. And a And a todo2.txt, todo3.txt, todo/*.txt, and various commandname.txt files with notes on individual commands.

My toybox work directory (for a couple years now) is ~/toybox/toy3, following my convention of doing a git checkout in a directory with the name of the project, so various debris that doesn't get checked into git has someplace to lib. This _starts_ as ~/toybox/toybox and there's a ~/toybox/clean for testing that I've committed sane chunks and it builds properly. Eventually so much half-finished cruft builds up in my work directory I clone a clean one and do some "needs to happen fast" project in there, and keep the old one around in hopes of salvaging the old work. (Which, as with viewing bookmarked pages again, never happens. This is why I have so many open tabs, there's a _chance_ I'll get back to the todo item it represents.)

This is how I wound up with toy3. (And in fact a toy4 and toy5 that didn't stick.) Those other directories have their own todo files in them. (Much of which overlaps, but not all.)

And then there's ~/toybox/pending which is full of things like a checkout of Android's minijail, libtommath, jeff's fixed point library from the GPS stuff we did, my old dvpn code (from like 2001), the rubber docker containers tutorial I attended at, a CC0 reference implementation of sha3, snapshots of this and this in case the pages go down, and so on. The todo item is implicit in a lot of those. :)

I also have old tweet threads and blog entries and such that I should collate at some point. A lot of my todo items point to those.

As for the topic sentence, my todo list grows fastest when I don't have time to follow the tangents right now. So I make a note of it and move on.

February 17, 2019

The bus back from Minneapolis left at 9:25pm, and was supposed to get in at 3:30 am but got in at 4am.

I'm still using the giant System76 laptop from 2008, which is 6 years old but has 16 gigs of ram and 8x processor and a terabyte hard drive and is fairly reasonable now that I've gotten a new battery for it, except for 2 things. It's still fairly ginormous, and the hard drive is rotating media so I'm nervous using it in a high-vibration environment. Such as on my lap on a bus for 6 hours, even when there is a working outlet.

A coworker at Johnson Controls (Julianne Hough, the long-ago author of the Shadow password suite) has a "laptop" that's a tablet with a case and keyboard. Except it's a mac. I want an Android device that does that (and in theory I can get add a 128 gig sd card to however much built in storage the sucker has so I should be able to get something reasonable), but every time I actually buy something it's a cheap clearance device like the annual Amazon Fire tablet sales during "prime day", and they're so locked down that it's just not worth the effort to crack them. This is a structural problem: what I'm trying to to with toybox is turn android in a usable general-purpose computing environment you can actually use as a development workstation more or less out of the box, but they're terrified of the "evil butler" problem. (Which isn't _just_ a tablet problem, EFI TPM nonsense does this for PCs, there are periodic LWN articles on that.) You should be able to aftermarket blank 'em, but how you distinguish that from "an organized crime organization like the NSA or GOP sent a dude into your room for 30 minutes while you're at dinner and now your device serves them not you until they decide to assasinate you".

Sadly, I haven't installed devuan on the other System76 oversized monstrosity because firmware nonsense and too busy to care. I got email from System76 that they've introduced a laptop to their lineup that _isn't_ visible from space, but I don't trust them. If buying System76 _doesn't_ mean I can just slap an arbitrary Linux distro on it because it's full of magic firmware that never went upstream, what's the _point_? If I have to install a magic distro-specific Linux distro fork, I might as well get a GPD Pocket or something.

February 16, 2019

Hanging out with Fade in Minneapolis. I have deployed heart-shaped butterfingers at her. (It's her favorite candy bar, and there was a sale.)

Yay, the gitub pull request adding 0BSD to the license chooser got merged!

This means I have developed just enough social skills to disagree with someone about how to help without pissing them off to the point they no longer want to help! (Although it's still a close thing, I wouldn't say I'm _good_ at this. I'm still far too easily irritated and have to really _push_ to compromise. (In this case that would mean swallowing my principles and editing a wikipedia page directly.)

February 15, 2019

There are over 100 toybox forks on github. I did not expect that. Hmmm... The most forked of which just added a logo and half an "rdate" command, back in 2016...

The downside of 0BSD licensing is when you find a nice patch in an external repo that wasn't submitted upstream (or if it was, I missed it), I'm nervous about merging it because forks of toybox are not actually required to be under the same license.

In this case the repo it's checked into still has the same LICENSE file and no notes to the contrary, and I can probably rely on that, but I'm still nervous and like to ask. Submissions ot the list mean they want it in, which means it has to be under the right license to go in. The submission _is_ the permission grant, the specific wording is secondary.

The intent of the GPL was to force you to police code re-use: if you accidentally sucked GPL code into your project, you had to GPL your project. (In reality you just as often had to remove it again and delete the offending version, as Linux did with the old-time unix allocation function Intel contributed to the Itanic architecture directory back during the SCO trial. Solving infringement via a product recall and pulping a print run has plenty of precedent.)

Then GPLv3 hapened and "the GPL" split into incompatible versions, and suddenly you had to police your contributions just as hard, your GPLv2 or later project couldn't accept code from GPLv3 or GPLv2-only sources, and the easy thing to do was break GPLv2-only. These days there's no such thing as "The GPL" anymore, thanks to the FSF. "The GPL" fragmented into three main incompatible GPL camps (GPLv2 and GPLv3 can't take code from each other, and the dual license of "GPLv2 or later" can't take code from either one), and then there's endless forks like Affero GPL complicating it further. This means there is no longer a "universal receiver" license covering a united pool of all copyleft code into a single common community of reusability, which is why copyleft use has slowly declined ever since GPLv3 came out. These days with GPL code you have to police in both dirctions, incoming _and_ outgoing code.

0BSD goes the other way from the glory days of "The GPL": you have to be careful about accepting contributions (and I'm more paranoid than most about that, having been involved in more copyright enforcement suits than any sane person would want). But what that buys you is the freedom for anyone wanting to reuse your code elsewhere to just do it, whenever and wherever however they like. No forms to fill out, no signs to post, have fun. They don't even have to tell me if they did it. (The internet is very good at detecting plagairism, I'm not worried about that.)

A fully permissive license holding nothing back is the modern equivalent of placing the code into the public domain. The berne convention grants a copyright on all newly created works whether you want it to or not (the notice is just for tracking purposes of _who_ has the copyright, so you're not in the "the original netcat was written by 'hobbit', how do I get in touch with 'hobbit' or their estate?"), but there's no enabling legislation for disposing of a copyright. You can't STOP owning a copyright, except by transferring it to someone else.

And thus the need for public domain equivalent licensing. You can't free(copyrght) but you can work out a solution.

February 14, 2019

Date is funky. The gnu/dammit date didn't implement posix, and busybox gets it wrong. Time zones changing names because of daylight savings time.

Testing day of the week. Found a hack. Coded it up. Went to test it.

$ ./date -D %j -d 1
Sun Jan  0 00:00:00 CST 1900
landley@halfbrick:~/toybox/toy3$ busybox date -D %j -d 1
Thu Feb 14 00:00:00 CST 2019


The C API for this is kinda screwed up too, although we need a new one that handles nanoseconds anyway.

February 13, 2019

The biggest sign that "const" is useless in C is that string constants have been rodata forever, but their _type_ isn't because that would be far too intrusive.

Putting "const" on local variables or function arguments doesn't affect code generation (which has liveness anaysis). It can move globals from the "data" segment to the "rodata" segment, which is nice and the compiler doesn't get without whole-tree LTO because the use crosses .o boundaires, but everywhere else it just creates endless busywork propagating a useless annotation down through multiple function calls without ever affecting the generated code.

I periodically recheck on new generations of compiler to see if it's _started_ to make a diference, but I don't see how it can because liveness analysis already has to happen for register allocation/saving/restoring, and that covers it better than manual annotation can? In this respect "const" seems like "register" or non-static "inline", ala "Ask not for whom ma bell tolls: let the machine get it".

Sadly, even though I do add "const" to various toybox arrays to move them into rodata, the actual toy_list[] isn't const because sticking "const" on it wants to propagate down into every user through every function argument (otherwise it's warning city and in fact errors out about invalid application of sizeof() to incomplete types when I all did was add "const" in two places).

February 12, 2019

Phone interview with the side gig, I'd get to poke at a new architecture (we are the knights who say nios) which qemu has a thing for! But no musl support for it, and Linux support is out of tree? Really? (A whole unmerged architecture that people are still using?) It's frustrating there's no easy way to get qemu-system-blah to tell you what it provisions a board emulation with. (How much memory, I/O controllers, disks, network, USB...)

It would be nice if "qemu-system-nios -M fruitbasket --whatisit" could say these things. The board has to _know_ them, somehow. Maybe through the device tree infrastructure? I might try to teach it, but all my previous qemu patches languished unmerged for years. Not worth the effort.

February 8, 2019

Very very tired. Went off caffeine monday but it's 4 days later and still tired. Burned out, half days yesterday and today.

I turned down a job in Minnesota a recruiter offered me. 20% less money isn't a deal breaker, but... they're not on the green or blue lines? It's an hour and half each way to Fade's via public transit (green line, bus, then walk) so I'd need to get an apartment near the work site to avoid a longish commute from the university (and Fade), and they're in some sort of suburban industrial park where there are family houses but no efficiency apartments? And this employer moves to seattle in june anyway.

Contracting company at the recruiter I got the JCI job through wants me to skype with somebody for evening and weekend jobs. It would pay off the home equity loan faster...

February 6, 2019

I'm trying to build Yocto in a fresh debootstrap. You'd think this would be documented, but it's a bit like the "distros only build under earlier versions of itself" problem, because Yocto is a corporate pointy-haired project and Red Hat is Pointy Hair Linux.

As a first pass I want to run a yocto instance under qemu, but when I downloaded it yocto wanted me to install a bunch of packages like "makeinfo" that I don't want on my host system. Hence debootstrap chroot.

So install debootstrap (I used apt-get on ubuntu), then the wiki instructions say the setup is:

debootstrap stable "$PWD/dirname"

Where "stable" is the release name, next argument is the directory to populate, and the third is the repository URL to fetch all the packages and manifest data from.

So clone yocto (git clone git://, checkout the right branch (current stable appears to be "thud"), and then "source oe-init-build-env" and...

mount /proc /sys /run
apt-get install locales &&
locale-gen en_US.UTF-8 &&
su - user &&
cd /home/poky && 
source oe-init-build-env &&
LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 bitbake core-image-minimal
  echo en_US.UTF-8 UTF-8 >> /etc/locale.gen &&
locale-gen &&
update-locale LANG=en_US.UTF-8

What on earth is a uninative binary shim? All I can find is this and it's at best "related". It's downloading a binary it has to run on the system, and can't build from source. So much for building yocto on powerpc or sh4 or something. Thanks yocto!

Python 3 refuses to work right if you haven't got a UTF8 locale enabled, and yocto's bitbake scripts explicitly check for this and fail... but don't say how to fix it. So I read the python docs and downloaded the python 3 source code. Python's getfilesystemencoding() is calling locale.nl_langinfo(CODESET) (at least on unix systems), which comes from langinfo_constants[] in _localemodule.c in the Python3 source...

Right, you have to install the "locales" package, then run locale-gen, but the online examples showing how you can feed it a locale on the command line are wrong (including the one in the "Setting up your choot with debootstrap" section of the ubuntu wiki), it ignores the command line, you have to edit the locale.gen file to add the locale you want, then you need to update-locale to get it to use it, and THEN you can set the LC_ALL envornment variable.

Darn, yocto's parallism ignores "taskset 1 cmdline...". It's building on an 8x SMP machine so it's trying to do 8 parallel package downloads through phone tethering, and the downloads keep timing out and aborting. Hmmm... Google, google... It's bitbake controlling this, I can set the environment variable BB_NUMBER_THREADS to the number of parallel tasks.

Ok, core-image-minimal is currently building gnome-desktop-testing and libxml2. I object to 2 of the 3 words in this target name. I'll give them "image". Yeah, I accept that this is probably an image. But gnome-desktop-testing is neither core, nor minimal.

February 5, 2019

Doing release cleanup on sntp.c I hit the fact that android NDK doesn't have adjtime(). Grrr. I dowanna add a compile-time probe for this, and unfortunately while I have USE_TOYBOX_ON_ANDROID() macros to chop out "a" from the optstr, I never did the SKIP_TOYBOX_ON_ANDROID() macros (only include contents if this is NOT set) because I haven't needed them before now.

Sigh, I can just #define adjtime to 0 in lib/portablity.h. It's a hack, but android isn't using this anyway (they presumably set time from the phone baseband stuff via the cell tower clocks, not via NTP). It doesn't make the whole code stanza drop out like making FLAG(a) be zero would (then the if(0) triggers dead code elimination), but... I wanna get a release out already, it was supposed to happen on the 31st.

February 4, 2019

Ok, toybox release seriousness. What do I need to finish up to cut a release...

SNTP is the main new command and I've already used the "Time is an illusion, lunchtime doubly so" Hitchhiker's Guide quote. Oh well.

I've got an outstanding todo item from the Google guys about netcat, but it's a bug I found so I haven't quite been prioritizing it. (As in nobody else reported this bug to me, so it's not holding anybody else up.) Still, I got the ping (once they know about it, they wanted it fixed)...

February 3, 2019

Greyhound topped itself on the bus ride back to Milwaukee. Of course it left most of an hour late, when we got on it hadn't been cleaned (my seat's drink holder had an empty coke bottle in it), for the first time in my experience they checked photo IDs (and the woman behind me couldn't get on the bus because she hadn't brought hers, bus left without her), somehow 2 stops later every single seat was full even though they'd left a through-to-chicago passenger behind at the first stop, the outlets didn't work for the first 2 hours, and the heat was stuck on full the entire time and was somewhere over 80 degrees. (Eventually they opened the emergency exits on top of the bus and left them open so we wouldn't die, but it was never comfortale.) Around 6pm the bus tracker web page decided that before the next stop our bus would travel back in time to 1pm and continue on to retroactively reach chicago around 3:30 pm (going something like 200 miles per hour along that part of the route), and we were kind of looking forward to it by that point but alas, we were disappointed. Then they switched drivers in Madison, and the new driver started heading south straight to Chicago and had to BACK UP to go to Milwaukee when enough people checking Google Maps noticed and yelled at them. Over the intercom the driver claimed to have "missed an exit", and threatened to pull over and let anybody who complained out on the side of the road (we were in Janesville at that point, 40 miles south along I-90), and then drove back north (reconnecting with I-94 at Johnson's Creek) instead of taking I-43 diagonally to our destination. According to phone speedometer apps, on the trip north (along non-interstate roads) the bus sometimes got up to 55 miles per hour, but averaged less than that.

Still, I arrived in to Minneapolis only 2 hours late. Not my worst greyhound trip, but still memorable. (Beats the trip _to_ minneapolis where the driver intentionally triggered feedback on the intercom six times and said "wakey wakey" between each one as we got in around 1:30 am. I'm told Greyhound was an oil company ploy to discredit travel by bus and encourage individual driving instead. Given that the "buy up the busses and destroy them to promote freeways" plot in "Who Framed Roger Rabbit" is the part of the movie based on real events (in our world, they did it and won)...

There's also a significant element of "punishing people for being poor" going on here. I'm taking the bus not just because it's cheaper, but because between the shortage of direct flights from milwaukee to minneapolis gives me a lot fewer departure options, and even with a direct flight the "arrive 2 hours early at an airport many miles south of town" plus the minneapolis airport requiring multiple transfers to get Fade's apartment via public transportation (meanwhile greyhound is right on the Green Line, which lets off about 500 feet from Fade's apartment)... end result is the bus gets me there about as fast as flying, and if I'm lucky I can work the whole way. The bus terminal's a 15 minute walk from work without having to opt out of the Porno-Scanners for the Freedom Grope.

But there's very very strong signaling "this is for the Poors, you shouldn't be here if you have any other choice, we punish you now"... ("We" being "republicans", which is a "we" I personally am very much NOT a part of even when I'm not hanging out with the tired poor huddled masses yearning to breathe free that they despise so much.)

February 1, 2019

Our story so far: I got the record-commands plumbing checked into toybox and hooked up to mkroot, and along the way I found and fixed a sed bug that was preventing commands from building tandalone with toybox in the $PATH. (The regex to figure out which toys/*/*.c file this command lives in was returning empty, because -r wasn't triggering.)

So I fixed that, got the record-commands wrapper hooked up, built everything, and... all the targets built? Except I just fixed _sed_ and I knew the kernel build break was a _grep_ bug because replacing the airlock's grep symlink with a link to the host's grep made the build work! (I often do "what commands changed recently" guesses like that before trying to narrow it down systematically...)

Sigh. I pulled linux-git to a newer version so I'm not quite testing the same kernel source, or was it 4.19 or 4.20 I was testing? I hate when things start working again when I DIDN'T FIX THEM, it just means I lost a test case and whatever loose flakiness it revealed is still there but has gone back into hiding. It's possible switching grep versions changed something that got fed into sed, but that's still a bug: the output should be the same.

Darn it, now I've got to waste time figuring out how to break it again the right way.

January 31, 2019

Bus to Minneapolis so I can spend my birthday tomorrow with Fade.

I emailed Linus about arch-sh not booting, he pointed me at a pending fix that hadn't quite made it into mainline yet, and I confirmed it fixed it for me, but oddly has both my emails but not Linus's in between?

Yesterday's toybox build break wasn't a grep bug, it was a sed bug, which broke toybox building anything with toybox in the $PATH. (The regex to figure out which toys/*/*.c file this command lives in was returning empty, because -r wasn't triggering.) Apparently I haven't got a tests/sed.test that checks "does -r do anything".

January 30, 2019

It's -20F out. The expected high is -7. I got permission to work from home today. (Mostly poking at yocto and going "huh".)

There's some sort of bug in grep that's breaking the kernel build, but I haven't reduced it to a test case yet, and what I used to use for this sort of thing in aboriginal linux was my old command line logging wrapper. So I spent most of a day getting the command line wrapper logging merged into toybox and integrated into mkroot, and... the toybox build is broken by the same grep bug, which means the logging wrapper install won't work in the context of the airlock (I.E. I can't build toybox with toybox in the $PATH, due to the bug I'm trying to _diagnose_).

Going back to bed.

January 28, 2019

It's too cold. And we have 8 inches of snow. My normal 20 minute walk to work (12 if I hurry) took 35 minutes today, including helping push a stuck car out of an intersection (along with a snowplow driver who got out to push on the other side).

When I got in only two coworkers I recognized were here. I'd go home early, but I'm already here and outside is the problem.

January 26, 2019

Busy week at work, wasn't sleeping well. Meant to spend today working on toybox release, but spent it recovering instead.

The big overdue thing at work is "timesync", which is where the SNTP stuff comes in. Back in late October we tried to figure out how the box keeps its clock up to date: it was close enough to just doing standard NTP that people had glossed it over as NTP... but not quite.

First of all, it's using SNTP ("Simple Network Time Protocol"), which is a subset of the NTP protocol (same 48 byte UDP packets with fields in the same place) that oddly enough has its own set of RFCs, and then in NTPv4 it all got bundled into one big SNTP+NTP RFC that's more or less illegible. So I went back to the earlier ones and am pretty much just implementing the old stuff and asking wikipedia[citation needed] whether it's safe to ignore whatever they changed.

An SNTP client can read data from an NTP server (it just doesn't care about several of the fields), but an NTP client can't read from an SNTP server (the fields SNTP doesn't care about are zeroed), and windows "NTP servers" tend to be SNTP. So if you use the Linux NTP client with a windows server, it doesn't work. (That took a while to figure out, and started us down this whole tangent.)

The box needs to be able to act as an sntp client (sntp not ntp because some exiting installs use the windows server), and it needs to be able to act as an ntp server (possibly sntp would be good enough because the downstream boxes are also running our software, but nobody seems to have _written_ an sntp server for Linux, because full NTP server works for SNTP client). And then it's got multicast.

Multicast? Yeah, there's a multicast variant in the sntp RFC, and JCI implemented it in old stuff (back in the 90's), but it's not working for some reason and it's .NET code which is a language I don't know (which isn't entirely a blocker but does slow me down) and which I haven't got a build environment for (which is the real blocker). And the ISC reference implementation in C doesn't appear to do multicast (because it's not 1996 anymore).

Note: Napster pretty much killed off Multicast starting around 1999. No podcasts use multicast. Youtube, Netflix, Hulu, and Amazon Prime do not use multicast. The original use case for multicast was "all that" and when it arrived it didn't, which means there isn't really a use case out there for it. The Mbone shut down years ago. Wikipedia[citation needed] says it's still used inside some LANs to do hotel televisions and stuff, but it's not routed through the wider internet anymore, and there really isn't a modern userbase for it, just the occasional LAN-local legacy install.

Instead we got MP3 and MP4 compression which shrinks data to 1/10 of its original size but means a single dropped packet is fatal. (As you can see with HDTV broadcasts "smearing" when the signal is marginal; and that's with a lot of effort put into implementing recovery!)

But JCI wants multicast because the old one they're replacing did multicast and they want to sell the Linux image as a strict upgrade to the WinCE image on the same hardware, without a single dropped feature. And long long ago their salesbeings pushed multicast as a Cool Thing We Can Do. So I wound up reading the RFC and writing a new one in C.

P.S. Although there isn't a Linux SNTP server, there _is_ a Linux SNTP client. It's one of the binaries the ISC source tarball _can_ build, but generally doesn't. I'm trying to convince buildroot to enable it. I suspect this was last tested by an actual human a decade ago, but we'll see...

January 23, 2019

Added multicast support to the sntp stuff. Should probably not name the multicast enabling function leeloo_dallas() but I've had enough sleep deprivation lately that's the sort of name I'm using. (Look, my brain takes the word "multicast", sticks a fifth elephant reference on the front and sings the whole thing to camptown races (doo dah, doo dah). When I'm tired enough this sort of thing leaks out into the outside world.)

All the config is on the command line: if you "snmp" it queries the server, prints the time, and how off the current clock is. Adding -s sets it, -a sets it via adjtime().

I initially had it so you could list as many servers as you liked on the command line and it would iterate through them, but if it switches between ipv4 and ipv6 I'd have to reopen the socket and I dowanna.

January 20, 2019

Ok, I need record-commands from Aboriginal Linux (which is built around wrappy.c), and rather than just dumping them into scripts/ I want to break that up into make/ and tests/harness...

Except that directory also has bloatcheck and showasm (halfway between build and testing), and which generates documentation (is that build?) and I have a todo item to split up into a script that generates the headers and a script that builds the .c files. I think all the second half of is using from the first half is the do_loudly() function (which turns a command's output into a single dot unless V=1 is set)...

January 19, 2019

Working on sntp, and FreeBSD build/testing.

January 18, 2019

Darn it, poking at mkroot and I updated toybox to current git and swapped in "test" with the newly promoted toybox version, and the Linux kernel build is breaking on all architectures. And it's a funky one too, even on a -j1 build it goes:

  LD      vmlinux
  SORTEX  vmlinux
make: *** [vmlinux] Error 2

That provides no information about what went WRONG! Thank you make.

Which means I need to dig up my old command line wrapper from Aboriginal Linux; I should probably stick it in the toybox scripts/ directory, except that's geting pretty crowded with build and test infrastructure. (I provide make wrappers as a gui and "make help" lists the options but DEPENDING on make is uncomfortable, it would be nice if running stuff directly was easy to not just do, but figure out at a glance...)

I should split scripts/ up somehow. I can move the make stuff into a make/ subdirectory, but then scripts/ isn't all the scripts so shouldn't be called that. The problem is "tests" is a bunch of *.test files, one per command, and I'd like to keep that accessible and clean. It's already got a tests/files directory under it that's a bit awkward, but manageable. I could put tests/harness under there with the infratructure part, but then running it would be tests/harness/ which is awkward. I could put "harness" at the top level but then it's much less obvious what the name means. Hmmm... tests/commands/sed.test? A top level tests directory with _three_ things under it?

Maybe I should add symlinks to the top level, ./ and ./ pointing into the appropriate subdirectory where the infratructure lives...

Sigh. Naming things, cache invalidation, and off by one errors remain the two biggest problems in computer science.

January 17, 2019

Human reaction time is measured in milliseconds, plural. A 60fps frame rate is a frame every 17 milliseconds. Computer reaction times are measured in nanoseconds. A 1ghz processor is advancing its clock once per nanosecond.

Those are pretty much the reason to use those two time resolutions: nanoseconds is overkill for humans, and even in computers jitter dominates at that level: DDR4 CAS latency's like 15 nanoseconds, an sh4 syscall has an ~8k instruction round trip last I checked, even small interrupts can flush cache lines...) Meanwhile milliseconds aren't enough for "make" to reliably distinguish which of two files is newer when you call "touch" twice in a row on initramfs with modern hardware.

64 bits worth of milliseconds is 584 million years, so a signed 64 bit time_t in milliseconds "just works" for over 250 million years. Rich Felker complained that multiplying or dividing by 1000 is an expensive operation (doesn't boil down to a binary power o 2 shift), but you've already got to divide by 60, 60, and 24 to get minutes, hours, and seconds...

Using nanoseconds for everything is not a good idea. A 32 bit number only holds 4.2 seconds of nanoseconds (or + or - 2.1 seconds if signed), so switching time_t to a 64 bit number of nanoseconds would only about double its range. (1<<32 seconds is just over 68 years, 1970+68 = 2038 when signed 32 bit time_t overflows. January 19 at 3:14 am, and 7 seconds.)

Splitting time_t into a structure with seperate "seconds" and "nanoseconds" fields is fiddly on two levels: keeping two fields in sync (check nanoseconds, then check seconds, then check nanoseconds again to see if it overflowed between the two and you're off by a second), _and_ the fact that you still need 64 bits to store seconds but nanoseconds never even uses the top 2 bits of a 32 bit field, but having the seconds and nanoseconds fields be two different types is really ugly, but guaranteed wasting of 4 bytes that _can't_ be used is silly, but if you don't a 12 byte structure's probably going to be padded anyway...

And computers can't accurately measure nanoseconds: A clock crystal that only lost a second every 5 years would be off by an average of over 6 nanoseconds per second, and that's _insanely_ accurate. Crystal oscillator accuracy is typically measured in parts per million, each of which is a thousand nanoseconds. A cheap 20ppm crystal is off by around a minute per month, which is fine for driving electronics. (The skew is less noticeable when the clock is 37khz, and does indeed produce that many pulses per second, and that's the common case: most crystals don't naturally physically vibrate millions of times per second, let alone billions. So to get the fast rates you multiply the clock up (double it and double it again), which means the 37000.4 clock pulses per second becomes multiple wrong clock pulses at the higer rate.

The easy way to double a clock signal is with a phase locked loop, a circuit with a capacitor and a transistor in a feedback loop that switches from "charging" to "discharging" and back when the charge goes over/under a threshold, so it naturally swings back and forth periodically (which is trivial to convert to a square wave of high/low output as it switches between charging and discharging modes). The speed it cycles at is naturally adjustable: more input current makes it cycle faster because the capacitor's charging faster, less current makes it cycle slower. If you feed in a reference input (add an existing wave to the input current charging the capacitor so it gets slightly stronger/weaker), it'll still switch back and forth more or less constantly, but the loop's output gradually syncs up with the input as long as it's in range, which smooths out a jittery input clock and gives it nice sharp edges.

Or the extra input signal to the PLL can just be quick pulses, to give the swing a periodic push, and it'll sync up its upswing with that too. So to double a clock signal, make an edge detector circuit that generates a pulse on _both_ the rising and falling edges of the input signal, and feed that into a phase locked loop. The result is a signal switching twice as fast, because it's got a rising edge on _each_ edge of the old input signal, and then a falling edge halfway in between each of those. Chain a few doublers in sequence and you can get it as fast as your transistors can switch. (And then divide it back down with "count 3 edges then pulse" adder-style logic.

But this also magnifies timing errors. Your 37khz clock that's actually producing 37000.4 edges per second becomes multiple wrong nanosecond clock ticks per second. (You're still only off by the same fraction of a percent, but it's a fraction of a percent of a lot more clock pulses.) Clock skew is ubiuitous: nno two clocks EVER agree, it's just a question of how much they differ by, and they basically have _tides_. You're ok if everything's driven by the same clock, but crossing "clock domains" (area where a different clock's driving stuff) they slide past each other and produce moire patterns and such.

Eventually, you'll sample the same bit twice or miss one. This is why every I/O device has clock skew detection and correction (generally by detecting the rising/falling edge of signals and measuring where to expect the next one from those edges. Of course you have to sample the signal much faster than you expect transitions in order to find the transitions, but as long as the signal transitions often enough it lets you keep in sync. And yes this is why everything has "framing" so you're never sending an endless stream of zeroes and lose track of how MANY zeroes have gone by, you are periodically _guaranteed_ a transition.).

Clock drift isn't even constant: when we were working to get nanosecond accurate timestamps for our syncrophasors at SEI, our boards' thermally stabilized reference clock (a part we special-ordered from germany, with the crystal in a metal box sitting on top of a little electric heater, to which we'd added half an inch of styrofoam insulation to keep the temperature as constant as possible and then put THAT in a case) would skew over 2 nanoseconds per second (for a couple minutes) if somebody across the room opened the door and generated an _imperceptible_ breeze. (We had a phase-locked loop constantly calculating the drift from GPS time and correcting. And GPS time is stable because the atomic clocks in the satellites are regularly updated from more accurate atomic clocks on the ground. In the past few years miniature atomic clocks have made it to market (based on laser cooling, first demonstrated in 2001), but they're $1500 each, 17 cubic centimeters, and use 125 milliwatts of power (thousands of times the power draw of the CMOS clock in a PC; not something you run off a coin cell battery for 5 years).

Sigh. Working on this timing SNTP stuff, I really miss working on the GPS timing stuff. SNTP should have just been milliseconds, it's good enough for what it tries to do. In toybox I have a millitime() function and use it for most times. (Yes another one of my sleep deprivation names. "It's millitime()". And struct reg* shoe; in grep.c is a discworld reference. I renamed struct fields *strawberry in ps.c already though.)

Rich Felker objected that storing everything in milliseconds would mean a division by 1000 to get seconds, and that's expensive. In 2019, that's considered expensive. Right...

January 16, 2019

Sign. No Rich, that's not how my relationship with Android works. I cannot "badger Android until they fix this nonsense".

I have limited traction and finite political capital. Leading them with a trail of breadcrumbs works best, which means I do work they might find useful and wait (often years) for them to start using it. And I can explain _why_ I want to go in a certain direction, and what I hope to achieve, and make as compelling an argument for that vision as I can.

But often, they've already made historical technical decisions that then become load-bearing for third party code, and you can't move the rug because somebody's standing on it. And their response is more or less "that might have been a nice way to go way back when, but we're over here now".

I'm trying to clean out the rest of the BSD code so that they're solidly using toybox, and making it so they can use as much of "defconfig" as possible. If the delta between android's deployment and toybox defconfig is minimized, then adding stuff to defconfig is most likely to add it to android. (This maximizes my traction/leverage. But it's _always_ gonna be finite, because they're way bigger than me.)

This means work on grep (--color), mkfs.vfat, and build stuff. The macos (and now FreeBSD) build genericization helps, as does the android hermetic build stuff. (Getting them closer to being able to use my build infrastructure, although they haven't got make and don't like arbitrary code running in their build.)

It's a bit like domesticating a feral cat. Offer food. Then offer food in the utility room. Except instead of a feral cat, one of the biggest companies in the world has a large team of full-time employees that's been doing this for 20 years now (The "Android One" came out in what, 2007?) which is constantly engaging with multiple large teams of phone vendor developers, collectively representing a many-multi-billion dollar industry that on such a vastly different scale they can't even _see_ me.

I can't even afford to work full time on this stuff. I'm doing what I can. You wanna post your concerns on the toybox list, go for it.

January 15, 2019

Sigh, $DAYJOB needs sntp, so let's do that for toybox...

Reading RFC 4330 (well a half-dozen RFCs, this has had a lot of versions and the new ones have added useless crap that's more complexity than help). Oh great, this protocol doesn't have a Y2038 problem, it has a Y2036 problem. They have a 64 bit timestamp: the bottom 32 bits of which is fraction of a second (meaning they devote 2 bits to recording FRACTIONS OF A NANOSECOND), leaving them 32 bits for seconds... starting from January 1 1900. For a protocol designed in the 1980's. So they ate 2/3 of the space before the protocol was _designed_. That's just stupid.

Anyway, the common workaround is if the high bit's _not_ set then it wrapped, which buys another 60 years or so. Still utterly insane to design the protocol that way.

January 14, 2019

Exhausted. Not sure I slept at all last night, just lay awake in bed. Is it possible to get jetlag without changing time zones?

Back at work: spent most of the day going through a month of missed email. They assigned a number of issues to me.

Back in my apartment, the manager was happy to see me and had a desk and a bed in storage, and says he'll replace the gas stove with electric (yay!). They should really put some solar panels on this building. (They don't just go on the roof, you can put them down the sides of tall buildings too, you don't even have to worry about sweeping the snow off of those.)

Poking at patch.c because I got reminded of todo items. Trying to add fuzz factor, which was easy enough (and my design for it's better) but... there's no tests/patch.test, and I don't seem to have patches that _require_ fuzz factor lying around.

I _used_ to just throw new commands through Aboriginal Linux and the LFS build, which was applying lots of patches. I suppose I could dig through the repo there and find where I adjusted them to eliminate fuzz factor. (Because even though I ported toybox patch to busybox over a decade ago, they still haven't added fuzz support to it. There's a lotta that going around, where things I was planning to do ages ago still aren't done in various projects, and it ranges from crickets to insistence that status quo is perfect and we've always been at war with eastasia. (People declared busybox "done" at the 1.0 release, which was before the majority of my contributions and long before you could use it in a build environment. Thing didn't happen therefore shouldn't happen is a failure of imagination. As Howard Aiken said long ago you don't need to worry about people stealing your ideas. Heck, I've been trying to get people to steal my ideas for a very long time, in a Tom Sawyer "paint the fence" way so I don't have to do it myself.

January 13, 2019

Flight back to Milwaukee. Sigh. Conflicted, but... this is the path of least resistance, and I know I can do it. (Neither Google nor the phone vendors will pay me to do Toybox or the android self-hosting stuff, nobody's interested in mkroot (hardly anybody was intersted in aboriginal even after I got it building LFS), and I can't afford to just do open source all the time. Gotta pay the mortgage. (I should really try to at least pay off that home equity loan this time.)

Got a hotel. It's $130/night, that's more per week than my old efficiency apartment here cost in a month. I should try to get that back in the morning. (They hadn't rented it out last I heard, and it's paid through the end of the month since I have to keep paying for it until they rent it out or 60 days goes by.)

I wrote up a thing about how patches work, because somebody on the list asked. I should collect and index those somehow, I suppose...

January 12, 2019

I committed a fix:

> Which is the "mode" of the symlink, except that mode says the filetype _is_ a
> symlink and you can't O_CREAT one of them so it's gonna get _really_ confused...
> Try now? (I added a test.)

Except that's inelegant (race condition between dirtree population and this stat, filesystem can change out from under us change?) and we're _supposed_ to feed dirtree the right flags so the initial stat() is following or not following the symlink appropriately. Why is it not doing that in this case... Hmmm...

January 11, 2019

Broke down and told chrome _not_ to restore state, just let it forget all those todo items. So now I have one window with only a dozen or so open tabs, which can restart itself without wasting half an hour fighting with it every time I open my laptop. I give it a week.

I should really pack my suitcase...

January 10, 2019

The battery on my laptop no longer holds ANY charge. Unplug it and it switches off instantly. Serious crimp in my "wander out somewhere and program for a bit at a quiet table" workflow. Even when I go somewhere with an outlet (which I now feel guilty about because I'm costing the place money, even if it's only a few cents), it loses all context going there and going back. Complete reboot each time.

And convincing chrome NOT to reload 8 windows with 200 tabs each in them (maintain the todo item links but leave the tabs in "failed to load" state rather than trying to allocate 30 gigabytes of RAM and max out my phone tether for 2 hours) is a huge pain. Doing "pkill -f renderer" USED to work but now SOMETIMES works, sometimes causes tabs to hang (still display fine but I can't scroll DOWN and it won't load new contents in that tab, but I can cut and paste the URL to a new tab that WILL load it so the URL is retained which is all I really wanted), and sometimes randomly crashes the whole browser process. Even pointing /etc/resolv.conf at while chrome starts up to force the resolve to fail no longer prevents the reloads, these days it just _delays_ its load; it tries to reload periodically and once it can reloads everything.

They keep "upgrading" chrome to make it a worse fit for my needs, and of course I can't stick with old versions because "security". (You can sing "cloud rot" to the tune of Love Shack.)

January 9, 2019

Looming return to milwaukee, starting to get paralyzed. Fade flies out tomorrow, although essentially it's tonight so early in the morning (she and Adverb are visiting family in California before heading back to minneapolis for the spring semester, both her sisters live there and I think more of her family is flying in for a reunion?)

I should get a plane ticket, but the TSA and air traffic controllers miss their first paycheck on Friday. Bit reluctant to fly with air traffic controllers considered "nonessential"... (Bit reluctant to _eat_ with FDA inspection considered nonessential.)

January 8, 2019

Visited the eye doctor for my 6 month follow-up. Not obviously going blind! Yay!

Eyes dilated, not a lot of programming today.

January 7, 2019

Wandering back to an open tab in which I have:

$ truncate -s $((512*68)) test.img && mkfs.vfat test.img && dd if=/dev/zero of=test.img seek=$((0x3e)) bs=1 count=448 && hexdump -C test.img

Which at the _time_ was the smallest filesystem mkdosfs would create. (The dd blanks some stuff that varies gratuitously between runs so I can diff two of them and see what changed when I resize the filesystem.)

But now I'm running a newer dosfstools version and it's saying that 512*100 is the smallest viable filesystem. And THAT is clearly arbitrary. Sigh, I should look up the kernel code for this and see what the actual driver says.

January 6, 2019

Rebuilt mkroot with linux-4.20 (after rebuilding the musl-cross-make toolchains with current musl). The s390x kernel wants sha256sum now.

Sigh. Throw another binary in the PENDING list of the airlock install in toybox/scripts/ (It's in the roadmap.)

January 5, 2019

Attempting to install devuan on the giant new laptop, because the ubuntu they stuck on it has systemd and it's possible I'd use a BSD first. Devuan is basically a debian fork retaining the original init system and with a really stupid over-engineered nigh-unmaintainable mirror overlay system written in python. (I have no idea why they did that last part, and hope it's merely a transitional problem.)

The System76 bios is "black screen with no output" until their ubuntu boots, which is kinda annoying. I guessed "reboot several times and hit escape and alt-f2 and so on a lot during said blackness" and eventually got a bios screen that let me boot from a USB stick.

Devuan's installer is really _sad_ compared to Ubuntu. What Ubuntu did was boot to a live CD, then run a gui app. That's basically copying the cutting edge knoppix technology from 2003 (which is 15 years ago now), and they've been doing it since... 2004 I think?

Devuan started with a menu of multiple install options (I have no clue here and cannot make an informed decision, STOP ASKING ME FOR INFORMATION I DO NOT HAVE YET), but all of them seem to go to a fullscreen installer with a font that's way too small for comfort, and no way to change it. Ok, soldiering on: it's freaking out that I used unetbootin to create the USB boot stick, promising a plague of locusts and possibly frogs if I continue. But it doesn't say how I SHOULD have created it, and it seems to be working fine, so I ignored it and continued.

It's refusing to provide binary firmware for the wireless card (iwlwifi-8265) because Freedom Freedom Blue Facepaint Mel Gibson. If a manufacturer was too cheap to put a ROM in their hardware and they expect the driver to load the equivalent data into SRAM, debian sits down in the mud and sulks. Great.

I think I've found where to get the firmware from debian, but "devuan ascii" isn't clearly mirroring any specific debian distro? (The previous ones were, the newest one... isn't.) The instructions say to put it in a "/firmware" directory on the USB stick, which seems separate from _booting_ from the USB stick...) All the devuan ascii docs say that all necessary firmware is bundled. Hmmm...

Ok, downloading the 4+ gigabyte "DVD" version of the devuan installer (for a complete offline install) to make a new USB stick from, and I should try to fish the firmware files out of the system76 ubuntu install before wiping it. (There's a certain amount of "should I use the 2 gb hard drive of the 1gb flash drive" for this install, I left the flash disk in because it's already there and I don't ever intend to use systemd ubuntu.)

This has already eaten all the time I allocated to poke at this.

January 3, 2019

Three days of rain and I've gotten nothing done. Barely left the house. I'm not recovered enough from seasonal affective disorder yet for the gloom outside not to put me in hibernation mode.

I was ok moving up to milwaukee in January from Austin, that was a discontiguous break and my internal clock did not adjust. But staying in milwaukee for 3 months while the days got shorter, _that_ screwed me up.

Partly it's that the sun coming up reliably knocks me out, because college. The last couple years at Rutgers were primiarly night courses due to governor Witless destroying the comp-sci program with stupid budget cuts so they lost _all_ their full-time faculty (including the head of the department; if you're denied tenure you _can't_stay_ past 5 years and they blanket denied tenure to everybody, and comp-sci had only peeled off of the physics department to become its own thing 4 years before the budget cuts...). This was the #2 most popular major on campus after "undecided" and everything had to be taught by adjuncts after their day jobs, and now you _couldn't_ complete it without lots of night classes. So I'd get home long after sunset and do more programming, then the sun would come up and I'd go "oh, didn't realize the time" and go to bed. (Which was fine if I didn't have to catch a bus to go back to class until 3pm or so.)

Now the sun coming up knocks me out. Being awake at night is fine... until the sun comes up. When my alarm's set at 6:30 am and the sun comes up over an hour later, getting up in the morning is a _problem_. And that sort of anchors the rest of it...)

January 2, 2019

Did a little research for the multicast doc in the ipv4 cleanup stuff.

Multicast failed to take off because improved compression schemes (like mp3 and mp4) greatly restricted storage and bandwith requirements of media while rendering partial delivery of data useless, and due to the widespread deployment of broadband internet via cable modem and DSL. The decline of multicast started in 1999 when Napster provided a proof of concept that distributing MP3 files via unicast could scale. RealAudio quickly lost market share to unicast media delivery solutions. These days Youtube, Netflix, Hulu, and Amazon Prime all use unicast distribution.

The decline started 20 years ago and the multicat mbone (which this address range was reserved for) essentially ceased operations about 15 years ago. The last signs of life I can find are from about 2003.

Multicast was never widely used, the range was allocated for growth that did not occur, and remaining users are treating it as a LAN protocol which could use any other LAN-local address range their routers were programmed to accept. Note also that LAN-local multicast was conserving bandwidth on 10baseT local area networks, and we have widely deployed cheap gigabit ethernet now (with 10gigE available for those who want to spend money).

Reserving 268 million IPv4 addresses for multicast, in 2019, is obviously a complete waste. We can put them back in the main pool.

Back to 2018