Rob's Blog rss feed old livejournal twitter

2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002


March 31, 2019

The L records the gnu/dammit tar outputs for long filenames have the permissions and user/group names filled out. They're not needed (they're in the next header and those are the ones that get used), but they're filled out. Meanwhile fields like timestamp are zeroed. There's no obvious pattern to it, I think it's an implementation detail (sequence packets are initialized?) leaking through into the file format.

No, it's worse. The owner/group is always "root" and the permissions are 644. So the field could be zeroed but it's instead nonsense. As with the " " after the checksum, just gotta match the nonsense to get binary equivalent tarballs.


March 30, 2019

I'm writing tar tests, trying to do a proper thorough job of testing tar (which the previous tests didn't really), and I did "tar c --mtime @0 /dev/null | tar xv", which should more or less be ls -l on a char device, but:

--- expected
+++ actual
@@ -1 +1 @@
-crw-rw-rw- root/root 1,3 1970-01-01 00:00 dev/null
+crw-rw-rw- root/root 0 1970-01-01 00:00:00 dev/null

It's showing size, not major, minor. (This is the gnu/dammit one.) I want TEST_HOST to pass, but they're showing useless info here. "Be compatible" is fighting with "do it right". Hmmm...

What does posix say? Hmmm. The last posix spec for tar was 1997, before they removed it (just like cpio, the basis for rpm and initramfs; Posix went off the rails and we're I'm waiting for Jorg Schilling to die before trying to correct anything). And that says:

The filename may be followed by additional information, such as the size of the file in the archive or file system, in an unspecified format. When used with the t function letter, v writes to standard output more information about the archive entries than just the name.

Great, EXPLICITLY unspecified. Thanks Posix! You're a _special_ kind of useless.


March 28, 2019

Ok, the Embedded Linux Conference and Open Source Summit are colocated in San Diego in August (the Linux Foundation does this to dilute the importance of conferences, it's about like how Marvel had endless crossoves to force you to buy more issues back in the 90's right before they went bankrupt). The CFP closes April 2. I should submit a thing.

Topics. Ummm. I could do an updated 3 waves thing (lots of good links for that, credentials, A03 is fan run and thus better at what it does, more on that, credentials vs accomplishment, and so on.) I could do a talk on 0BSD, on mkroot, on toybox closing in on 1.0...


March 27, 2019

So, tar paths...

$ tar c tartest/../tartest/hello | hd
tar: Removing leading `tartest/../' from member names
00000000  74 61 72 74 65 73 74 2f  68 65 6c 6c 6f 00 00 00  |tartest/hello...|

It's matching .. sections (the code I'm replacing was just looking at _leading_ ../ which isn't good enough).

$ tar c tartest/../../toy3/tartest/hello | hd
tar: Removing leading `tartest/../../' from member names
00000000  74 6f 79 33 2f 74 61 72  74 65 73 74 2f 68 65 6c  |toy3/tartest/hel|
00000010  6c 6f 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |lo..............|

And the gnu/dammit code is stupid.

$ tar c tartest/sub/../hello | hdtar: Removing leading `tartest/sub/../' from member names
00000000  68 65 6c 6c 6f 00 00 00  00 00 00 00 00 00 00 00  |hello...........|

_really_ stupid.

Of course figuring out what/how to cannonicalize is weird too, because I don't have abspath code that stops when it matches a directory, and there's no guarantee it would anyway rather than jumping right over it. I want the _relative_ path to be right.

Sigh. Compatibility, do what the existing one's doing...


March 25, 2019

Got a heads up from Elliott that auto-merges of external projects into the Android Q branch end on April 3, feature freeze in run up to the release. So if I want to get tar promoted and in, I've got until then.


March 24, 2019

Once again trying to work out if old = getenv("X"); setenv("X", "blah", 1); setenv("X", old, 1); is allowed. Because old is a pointer into the environment space, and setenv replaces that environment variable. Under what circumstances do I need a strdup() in there?

I dug into this way back in 2006 but don't remember the details...


March 18, 2019

Tar cleanup corner case: the gnu/dammit tar fills out the checksum field weird, I kinda dowanna do that but the resulting tarballs won't be binary identical if I _don't_...

Backstory: tar header fields are fixed length records with left-justified ascii contents, padded with NUL bytes. The numerical ones are octal strings (because PDP-7 used a 6 bit byte, we say the machine Ken and Dennis wrote Unix on had 18k of ram but that was 1024 18-bit words of memory).

The "checksum" field is just the sum of all the bytes in the header, and is calculated as if the checksum field itself is memset with space characters. (Then you write the checksum into the field after you've calculated it.) The checksum has 7 digits reserved (plus a NUL) but due to all the NUL bytes in the header, the checksum is almost always 6 digits. So it _should_ have 2 NUL bytes after it... but it doesn't. It has a NUL and a space, ala:

00000090  31 34 31 00 30 31 32 32  36 36 00 20 30 00 00 00  |141.012266. 0...|

The _reason_ for this is historical implementations would memset the field, iterate over the values, and then sprintf() into the field which would add a NULL terminator but not overwrite the last space in the field. And the gnu/dammit tar is either _doing_ that, or emulating it.

I'm not memsetting spaces into the cksum field, I'm starting with 8*' ' and skipping those 8 bytes... but the result is I'm printing out two NUL bytes at the end instead of NUL space. And if you check for binary identical files...

It's _almost_ certain no tar program out there is going to care about this, but if I don't and I use canned tarballs in my tests, CHECK_HOST would always fail with the gnu/dammit implementation. (Or possibly busybox, I haven't looked at what that's doing yet.)


March 16, 2019

Oh FSM. I feel I should do a response to LWN's motivations and pitfalls for new "open source" licenses article, but you could just watch my 3 minute rant on there being no such thing as "the GPL" anymore, copyleft fragmentation inevitably increasing as a result, and the need for a reclaimed public domain via public domain equivalent licenses that don't have the "stuttering problem".

Of course there's no mention of 0BSD or similar, they haven't noticed it yet. A lot of people haven't worked this sea change through to a logical conclusion yet, they're still trying to make a better buggy whip because their old one stopped serving their needs. Fighting the last war...


March 15, 2019

That side gig is hanging over me. I want to do the thing for them, it's not hard, but I'm huddling under an "out of service" sign.


March 13, 2019

At Fade's. Well, currently at the McDonald's down the street from Fade's.

Tar has an interesting corner case in autodetecting file type: if it's a seekable file you can read the first tar header block (512 bytes) and if it doesn't start with "ustar" (unix standard tar, posix-2001 and up so an 18 year old format we can pretty much assume at this point, albeit with extensions) then check for compression signatures for gzip and bzip...

At which point, if it's _seekable_ you seek back to the beginning, fork, and pass the filehandle off to gzip or similar. I just redid xpopen() so it can inherit a filehandle from the host namespace as its stdin/stdout. (It can still do the pipe thing: feed it -1 and it'll create a pipe, but feed it an existing filehandle and it'll move it to stdin/stdout of the new process; I should probably have it close it in the parent space too but haven't yet because when you pass along stdin/stdout _those_ shouldn't get closed and is that the only case?)

But if it's _not_ seekable, I have 512 bytes of data I need to feed into the new process, and there's no elegant way to do that. I kind of have to fork another instance of tar with the appropriate -zjJ flag and then have _this_ one spin in a loop forwarding it data through a pipe(2).

Which is awkward, but doable...


March 12, 2019

Packed out of apartment, onna bus to Fade's.

Hey, ubuntu found a new way to fail. Doesn't suspend because kworker/u16 (Workqueue: kmemstick memstick_check [memstick]) failed to suspend for more than 120 seconds, and so the suspend was aborted _after_ I'd closed the lid and put it in my laptop bag, so instead it got VERY HOT.

Bravo, ubuntu. Yes of course the correct thing to do if the memory stick probe hangs for 2 minutes is to MELT THE HARDWARE. Linux: smell the usability!


March 11, 2019

First day where I would be working if I hadn't quit the job. Sitting in the apartment poking at computer stuff. I had a long todo list I haven't done any of yet. Luckily, over the years I've learned that "not doing stuff" is an important part of the process. I need cycle time. Rest, recovery, sleep, staring out windows. I gave up a lot of money to be able to afford _not_ to do stuff today, and am enjoying it.

That said, I should at the very least drop off the "moving out of the apartment" form, and maybe take my bike back to the bike co-op I got it from and go "here, free bike". (It's a vintage Schwinn, it's lovely. Someone will want it as much as I did. Alas, can't easily take it out of state with me.)

Somebody tried to sign up to the https://landley.net/aboriginal/about.html mailing list, and I forwarded them to mkroot, but as I told them in the email... "I mostly talk about it on the toybox mailing list. And patreon. And my twitter. And my blog..." (It had a mailing list but I stopped using it after a thing happened. I have a vague roadmap to merge it into toybox and stop doing it as a standalone project, but need to implement route and toysh in toybox first.)


March 10, 2019

And thunderbird filled up all memory, wasn't watching, didn't kill it fast enough, and it locked the machine hard. Had to power cycle. Wheee.

Lost 8 desktops full of open windows, most of which had many tabs. Rebuilding much state. The most irreproducible loss is, of course, all the thunderbird windows where I clicked "reply" to pop open a window to deal with later. Thunderbird keeps no record of that whatsoever. (Kmail would reopen them all when restarted, but alas that was bundled into a giant desktop suite and went down with the ship it was tied to the mast of. Pity, it was a much better mail client than thunderbird. Oh well.)

Once upon a time, Linux had an OOM killer that would kill misbehaving processes if the system was in danger of running out of memory and locking up. People complained that their process might get killed. So the kernel devs neutered the OOM killer so it doesn't work remotely reliably and now the whole system locks up as often as it's saved by the OOM killer, because killing _every_ process is clearly an improvement to killing _a_ process.

Sigh. Lateral progress.


March 9, 2019

Thunderbird's sluggish again so I tried to clean out the linux-kernel folder. Since this is the big machine with 16 gigs of ram and 8 gigs swap, I told it to move 96k messages instead of the usual 20k at a time. It moved all the messages, and then did its Gratuitous Memory Hog thing it always does at the end (because Thunderbird is programmed strangely). It ate all 16 gigs DRAM, worked its way through all 8 gigs swap, and then I called killall thunderbird from the crl-alt-F1 console before the machine could hard lock (because the OOM killer dosn't work anymore, no idea why).

And of course when I started it back up, none of the messages it had spent hours copying to the new folder had been deleted from the old one.

Could somebody not crazy write an email client? This doesn't seem hard. Far and away the _most_ annoying thing about thunderbird is when it pops up a pulldown menu or hovertext, and then freezes for 6 minutes doing something where the CPU or disk is pegged, and the darn pop-up follows me when I switch desktops, blocking whatever's behind it.

So now I tried right click delete... and it's moving 96k messages to a trash folder. Sigh. NO, DELETE THEM! NOT MOVE TO TRASH! NOW WHEN THIS CRASHES I'M GOING TO WIND UP WITH _THREE_ COPIES!

It's a good thing this machine has gigabytes of free disk space because this email client is written by idiots. And once you start one of these operations that's going to take 4 hours (and then maybe try to crash the OS again afterwards if you're not babysitting it), there's no way to interrupt it short of kill -9 which would leave the files in who knows what state...


March 8, 2019

Last day at JCI. Stress level: curled into a ball, whimpering.

Sigh. I'd really like to move the Android guys to a more conventional build approach, where the Android NDK toolchain is not just a generic-ish toolchain but is the one used by AOSP, so that 1) you can export CROSS_COMPILE=/path/to/toolchain/prefix- and if your build is cross compile aware it just works, 2) Android isn't shipping 2 slightly different toolchains that do the same thing.

They are reluctant to do this because A) windows, B) they see me trying to apply conventional embedded-ish development to android as weird. (Everybody except them is an app developer. This isn't how you build apps!)

Sigh. I keep going "this reduces to this, just implement the general case and it should work in a lot more situations" and getting "but that's not how we've ever thought of it, you'll confuse people". I get different variants of it from the linux kernel guys, the distro maintainers, embedded developers, the android guys, compiler developers... everybody's in their own niche.


March 7, 2019

I've been doing a review pass of pending/tar.c and adding a bunch of "TODO: reading random stack before start of array" and so on, and I've come to the conclusion I need to change the xpopen_both() api. Because if the child process needs its stdin or stdout hooked up to an existing filehandle, there's no current way to do that.

The way it works now is you pass in an int[2] array and it hooks up a pipe to each one that's zero, and writes the other end of the pipe into that slot (int[0] going to the stdin of the process and int[1] coming from the stdout of the process). But what I _want_ is if I feed an existing filehandle to the process, THAT filehandle should become the stdin or stdout of the process. (So gzip can read from or write to a tarball.)

Also, once upon a time I had strlcpy() which was like strncpy but would reliably add a null terminator and didn't do the stupid (memset the rest of the range after we copied). It was just something like "int i; if (!len--) return; i = strlen(src)+1; memcpy(dst, src, i>len ? len : i); dst[len] = 0;" and it worked fine. But unfortunately BSD had the same idea, and added it to libc in a conflicting way (const const const str const *const) and I think uClibc picked that up, so I switched to xstrncpy() which will error_exit() if the string doesn't fit. Which 99% of the time is what you want: don't silently corrupt data. BUT with tar and the user and group name fields...

Hmmm, except if they don't fit what _do_ we want? Truncating could (theoretically) collide with another name, and if the lookup by name fails we've already got UID/GID. (I did bufgetpwuid but didn't implement a negative dentry mechanism for optimizing _failed_ username lookups...)

Ah, it's using snprintf(), close enough. (I keep confusing that with strncpy, which is stupid and will memset the rest of the space with zeroes for no apparent reason. But snprintf() will just _stop_writing_ at the appropriate spot, leaving a null terminator and not gratuitously molesting the rest of the field.)


March 6, 2019

Last week at work. Totally listless. Paralyzed, basically. I'm stress eating and stress tweeting.

Also, SEI has resurfaced with Probaby Money (not yet the same as Actual Money but you never know), and I've mentioned my recruiter found me a side gig (telecommuting getting a medical sensor board upgraded to new driver versions), and I'm kind of annoyed that I quit my $DAYJOB (which paid quite well) so I would have TIME, and that time is already filling up with other less-well-paying work.

I'm totally aware this is a self-inflicted problem, but... dude. I should be better at saying no.


March 4, 2019

Dreamhost has been poking me about renewal for landley.net. Got the check in the mail today.

(I know way too much about how the sausage is made to be comfortable doing financial transactions online. I'm aware it's silly, and yet...)


March 3, 2019

Poking at toys/pending/tar.c and of course the first thing I do (after a quick scan and some "this sprintf is actually a memset" style cleanups) is build it, make an empty subdirectory, and "tar tvzf ~/linux-4.20.tar.gz". And I get a screen full of "tar: chown 0:0 'linux-4.20/arch/mips/loongson64/common/serial.c': Operation not permitted".

Sigh. This is unlikely to be a small task.


March 2, 2019

Fighting bad Linux userspace decisions.

So top -H is showing the right CPU usage for child threads, but the main thread of a process has the cumulative CPU usage. I _think_ this is because /proc/$PID/stat and /proc/$PID/task/$PID/stat have different data (I.E. the kernel is collating when you read through one API but not reading the same data through another API).

I have a test program that spawns 4 child threads and has them spin 4 billion times in a for(;;) loop, and I just poked it to dprint(1, "pid=%d") the PID and TID values (to a filehandle so I don't have to worry about stdio flushing for FILE *), and I hit my first problem: glibc refused to wrap the gettid() system call? (What the... the man page bitches about thread IDs being an "opaque cookie" and I'm going "this is legacy crap from back when pthreads as an abomination, before NTPL, isn't it?" Sigh, so use syscall() to call gettid so I have the number I can look in /proc under.

Second problem: the process doesn't _end_ until the threads finish spinning and exit, which means the output doesn't close, so my little pipeline doing:

./a.out | sed -n s/pid=//p | (read i; cat /proc/$i{,/task/$i}/stat)

Is sitting there blocked in read until a.out exits, at which point the cat says the /proc entries don't exist anymore. This is DESPITE the fact that if you chop it at the second | you get the value followed by a newline immediately! It's just that bash's read is blocking trying to get more data AFTER the newline, for reasons I don't understand? (Even read(4096) should return a _short_read_. And yes the "read i;" needs a sleep 1 after it to accumulate enough data to see the divergence reliably, but this bug hits first and that confuses debugging right now.)

This totally needs to be a test case for toysh. My "bash replacement" should get this RIGHT, even if ubuntu's bash doesn't. (I was even desperate enough to check /bin/dash, which also got it wrong in the same way. Well, ok dash didn't understand the curly bracket syntax, but it waited out ./a.out's runtime _before_ getting that wrong.)


March 1, 2019

Two different coworkers basically need the toybox version of a command to fix a problem they're having. One is that busybox's ar can't extract an ipk file, another is a busybox tar bug where if you tar -xC into a subdir that results in broken symlinks (in this case a root filesystem install from initramfs into a mount point where /etc/localtime points to a timezone file that's there in the subdir but the symlink points to the absolute path of where it would on the final system), busybox tar does NOT chown the symlink. So the symlink belongs to root:root instead of whoever it's supposed to belong to, even though the tar file has the right info.

Alas, I haven't implemented toybox tar and ar because I've been too busy with $DAYJOB. I'm not sure if this is ironic or merely unforunate. I'd ask Alanis Morisette, but I'm told she had problems with that too.


February 28, 2019

It's the last day of the month and I kept meaning to check if any conference call for papers were expiring, but I just couldn't bring myself to care.

I told my boss at $DAYJOB on monday I'm too burned out to accomplish anything else, but they still haven't let me know when my last day is. They keep saying they're _not_ unhappy with my performance on the morning call, but _I_ am unhappy with my performance.

One of the big differences between my mental health in my 20's and now is I know when I need to bow out for self care. (I often miss when I _should_, but am reasonable about working out when I _need_ to.)


February 27, 2019

I'm doing board bringup for that side gig, and they just emailed me a large explanation of the hardware they need working. I unboxed the new board yesterday and confirmed the bits connect together, but haven't actually powered up the result yet.

My first goalpost on any new board is "boot to an initramfs shell prompt on serial console", at least when I'm trying to understand everything and rebuild it properly from source. Getting that working means:

1) Our compiler toolchain is generating the right output for the board, both in kernel mode and userspace/libc.

2) We know how to install code onto the board and run it. (Whether it's tftp into memory or flash it to spi or jtag or what.)

3) The bootloader is working, running itself and doing setup (DRAM init, etc), then loading and starting our kernel.

4) If we get kernel boot messages then the kernel we built is packaged correctly, has a usable physical ram mapping, and is correctly writing to the serial port.

5) If we can run our first program (usually rdinit=/bin/sh) then the kernel is enabling interrupts properly (the early_printk stuff above is simple spin-and-write-bytes with interrupts disabled, that's why printing fewer early boot messages can speed up the board booting), finding a clock to drive the scheduler, and this is where we verify the libc and executable packaging parts of the toolchain work right (because we're finally using them; often I do a statically linked rdinit=/bin/helloworld first if it's giving me trouble.)

When we're done "I built and ran a userspace program that produced output" means I should be able to build arbitrary other ones, and a toybox shell is the generic universal "do lots of stuff with the board" one, where you can mount /proc and /sys and fiddle with them, load modules, etc. That's basically where you get traction with the board.

When an existing BSP gives you a working Linux reference implementation, most of these steps are probably just isolating and copying what it's doing, but I like to step through and move all that stuff into the "I know what it's doing, or at least where to look it up if it breaks" category on any new board I have to support in a nontrivial way.

Then the next thing is usualy digesting the kernel .config into a miniconfig and seeing what's there, coming up with the minimal set of options to do the shell prompt thing and cataloging the rest of them.


February 26, 2019

I'm trying to figure out if my normal response to spam callers is "punching down". I always try to hit the buttons to get through to a human, then say "You spam people for a living. That's sad." and then hang up.

The problem is, I'm doing this to the minimum wage drones in some poverty-stricken rural area who are... doing it for a living. Not the people benefitting from it and collecting 90% of the money from whatever scam it is. But alas, this is the only way I know to push back. (It's not like our current government will do anything about it, not until the GOP finishes imploding, which won't happen until the Boomers die and the fossil fuel companies lose their position as 1/6 of the planet's economy.)


February 25, 2019

Told my boss I'd like to wrap up at work. The money is _lovely_ and this is work I could do in my sleep _if_ I could do it. Unfortunately I've got a variant of writer's block, which is a bit like having a big term paper due and being unable to start because you're so stressed out.

I've been spinning my wheels here so long that I've exhausted my coping mechanisms.


February 22, 2019

How is this page's bit on toybox wrong, let me count the ways:

The Toybox license, referred to by the Open Source Initiative as the Zero Clause BSD license,[7] removes all conditions from the ISC license, leaving only an unconditional grant of rights and a warranty disclaimer.[8] It is also listed by the Software Package Data Exchange as the Zero Clause BSD license, with the identifier "0BSD."[9]

It's not important that it's from toybox, other projects use it too. It was the OpenBSD suggested template license and I got Kirk McKusick's permission to call it zero clause BSD. IT doesn't remove _all_ conditions, it removes half a sentence. And SPDX approval came long before OSI, so a better phrasing would be:

The Zero Clause BSD license [7] (SPDX identifier "0BSD"[9]) removes half a sentence from the OpenBSD suggested template license [https://www.openbsd.org/policy.html], leaving only an unconditional grant of rights and a warranty disclaimer.[8]

Anybody want to edit wikipedia[citation needed] to fix this?


February 21, 2019

Still deeply burned out.

VirtualBox's .vdi files provide "sparse" block devices that grow as you use more space in them (up to the maximum size specified at creation time). The ext4 filesystem assumes any block device it's on might be flash under the covers, and attempts to wear level them via round-robin allocation.

Guess how these two interact! Go on, guess!

I set up a new VM, and because my previous one ran out of space I was generous about provisioning it, thinking it would only use the space when it actually needed it. After deleting two other VMs and a DVD iso and trying to figure out why a VM using 60 gigs in the virtual Linux system was consuming 160 gigs on the host...

I had a BAD DAY. And now I need to redo the VM from scratch because even if I could shrink the ext4 partition (the resize tool can grow them while mounted, but not shrink them), I dunno how to tell the emulator to give back the space it would stop using...

Darn it, I was excited about this, but no. The person who pointed me at it said it was a bash test suite that might help me with toysh being a bash replacement. But the readme didn't say what to _do_ to run th bash tests. I figured out that bin/bats with the thing to run, but its output with no arguments was useless and --help didn't really help either. I eventually figured out "bin/bats test" but then it only ran 44 tests and they tested the test suite, not the shell?

At which point I figured out that it's not a shell test, it's test plumbing written _in_ bash. That's useless, I've written and _published_ 2 sets of test infratructure in bash myself already (one in toybox, one in busybox). That's uninteresting, what's interesting is the _tests_, and this has none. And it's doing the "#!/usr/bin/env bash" thing which is INSANE: why do you trust /usr/bin/env to be there at an absolute path? Posix doesn't require that. Android (until recently) didn't even have a /bin directory. It's /bin/bash even on weird systems like MacOS X. The ONLY place that installs it but puts it somewhere else is FreeBSD, and that's FreeBSD-specific breakage. It's a fixable open source system: drop a symlink and move on. (Just like we all fix /bin/sh pointing to the defective annoying shell on debian.)


February 20, 2019

Upgrades to the su command came up recently, and it's been on my todo list forever: if you want to run a command as an arbitrary UID/GID, it's kinda awkward to do so with sudo or su because both conventionally want to look up a name out of /etc/passwd, and will error out on a uid with no passwd entry even for root. But these days with things like containers, there's lots of interesting UIDs and GIDs that aren't active in /etc/passwd. (And then there's the whole android thing of not having an /etc/passwd and using their version of the Windows Registry instead, because keeping system information in human readable text files is too unixy or something....)

So anyway, I want su -u UID and su -g GID[,gid,gid...] to work, at least for root. And I want to be able to run an arbitrary command line without necessarily having to wash it through a random command shell. And _implementing this is fairly straightforward. No the hard part is writing the help text to explain it, especially if I've kept compatibility with the original su behavior.

A word on the legacy su behavior: way back when setting a user's shell in /etc/passwd to /bin/false or /dev/null was a way of preventing anybody from running commands as that user. Then su grew -s to override which shell you were running as, so this stopped working from a security standpoint. (Besides, if you were running as root you could whip up a trivial C program to do it anyway, but the point was _su_ no longer enforced it.) And it let you specify -c to pass a command line to that shell so su could "run a command as a user" instead of being interactive, so this ability is already _there_ for most users, just awkward to use.

But su has an awkward syntax where it runs a shell and unrecognized options are passed through as options _to_the_shell_. (So the -c thing was kind of accidental at first.) So using su as sudo isn't just "su user ls -l", it's su user -s $(which ls) -l if you don't want to invoke a gratuitous shell in between. And defining new su options means they _don't_ get passed through to the shell.

What would have made sense was a syntax like xargs, where the first command that's not an option stops option parsing for the wrapper. But that's not what they did back circa 1972...


February 19, 2019

Burnout. So much burnout.

When I came to this job a year ago, I was interested in the technology. I was helping get realtime Linux running on an sh4 board. (The larger context was they shipped a Windows CE product back in the 90's, and Windows CE was being end of lifed by Microsoft. So this Microsoft shop was switching to Linux, which I'm all for and happy to help with. As for the sh4 boards, they had a bunch of this hardware installed at customer sites, and a large stock of part inventory to make more boxes with at the factory, so getting Linux running on those was useful to them.)

Coming _back_ in January was because the money was good, it was easy to just keep going, I didn't have another job lined up, and we've still got about half the home equity loan to pay off from riding down SEI.

But this time... they've already built up a reasonable Linux team (including people I know like Rich Pennington of ellcc and Julianne Haugh of the shadow password suite), all the new work is on standard x86 and arm boxes with gigahertz and gigabytes, they're using wind river's fork of yocto's fork of openembedded with systemd ru(n|i)ning everything, the application is still dot net code running on mono talking to a windows GUI app...

And I'm not entirely sure what I'm doing. Not "I don't know how to do this", I mean what am I trying to accomplish? What is this activity _for_?

I'm part of an enormous team where we have over a dozen people in a room for an hour twice a week going over excel spreadsheets reacting to comments on the design of things like "background file transfer" (strangely not rsync) which is somehow a 12 week project for over a dozen people, told "this is what you're doing this week" more or less via the waterfall method. There's an API document, an implementation of this API via gratuitous translation layer with a management daemon using dbus to talk to systemd, and then functions you plug in for a given architecture that the guy who wrote the daemon could have done in a couple hours.

I think this has turned into a "bullshit job". And I am unhappy. The money remains excellent, but... that's pretty much it.


February 18, 2019

If I titled blog posts, this one would be "Tabsplosion is a symptom of overload".

When I say "that's on the todo list", I'm fudging a bit. The toybox todo list does indeed have a todo.txt. And a todo.today. And a todo2.txt, todo3.txt, todo/*.txt, and various commandname.txt files with notes on individual commands.

My toybox work directory (for a couple years now) is ~/toybox/toy3, following my convention of doing a git checkout in a directory with the name of the project, so various debris that doesn't get checked into git has someplace to lib. This _starts_ as ~/toybox/toybox and there's a ~/toybox/clean for testing that I've committed sane chunks and it builds properly. Eventually so much half-finished cruft builds up in my work directory I clone a clean one and do some "needs to happen fast" project in there, and keep the old one around in hopes of salvaging the old work. (Which, as with viewing bookmarked pages again, never happens. This is why I have so many open tabs, there's a _chance_ I'll get back to the todo item it represents.)

This is how I wound up with toy3. (And in fact a toy4 and toy5 that didn't stick.) Those other directories have their own todo files in them. (Much of which overlaps, but not all.)

And then there's ~/toybox/pending which is full of things like a checkout of Android's minijail, libtommath, jeff's fixed point library from the GPS stuff we did, my old dvpn code (from like 2001), the rubber docker containers tutorial I attended at linuxconf.au, a CC0 reference implementation of sha3, snapshots of this and this in case the pages go down, and so on. The todo item is implicit in a lot of those. :)

I also have old tweet threads and blog entries and such that I should collate at some point. A lot of my todo items point to those.

As for the topic sentence, my todo list grows fastest when I don't have time to follow the tangents right now. So I make a note of it and move on.


February 17, 2019

The bus back from Minneapolis left at 9:25pm, and was supposed to get in at 3:30 am but got in at 4am.

I'm still using the giant System76 laptop from 2008, which is 6 years old but has 16 gigs of ram and 8x processor and a terabyte hard drive and is fairly reasonable now that I've gotten a new battery for it, except for 2 things. It's still fairly ginormous, and the hard drive is rotating media so I'm nervous using it in a high-vibration environment. Such as on my lap on a bus for 6 hours, even when there is a working outlet.

A coworker at Johnson Controls (Julianne Hough, the long-ago author of the Shadow password suite) has a "laptop" that's a tablet with a case and keyboard. Except it's a mac. I want an Android device that does that (and in theory I can get add a 128 gig sd card to however much built in storage the sucker has so I should be able to get something reasonable), but every time I actually buy something it's a cheap clearance device like the annual Amazon Fire tablet sales during "prime day", and they're so locked down that it's just not worth the effort to crack them. This is a structural problem: what I'm trying to to with toybox is turn android in a usable general-purpose computing environment you can actually use as a development workstation more or less out of the box, but they're terrified of the "evil butler" problem. (Which isn't _just_ a tablet problem, EFI TPM nonsense does this for PCs, there are periodic LWN articles on that.) You should be able to aftermarket blank 'em, but how you distinguish that from "an organized crime organization like the NSA or GOP sent a dude into your room for 30 minutes while you're at dinner and now your device serves them not you until they decide to assasinate you".

Sadly, I haven't installed devuan on the other System76 oversized monstrosity because firmware nonsense and too busy to care. I got email from System76 that they've introduced a laptop to their lineup that _isn't_ visible from space, but I don't trust them. If buying System76 _doesn't_ mean I can just slap an arbitrary Linux distro on it because it's full of magic firmware that never went upstream, what's the _point_? If I have to install a magic distro-specific Linux distro fork, I might as well get a GPD Pocket or something.


February 16, 2019

Hanging out with Fade in Minneapolis. I have deployed heart-shaped butterfingers at her. (It's her favorite candy bar, and there was a sale.)

Yay, the gitub pull request adding 0BSD to the license chooser got merged!

This means I have developed just enough social skills to disagree with someone about how to help without pissing them off to the point they no longer want to help! (Although it's still a close thing, I wouldn't say I'm _good_ at this. I'm still far too easily irritated and have to really _push_ to compromise. (In this case that would mean swallowing my principles and editing a wikipedia page directly.)


February 15, 2019

There are over 100 toybox forks on github. I did not expect that. Hmmm... The most forked of which just added a logo and half an "rdate" command, back in 2016...

The downside of 0BSD licensing is when you find a nice patch in an external repo that wasn't submitted upstream (or if it was, I missed it), I'm nervous about merging it because forks of toybox are not actually required to be under the same license.

In this case the repo it's checked into still has the same LICENSE file and no notes to the contrary, and I can probably rely on that, but I'm still nervous and like to ask. Submissions ot the list mean they want it in, which means it has to be under the right license to go in. The submission _is_ the permission grant, the specific wording is secondary.

The intent of the GPL was to force you to police code re-use: if you accidentally sucked GPL code into your project, you had to GPL your project. (In reality you just as often had to remove it again and delete the offending version, as Linux did with the old-time unix allocation function Intel contributed to the Itanic architecture directory back during the SCO trial. Solving infringement via a product recall and pulping a print run has plenty of precedent.)

Then GPLv3 hapened and "the GPL" split into incompatible versions, and suddenly you had to police your contributions just as hard, your GPLv2 or later project couldn't accept code from GPLv3 or GPLv2-only sources, and the easy thing to do was break GPLv2-only. These days there's no such thing as "The GPL" anymore, thanks to the FSF. "The GPL" fragmented into three main incompatible GPL camps (GPLv2 and GPLv3 can't take code from each other, and the dual license of "GPLv2 or later" can't take code from either one), and then there's endless forks like Affero GPL complicating it further. This means there is no longer a "universal receiver" license covering a united pool of all copyleft code into a single common community of reusability, which is why copyleft use has slowly declined ever since GPLv3 came out. These days with GPL code you have to police in both dirctions, incoming _and_ outgoing code.

0BSD goes the other way from the glory days of "The GPL": you have to be careful about accepting contributions (and I'm more paranoid than most about that, having been involved in more copyright enforcement suits than any sane person would want). But what that buys you is the freedom for anyone wanting to reuse your code elsewhere to just do it, whenever and wherever however they like. No forms to fill out, no signs to post, have fun. They don't even have to tell me if they did it. (The internet is very good at detecting plagairism, I'm not worried about that.)

A fully permissive license holding nothing back is the modern equivalent of placing the code into the public domain. The berne convention grants a copyright on all newly created works whether you want it to or not (the notice is just for tracking purposes of _who_ has the copyright, so you're not in the "the original netcat was written by 'hobbit', how do I get in touch with 'hobbit' or their estate?"), but there's no enabling legislation for disposing of a copyright. You can't STOP owning a copyright, except by transferring it to someone else.

And thus the need for public domain equivalent licensing. You can't free(copyrght) but you can work out a solution.


February 14, 2019

Date is funky. The gnu/dammit date didn't implement posix, and busybox gets it wrong. Time zones changing names because of daylight savings time.

Testing day of the week. Found a hack. Coded it up. Went to test it.

$ ./date -D %j -d 1
Sun Jan  0 00:00:00 CST 1900
landley@halfbrick:~/toybox/toy3$ busybox date -D %j -d 1
Thu Feb 14 00:00:00 CST 2019

Sigh.

The C API for this is kinda screwed up too, although we need a new one that handles nanoseconds anyway.


February 13, 2019

The biggest sign that "const" is useless in C is that string constants have been rodata forever, but their _type_ isn't because that would be far too intrusive.

Putting "const" on local variables or function arguments doesn't affect code generation (which has liveness anaysis). It can move globals from the "data" segment to the "rodata" segment, which is nice and the compiler doesn't get without whole-tree LTO because the use crosses .o boundaires, but everywhere else it just creates endless busywork propagating a useless annotation down through multiple function calls without ever affecting the generated code.

I periodically recheck on new generations of compiler to see if it's _started_ to make a diference, but I don't see how it can because liveness analysis already has to happen for register allocation/saving/restoring, and that covers it better than manual annotation can? In this respect "const" seems like "register" or non-static "inline", ala "Ask not for whom ma bell tolls: let the machine get it".

Sadly, even though I do add "const" to various toybox arrays to move them into rodata, the actual toy_list[] isn't const because sticking "const" on it wants to propagate down into every user through every function argument (otherwise it's warning city and in fact errors out about invalid application of sizeof() to incomplete types when I all did was add "const" in two places).


February 12, 2019

Phone interview with the side gig, I'd get to poke at a new architecture (we are the knights who say nios) which qemu has a thing for! But no musl support for it, and Linux support is out of tree? Really? (A whole unmerged architecture that people are still using?) It's frustrating there's no easy way to get qemu-system-blah to tell you what it provisions a board emulation with. (How much memory, I/O controllers, disks, network, USB...)

It would be nice if "qemu-system-nios -M fruitbasket --whatisit" could say these things. The board has to _know_ them, somehow. Maybe through the device tree infrastructure? I might try to teach it, but all my previous qemu patches languished unmerged for years. Not worth the effort.


February 8, 2019

Very very tired. Went off caffeine monday but it's 4 days later and still tired. Burned out, half days yesterday and today.

I turned down a job in Minnesota a recruiter offered me. 20% less money isn't a deal breaker, but... they're not on the green or blue lines? It's an hour and half each way to Fade's via public transit (green line, bus, then walk) so I'd need to get an apartment near the work site to avoid a longish commute from the university (and Fade), and they're in some sort of suburban industrial park where there are family houses but no efficiency apartments? And this employer moves to seattle in june anyway.

Contracting company at the recruiter I got the JCI job through wants me to skype with somebody for evening and weekend jobs. It would pay off the home equity loan faster...


February 6, 2019

I'm trying to build Yocto in a fresh debootstrap. You'd think this would be documented, but it's a bit like the "distros only build under earlier versions of itself" problem, because Yocto is a corporate pointy-haired project and Red Hat is Pointy Hair Linux.

As a first pass I want to run a yocto instance under qemu, but when I downloaded it yocto wanted me to install a bunch of packages like "makeinfo" that I don't want on my host system. Hence debootstrap chroot.

So install debootstrap (I used apt-get on ubuntu), then the wiki instructions say the setup is:

debootstrap stable "$PWD/dirname" http://deb.debian.org/debian/

Where "stable" is the release name, next argument is the directory to populate, and the third is the repository URL to fetch all the packages and manifest data from.

So clone yocto (git clone git://git.yoctoproject.org/poky), checkout the right branch (current stable appears to be "thud"), and then "source oe-init-build-env" and...

mount /proc /sys /run
apt-get install locales &&
locale-gen en_US.UTF-8 &&
su - user &&
cd /home/poky && 
source oe-init-build-env &&
LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 bitbake core-image-minimal
  echo en_US.UTF-8 UTF-8 >> /etc/locale.gen &&
locale-gen &&
update-locale LANG=en_US.UTF-8

What on earth is a uninative binary shim? All I can find is this and it's at best "related". It's downloading a binary it has to run on the system, and can't build from source. So much for building yocto on powerpc or sh4 or something. Thanks yocto!

Python 3 refuses to work right if you haven't got a UTF8 locale enabled, and yocto's bitbake scripts explicitly check for this and fail... but don't say how to fix it. So I read the python docs and downloaded the python 3 source code. Python's getfilesystemencoding() is calling locale.nl_langinfo(CODESET) (at least on unix systems), which comes from langinfo_constants[] in _localemodule.c in the Python3 source...

Right, you have to install the "locales" package, then run locale-gen, but the online examples showing how you can feed it a locale on the command line are wrong (including the one in the "Setting up your choot with debootstrap" section of the ubuntu wiki), it ignores the command line, you have to edit the locale.gen file to add the locale you want, then you need to update-locale to get it to use it, and THEN you can set the LC_ALL envornment variable.

Darn, yocto's parallism ignores "taskset 1 cmdline...". It's building on an 8x SMP machine so it's trying to do 8 parallel package downloads through phone tethering, and the downloads keep timing out and aborting. Hmmm... Google, google... It's bitbake controlling this, I can set the environment variable BB_NUMBER_THREADS to the number of parallel tasks.

Ok, core-image-minimal is currently building gnome-desktop-testing and libxml2. I object to 2 of the 3 words in this target name. I'll give them "image". Yeah, I accept that this is probably an image. But gnome-desktop-testing is neither core, nor minimal.


February 5, 2019

Doing release cleanup on sntp.c I hit the fact that android NDK doesn't have adjtime(). Grrr. I dowanna add a compile-time probe for this, and unfortunately while I have USE_TOYBOX_ON_ANDROID() macros to chop out "a" from the optstr, I never did the SKIP_TOYBOX_ON_ANDROID() macros (only include contents if this is NOT set) because I haven't needed them before now.

Sigh, I can just #define adjtime to 0 in lib/portablity.h. It's a hack, but android isn't using this anyway (they presumably set time from the phone baseband stuff via the cell tower clocks, not via NTP). It doesn't make the whole code stanza drop out like making FLAG(a) be zero would (then the if(0) triggers dead code elimination), but... I wanna get a release out already, it was supposed to happen on the 31st.


February 4, 2019

Ok, toybox release seriousness. What do I need to finish up to cut a release...

SNTP is the main new command and I've already used the "Time is an illusion, lunchtime doubly so" Hitchhiker's Guide quote. Oh well.

I've got an outstanding todo item from the Google guys about netcat, but it's a bug I found so I haven't quite been prioritizing it. (As in nobody else reported this bug to me, so it's not holding anybody else up.) Still, I got the ping (once they know about it, they wanted it fixed)...


February 3, 2019

Greyhound topped itself on the bus ride back to Milwaukee. Of course it left most of an hour late, when we got on it hadn't been cleaned (my seat's drink holder had an empty coke bottle in it), for the first time in my experience they checked photo IDs (and the woman behind me couldn't get on the bus because she hadn't brought hers, bus left without her), somehow 2 stops later every single seat was full even though they'd left a through-to-chicago passenger behind at the first stop, the outlets didn't work for the first 2 hours, and the heat was stuck on full the entire time and was somewhere over 80 degrees. (Eventually they opened the emergency exits on top of the bus and left them open so we wouldn't die, but it was never comfortale.) Around 6pm the bus tracker web page decided that before the next stop our bus would travel back in time to 1pm and continue on to retroactively reach chicago around 3:30 pm (going something like 200 miles per hour along that part of the route), and we were kind of looking forward to it by that point but alas, we were disappointed. Then they switched drivers in Madison, and the new driver started heading south straight to Chicago and had to BACK UP to go to Milwaukee when enough people checking Google Maps noticed and yelled at them. Over the intercom the driver claimed to have "missed an exit", and threatened to pull over and let anybody who complained out on the side of the road (we were in Janesville at that point, 40 miles south along I-90), and then drove back north (reconnecting with I-94 at Johnson's Creek) instead of taking I-43 diagonally to our destination. According to phone speedometer apps, on the trip north (along non-interstate roads) the bus sometimes got up to 55 miles per hour, but averaged less than that.

Still, I arrived in to Minneapolis only 2 hours late. Not my worst greyhound trip, but still memorable. (Beats the trip _to_ minneapolis where the driver intentionally triggered feedback on the intercom six times and said "wakey wakey" between each one as we got in around 1:30 am. I'm told Greyhound was an oil company ploy to discredit travel by bus and encourage individual driving instead. Given that the "buy up the busses and destroy them to promote freeways" plot in "Who Framed Roger Rabbit" is the part of the movie based on real events (in our world, they did it and won)...

There's also a significant element of "punishing people for being poor" going on here. I'm taking the bus not just because it's cheaper, but because between the shortage of direct flights from milwaukee to minneapolis gives me a lot fewer departure options, and even with a direct flight the "arrive 2 hours early at an airport many miles south of town" plus the minneapolis airport requiring multiple transfers to get Fade's apartment via public transportation (meanwhile greyhound is right on the Green Line, which lets off about 500 feet from Fade's apartment)... end result is the bus gets me there about as fast as flying, and if I'm lucky I can work the whole way. The bus terminal's a 15 minute walk from work without having to opt out of the Porno-Scanners for the Freedom Grope.

But there's very very strong signaling "this is for the Poors, you shouldn't be here if you have any other choice, we punish you now"... ("We" being "republicans", which is a "we" I personally am very much NOT a part of even when I'm not hanging out with the tired poor huddled masses yearning to breathe free that they despise so much.)


February 1, 2019

Our story so far: I got the record-commands plumbing checked into toybox and hooked up to mkroot, and along the way I found and fixed a sed bug that was preventing commands from building tandalone with toybox in the $PATH. (The regex to figure out which toys/*/*.c file this command lives in was returning empty, because -r wasn't triggering.)

So I fixed that, got the record-commands wrapper hooked up, built everything, and... all the targets built? Except I just fixed _sed_ and I knew the kernel build break was a _grep_ bug because replacing the airlock's grep symlink with a link to the host's grep made the build work! (I often do "what commands changed recently" guesses like that before trying to narrow it down systematically...)

Sigh. I pulled linux-git to a newer version so I'm not quite testing the same kernel source, or was it 4.19 or 4.20 I was testing? I hate when things start working again when I DIDN'T FIX THEM, it just means I lost a test case and whatever loose flakiness it revealed is still there but has gone back into hiding. It's possible switching grep versions changed something that got fed into sed, but that's still a bug: the output should be the same.

Darn it, now I've got to waste time figuring out how to break it again the right way.


January 31, 2019

Bus to Minneapolis so I can spend my birthday tomorrow with Fade.

I emailed Linus about arch-sh not booting, he pointed me at a pending fix that hadn't quite made it into mainline yet, and I confirmed it fixed it for me, but oddly lkml.iu.edu has both my emails but not Linus's in between?

Yesterday's toybox build break wasn't a grep bug, it was a sed bug, which broke toybox building anything with toybox in the $PATH. (The regex to figure out which toys/*/*.c file this command lives in was returning empty, because -r wasn't triggering.) Apparently I haven't got a tests/sed.test that checks "does -r do anything".


January 30, 2019

It's -20F out. The expected high is -7. I got permission to work from home today. (Mostly poking at yocto and going "huh".)

There's some sort of bug in grep that's breaking the kernel build, but I haven't reduced it to a test case yet, and what I used to use for this sort of thing in aboriginal linux was my old command line logging wrapper. So I spent most of a day getting the command line wrapper logging merged into toybox and integrated into mkroot, and... the toybox build is broken by the same grep bug, which means the logging wrapper install won't work in the context of the airlock (I.E. I can't build toybox with toybox in the $PATH, due to the bug I'm trying to _diagnose_).

Going back to bed.


January 28, 2019

It's too cold. And we have 8 inches of snow. My normal 20 minute walk to work (12 if I hurry) took 35 minutes today, including helping push a stuck car out of an intersection (along with a snowplow driver who got out to push on the other side).

When I got in only two coworkers I recognized were here. I'd go home early, but I'm already here and outside is the problem.


January 26, 2019

Busy week at work, wasn't sleeping well. Meant to spend today working on toybox release, but spent it recovering instead.

The big overdue thing at work is "timesync", which is where the SNTP stuff comes in. Back in late October we tried to figure out how the box keeps its clock up to date: it was close enough to just doing standard NTP that people had glossed it over as NTP... but not quite.

First of all, it's using SNTP ("Simple Network Time Protocol"), which is a subset of the NTP protocol (same 48 byte UDP packets with fields in the same place) that oddly enough has its own set of RFCs, and then in NTPv4 it all got bundled into one big SNTP+NTP RFC that's more or less illegible. So I went back to the earlier ones and am pretty much just implementing the old stuff and asking wikipedia[citation needed] whether it's safe to ignore whatever they changed.

An SNTP client can read data from an NTP server (it just doesn't care about several of the fields), but an NTP client can't read from an SNTP server (the fields SNTP doesn't care about are zeroed), and windows "NTP servers" tend to be SNTP. So if you use the Linux NTP client with a windows server, it doesn't work. (That took a while to figure out, and started us down this whole tangent.)

The box needs to be able to act as an sntp client (sntp not ntp because some exiting installs use the windows server), and it needs to be able to act as an ntp server (possibly sntp would be good enough because the downstream boxes are also running our software, but nobody seems to have _written_ an sntp server for Linux, because full NTP server works for SNTP client). And then it's got multicast.

Multicast? Yeah, there's a multicast variant in the sntp RFC, and JCI implemented it in old stuff (back in the 90's), but it's not working for some reason and it's .NET code which is a language I don't know (which isn't entirely a blocker but does slow me down) and which I haven't got a build environment for (which is the real blocker). And the ISC reference implementation in C doesn't appear to do multicast (because it's not 1996 anymore).

Note: Napster pretty much killed off Multicast starting around 1999. No podcasts use multicast. Youtube, Netflix, Hulu, and Amazon Prime do not use multicast. The original use case for multicast was "all that" and when it arrived it didn't, which means there isn't really a use case out there for it. The Mbone shut down years ago. Wikipedia[citation needed] says it's still used inside some LANs to do hotel televisions and stuff, but it's not routed through the wider internet anymore, and there really isn't a modern userbase for it, just the occasional LAN-local legacy install.

Instead we got MP3 and MP4 compression which shrinks data to 1/10 of its original size but means a single dropped packet is fatal. (As you can see with HDTV broadcasts "smearing" when the signal is marginal; and that's with a lot of effort put into implementing recovery!)

But JCI wants multicast because the old one they're replacing did multicast and they want to sell the Linux image as a strict upgrade to the WinCE image on the same hardware, without a single dropped feature. And long long ago their salesbeings pushed multicast as a Cool Thing We Can Do. So I wound up reading the RFC and writing a new one in C.

P.S. Although there isn't a Linux SNTP server, there _is_ a Linux SNTP client. It's one of the binaries the ISC source tarball _can_ build, but generally doesn't. I'm trying to convince buildroot to enable it. I suspect this was last tested by an actual human a decade ago, but we'll see...


January 23, 2019

Added multicast support to the sntp stuff. Should probably not name the multicast enabling function leeloo_dallas() but I've had enough sleep deprivation lately that's the sort of name I'm using. (Look, my brain takes the word "multicast", sticks a fifth elephant reference on the front and sings the whole thing to camptown races (doo dah, doo dah). When I'm tired enough this sort of thing leaks out into the outside world.)

All the config is on the command line: if you "snmp 1.2.3.4" it queries the server, prints the time, and how off the current clock is. Adding -s sets it, -a sets it via adjtime().

I initially had it so you could list as many servers as you liked on the command line and it would iterate through them, but if it switches between ipv4 and ipv6 I'd have to reopen the socket and I dowanna.


January 20, 2019

Ok, I need record-commands from Aboriginal Linux (which is built around wrappy.c), and rather than just dumping them into scripts/ I want to break that up into make/ and tests/harness...

Except that directory also has bloatcheck and showasm (halfway between build and testing), and mkstatus.py which generates documentation (is that build?) and I have a todo item to split up make.sh into a script that generates the headers and a script that builds the .c files. I think all the second half of make.sh is using from the first half is the do_loudly() function (which turns a command's output into a single dot unless V=1 is set)...


January 19, 2019

Working on sntp, and FreeBSD build/testing.


January 18, 2019

Darn it, poking at mkroot and I updated toybox to current git and swapped in "test" with the newly promoted toybox version, and the Linux kernel build is breaking on all architectures. And it's a funky one too, even on a -j1 build it goes:

  LD      vmlinux
  SORTEX  vmlinux
  SYSMAP  System.map
make: *** [vmlinux] Error 2

That provides no information about what went WRONG! Thank you make.

Which means I need to dig up my old command line wrapper from Aboriginal Linux; I should probably stick it in the toybox scripts/ directory, except that's geting pretty crowded with build and test infrastructure. (I provide make wrappers as a gui and "make help" lists the options but DEPENDING on make is uncomfortable, it would be nice if running stuff directly was easy to not just do, but figure out at a glance...)

I should split scripts/ up somehow. I can move the make stuff into a make/ subdirectory, but then scripts/ isn't all the scripts so shouldn't be called that. The problem is "tests" is a bunch of *.test files, one per command, and I'd like to keep that accessible and clean. It's already got a tests/files directory under it that's a bit awkward, but manageable. I could put tests/harness under there with the infratructure part, but then running it would be tests/harness/runtest.sh which is awkward. I could put "harness" at the top level but then it's much less obvious what the name means. Hmmm... tests/commands/sed.test? A top level tests directory with _three_ things under it?

Maybe I should add symlinks to the top level, ./make.sh and ./test.sh pointing into the appropriate subdirectory where the infratructure lives...

Sigh. Naming things, cache invalidation, and off by one errors remain the two biggest problems in computer science.


January 17, 2019

Human reaction time is measured in milliseconds, plural. A 60fps frame rate is a frame every 17 milliseconds. Computer reaction times are measured in nanoseconds. A 1ghz processor is advancing its clock once per nanosecond.

Those are pretty much the reason to use those two time resolutions: nanoseconds is overkill for humans, and even in computers jitter dominates at that level: DDR4 CAS latency's like 15 nanoseconds, an sh4 syscall has an ~8k instruction round trip last I checked, even small interrupts can flush cache lines...) Meanwhile milliseconds aren't enough for "make" to reliably distinguish which of two files is newer when you call "touch" twice in a row on initramfs with modern hardware.

64 bits worth of milliseconds is 584 million years, so a signed 64 bit time_t in milliseconds "just works" for over 250 million years. Rich Felker complained that multiplying or dividing by 1000 is an expensive operation (doesn't boil down to a binary power o 2 shift), but you've already got to divide by 60, 60, and 24 to get minutes, hours, and seconds...

Using nanoseconds for everything is not a good idea. A 32 bit number only holds 4.2 seconds of nanoseconds (or + or - 2.1 seconds if signed), so switching time_t to a 64 bit number of nanoseconds would only about double its range. (1<<32 seconds is just over 68 years, 1970+68 = 2038 when signed 32 bit time_t overflows. January 19 at 3:14 am, and 7 seconds.)

Splitting time_t into a structure with seperate "seconds" and "nanoseconds" fields is fiddly on two levels: keeping two fields in sync (check nanoseconds, then check seconds, then check nanoseconds again to see if it overflowed between the two and you're off by a second), _and_ the fact that you still need 64 bits to store seconds but nanoseconds never even uses the top 2 bits of a 32 bit field, but having the seconds and nanoseconds fields be two different types is really ugly, but guaranteed wasting of 4 bytes that _can't_ be used is silly, but if you don't a 12 byte structure's probably going to be padded anyway...

And computers can't accurately measure nanoseconds: A clock crystal that only lost a second every 5 years would be off by an average of over 6 nanoseconds per second, and that's _insanely_ accurate. Crystal oscillator accuracy is typically measured in parts per million, each of which is a thousand nanoseconds. A cheap 20ppm crystal is off by around a minute per month, which is fine for driving electronics. (The skew is less noticeable when the clock is 37khz, and does indeed produce that many pulses per second, and that's the common case: most crystals don't naturally physically vibrate millions of times per second, let alone billions. So to get the fast rates you multiply the clock up (double it and double it again), which means the 37000.4 clock pulses per second becomes multiple wrong clock pulses at the higer rate.

The easy way to double a clock signal is with a phase locked loop, a circuit with a capacitor and a transistor in a feedback loop that switches from "charging" to "discharging" and back when the charge goes over/under a threshold, so it naturally swings back and forth periodically (which is trivial to convert to a square wave of high/low output as it switches between charging and discharging modes). The speed it cycles at is naturally adjustable: more input current makes it cycle faster because the capacitor's charging faster, less current makes it cycle slower. If you feed in a reference input (add an existing wave to the input current charging the capacitor so it gets slightly stronger/weaker), it'll still switch back and forth more or less constantly, but the loop's output gradually syncs up with the input as long as it's in range, which smooths out a jittery input clock and gives it nice sharp edges.

Or the extra input signal to the PLL can just be quick pulses, to give the swing a periodic push, and it'll sync up its upswing with that too. So to double a clock signal, make an edge detector circuit that generates a pulse on _both_ the rising and falling edges of the input signal, and feed that into a phase locked loop. The result is a signal switching twice as fast, because it's got a rising edge on _each_ edge of the old input signal, and then a falling edge halfway in between each of those. Chain a few doublers in sequence and you can get it as fast as your transistors can switch. (And then divide it back down with "count 3 edges then pulse" adder-style logic.

But this also magnifies timing errors. Your 37khz clock that's actually producing 37000.4 edges per second becomes multiple wrong nanosecond clock ticks per second. (You're still only off by the same fraction of a percent, but it's a fraction of a percent of a lot more clock pulses.) Clock skew is ubiuitous: nno two clocks EVER agree, it's just a question of how much they differ by, and they basically have _tides_. You're ok if everything's driven by the same clock, but crossing "clock domains" (area where a different clock's driving stuff) they slide past each other and produce moire patterns and such.

Eventually, you'll sample the same bit twice or miss one. This is why every I/O device has clock skew detection and correction (generally by detecting the rising/falling edge of signals and measuring where to expect the next one from those edges. Of course you have to sample the signal much faster than you expect transitions in order to find the transitions, but as long as the signal transitions often enough it lets you keep in sync. And yes this is why everything has "framing" so you're never sending an endless stream of zeroes and lose track of how MANY zeroes have gone by, you are periodically _guaranteed_ a transition.).

Clock drift isn't even constant: when we were working to get nanosecond accurate timestamps for our syncrophasors at SEI, our boards' thermally stabilized reference clock (a part we special-ordered from germany, with the crystal in a metal box sitting on top of a little electric heater, to which we'd added half an inch of styrofoam insulation to keep the temperature as constant as possible and then put THAT in a case) would skew over 2 nanoseconds per second (for a couple minutes) if somebody across the room opened the door and generated an _imperceptible_ breeze. (We had a phase-locked loop constantly calculating the drift from GPS time and correcting. And GPS time is stable because the atomic clocks in the satellites are regularly updated from more accurate atomic clocks on the ground. In the past few years miniature atomic clocks have made it to market (based on laser cooling, first demonstrated in 2001), but they're $1500 each, 17 cubic centimeters, and use 125 milliwatts of power (thousands of times the power draw of the CMOS clock in a PC; not something you run off a coin cell battery for 5 years).

Sigh. Working on this timing SNTP stuff, I really miss working on the GPS timing stuff. SNTP should have just been milliseconds, it's good enough for what it tries to do. In toybox I have a millitime() function and use it for most times. (Yes another one of my sleep deprivation names. "It's millitime()". And struct reg* shoe; in grep.c is a discworld reference. I renamed struct fields *strawberry in ps.c already though.)

Rich Felker objected that storing everything in milliseconds would mean a division by 1000 to get seconds, and that's expensive. In 2019, that's considered expensive. Right...


January 16, 2019

Sign. No Rich, that's not how my relationship with Android works. I cannot "badger Android until they fix this nonsense".

I have limited traction and finite political capital. Leading them with a trail of breadcrumbs works best, which means I do work they might find useful and wait (often years) for them to start using it. And I can explain _why_ I want to go in a certain direction, and what I hope to achieve, and make as compelling an argument for that vision as I can.

But often, they've already made historical technical decisions that then become load-bearing for third party code, and you can't move the rug because somebody's standing on it. And their response is more or less "that might have been a nice way to go way back when, but we're over here now".

I'm trying to clean out the rest of the BSD code so that they're solidly using toybox, and making it so they can use as much of "defconfig" as possible. If the delta between android's deployment and toybox defconfig is minimized, then adding stuff to defconfig is most likely to add it to android. (This maximizes my traction/leverage. But it's _always_ gonna be finite, because they're way bigger than me.)

This means work on grep (--color), mkfs.vfat, and build stuff. The macos (and now FreeBSD) build genericization helps, as does the android hermetic build stuff. (Getting them closer to being able to use my build infrastructure, although they haven't got make and don't like arbitrary code running in their build.)

It's a bit like domesticating a feral cat. Offer food. Then offer food in the utility room. Except instead of a feral cat, one of the biggest companies in the world has a large team of full-time employees that's been doing this for 20 years now (The "Android One" came out in what, 2007?) which is constantly engaging with multiple large teams of phone vendor developers, collectively representing a many-multi-billion dollar industry that on such a vastly different scale they can't even _see_ me.

I can't even afford to work full time on this stuff. I'm doing what I can. You wanna post your concerns on the toybox list, go for it.


January 15, 2019

Sigh, $DAYJOB needs sntp, so let's do that for toybox...

Reading RFC 4330 (well a half-dozen RFCs, this has had a lot of versions and the new ones have added useless crap that's more complexity than help). Oh great, this protocol doesn't have a Y2038 problem, it has a Y2036 problem. They have a 64 bit timestamp: the bottom 32 bits of which is fraction of a second (meaning they devote 2 bits to recording FRACTIONS OF A NANOSECOND), leaving them 32 bits for seconds... starting from January 1 1900. For a protocol designed in the 1980's. So they ate 2/3 of the space before the protocol was _designed_. That's just stupid.

Anyway, the common workaround is if the high bit's _not_ set then it wrapped, which buys another 60 years or so. Still utterly insane to design the protocol that way.


January 14, 2019

Exhausted. Not sure I slept at all last night, just lay awake in bed. Is it possible to get jetlag without changing time zones?

Back at work: spent most of the day going through a month of missed email. They assigned a number of issues to me.

Back in my apartment, the manager was happy to see me and had a desk and a bed in storage, and says he'll replace the gas stove with electric (yay!). They should really put some solar panels on this building. (They don't just go on the roof, you can put them down the sides of tall buildings too, you don't even have to worry about sweeping the snow off of those.)

Poking at patch.c because I got reminded of todo items. Trying to add fuzz factor, which was easy enough (and my design for it's better) but... there's no tests/patch.test, and I don't seem to have patches that _require_ fuzz factor lying around.

I _used_ to just throw new commands through Aboriginal Linux and the LFS build, which was applying lots of patches. I suppose I could dig through the repo there and find where I adjusted them to eliminate fuzz factor. (Because even though I ported toybox patch to busybox over a decade ago, they still haven't added fuzz support to it. There's a lotta that going around, where things I was planning to do ages ago still aren't done in various projects, and it ranges from crickets to insistence that status quo is perfect and we've always been at war with eastasia. (People declared busybox "done" at the 1.0 release, which was before the majority of my contributions and long before you could use it in a build environment. Thing didn't happen therefore shouldn't happen is a failure of imagination. As Howard Aiken said long ago you don't need to worry about people stealing your ideas. Heck, I've been trying to get people to steal my ideas for a very long time, in a Tom Sawyer "paint the fence" way so I don't have to do it myself.


January 13, 2019

Flight back to Milwaukee. Sigh. Conflicted, but... this is the path of least resistance, and I know I can do it. (Neither Google nor the phone vendors will pay me to do Toybox or the android self-hosting stuff, nobody's interested in mkroot (hardly anybody was intersted in aboriginal even after I got it building LFS), and I can't afford to just do open source all the time. Gotta pay the mortgage. (I should really try to at least pay off that home equity loan this time.)

Got a hotel. It's $130/night, that's more per week than my old efficiency apartment here cost in a month. I should try to get that back in the morning. (They hadn't rented it out last I heard, and it's paid through the end of the month since I have to keep paying for it until they rent it out or 60 days goes by.)

I wrote up a thing about how patches work, because somebody on the list asked. I should collect and index those somehow, I suppose...


January 12, 2019

I committed a fix:

> Which is the "mode" of the symlink, except that mode says the filetype _is_ a
> symlink and you can't O_CREAT one of them so it's gonna get _really_ confused...
>
> Try now? (I added a test.)

Except that's inelegant (race condition between dirtree population and this stat, filesystem can change out from under us change?) and we're _supposed_ to feed dirtree the right flags so the initial stat() is following or not following the symlink appropriately. Why is it not doing that in this case... Hmmm...


January 11, 2019

Broke down and told chrome _not_ to restore state, just let it forget all those todo items. So now I have one window with only a dozen or so open tabs, which can restart itself without wasting half an hour fighting with it every time I open my laptop. I give it a week.

I should really pack my suitcase...


January 10, 2019

The battery on my laptop no longer holds ANY charge. Unplug it and it switches off instantly. Serious crimp in my "wander out somewhere and program for a bit at a quiet table" workflow. Even when I go somewhere with an outlet (which I now feel guilty about because I'm costing the place money, even if it's only a few cents), it loses all context going there and going back. Complete reboot each time.

And convincing chrome NOT to reload 8 windows with 200 tabs each in them (maintain the todo item links but leave the tabs in "failed to load" state rather than trying to allocate 30 gigabytes of RAM and max out my phone tether for 2 hours) is a huge pain. Doing "pkill -f renderer" USED to work but now SOMETIMES works, sometimes causes tabs to hang (still display fine but I can't scroll DOWN and it won't load new contents in that tab, but I can cut and paste the URL to a new tab that WILL load it so the URL is retained which is all I really wanted), and sometimes randomly crashes the whole browser process. Even pointing /etc/resolv.conf at 127.0.0.1 while chrome starts up to force the resolve to fail no longer prevents the reloads, these days it just _delays_ its load; it tries to reload periodically and once it can reloads everything.

They keep "upgrading" chrome to make it a worse fit for my needs, and of course I can't stick with old versions because "security". (You can sing "cloud rot" to the tune of Love Shack.)


January 9, 2019

Looming return to milwaukee, starting to get paralyzed. Fade flies out tomorrow, although essentially it's tonight so early in the morning (she and Adverb are visiting family in California before heading back to minneapolis for the spring semester, both her sisters live there and I think more of her family is flying in for a reunion?)

I should get a plane ticket, but the TSA and air traffic controllers miss their first paycheck on Friday. Bit reluctant to fly with air traffic controllers considered "nonessential"... (Bit reluctant to _eat_ with FDA inspection considered nonessential.)


January 8, 2019

Visited the eye doctor for my 6 month follow-up. Not obviously going blind! Yay!

Eyes dilated, not a lot of programming today.


January 7, 2019

Wandering back to an open tab in which I have:

$ truncate -s $((512*68)) test.img && mkfs.vfat test.img && dd if=/dev/zero of=test.img seek=$((0x3e)) bs=1 count=448 && hexdump -C test.img

Which at the _time_ was the smallest filesystem mkdosfs would create. (The dd blanks some stuff that varies gratuitously between runs so I can diff two of them and see what changed when I resize the filesystem.)

But now I'm running a newer dosfstools version and it's saying that 512*100 is the smallest viable filesystem. And THAT is clearly arbitrary. Sigh, I should look up the kernel code for this and see what the actual driver says.


January 6, 2019

Rebuilt mkroot with linux-4.20 (after rebuilding the musl-cross-make toolchains with current musl). The s390x kernel wants sha256sum now.

Sigh. Throw another binary in the PENDING list of the airlock install in toybox/scripts/install.sh. (It's in the roadmap.)


January 5, 2019

Attempting to install devuan on the giant new laptop, because the ubuntu they stuck on it has systemd and it's possible I'd use a BSD first. Devuan is basically a debian fork retaining the original init system and with a really stupid over-engineered nigh-unmaintainable mirror overlay system written in python. (I have no idea why they did that last part, and hope it's merely a transitional problem.)

The System76 bios is "black screen with no output" until their ubuntu boots, which is kinda annoying. I guessed "reboot several times and hit escape and alt-f2 and so on a lot during said blackness" and eventually got a bios screen that let me boot from a USB stick.

Devuan's installer is really _sad_ compared to Ubuntu. What Ubuntu did was boot to a live CD, then run a gui app. That's basically copying the cutting edge knoppix technology from 2003 (which is 15 years ago now), and they've been doing it since... 2004 I think?

Devuan started with a menu of multiple install options (I have no clue here and cannot make an informed decision, STOP ASKING ME FOR INFORMATION I DO NOT HAVE YET), but all of them seem to go to a fullscreen installer with a font that's way too small for comfort, and no way to change it. Ok, soldiering on: it's freaking out that I used unetbootin to create the USB boot stick, promising a plague of locusts and possibly frogs if I continue. But it doesn't say how I SHOULD have created it, and it seems to be working fine, so I ignored it and continued.

It's refusing to provide binary firmware for the wireless card (iwlwifi-8265) because Freedom Freedom Blue Facepaint Mel Gibson. If a manufacturer was too cheap to put a ROM in their hardware and they expect the driver to load the equivalent data into SRAM, debian sits down in the mud and sulks. Great.

I think I've found where to get the firmware from debian, but "devuan ascii" isn't clearly mirroring any specific debian distro? (The previous ones were, the newest one... isn't.) The instructions say to put it in a "/firmware" directory on the USB stick, which seems separate from _booting_ from the USB stick...) All the devuan ascii docs say that all necessary firmware is bundled. Hmmm...

Ok, downloading the 4+ gigabyte "DVD" version of the devuan installer (for a complete offline install) to make a new USB stick from, and I should try to fish the firmware files out of the system76 ubuntu install before wiping it. (There's a certain amount of "should I use the 2 gb hard drive of the 1gb flash drive" for this install, I left the flash disk in because it's already there and I don't ever intend to use systemd ubuntu.)

This has already eaten all the time I allocated to poke at this.


January 3, 2019

Three days of rain and I've gotten nothing done. Barely left the house. I'm not recovered enough from seasonal affective disorder yet for the gloom outside not to put me in hibernation mode.

I was ok moving up to milwaukee in January from Austin, that was a discontiguous break and my internal clock did not adjust. But staying in milwaukee for 3 months while the days got shorter, _that_ screwed me up.

Partly it's that the sun coming up reliably knocks me out, because college. The last couple years at Rutgers were primiarly night courses due to governor Witless destroying the comp-sci program with stupid budget cuts so they lost _all_ their full-time faculty (including the head of the department; if you're denied tenure you _can't_stay_ past 5 years and they blanket denied tenure to everybody, and comp-sci had only peeled off of the physics department to become its own thing 4 years before the budget cuts...). This was the #2 most popular major on campus after "undecided" and everything had to be taught by adjuncts after their day jobs, and now you _couldn't_ complete it without lots of night classes. So I'd get home long after sunset and do more programming, then the sun would come up and I'd go "oh, didn't realize the time" and go to bed. (Which was fine if I didn't have to catch a bus to go back to class until 3pm or so.)

Now the sun coming up knocks me out. Being awake at night is fine... until the sun comes up. When my alarm's set at 6:30 am and the sun comes up over an hour later, getting up in the morning is a _problem_. And that sort of anchors the rest of it...)


January 2, 2019

Did a little research for the multicast doc in the ipv4 cleanup stuff.

Multicast failed to take off because improved compression schemes (like mp3 and mp4) greatly restricted storage and bandwith requirements of media while rendering partial delivery of data useless, and due to the widespread deployment of broadband internet via cable modem and DSL. The decline of multicast started in 1999 when Napster provided a proof of concept that distributing MP3 files via unicast could scale. RealAudio quickly lost market share to unicast media delivery solutions. These days Youtube, Netflix, Hulu, and Amazon Prime all use unicast distribution.

The decline started 20 years ago and the multicat mbone (which this address range was reserved for) essentially ceased operations about 15 years ago. The last signs of life I can find are from about 2003.

Multicast was never widely used, the range was allocated for growth that did not occur, and remaining users are treating it as a LAN protocol which could use any other LAN-local address range their routers were programmed to accept. Note also that LAN-local multicast was conserving bandwidth on 10baseT local area networks, and we have widely deployed cheap gigabit ethernet now (with 10gigE available for those who want to spend money).

Reserving 268 million IPv4 addresses for multicast, in 2019, is obviously a complete waste. We can put them back in the main pool.


Back to 2018