Rob's Blog (rss feed) (mastodon)

2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002

June 7, 2024

Stuff's a bit chopped up since I'm straddling two laptops. Still blogging from the old one, and the old one has the reasonable battery (I should order another battery) so I can't take the new one out to random coffee shop yet but only use it plugged in at the desk. So I'm back on the old machine, blogging about what I did on the new machine, based on a notes.txt file I scp'd over to the old machine.

Package dependencies remain out of control: for some reason "apt-get install git" wanted "libperl-error" which is just sad. I'm vaguely annoyed that build-essential installed fakeroot and three *-perl packages and so on, but that's the cost of using a meta-package somebody else curates. (Saying "the following additional packages will be installed" and then "the following NEW packages will be installed" with the only difference being the second list includes the package I requested... that seems non-optimal, especially when the list is 37 packages long).

The new debain toolchain is hallucinating a warning when I build toybox with it, toys/posix/grep.c:211:24: warning: 'regexec0' accessing 8 bytes in a region of size 4 [-Wstringop-overflow=] and futher note: referencing argument 5 of type 'regmatch_t[0]'. This warning is wrong in multiple ways.

First this is not NEW code. That part's been run under ASAN a lot without complaint, and no other toolchain produces this warning: not llvm, not gcc, and musl-cross-make has been building the same gcc 12.0 version which does NOT produce the warning. Something debian locally patched into its "gcc 12.0-14" is producing a warning that vanilla gcc does not produce. That makes me a bit suspicious to begin with.

I inspected the code anyway, and argument 5 of the call to regexec0() in do_grep() is an 8 byte pointer to a 16 byte structure. There's no "region of size 4" to be found. The argument &shoe->m is a pointer to an entry of type regmatch_t (yes, Reg Shoe is a discworld reference), and that struct contains two entries of regoff_t which is ssize_t which is long, thus 16 bytes on a 64-bit system. Even on a 32 bit system, the two of them would still add up to 8 bytes. The structure is allocated to its full size. There's nothing wrong with the code that I've been able to spot.

I _think_ what might be happening is shoe->m lives in "shoe" which is most recently assigned to in the enclosing for() loop via shoe = (void *)TT.reg; and TT.reg in the GLOBALS() block is declared as struct double_list *reg; because at that level we only care that it's a doubly linked list, not what members each list entry has in the command-local "struct reg". Except even THAT theory is funky because double_list has three pointers: next, prev, and data, each of which is 8 bytes on a 64-bit system: where is it getting size 4? If it was comparing sizeof(*TT.reg) with sizeof(*shoe) then shoe->m starts off the end of the smaller struct. If the compiler can't keep the types straight then it's not a size 4 issue, it would be an out of bounds access.

The type of the "shoe" pointer is "struct reg", which has 5 members. The argument it's complaining about is a pointer to the 5th member, which is indeed a regmatch_t. (And the error is SAYING it's a regmatch_t, which is neither 4 nor 8 bytes long, it's 16. Neither the pointer, not the struct, nor any member OF that struct, match the constraint it's insisting was violated.)

The only place there's a member of size 4 is "int rc", the third member of struct reg. And struct double_list only HAS 3 members, and "m" is the last member struct reg, so maybe somehow the compiler is confusing (struct reg *)shoe->m with (struct reg *)shoe->rc because (struct double_list *)TT.reg only has 3 members? The last member of struct reg is the 5th member, the last member of struct double_list is the 3rd member, and the 3rd member of reg is 4 bytes. (Of course the typecast multiple lines previously saying "this is not actually a pointer to that other type, they have nothing to do with each other" means it would have to bounce off an irrelevant historical type AND specially care about "last member" to get it wrong in this specific way.)

Dunno. It really seems like a broken warning. I dunno how to squelch it. There is no region of size 4 involved in any way with the 5th argument. In fact shoe->rc isn't used as an argument, the return value is assigned to it, no pointers involved there, it's an integer assignment so the type autoconverts. Maybe if I change the prototype of regexec0() in lib/lib.h so its 5th argument says regmatch_t *pmatch instead of regmatch_t pmatch[] it'll shut up? (It's the same thing! Magic tweak to avoid triggering someone else's bug, and that's IF it works. I'm on the wrong laptop to check...)

The new debian toolchain also broke gcc/glibc ASAN, complaining (at runtime) "ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD." Which is that library ordering nonsense back to rear its ugly head again and I refuse to humor these INSANE ASSHOLES. If LLVM/bionic works without this, then it's NOT REQUIRED, they're just really bad at it. Notice how the error message doesn't even say which library to LD_PRELOAD if I _did_ want to fix it, it just refuses to work where the previous version worked. A clear regression. Which I'm late enough in reporting it's a fait accompli, and I'm in the wrong for not noticing their fuck-up in a timely manner. Far too late to start making a fuss about it now. (I think I have the hitchhiker's quote for next release.)

Is a required library not installed? I used "build-essential" instead of manually installing gcc and make precisely so it would scoop up that kind of nonsense... And it's complaining about library ORDERING, which is not supposed to be a thing when dynamic linking.

And THEN, of course, the 6.10-rc2 kernel broke my libelf removal patch (really it's a patch to allow x86-64 to use the frame pointer unwinder like EVERY OTHER ARCHITECTURE CAN). But now kconfig is saying there's a circular dependency with HAVE_OBJTOOL being selected. I do not understand what it's complaining about, but this is an error not a warning so the build refuses to build.

Red queen's race. Running to stay in place...

June 6, 2024

Found a cat 5 cable to run between the two laptops and rsync over a zillion files. It's going slowly, but can happen in the background while I type.

Several days behind on email, but nothing time critical has showed up in the toybox list archive (I saw the vi thing and mean to reply to it, but pending can wait) so I didn't feel pressed for time. It would be nice if I could similarly check microsoft github but going to and clicking the notifications bell hits a login paywall, and my phone does not get any credentials I would be annoyed losing. (It's got mastodon, and the household slack discord where we send each other cat memes and dinner requests and lunch reminders and add things to the grocery list. But even the few youtube videos I posted were NOT on the youtube account the phone auto-generated to track my likes and watch history because no.)

The house STILL hasn't quite closed. Truthiness bank is taking its time to get the payoff figures to the escrow people. (Who are apparently near Lancre, so the mail coach takes a while.)

June 5, 2024

I should collect various random things I write and do proper documentation with an index of them. I have a bunch of emails I've sent people privately with similar rants, but have to trawl through my outbox to find them and I just don't bother.

Once upon a time I was thinking that sort of thing would be youtube videos, but I'm too much of a perfectionist, prudetube is too much of a pain, peertube is a its own checklist of todo items...

Spent far too long getting a new laptop set up. Step 1, dug up my "good spare" laptop, which requires some explanation. When the Dell e6230 was cheap at end of life I bought multiples, and in THEORY still have 2 spares (not counting the one that died 6 months ago, its stripped corpse is in the storage cube in texas). I'm typing on the one with Xeno's tab closure ongoing and the Devuan install that went out of support on Friday. Waiting for THIS laptop to be available for power cycling (let alone reinstall) has already delayed getting a new Devuan Dementia system up (I'm skipping Devuan Cholera) until after Devuan Bronchitis got end of lifed at the start of the month. So set up the new laptop up in parallel, migrate over as I close stuff on the old one, meanwhile do as much as I can on the new one.

Of the 2 spares wrapped in bubble wrap on the bookshelf, one has Fade's stickers all over it because it was her primary (windows) machine for a while between working macs, and the other I bought AS a spare and have never used for anything. It turns out the TSA cut both of those open at some point (presumably when I was forced to check my carry-on earlier this year), and put a BIG SCRATCH along the case of the "designated spare", and chipped a chunk out of the plastic battery case. Which may relate to that battery refusing to charge (rapid orange blink). I stole the battery out of Fade's old laptop to get it working, but that's the 3 cell battery instead of 6 cell, and the bios diagnostics say it's end of life anyway (42% capacity at full charge), but it gets something booted. (I'm actually reluctant to buy MORE of these laptops due to the x86-64-v2 architecture generation shenanigans IP money is imposing on Linux via IBM Hat and friends. They work great for me, but the "oh noes, hardware that is out of patent cannot capitalism profitably" assholes are forcing everybody except Debian to drop support for it, which is just sad. I should probably get another battery if I can, though.)

The 16 gig memory sticks worked fine, and although the Devuan Diptheria USB stuck couldn't run its x86-64 memory test for some reason, the bios buit-in memory test happily chewed on them for half an hour and gave it the thumbs up. (Relieved: they WERE in my backpack for a couple months, although in a folded over anti-static bag tucked secure-ishly in a pocket.)

The new hard drive did NOT want to work, for HOURS, until I eventually figured out that yes, that little bridge connecty piece on the end of the old hard drive comes off and needs to be transferred to the new hard drive so the pins plug "down" instead of "out". I took apart and reassembled the bottom of the laptop like 8 times before I figured that out, but still had all the screws afterward. (And an extra. I don't think it went to this laptop. I think it might have held the GPS hat on the Turtle Board?)

Got Devuan installed! Got it talking to the guest wifi to download updates (through an unencrypted connection but eh). The Devuan Dropsy installer complained that its permissions were wrong (but ran anyway), and then apt-get update complained that the name of some of the repositories had changed from "testing" to "production" or some such, and I'm PRETTY sure I flashed the current installer ISO image. (If the devuan website had the wrong version up for download last week, that's on them. I'm not using the Ex-lax release with the /usr merge, I'm using the one before that. And I _think_ "D" has the same support horizon as "E"? Sigh, I don't want to have to care about this part...

Seems like there's less to tweak in xfce this time. By default "vi" is happy cursoring around in insert mode, and even has syntax highlighting enabled. Tentatively optimistic?

Installed git and build-essential, cloned toybox and musl-cross-make, set grinding away forever.

Built toybox: there's a suprious new warning (in grep.c) that isn't there in a vanilla build of the same gcc version (from musl-cross-make), looks like debian patched it in to their version...? Or it only shows up with glibc...? Odd. Doesn't seem to be hurting anything, but the issue it's complaining about also doesn't seem to be POSSIBLE...

Cloned qemu. It wanted mostly the same stuff as last time, except this time it built without pixman. Also building forever.

June 4, 2024

Finally made it out to best buy to purchase a new 2 gig laptop SSD. Even they don't _really_ carry 2.5" sata drives anymore (none on display on the shop floor), but they had a few left in a lockable closet the salesbeing showed me to when I asked. Only one left of the type I needed, but I bought it. (One problem is actually the disk _thickness_: the Dell e6230 has a weird little mounting bracket thing that assumes the drive is 2.5mm thick, and some disks are thicker than that. The had a 4 gig one if I could use a 2.8mm thick disk, which I _might_ be able to use but couldn't confirm while I was there.)

This involved taking the green line (train) to the A line (bus) to wander around in the Rosedale Mall (which is enormous and expensive, but alive) except the best buy isn't _in_ the mall, it's across a street from the mall. If you are unfamiliar with the area, "in which direction" can be KIND OF A THING. (Across WHICH street?) My phone map showing me and the destination is less useful when I dunno which direction I'm pointing in and it's showing a giant parking lot without intelligible building outlines.

The entire process took 6 hours and a certain amount of sunburn. And 18k steps according to my phone's step counter. And of course I got on the relevant bus or train going in the wrong direction three of the four times. And overshooting a transfer and decided walking back 10 blocks was faster than waiting for a bus goinng in the other direction. (Probably wrong.) But it beats trying to order a storage product from scamazon.

Over the years I've bought 4 of these laptops (they were cheap and clearanced when Dell stopped making them), and I had 2 spares left on the shelf wrapped in bubble wrap (which on closer inspection say they've been cut open and "inspected by TSA", but might still work) so I decided that I should set up the new OS on the new hard drive (with the big ram) in a new laptop, and migrate over to it as feasible. Having two laptops in play is manageable if I can rsync the "old" one's home directory into /home/old/2024 on the new laptop (to go with /home/old/201{1,2,3,5,7,8} already there, yeah the pandemic kinda left me underclocked). The important thing is only one machine at a time having a role like "downloads new email", and each directory having a canonical source that other copies are just read-only mirrors of.

And that means I get the new test environment up early, so I can tackle that unshare issue I need the new OS release to reproduce.

June 3, 2024

Running a shell when stdin/stdout/stderr aren't provided (and thus the filehandles start closed) is awkward: everything I open starts there and needs to get bumped up.

I fixed the script filehandle itself (which was a todo item anyway: fd3 was leaking into child processes and needed to be CLOEXEC, bumping it to a "high" filehandle is just hygiene). It's a little like the way ELF executables get opened (and stay open) but don't populate a filehandle in the process's fd space. Which is annoying because I _want_ that fd so I can re-exec self after vfork(), but don't have an obvious way to GET it. (It SHOULD be getauxval(AT_EXECFD) but, in my testing, isn't. I don't know if this is the dynamic linker closing it when it's done with it, or if it's never passed to static binaries at all, or if it's not in the kernel auxiliary vector and gets populated by the dynamic linker (meaning it's not there when you're statically linked)...

How do you use the VDSO from static binaries, anyway? "man 7 vdso" doesn't include the word "static". (There's static pie, you can have minor linking plumbing in static binaries when necessary.)

June 2, 2024

Got a feature request which seems MOSTLY straightforward, modulo some aesthetic display decisions and graceful handling of small window sizes (and doing the SIGWINCH nonsense to check if the display size changed).

Ok, backing up: the copyright on count.c in toybox says 2002 because that's approximately when I submitted it to busybox, which immediately resulted in the suggestion of a progress bar, and then the list started bikeshedding (which moved to IRC if I recall), and a month or so later the result was called pipe_progress.

I'd been using "count" locally for a couple years already by that point: I initially needed it when duplicating hard drives on a machine with 4 IDE controllers, which could thus handle 8 disks (each controller had a master/slave slot), so one Linux boot disk, one data disk, and 6 blank disks it copied with "cat /dev/hdb | tee /dev/hd{c,d,e,f,g} > /dev/hdh" and the problem was that hung for an hour and then suddenly exited, so I whipped up a quick and dirty progress indicator I could insert in the pipeline between cat and tee. I know I did that at boxxtech which my resume says was in 2000. I _think_ I'd done similar things previously and had the trick lying around from before, although I rewrote the command each time I needed it rather than keeping one around because it's hello world with a for loop, thus my oldest copy being 2002.

Anyway, since then "pv" has showed up, and over the years pv has grown a bunch of bells and whistles I just triaged and threw out 2/3 of in that github thread. And then what's left had some collisions because count -l is already used for long output, but -r to ratelimit to SIZE/second, -s to expect SIZE for a progress indicator, -S to stop at a provided SIZE, -L to count lines instead of bytes, -0 to count NULs instead of bytes, and -q to suppress output (for use with -r and/or -S) might be of some use?

So how do you do completion time estimation for non-constant transfer speeds? I have a current speed estimate (which is based on trailing speed over the past 16 seconds or so), and can easily do a "time elapsed since start divided by total transferred so far", and then I... what, average them?

There's two ways to do ratelimiting: smoothing and capping. Do you never go over in a given second, or do you let it "catch up" a bit? The second is easy for me to do with the trailing average I'm already calculating. The first means if there are regular-ish dropouts, say wifi packet loss, your average is going to be low. (Yes, this is conceptually adjacent to bufferbloat.) Sigh, maybe I need -r and -R to offer both but the hard part would be tersely yet intelligibly describing the difference in the help text.

June 1, 2024

The turtle board init "no output" problem had multiple layers: The test code I added to PID 1 didn't work because the mknod("/blah", 0777, dev_makedev(5, 1)) I did to open a filehandle for the test dprintf()s needs S_IFCHR|0777 instead if I want a char device. (Oops, I knew that.) That confirms the kernel is launching init and the toybox shell is running and parsing it.

The reason my "mount -t devtmpfs /mnt; echo hello > /mnt/console" test added to the shell script didn't work is the first couple filehandles the shell is opening are inappropriately becoming stdin and stdout (using the first available fd numbers), and my code attempting to deal with those collisions isn't doing it right. (Sigh. I thought I had all this working at one point, but I stopped regularly regression testing the shell with stdin/stdout/stderr starting uninitialized when I added my devtmpfs automount patch, and have since then switched off oneit that was redoing it in C to switch consoles...)

The first problem is the shell script _itself_ is opened as the first available file descriptor (fd 0, I.E. stdin, points to the file "/init" opened for reading), and if a redirect closes that out from under it the script ends immediately. (Or hangs trying to read commands from the new fd0 with no prompt.) So the script wasn't running to the end when it tried to redirect stdin. I need to bump the script to a "high" filehandle and make it CLOEXEC. (This also fixes a "child processes are inheriting an inappropriate fd #3" problem I was seeing, but NOT when run interactively. Because of course.)

The second problem was /mnt/console was being opened as fd #1, and the "redirect fd1 to fd1" code is returning immediately as a NOP (I had a test for that), and then the caller is closing the file descriptor it opened (fd1) because it expects the redirect to have already dup'd it. More to the point, it expects the redirect plumbing to have dup()'d the OLD filehandle at that location (if any) to a high filehandle and made a note to move it back down when untangling the redirects later. (All this happens in the parent process to be nommu friendly, a vfork()ed child needs to exec() or _exit() pronto without doing elaborate operations that might fail and not be able to REPORT their failure anywhere sanely, or at least without a really awkward pipe. So redirects move filehandles out of the way and then move them BACK a lot, the special "exec" redirect behavior just discards the undo list. :)

So redirecting something to itself should NOT be a NOP, but the lifetime rules are tricksy here: "echo hello > /mnt/console" needs the deferred close (it shouldn't permanently make /mnt/console stdout for all future commands just because we didn't previously have one) but "echo hello 1>&1" still does NOT close because otherwise there's no stdout afterwards when there was one before. I'm not sure if the lifetime rules for save_redirect() were wrong or if I have to detect this in the CALLER...

This probably worked without the devtmpfs_mount kernel patch because it used to run a C program at the end of the script which redid the console redirects, so "no output" wasn't a problem as long as script ran to the end. This commit replaced the C program with shell redirects, and I need to fix up my shell to make this work when I'm having filehandle collisions all over the place. The easy thing for Glaubitz to do in the short term is use my CONFIG_DEVTMPFS_MOUNT kernel patch (which he doesn't want to). The slightly less easy thing to do is use an older toybox/mkroot userspace version until I can fix up the shell design to juggle the collisions properly...

May 30, 2024

End of the month, zillion todo item deadlines. I STILL haven't gotten a haircut, to give a hint about the level of "huddled into a ball" I'm doing.

Still need to move my email from gmail to dreamhost before they drop POP3 support. Still need to close the rest of the tabs so I can update my laptop and reinstall the 16 gigs ram. Still buy a new hard drive to install on to. Still need to get the orange pi 3b boards set up with a vanilla kernel. Still need to add the RiscV kernel config to mkroot. Still need to track down the 401k that Vanguard handed off to some random other company.

I had not yet contacted Wells Fargo about the bank account type changing, but Fade did and it turns out to be benign. (The new account type does NOT charge us a monthly fee, they retired the old account type and we meet the minimums for not getting nickel and dimed for existing in the new type.)

The house sale closes next week. A Notary Public is coming to meet us on sunday and watch us sign paperwork.

May 29, 2024

Sigh, that chmod invocation doesn't work because it wasn't _tested_. There's a "a=r+w+x" test, and an "a-w,a+x" test, but not "u+x-w".

In THEORY it should work. The code in string_to_mode() in lib/lib.c does:

    char *whos = "ogua", *hows = "=+-", *whats = "xwrstX", *whys = "ogu",
    // Repeated "hows" are allowed; something like "a=r+w+s" is valid.
    for (;;) {
      if (-1 == stridx(hows, dohow = *str)) goto barf;
      while (*++str && (s = strchr(whats, *str))) dowhat |= 1<<(s-whats);

And the loop after that comment is so you can circle back around to multiple r+w+s, above. (The outer loop after "gaze into the bin of permission" is returned to after a comma. That comment is from 19 minutes into this (and again at 23:20) which is probably a deeply obscure reference these days...) Sticking a call in hello.c, the first example returns 0, the second returns 644. Smells like a missing flush? Ah, yes: needs a dowhat = 0; at the start of that loop, seems easy enough.

My recent attempt at adding the "trap" builtin to toysh was biting off too much in one go. There's actually two passes I should separate: adding the trap logic and making do_source() non-recursive.

Instead of do_source() recursively calling run_lines(), it should add a transparent function call context to TT.ff and return, letting the caller fall back to run_lines(). Either run_lines() or the read loop in sh_main() (calling get_next_line(), parse_line(), and run_lines()) should pop function contexts as appropriate. This means the FILE * needs to be stored in the sh_func struct so end_fcall() can close it. It ALSO means that run_lines() should pop any function context with a null FILE *, and return at the end of any function context with a set FILE *.

The trap_main() logic sets generic_signal() for the intercepted signals, which sets toys.signal and returns so run_lines() can check at the start of the loop and do_source(0, fmemopen(traps[trapno])) thus avoiding asynchronous locking issues. The question is whether it should be toys.signal or toys.signalfd: the first can drop signals if multiple ones come in quickly, the second adds a spurious syscall each time through the loop (whether select() or a nonblocking read(), either makes strace noisy whether or not there's a significant performance impact). Yeah I can split the difference and whip up my own signal handler that writes to a small array (ring buffer?) but that just gives us a collision BUDGET before we drop more unprocessed signals. Is that a good fix?

Also, how SHOULD you handle a screaming interrupt? Naturally dropping overlapping signals is poor man's rate limiting, and the existing recursion counter probably still applies (just now to the TT.ff stack length). Or I could have the function return re-enable the trap, but there's no obvious place to record that? (Um, check top of stack to see if the pointer to the string it's running is the one you'd add? No, because we have to strdup() it because lifetime, not that resetting trap handlers from a trap handler is a sane thing to do but they COULD do it. Now I want reference counted strings, except I don't want to go there.)

Changing the do_source() plumbing to be non-recursive should probably happen first, because that _should_ be a NOP to the outside world, resulting in the same externally observable behavior. If we passed the existing test suite, that would be easier to confirm.

May 28, 2024

So shell traps basically insert a function call into the control flow. We can't execute arbitrary code from signal handler context, but we've got a function call stack and the main execution loop in run_lines() starts by looking at TT.ff->pl meaning if we insert a new function call it should just run it. Of course the signal handler itself shouldn't mess with that because all sorts of other code is reaching out and touching that and a mismatch between TT.ff->pl and TT.ff->blk or similar would be bad. There's no locking or attempts to order/localize the data access, so don't futz with it asynchronously.

But in lib/lib.c we've got generic_signal() which sets toys.signal (or writes to toys.signalfd), so in theory what I need to do is interrupt the current command that's running (easy if it's a child process, trickier if it's a shell builtin but I do have siglongjmp there) and then have the command loop notice there was a signal.

When I say "basically" a function call, it's not a parsed list of pipeline structures, it's actually a source string snippet, basically do_source(0, fmemopen(trapstring, strlen(trapstring), "r")). Which again, shouldn't _recurse_.

Two things here: 1) I need to strdup() that trapstring because it could otherwise get freed out from under us if somebody resets the trap handler before we return (no I am not reference counting it), which means it needs to get freed. Luckily, sh_fcall already has a "delete" member for when the function call context gets popped. 2) I need to fclose(fp) on that fmemopen(), and THAT I don't currently have plumbing for. (It's handled by the recursive do_source() call that shouldn't recurse.)


May 27, 2024

This is a thing I needed to know today:

$ trap 'echo hello; return 3; echo after' SIGINT
$ ^Chello
bash: return: can only `return' from a function or sourced script
$ echo $?

A trap handler does NOT count as a function context. So what happens if we're already IN a function context?

$ x() { read i; echo then; }; x
$ echo $?

Which is weird because the "then" didn't get printed. No, I am not asking chet. (Maybe after updating my laptop so the bash version I'm running isn't quite so stale...)

Ah, duh:

$ x() { read i; echo then; }; x; echo potato

The return jumps to the end of the _function_. Of course it does.

May 26, 2024

The job market is being weird.

I don't strictly NEED a new contract right now (selling the house should make over half our monthly expenses go away), and Fade just graduated and SHE is job hunting for something that will use her shiny new doctorate, which in theory could pay the bills without me. (Heck, health insurance through an employer would make rather a lot of the remaining monthly bills go away.) But I should really save for retirement and keep my hand in and so on. There's a big difference between "comfortable for now" and "ready to retire".

Plus an externally imposed schedule is really nice. At the moment I'm kind of being a house husband for Fade, doing the dishes and cooking and such, but until she's got a new job with a commute to an office (she LIKES working in an office, although it's generally been teaching in classrooms, faculty office hours, and dissertation writing) that's not really providing ME with an externally imposed schedule. (How do I do avoidance productivity without something to avoid? "All the work while crying" requires a deadline.)

So I've been telling the usual cloud of recruiters that periodically call me "yes I'm interested" (which isn't the same as actively looking), and... it's weird. It's like the dot-com bust or the mortgage crisis out there, which it WASN'T either during or immediately after the pandemic. It set in sometime after Google laid off those 12k people last year, programming's having an employment crash that has to work its way through once enough idiotic managers have gone bankrupt and been buried. And given how long companies with large ablative piles of cash can continue to push Itanium, Windows Vista, MetaVerse... that can take a while.

Capitalism is naturally unstable, oscillating through boom and bust cycles. For 50 years the USA didn't have to deal with this because during the Great Depression FDR implemented a bunch of New Deal regulations that stabilized things for two generations, until Ronald Reagan did the Fish Filter Fallacy and tore it down so he could fire the air traffic controllers, drop the top tax rate from 70% to 28%, run up the national debt, create a new oligarch class, etc. Since then we've had the Savings and Loan Crisis (1991), the dot-com crash (2001) the mortgage crisis (2008), the pandemic (2019), and now wave after wave of 5 figure layoffs from the Tech Titans.

The fish filter fallacy is also known as Chesterton's Fence, although to me that one has a different emphasis. It's not "don't rip stuff out", it's "research how it wound up like that before breaking the seals", which is part of the reason I spend so much time digging into computer history. Of course you can't understand how we got here and still be a libertarian, so there's some "roll to disbelieve" self-selection going on here. You can't win the lottery if you don't play, and you don't play if you know the odds, so lottery winners tend to have at _best_ wilful blinders on. And have daddy buy them lots and lots of tickets with family money. And of course wilful blindness to "externalities", which is economist-speak for smashing a thousand dollar window to steal a hundred dollar stereo.

This time the flavor of the month of blindfolded libertarian smashy-smashy isn't union busting or outsourcing to India or China, it's even dumber than that. The Dunning-Kruger nature of Large Language Models makes the answer to any question you ask the Mansplaining Engine about sound plausible when you don't know anything about the topic, but the answer to any question you DO already know something about is chock full of obvious bullshit. Thus to the managerial class, clearly their OWN skills are unique and unable to be replicated, but they can lay off everyone ELSE and replace them with LLMs because this thing's output sounds just like what their reports tell them. (Clay Shirky once noted he was "getting paid to save management from the distasteful act of listening to their own employees", because to a certain class of parasite skilled people are fungible and you don't want to get fungibility on you by "getting your hands dirty".)

Which means the job market is currently digesting tens of thousands of laid off engineers per month from people emulating Musk's twitter and tesla tantrums (in the mold of neutron jack who trashed GE), plus the NFT grifters pivoting to LLMs convinced silicon valley's money people what the Hot New Thing was, and the usual Carly Fiorina/Mitt Romney pump-and-dump financial shenanigans where your bank account looks GREAT when you haven't paid the rent or electricity bill in three months, and it's combined into yet another mushroom cloud of capitalism that will presumably have a name five years from now the way hurricanes do, unless it's lost in the noise like Enron was.

Some ex-engineers are semi-retiring and/or transitioning to other industries (becoming an apartment manager or barista qualifies as both), and there's the usual chunk going the way of the Minesweeper Certified Solitaire Experts of yore finding other fungible white collar paychecks when the Hot New Thing stops being hot, new, or a thing. I have no idea how all this is impacting what students decide to major in, but "individuals can no longer put an app on the Google Play store under their own power" cuts off new hobbyists at the knees. (In the long run shrinking the labor pool like that probably means anybody wanting to hire a programmer 5-10 years from now won't enjoy it, but nobody in the Fortune 500 looks beyond Q4 anymore.)

I've always done programming because I find it fun, and I admit I haven't really been finding it fun recently. There's a certain amount of Sam Vimes "do the job that's in front of you", but I have trouble navigating through the fog when I've lost faith in other people's visions of the future to use as islands to build upon.

May 24, 2024

I finally figured out how to reproduce the turtle boot failure under qemu, it's just:

mkroot/ BUILTIN=1 CROSS=sh4eb LINUX=~/linux

Point LINUX= at a clean 6.8.0 tree WITHOUT my patches and tell the sh4 target to statically link the root filesystem, and the problem reproduces. You don't even have to force the nommu codepaths (which would be PENDING=TOYBOX_FORCE_NOMMU added to the above command line, by the way).

This launches PID 1 with stdin/stdout/stderr closed, which is what's triggering the issue. (The external initrd=initramfs.cpio.gz provided to the bootloader as a separate file and linked together at runtime instead of compile time does NOT manifest this issue, because the kernel runs this crap before loading the external initramfs and MAGICALLY CHANGES THE BEHAVIOR between static and dynamic link codepaths.)

The above mkroot invocation does a clean rebuild each time (because that build script is simple and dependency tracking is not), which makes the compile/edit/test cycle kinda slow, so you can copy the .config file to re-run the incremental kernel build yourself, ala:

cd ~/linux
cp ~/toybox/root/sh4eb/docs/linux-fullconfig .config
make ARCH=sh CROSS_COMPILE=~/path/to/sh4eb-linux-musl- -j $(nproc)

And then run it with:

qemu-system-sh4eb -M r2d -serial null -serial mon:stdio -m 256 -nographic -no-reboot -kernel arch/sh/boot/zImage -append 'HOST=sh4eb console=ttySC1 noiotrap'

That .config file should have a CONFIG_INITRAMFS_SOURCE entry with the full path to the "fs" directory populated by the mkroot build (before packaging), so if you go back to the ~/toybox directory and run mkroot/ CROSS=sh4eb it should rebuild the fs directory then in the ~/linux directory you can re-run the kernel make line above and it should detect that the "fs" directory is newer and repackage it without redoing the rest of the build. (If you don't specify LINUX= mkroot won't rebuild the kernel, which also means we don't need the BUILTIN=1 argument to add the CONFIG_INITRAMFS_SOURCE entry to the kernel .config file. The CONFIG_INITRAMFS_SOURCE entry mkroot generates has an absolute path to the fs directory it used, so it can find "fs" under ~/toybox from ~/linux.)

I have yet to track down _why_ the resulting silent init is unhappy, "no output occurs" is still a bit frustrating to debug. Dunno if there's a kernel, toybox, or mkroot regression since this last worked without my patch to auto-mount devtmpfs in initramfs. It USED to work. (And is almost certainly something small and stupid...)

May 23, 2024

Finally got my turtle board working again: the one and only USB-micro-B cable I currently have (the rest are in storage or abandoned in texas, I've got USB-C cables for days though) is fiddly and needs to be pressed in and jiggled at the small end after I've picked the board up to do things like "swap the sd card". It powers the board either way, but doesn't pass serial data unless it's got a good connection. Which is part of the reason I thought it was bricked for so long. (I had to flash it with a working FPGA image, but when testing whether an image worked or not "no serial output" had multiple causes.)

But Glaubitz (the superh maintainer for both debian and linux-kernel) wants to test current j-core kernels on his board, and he said my mkroot image didn't work for him, except I tested it in February and it worked fine, so something is clearly weird.

First problem was that he was using the old toolchain, and we fixed the musl setjmp register bug months ago. (Which is what I was specifically testing on real hardware back in February, to make sure that fix worked there.)

The SECOND problem is that what I tested boots and runs fine... but there are two failures in codepaths I hadn't tested. Something changed in the kernel config between 6.8.0 and 6.9.0 so the j-core serial console doesn't produce any output (should be easy enough to bisect, just tedious to "build kernel, remove the sd card from turtle board, insert it into laptop, copy the kernel over, remove from laptop, insert into turtle board, plug the USB back in, re-run microcom" dozens of times even WITHOUT having to confirm that the lack of serial output isn't because of a dodgy USB cable. The sd card is TINY and fiddly, I can only touch the caseless board by its edges, sudo keeps wanting the password typed in again if I took too long since last time... This is why Japan had a proper desk with the green mat and wrist strap.

The other problem is if you don't apply my kernel patch to make CONFIG_DEVTMPFS_MOUNT work in initramfs, the shell script codepath to set it back up seems to have bit rotted. Which is OBNOXIOUS to test because THERE IS NO OUTPUT when it's not working. Sigh, I should hack the shell to manually { mknod("/potato", 0777, dev_makedev(5, 1)); int fd = open("/potato", O_RDWR); dup2(fd, 254); close(fd); } and then spray it down with dprintf(254, "debug statement\n"); or similar... Again, debugging this involves applying fingernails to a microsd cards dozens of times in succession without killing a piece of bare electronics with static electricity or breaking a fragile connector. (I need to buy another case to put the board in, I broke the previous one trying to transfer it to a different board. Especially 3D-printed plastic cases go together ONCE, and then when you try to dismantle them after the plastic's had time to age bits break off.)

This is why I really really really want a qemu environment I can test this sort of thing in. Regressions accumulated in a codepath that is tedious and fiddly to test. The stuff I DID regularly regression test works JUST FINE.

May 21, 2024

I added "netcat -o" hex dump mode, initially thinking it collated the output into the minimum number of lines, since TCP/IP input defaults to using nagel's algorithm so gaps in data transmission are not reliably preserved unless you're in UDP mode: your data getting broken into ethernet packets which are delivered individually isn't necessarily useful information. But apparently the busybox version breaks lines after each read (commemorating what ethernet packet granularity the OS broke the TCP/IP stream into), so I made -o do that (which I CANNOT RELIABLY TEST) and added capital -O mode to do the testable thing. (I'd look for an "upstream" version to compare behavior with, but Hobbit anonymously released an implementation ~30 years ago and changes since all seem to be forks with no agreement, and -o existing at all is a recent addition. Maybe busybox came up with it?)

As always, the behavior is the easy part. Elliott objects to the help text: to him "collate" means "sort" instead of "group", and he wants to call it "buffer" instead. I never would have guessed this behavior from the word "buffer": if "buffering" changes the data something is wrong (this is about when we should and shouldn't have newlines, that's not a "buffering" question). TCP/IP is a streaming protocol not a "buffered" protocol. UDP copies data into and out of buffers. There is a "packet buffer".

But once you're into the weeds bikeshedding word choice... both vi and microsoft word have "join", which isn't very helpful here because that's undoing breaks in the input data, and this is about ignoring vs annotating input granularity. The toybox command "fold" does the opposite here, I vaguely thought there was a command that would rewrap input lines but apparently not? The debian host path has a "join" which seems like it's trying to do database stuff.

Sigh, naming things is not my area of strength, I just don't want it to be obviously WRONG...

May 20, 2024

Finally got toysh "return" tested and checked in. The problem wasn't "source" failing to clean up the function call stack after itself (it already was cleaning up, it's just not calling the regular function to do it, and probably should be). The problem was I only had one level of setjmp buffer for intercepting xexit() calls from shell builitins, and builtins can nest. Specifically "eval" or "source" can call back into do_source() and loop back around to call another builtin (like "return"), and when that inner builtin returns the cleanup reset the setjmp pointer to NULL, which xexit() interprets as actually call _exit() instead of longjmp() when you hit the end of the "eval".

Which meant "source <(echo return 37); echo $?" wouldn't run the second echo, because when "source" exited the shell exited because return had cleared the "we are in a builtin" handler.

The solution is to save the old value of the pointer to a local variable on the stack and restore it instead of setting NULL at the end. It's still really annoying that this recursion path is consuming unlimited stack (again, nommu systems), but I haven't got an immediate fix.

That said, I _could_ use something like this to fix xexec() recursion. Right now we measure stack usage (by subtracting pointers, which we typecast to long to make the compiler STOP "HELPING") and if it's past an arbitrary limit call a command out of the $PATH instead of an internal recursive function call. But what xexec() COULD do instead is longjmp() back to main() and then re-dispatch through toybox_main() based on the new command line. This frees the stack space but doesn't free anything ELSE we've allocated (malloc, mmap, file descriptors)... but that's already the case, and even execve() won't close filehandles we didn't annotate with CLOEXEC.

May 19, 2024

On the list we're arguing about FILE * buffering again. Elliott said he'd handle the endless whack-a-mole cleanup but an instance of it wound up back in my lap, and I added TOYFLAG_NOBUF to MAKE IT STOP.

Really, that should be the default, and command should say TOYFLAG_BUF or TOYFLAG_LINEBUF if they want stdout to be buffered. But I need to do a review pass on every command to determine the buffering type it needs. Throw it on the todo heap...

May 18, 2024

I have received a request to add oksh to toybox, because "The default sh of toybox is too limited." Sigh. I mean, they're not wrong...

For once this request is feasible: it does appear to be a public domain shell implementation. On the other hand it's 25k lines. It's easy enough to install NEXT to toybox, but would be a lot of work to integrate.

Maybe I should have a mkroot/packages/oksh?

May 17, 2024

Trying to implement toysh return, which needs to return from shell function calls and from "source" imports, needs to ignore (skip past) "eval" but NOT return from the (various) $(types) <(of) >(subshells), and needs to error out BUT NOT ACTUALLY RETURN when it hasn't got a parent function or source context. Which is more levels of distinction than toysh's current data structures are currently annotating, and extending it requires some design cleanup.

The first problem is there's _two_ function structures, with some existing name confusion, a function CALL is not a function DEFINITION: struct sh_function holds a callable function, and struct sh_fcall is a call stack entry. They're adjacent, but serve different purposes, and right now the functions that deal with BOTH have "function" in the name (ala end_function), when half of them should be fcall instead. (And to be proper object oriented C should be fcall_end(struct fcall *fc, ...) with consistent prefix and the pointer to the structure they operate on consistently as the first argument.

Alas renaming a lot of stuff is churn, and renaming WHILE modifying the logic is... well there's a reason this has taken so long to do and I keep restarting trying to come up with a clean correct change that isn't outright sprawling.

TT->functions is an array of struct sh_function holding the currently defined callable functions, which is dynamically updated and the entries are reference counted because functions aren't added to that when you PARSE the code, they're added when you EXECUTE it. Meaning you can have a function definition inside an if statment or for loop, or even inside another function. A function can even replace itself while it's running, and the function will keep executing to the end but then get freed when you return from it, I.E:

$ bash -c 'x() { echo one; x() { echo two; }; echo three; }; x; x'

That shouldn't leak memory, hence the reference counting. (Calling a a function increments the reference count, returning from it decrements the count and frees if it hits zero.) But it's worse than that: the parsed chunks of shell script that got executed to register that function definition into TT->functions have their own lifetimes. Incoming shell script is read a line at a time until we have a complete thought (the line continuation logic prompting with > for more in iteractive mode does basically the same thing behind the scenes when we're NOT in interactive mode, resolving "HERE" documents gets worked in there too, and this can go on arbitrarily long with if/then/else/fi blocks and || gluing the next line to the previous one and so on), then the parsed pipeline list is executed, and when execution finishes (usually by reaching the end) they get freed. And if we free a chunk of pipeline context that an sh_function is still using, that's bad. So we have to TRANSPLANT the function body from the parsed pipeline list into a reference counted sh_function structure, DURING PARSING, and have the function definition statement act as a reference to the structure. (We just create it with reference count 1 while transplanting.) Meaning a function definition needs its own pipeline type (because the body of the statement isn't a pointer to type sh_arg, it's a pointer to type sh_function), and there are actually TWO statement types: while it's being parsed it's type 'f', and then we go back and repot it when it's complete to turn it into type capital 'F', which means if a syntax error happens during parsing and we tear down a half-parsed pipeline list, we free the right stuff.

I did all that ages ago, but periodically need to read through it again to remind myself how it works and reassure myself that it DOES work. Anyway, I really hope I don't need to fiddle with that right now, although I may have some outstanding bugs about off by one errors or similar in there (which boil down to "I do not have enough tests in sh.test yet to exercise all the corner cases", and also I need to run this crap under valgrind. Screw ASAN: I need to find LEAKS. I have written my own garbage collection all over the place, I need to go over that with hot irons. Again, using valgrind has the problem that toysh is intentionally NOT freeing things that don't need to be freed on exit, because iterating through the environment variable list to free() all the ones I redefined (it's a malloc!) is silly in normal use, but without doing that valgrind is too noisy to mean anything... I'm not looking forward to tackling that.)

People offering to help with the shell who do not have this context... I need to sit them down in front of a whiteboard for MULTIPLE DAYS just to explain the issues the current design is incompletely trying to address. I would LOVE help, but drive-by ain't gonna cut, this is "pair programming" territory...

Meanwhile sh_fcall is an entry in the call stack, which isn't JUST functions. Any time you need a new variable context, or to reset $LINENO, or set different shell command line options for "$@" and "shift" to operate on, or need to jump somewhere else and return to where you came in a way you can't work out from the structure (ala "break jumps to the end of the current do/done block", but "eval $POTATO could have _anything_ in $POTATO, something different each time")... Using prefix assignments ala ABC=def GHI=jkl env creates a new function context. When you run "eval" it creates a new function context. It's a fairly general purpose container structure providing execution context.

The fcall stack is a doubly linked list pointed to by TT.ff, where the current entry is the current function context (so TT.ff->pl is the statement being executed now, which I sometimes call the "pipeline cursor"), and traversing the list forward gives you _previous_ contexts, and you traverse it forward when looking for global variables occluded by local variables (or whiteouts, ala $ bash -c 'abc=def; x() { local abc; unset abc; echo abc=$abc;}; x; echo abc2=$abc' produces abc= abc2=def), but because it's doubly linked the root/global function context (the one you start with at the first shell prompt, containing the global variables) is TT.ff->prev, the last entry in the list. (Which comes up: an assignment that ISN'T declared local goes in the global context by default, so if you search for a variable and don't find it to update it, you add it to TT.ff->prev->vars which is the same as TT.ff->vars when there are no other function contexts.)

(Again, "export" is not the same as "global", you can export a local variable. And "unset" of a local causes whiteouts, returning undoes the "local" so any previous variable by that name becomes visible again when you pop the function context even though _locally_ unset it. There does not appear to be an "unlocal" other than returning. And yes technically "echo | cat" is a "pipeline" which is a single command in this context, EXCEPT "while true; do echo hello; done | cat" is also a pipeline containing complex statements, so struct TT.ff->pl being an instance of struct sh_pipeline meaning "single command with before/after glue and associated type so it might actually be a flow control statement rather than a command"... the vocabulary is somewhat ad-hoc here. I parse incoming lines into a list of struct sh_pipeline and then do stuff to that. This is likely to cause problems with job control the same way failing to distinguish function definitions from the function call stack did, but that's not TODAY'S problem.)

I've been referring to function call contexts that DON'T affect return as "transparent" contexts, you return right through them. So return doesn't affect "the" pipeline cursor, it may in fact iterate through multiple sh_fcall instances to find the one it needs to return to, and move the cursor of more than one of them to the end.

But bash's "return" will also error out immediately (without moving any pipeline cursors) when you don't have a function or source call to return from, ala $ bash -c 'echo one; return 2>/dev/null || echo two' printing both one and two. When return errors out it acts like a normal command with return code 1 and you continue executing the current "command || command ; command" pipeline. If your stack doesn't have any non-transparent contects, return needs to error out IMMEDIATELY, and leave the stack's chain of return pointers (pipeline cursors) alone so execution can continue normally.

The way to get "return" to error out immediately without mucking about with the pipeline cursor in the error case is to traverse the call stack twice: once to detect "nowhere to return to" errors, then a second time to update all the pipeline pointers to the end of each block once it's determined it's safe to do so.

Note that return can't pop sh_fcall entries from the stack, the caller does that. The return logic is a bit like "break", it can pop all but the last block in the block stack (for if/else entries and similar: you start with one that's needed for things like the "run" variable that lets you know whether you're skipping the current statement due to previous if/else or && or similar) and then it needs to set the pipeline pointer to that block's "end" value, so the execution loop we return to goes "nothing else to run in this pipeline list" and pops the fcall and returns to its caller. The shell logic returning and the C functions returning interleave a bit here: I can avoid consuming C stack when you merely call shell functions, but some kinds of <(subshells) fire off child processes within argument parsing, and nested calls to eval and source are implemented as a sort of unrolled xexec() into toybox builtin commands, which then bends back around to call do_source() and run_lines() again, which is GOING to eat C stack and then have to return whence it came at some point. (I might be able to longjmp() my way around the xexec() ones, albeit with GREAT EFFORT, but the command line >(argument) parsing has to fire off a child process and continue on from there, and that's hard to do without recursion.)

Meanwhile, blockstacks are generally resolved locally within a single call to run_lines(), and that part's designed to handle arbitrary pop_block() gracefully when you get back to the start of the loop. That's why return can pop the block stack but not the fcall stack.

So "return" checks its command line argument (if any) to set toys.exitval (and otherwise leaves it alone because "false; return" leaves the return code at 1), checks for "can't return" error (or the argument is not a number error) and xexit()s with toys.exitval=1 if so, else it adjusts pipeline cursors at possibly multiple levels of the fcall stack (reaching through transparent contexts if any, and popping blocks within each adjusted fcall as appropriate), and then exits the builtin function and lets the calling context resume to free the fcalls and unwind the C stack through any do_source() calls.

Which SEEMS simple enough, the problem is the existing sh_fcall annotations don't reliably distinguish transparent, function/source, and stop contexts. There's an unused "fcall" field that should point to the sh_function we're running (so trace and syntax errors and similar can say where we were when stuff happened) but isn't currently initialized, and there's an old rule that fcalls with variables cause run_lines() to break out of the loop when we exit them, and when TT.ff->vars is NULL we just pop it and continue on... which is NOT the same as a transprent context because "abc=def eval 'echo $abc'" is a transparent context despite having variables, return does not stop there.

Alas, redoing these rules is another big intrusive change of the "hard to finish, test, and check in" variety. And combining that with renaming functions... But if I try to do this WITHOUT the cleanup, it's a mess. But trying to do the cleanup FIRST without the code WORKING, means I'm creating infrastructure in search of a user that may not actually work when I try to implement it, so it's hard to do incrementally except "come up with the one big one, then break it into small chunks for checkin after I've tested it", which is even more work than doing it in one gulp. (Re-testing at each step for regressions!)

And then I bump into stuff like "nothing is popping the function context the 'source' builtin adds?" which is fine at the moment because while it's an empty (no variables) context run_lines() will eat it and continue on, but when I change the annotations around it to become a stop context and execution doesn't continue after "source" because it's acting like a $(subshell)... So I have to FIX REGRESSIONS and then... reintroduce them for checking? Or something?

Meanwhile, Bash has some corners I'm NOT likely to replicate:

$ bash -c 'return potato; echo $?'
bash: line 0: return: potato: numeric argument required
bash: line 0: return: can only `return' from a function or sourced script
1 $ bash -c 'echo $(one; return; echo two); x() { echo $(echo three; return; echo four); echo five;}; x'
bash: one: command not found
bash: line 0: return: can only `return' from a function or sourced script

Why two error messages at the start? No idea. Why does $(return) produce an error message outside a function but not inside a function, when in either case it should NOT continue execution within the enclosing context, for the same reason a (subshell) needs to STOP AT THE END and not continue into the inherited parent context that it can see just fine because fork() duplicates the entire executable, but those statements are the parent PID's turf, not the child's because if it ran them they'd get run twice. But the state that $(subshell) inherits KNOWS the child process either is or isn't within a function as far as the parent context is concerned, and thus produces different error messages. For no obvious reason.

For me to replicate that in nommu mode, I'd have to marshall unnecessary data across the pipe to the exec'd child process, because vfork() has to exec the child process (creating a brand new execution state and yielding back the parent's memory mappings) before the parent can unblock, so no the child does NOT just "naturally know" whether the parent is in a function or not, the child doesn't inherit the parent's function call stack. (That said, it may need to inherit the parent's function call DEFINITIONS in order for the child to be able to call those functions. Which means I need to marshall that data across the pipe to the child.)

So child processes that DO inherit the parent's full call stack from fork() need to either discard most of it (who cares if it leaks, dirtying the pages to free them would break copy-on-write and wind up allocating more memory anyway, the child is usually short lived and exit() covers a multitude of cleanup... yeah, see valgrind above, WHAT QUALIFIES AS A LEAK EXACTLY...) or else the child can annotate an sh_fcall entry as a stop point, where return errors out when it hits it.

This is a lot of work for "break with a bigger hammer and a return code". And THEN I have to implement "trap"...

May 16, 2024

Android's phone web browser "updated" itself to remove the blank lines between paragraphs in my blog. They've been punishing conventional HTML for years now. Conventional as in no css, just paragraph tags with the occasional link, bold, horizontal rule, blockquote, and pre block. And I've used the strike tag a few times for sarcasm purposes. But chrome keeps managing to make it WORSE. And it keeps prompting me for "simplified view" when IT HAS THE MOST BASIC POSSIBLE FORMATTING ALREADY. How do you SIMPLIFY that? What does "simplify" mean in this context? I'm already doing break tags in the block quotes by hand half the time, just because chrome is SO brain dead that it shrinks the font in pre blocks. (I miss konqueror.)

Installed vivaldi on my phone. It shows the blank lines between paragraphs again, but when I zoom to 300% it squeezes all the text to the left edge for no obvious reason, just like chrome's been doing. (I'd think maybe it's the pre blocks, but I scrolled down to the first one and it's going off the right edge so I have to scroll the view to see the end of it. So it's failing in BOTH directions at once, bravo. Presumably using Android's shared web rendering infrastructure.)

May 15, 2024

Made it to Target! They do not sell electronics anymore, instead they have a "Tech" sign, with phones, switch games, bluetooth and USB devices under it. (When I sent Fuzzy a picture she replied "I'd like to buy one Tech please.") Specifically, this means I can't buy a replacement hard drive for my laptop from them. (Flash drives wear out and its' been a few years, reinstall seems a good time to swap in a fresh ssd, but 2.5 sata disks are a bit long in the tooth these days, and I refuse to buy storage from amazon. Too many horror stories.) The "tech" section had 256 gig microsd cards (to stick in your switch) on sale for $36 though, so I picked up one of those along with various beverages and multivitamins and so on.

My goal for the sd card (as with the previous one) is to turn one of the Orange Pi 3b boards into a home server. Which involves getting a vanilla kernel I built myself to boot on it, which doesn't SEEM hard except "I have a device tree, what kernel config symbols switch on all the drivers this device tree needs" does not seem to be a question ANYBODY HAS EVER ASKED BEFORE. Or at least not written a tool to do it. Which is sad. I keep hoping Debian aarch6644 will just grow support for orange pi 3b some release, but so far it's not even explicitly supported in Linus's kernel yet. (Orange Pi 5 is, and orange pi zero is, and there's mention of orange pi 3, but the out-of-tree fork has "rk3566-orangepi-3b.dts" and vanilla does not. (I _could_ try to do board bringup myself, but I don't trust cheap chinese hardware not to eat itself if the thermal sensors aren't monitored and so on. The first THREE devices listed in that dts file are voltage regulators, which apparently require programming. Nope, I am not comfortable experimenting with that without up-front assurances from an electrical engineer about what is and isn't safe to do.)

On the other hand, the device tree people have screwed themselves over SO PROFOUNDLY BADLY by gpl-ing all the device tree files so BSD and Windows and such will never touch them (as I complained about a decade ago now, although back then the Intel system management mode repository du jour was ACPI) that Intel's EFI is becoming the standard on Android, to the point the only way to boot a vanilla kernel on Raspberry pi (rather than the weird proprietary forked kernel the Pi guys publish) is to use ARM EFI firmware. So congratulations copyleft guys, you have reduced another promising technology to a historical footnote with unpleasant licensing locking it away from potential users. You have become what you fought.

May 14, 2024

Currently toysh is setting SIGPIPE to SIG_IGN early in sh_main(), and now that I'm implementing the "trap" builtin (which also reopens the can of worms that is job control), that's kind of awkward. (I switched mkroot's init script to manually set up the child shell instead of using oneit, and the missing part of the THEORY here is having PID 1 SIG_IGN on SIGCHLD so reparent_to_init() doesn't accumulate zombies. Which means I need the trap builtin so I can set signal handlers in the shell.

But trap - SIGPIPE needs to set that handler _back_ to "default" and if the local default is SIG_IGN I need a special case for that. (I also need to special case SIGINT because that one also gets intercepted by default: Ctrl-C should not kill your shell. And Ctrl-Z should not freeze your shell either so SIGSTOP works in here somewhere. Those last two are gated on -i "interactive" mode, though, and might logically be part of the line editing plumbing. Or at least Job Control.)

Way back when toybox's main.c was setting SIGPIPE to SIG_IGN itself, because we check our own output errors rather than just get randomly killed. So every toybox command ignored SIGPIPE and instead had xprintf() and friends check for errors. But these days we don't do that, because a 2015 commit accidentally removed the SIG_IGN setting, and when it got put back in 2017 it was only for android which set it to SIG_DFL (which man 7 signal says has the default behavior of "term", killing the process).

The stated reason (at least according to a comment added in 2020) was that it did so because bionic was installing its own handler that (noisily!) killed the process, and we wanted it to stop being noisy. But I thought I'd set it back to SIG_IGN, and instead I set it to SIG_DFL, which means all the xprintf() checks for error only kick in when I run commands from toysh? (Or is bash also absorbing this as part of its job control in pipes?)

I'm not quite sure what success looks like here. The original design from way back when was all toybox commands would SIG_IGN and let xprintf() exit. Right now, anything launched from toysh is inheriting SIG_IGN on sigpipe because we don't set that one _back_ (we reset SIGINT to SIG_DFL, but not SIGPIPE), restoring the pre-2015 toybox behavior... but not for Android. So what's happening now is INCONSISTENT.

The comment in main.c says the SIG_DFL hack for android expires (should be removed due to 7 year time horizon) in September of this year anyway. Replacing it with a global SIG_IGN for sigpipe is... what I _thought_ it was currently doing?

Sigh, why is writing to stdout so much of a pain? Honestly. (I mean at least this one isn't more FILE * buffering changes...)

May 13, 2024

Sigh, is anybody (Other than me) going to care about this bash corner case?

$ x() { return $1;}; X=9999999999; for i in {1..10}; do Y=${X:10-$i}; x $Y; echo $Y=$?; done

The "return" argument doesn't seem to error on insanely large values, but it does get capped. Except it wraps first (multiple times), THEN gets capped. And it's not related to 32 bit integer values, either: the next to last entry in that list is less than 2 billion, the last is over 4 billion, no change.

I don't really want to emulate that?

May 12, 2024

Didn't make it to Target yesterday, because I didn't finish closing tabs. There's quite a few of them. Plus the usual du jour incoming pokes like OIN wanting to send out a physical copy of the latest system definition and seeing if they have my current address. (They do not, nor do they have my current _project_ either, I signed up Aboriginal Linux back in the day and that got end of lifed in 2017... OIN is "Open Invention Network", basically a patent pool for open source projects. I haven't got any patents, but they were fighting the good fight back when I signed up in... I want to say 2010? And I haven't heard about them going septic since.)

I should really add the todo item at the end of this post (find -maxdepth not being "global" in toybox and thus not needing a warning) to my todo.txt but I just haven't got the heart for it. It's SO deep in the weeds that "not doing the crazy thing" (and being unable to parse at first glance what busybox IS doing, and not wanting to dig further into busybox because license and Bradley being active again)... Maybe I should document this as a "deviation from posix" in find.c? Except I'd want a test case to CONFIRM it's a deviation from posix, and that -maxdepth applies to entries being recursively considered below the threshold rather than just disabling the recursion to prevent their consideration...

I suppose part of the strength drain is realizing how far busybox has fallen down a complexity rathole. Or maybe it always was and I've become sensitized to it by being away for so long? I used to send Denys random ifdefectomies but he could never wrap his head around the concept for some reason. And the INIT_G() thing is another mess toybox doesn't NEED because the this.command union that TT points to is A) an uninitialized global and thus guaranteed zero by the ELF standard, B) rezeroed by toy_init() when you recurse to xexec() a new command WITHOUT needing...

Hang on, I could have xexec() do a longjmp() back to main() and thus free the stack space, and then toybox commands could recurse infinitely without limit because they're not consuming stack space. And then the nommu test would be a null setjmp pointer instead of having to measure stack space. (It's still leaking filehandles and mallocs and so on, but that was already the case. Infinite recursion in non-nofork commands isn't _free_. Of course unless I NOEXEC the filehandle, those leak already. Doesn't come up much because deep exec stacks isn't really how Unix usually works: you call processes and they _end_, the process tree branches out from a smallish number of long-running processes spawning shorter lived ones.)

Another todo item I someday need to find the strength for is going back through all my old blog entries and mailing list entries and fishing out todo items that fell through the cracks. That's post-1.0 work though.

May 11, 2024

I miss Google. Searching for " three doctors" should NOT pop up a "see results closer to you, enable geotargeted advertising!" pop-up. The search query told it the website to look within. How do you even CONCEPTUALLY attach a location to that? (Did someone edit the scene I'm looking for from the Doctor Who 10th anniversary from 1979 at a physically closer location to where my laptop is now? It was FILMED in ENGLAND, I am in MINNESOTA...)

I wanted to link to the clip of the second doctor saying "it's my fault, and I'm sorry" for the "should I update to devuan deadalus or excalibur" footnote of a reply, after determining that excalibur is experimental but pushed early since the big thing there is the /usr merge, which I seem to have convinced people to do. (As in they linked to my busybox post when doing it, thus the "my fault" part. I mean, I wasn't _wrong_. I was, in fact, convincing enough that it became the new norm. Oops.) Anyway, I want stable, so that means the 5.0 release, and I wound up editing out all mention of the issue before hitting send anyway because irrelevant to recipient: brief is better.

The issue the post was about can't be tested without upgrading to a new distro release using a new kernel (well, I could limp along testing it in kvm but then the "never switched back to the 16 gig ram sticks" issue hits me), so I'm closing tabs. So many tabs. Not browser tabs, which chromium remembers for me between reboots (although not across upgrades, so maybe I should... Hmmm, probably installing Vivaldi instead of Chrome next time which means losing all the tabs so I should triage THOSE too, great...) but for now I'm just trying to add stuff to todo.txt as I close command line windows where the backscroll says I was in the middle of doing a thing.

Then I head out to... I guess Target? Fresh install on an old SSD seems dubious, I should replace the component known to wear out as long as I'm reinstalling anyway. But I refuse to order storage hardware blindly from the internet (half of amazon is scam entries now, and their predatory monopolization is taken most non-amazon online ordering down). I want to buy a physical box from someone who is actually present, to whom I can personally hand it back if it demonstrably does not work. And there isn't a circuit city around here, and Target is "wal-mart but not owned by obvious and outspoken right wing loons". I dunno if Target is great (it's capitalism: money corrupts, for-profit corporations corrupt absolutely), but I haven't seen regular headlines about them being terrible to their workers consistently for multiple decades now.

May 10, 2024

Wells Fargo emailed us to say our Complete Advantage Checking is converting to Everyday Checking with a $10 monthly service fee for existing, and a Phishing link to click for details. I have not done so, will not do so, and need to find a physical bank location around here to talk to a human, and quite possibly find a minneapolis credit union.

Yesterday, a Wells Fargo Investment Advisor called us (well, asked for Fade using my number and I handed the phone over) wanting to talk to her about her IRA. (She and I have basically the same IRA accounts, but this was specifically for her. Ok...)

And I mentioned Truthiness trying to upsell me on a new mortgage to replace the old mortgage and home equity loan the sale should pay off. The fact we're selling a house is public, so people speculating that six figures of money might pass through our account soon are crouched waiting to steal it.

Meanwhile, the people who bought the house found a couple hundred dollars worth of exterior damage we hadn't known about, and are trying to knock another 5 figures off the price ON TOP of the $12.5k they already knocked off the price for... I guess no reason, it obviously wasn't because of that sort of thing. (In theory we have homeowner's insurance. Why doesn't that cover this?)

Strongly tempted to go "no, I don't want to sell to these people, let's wait for another offer", but Fade's pretty stressed about the process and the only way I'm taking it back over is if my vindictive streak kicks in and I become powered by spite. Which would not be good for getting anything at all done on toybox in the next month.

"Obviously we can't sell a house in that condition, sorry to have wasted your time. We'll take it off the market, check with our homeowner's insurance, get it fixed up, and put it back on the market once the $2000 worth of repairs have been finalized."

I am VERY close to my vindictive streak triggering, I need to go think about something else.

May 9, 2024

Accomplished nothing today, exhausted and sick in bed in a listless but NOT sleeping way. We ran out of blueberry caffeine cylinders so with breakfast I had one of the Red Bulls Fade occasionally brings home when they hand them out on campus (first one's free). And crashed VERY HARD an hour later and remembered why I don't drink those anymore. Well, one of the reasons. Another reason is no beverage should be bubble gum flavored, there's a whole cultural "do not swallow, this is not food" aspect to that particular flavor that they're blithely ignoring.

Also a nonzero chance I gave myself sunburn walking across campus with Fade to get Random House Thing notarized before coming back to meet with the guy I talked computer history with. Did not manage to get my hair cut while I was in the area, not enough time...

May 8, 2024

We wound up giving the realtor's pet homebuyers half what they asked for off the sale price. Still in the "does not affect the nominal sale price and thus broker's commission" way.

Met with a guy who was only in town today and blathered about computer history at him for 7 hours. A bit like a full weekend's convention speaking schedule, except somewhat extemporaneous, compressed into one day, and without personally having to travel. Had fun, but wound up exhausted afterwards and went to bed early.

Partway through the meeting Truthiness Truist (until recently BB&T Bank) called me to try to upsell us on a new mortgage, since THEY got notified the house was on the market and thus we'd be paying off the old mortgage and home equity loan. The financial industry is desperate to take money from everybody in contact with it, and I am so tired of late stage capitalism. And sick of sales calls where "no, don't transfer me to an agent, I can call you back in future if I develop an interest" has to be repeated at least 3 times and then I wind up hanging up on them anyway.

May 7, 2024

Potential house buyer. The house has been on the market for either 2 or 3 days at this point depending on how you want to count it, and the broker is going SELL SELL SELL GRAB THIS ONE DON'T WAIT DON'T THINK ABOUT IT TIME PRESSURE ALL GOOD SCAMS HAVE TIME PRESSURE ACT NOW SUPPLIES RUNNING OUT DO WHAT I SAY because these people want us to knock $25k off the asking price in a way that doesn't reduce the broker's commission but gives us less money. (So same nominal sale price but we provide them with... I dunno, kickbacks? The excuse is they'll need to change stuff when they move in. Um, yes, that's part of buying a house.)

Yeah bird in the hand, yeah extricating ourselves from the situation. But when I asked about waiting a week or so to see if anyone else bit, the realtor instead called the other people who went to the "grand opening" or whatever it was and confirmed they hadn't wanted to put in an offer. Which wasn't what I asked. (My vague theory she's selling it cheap to somebody she knows has not yet been contradicted.)

And the realtor's saying we should DEFINITELY NOT WAIT for somebody at least meeting our asking price because when houses go on the market they depreciate like a sliced open avacado turning brown. Which is news to me, but... Wife's sister's recommendation. I'm not steering here...

Oh, the logic of avoiding the house being on the market longer is that people are more likely to offer less than the asking price. As opposed to taking someone offering less than the asking price right off the bat, after the asking price was already six figures lower than the tax assessment...

May 6, 2024

You know what the biggest derailer of toybox work is? Finding where I left off in the morning. I think of a thing I want to work on next (I'm in the mood to tackle THIS), go fire up my laptop, and then try to find which tab, in which window, in which desktop I left off on that work in...

Usually a workspace is at least two or three adjacent tabs, such as the riscv kernel build with a vi instance editing the mini.config file (trying to slim it down and track down what each currently enabled symbol actually DOES and why the architecture's defconfig thought it needed it, so there's a "make menuconfig" tab I can forward slash search for symbols in and then go navigate to where they are to look at their help text, yes that's inefficient but I didn't write that menuconfig). There's the kernel build tab with "cursor up and hit enter" command line history to configure and build the kernel with the long tortorus cross compiler path and ARCH= and so on. That one's two hisory entries I run separately, two lines both because the default make target is non-obvious (something like __all I think, the vmlinux target doesn't build the arch/$ARCH/boot files), and because configure and build on the same command line wasn't SMP safe last I checked and you _really_ want -j $(nproc) for the kernel build part. Then there's the tab with the qemu invocation to test launch it against a prebuild filesystem.cpio.gz, and if I need to rebuild the filesystem I cursor up an extra time in that tab.

In theory I can just vi the mini.config from wherever I am, see the PID in the vi error message that another editor instance has this file open, kill that instance, reload it, write it out to a different filename and diff them if it's still complaining (meaning I had unsaved changes in the open editor instance that might be worth saving), and then figure out where I left off in that file and also recreate the rest of the other tabs' context in a succession of new tabs. Which means that back in the original location I didn't track down, I am accumulating debris. Which probably includes various TESTS I was running in yet more windows/tabs near there, and the testing I remember to do off the top of my head in the next context may not provide full coverage of the tests I WAS running when I left off.

But if I go digging for where it was, I stumble across a dozen OTHER half-finished work items, and get distracted trying to IDENTIFY them, which means mentally recreating the context and going "oh yeah, that was a problem I need to address". And the surviving tabs are all evolutionarily selected to be sticky problems, because the ones I could finish and check already got closed.

Having 3 or 4 plates spinning at once isn't a big deal, I have 8 selectable desktops configured in the widget (each of which has a nominal purpose, toybox development is desktop 2 and mkroot is desktop 6 just under it in the 2 levels of 4 in the selector widget), but a bunch winds up in the email desktop (7) because popping open a quick terminal to fiddle with stuff in response to an email is a constant temptation, and quick things don't stay quick. (In xfce I can left-click the top left corner of the window and "move to desktop" from the pulldown menu, but that moves ALL tabs of a window. XFCE also lets me drag tabs between terminal windows so I can collate them during cleanup, but... that's cleanup work. Takes time and brain and I have to explicitly do it.) Some stuff winds up happening/accumulating on desktop 1 (originally the "default desktop" but basically web browsing and blogging), desktop 8 used to be for j-core stuff and these days is where I wrestle with real hardware like those Orange Pi 3b boards. In THEORY desktop 3 is documentation/presentation staging and desktop 4 is documentation/presentation recording or giving live talks. In practice I grab all 3 when there's too much clutter and I just need a clean workspace. (Desktop 5 is for NOT PROGRAMMING STUFF, things like role playing games and mame and so on. Yes I DO own a copy of several things like the "mappy" rom, I have various "8 games in one" joystick with NTSC output devices over the years, mostly christmas and birthday presents. I may not have a TV with NTSC input anymore, but I have first sale doctrine on that ROM image!)

Anyway, the swap thrashing has multiple causes, just so you know. (Yeah, it's mostly insufficiently medicated ADHD.)

May 5, 2024

Sigh, I should find another USB player app. The android built-in "file" app restarts the playlist from the beginning every time I get a phone call. (Because it has to stop to play the ringtone, and then forgets where it was so starts over from the beginning. And randomly stops playing when whatever "don't sleep this background app despite the screen being off" thing doesn't renew fast enough. Sigh, if the OS developers can't navigate this stuff, how are third parties supposed to manage?)

Alas, the pray store seems to have been completely consumed by late stage capitalism. It doesn't help that the first hit googling for "google play developer account" is an ad talking about "monetize with ease", but that's not the real problem. If I wanted to write my own mp3 player and stick in in the play store: I literally can't.

You can't just have open source apps uploaded by hobbyists anymore, now to upload an app you need an account with a bunch of constraints: must be 18 (because minors aren't people, they're property with no rights), must pay a $25 registration fee (to weed out the poor, oh you MUST have a credit card, prepaid cards are explicitly not accepted, and cash check or money order? Hahahaha, we're eliminating those options from society, money has been privatized.) And that's before you select the account type, and THEN you need to verify your identity, and if you're a plebian personal account you need to find 20 people to vouch for you before we deign to allow your app onto our precious platform.

Imagine if Linux did that? Heck, WINDOWS doesn't do that. A site like github that tried to pull that nonsense would lose its entire userbase immediately, the only reason Google can get away with it is monopoly leverage. How this ISN'T an antitrust flaming red flag, I couldn't tell you.

Sigh. I don't WANT an iPhone...

Oh well, at least there's still sideloading. Which is the "install Linux" of Android that only 2% of the population will ever do, but... Anyway, that's why there's no good app for playing local MP3s in the play store anymore. Remember: the iPhone was an upgraded iPod, and we've now enshittified away the ability to ipod. (Which apple didn't invent, the Diamond Rio was already successful, a big company with deep pockets muscled in on an established niche.)

Innovative new apps are less likely to wind up on Android. If small developers have to jump through enormous hoops to publish them many just won't bother. I wonder where they'll wind up instead? (I wonder what the Steam Deck's policies are? If somebody wanted to stick a spreadsheet program or similar into that distribution network, and hook up a keyboard and mouse and television to that...)

Of course when providing a fig leaf for their monopoly leverage they'll say "security". People who want to control other people always say "security", because if you don't live in a gated community or way off on a ranch with a rifle somewhere, what if something HAPPENS? (How can you dare to drive, a car in the other lane could swerve into yours and crash into you AT ANY TIME! You're trusting EVERY PASSING DRIVER not to kill you!) Meanwhile your phone _carrier_ is tracking you 24/7 no matter what the phone is doing internally, and that tower-based location information is available from the carrier, and the carriers leak like sieves in this regard. How is anything that happens ON the phone less safe than that from an abusive spouse or mad coworker who got your phone number from an old resume and uses it to corner you in a dark alley?

If they really cared about security, you could have small physical LEDs that light up when the microphone, camera, or GPS are powered up, and are dark when they aren't, in a way that was not under software control. If bits of your phone glow when it should be off, you know bad things are happening. Failure to do that is because they want control, and they want to sell your data, and they don't actually care about YOU. (In Late State Capitalism, the customer is ALSO the product.)

Or they say "think of the children", because latchkey kids in the 80's and unsupervised children going to/from school today clearly never happened, that's just not POSSIBLE. If you instill learned helplessness from an early age it's more likely to stick. Especially if you engineer failures like Amish Rumspriga, or college binge drinking or the way Frederick Douglas described slave holidays: suddenly unsupervised, encouraged to overindulge, like Donald Duck forcing his nephews to smoke a box of cigars all at once only self-administered. Look what happens when you stop obeying the rules! The Mormons organize "missions" and Jehova's Witnesses have their young adults knock on strangers' doors to create a carefully curated experience of the outside world rejecting you, briefly challenging your beliefs with no follow-up to encourage the "backfire effect" (what doesn't kill it makes it stronger, challenge overcome when the cognitive dissonance kicks in). Or the way Dick Cheney expanded his power tremendously after 9/11 (justifying the TSA, ICE, warantless wiretaps...) Failing to protect, allowing it to become a crisis, thus justifying draconian measures.

Except things like Linux have never needed that, despite running most internet servers for 30 years. Japan still has a self-policing culture keeping children safe. The USA discarded a culture of ACTUAL safety in exchange for fearmongering. Children today suffer endless school shootings, metal detectors, active shooter drills, completely useless armed security guards (as demonstrated at Uvalde), and of course the school-to-prison pipeline. In "Lassie", Timmy kept theatrically falling down the well because running around unsupervised was the norm in the "leave it to beaver" days and neighbors babysat for each other all the time. Now running a daycare requires levels of certification preventing anyone from doing it. Children are too precious to exist. That's not an improvement, and the difference is corrosive fearmongering from people who want ever-greater control.

*shrug* It's self-limiting, but historically the correction involves a collapse that's unlikely to be fun.

May 4, 2024

Closing tabs is hard when each one is basically an unanswered question. I'm supposed to copy them to a todo file rather than ponder them now, but I have to work out what the issue was, and how to phrase it in said todo file because I can't just cut and paste the last command run when context includes the full screen, the bash command line history in that tab (and context in adjacent tabs), what directory was I in and what does "git diff" show in that directory, and then trying to think back to what I was doing at the time given all that context which is hard enough to remember NOW, when a group of tabs can have a test and a build, another showing a section of an old commit, and then two open man pages, and a blog entry from a couple years ago with a section highlighted (both vi's "v" highlighting and mouseover highlighting survive switching tabs)... A dozen word note to myself often provides LESS context to remember from in future...

When I run exit 1 2 at an interactive bash prompt, it says "exit" on its own line, then complains "bash: exit: too many arguments" on a second line. But when I run (exit) or bash -c exit it does NOT say "exit".

I am not asking Chet. That way lies madness. (I'm trying to figure out what it's doing, not CHANGE it so there's version skew.)

Yes, closing tabs is hard when each one is basically an unanswered question. I'm supposed to copy them to a todo file rather than ponder them now, but I have to work out what the issue was, and how to phrase it in said todo file because I can't just cut and paste the last command run when context includes the full screen, the bash command line history in that tab (and context in adjacent tabs), what directory was I in and what does "git diff" show in that directory, and then trying to think back to what I was doing at the time given all that context which is hard enough to remember NOW, when a group of tabs can have a test and a build, another showing a section of an old commit, and then two open man pages, and a blog entry from a couple years ago with a section highlighted (both vi's "v" highlighting and mouseover highlighting survive switching tabs)... A dozen word note to myself often provides LESS context to remember from in future...

$ dash
$ exit
$ bash
$ exit
$ mksh
$ exit
$ bash
$ trap 'echo potato' exit
$ exit

It's just bash. Why is it doing that? And it's not the "exit" command, it prints it for ctrl-d as well. And it's doing so _before_ calling the exit trap, AND doing so before checking that the "exit" command has too many arguments...

PROBABLY I don't mind if I don't accurately duplicate this one.

May 3, 2024

Closing tabs towards eventual reboot of the laptop is like shelving books without reading them.

Various interesting todo items implied by the getauxval(3) man page (which was open in a tab because of that). If you really needed to, you could probably figure out if you're dynamically linked or not (static linking should have AT_BASE null or similar?) and thus whether it's safe to call dlopen(). If you can clean up sufficiently, What _is_ getauxval(AT_EXECFD) for? Because execveat(fd, NULL, ...) will exec the file pointed to by fd if you can just get a filehandle to the currently running executable, and that looks like EXACTLY what I want (I've poked the kernel guys for a way to get this repeatedly; you can't reliably open this yourself within various types of container because your process may not live inside your current chroot, it could even be on a lazily unmounted filesystem) but it returns 0 in both a dynamic and static linked test program, and is thus useless.

Meanwhile, AT_EXECFN looks potentially useful for re-exec-self (in the "is it possible" case, at least avoiding dependency on /proc even when argv[0] lies) but I wonder if it has a length limit the way comm /proc/self/comm does? Either way, names_to_pid() in lib/lib.c might care about this. (I'd say "but what about bsd" except we're already fishing in /proc there. Except... no, names_to_pid doesn't care because getauxval() is about this process and that function is fetching data for other processes, nevermind.)

$ ln -s . circle
$ $(echo $PWD $(yes circle | head -n 100) a.out | tr ' ' /)
bash: too many levels of symblic links

Seriously? It was not a recursive traversal! Oh, honestly. However, head -n 40 worked (darn arbitrary limit) and the result produced a 313 byte path, which was not truncated. So that's useful. Possibly the nommu codepath should open(CLOEXEC) the fd in main (before we can chdir() away from that path) and dup2() it up to the highest available filehandle? Some variant of fd = open(getauxval(AT_EXECFD), O_RDONLY|O_CLOEXEC); struct rlimit rr; getrlimit(RLIMIT_NOFILE, &rr); dup2(fd, rr.rlim_cur-1); close(fd); with a lot more error checking. Except the problem is the child may have cd'd away from where it got run, and if it does an exec it ALSO needs this info. Really I want a syscall or something that can get me a filehandle to my running executable, and right now Linux just has /proc/self/exe which might not be mounted.

Is AT_PLATFORM the same as uname -m? Let's stick it in toybox's "hello" command, switch that to default y, rebuild the mkroot targets (if you don't specify LINUX= it'll leave the existing kernels there and just rebuild the userspace including the cpio.gz archives qemu loads to populate initramfs), and see what we get...

Dear gcc: according to C a void * is automatically typecast to any other pointer value as necessary so printf("%s\n", (void *)getauxval(AT_PLATFORM)) should NOT WARN with -Wformat= because THIS IS NOT C++. Honestly, warning: format '%s' expects argument of type 'char *', but argument 2 has type 'void *' [-Wformat=] is an abomination unto Dennis Ritchie by the apostate Stroustrup. Stoppit. (And llvm is doing it too. Sheesh. I should just leave it returning unsigned long. It WORKS FINE.)

Hmmm... it's _sometimes_ the same (aarch6644) and sometimes different: armv5l is producing "v5l" for AT_PLATFORM, but uname -m says "armv5tejl". Both are correct, but one is providing a lot more detail. And neither is the "armv5l" that $HOST is set to by the boot. I was wondering if I could simplify mkroot to not have to pass through the build architecture on the kernel command line, but if I want targets to know when they're x32 or coldfire I might still have to. (I'm not having ANOTHER lookup table to convert one to the other. The gcc tuple vs kernel ARCH= vs uname -m vs AT_PLATFORM vs whatever llvm uses... Just no.)

AT_RANDOM is just weird: is this data already used for something else? (Such as the Age/Sex/Location segment relocation stuff, ASMR, whatever it's called. In which case using it AGAIN would leak it and give attackers a leg up.) And is this info harvested on demand (I.E. "lazy" randomness fetched when you make the call?) If not, launching processes is depleting the entropy pool...? Or is that blocking no longer done on modern kernels...)

Ooh, AT_SECURE is nice. Ah, no it isn't. It LOOKS like it indicates "we were called via suid/sgid" (doesn't say if ACTUALLY being root sets it), but it also says it could be set by Linux Security Modules rendering it pointless. Just check uid vs euid like I've been doing, I guess.

May 2, 2024

Well that didn't take long. Oliver noticed he'd been unsubscribed (and also noticed that I hadn't even blocked him from _resubscribing_, and thus immediately did), and emailed me a long screed titled "Dude." because I apparently overstepped in doing so...? I replied to this one, at length, trying to explain why he is NOT HELPING THE PROJECT. Wasted the entire morning and into the afternoon doing it, too, which was about average for engaging with Oliver's emails. (Which is why I'd stopped.)

Other people have pointed out similarities to the xz situation in private email, but I don't think it's intentional: I think he's probably 13. Either way: he could have forked the project from day 1 if he just wanted there to BE an improved version by his own personal metrics, and if I did stop working on it the keys pretty much go to Elliott by default. There is no situation where Oliver comes out of nowhere to maintain or co-maintain this codebase in-situ, even if the XZ thing hadn't happened, and I'm pretty sure state actors could A) figure that out, B) do a better job with the negging. (I already lived through far worse than Oliver could ever manage, and if you look at the couple days after that, it's literally where toybox came from...)

Oliver replied to my reply (and said he'd been composing his reply since he "got home from school": called it), and there was some actual self reflection. Hopefully this is a learning experience for him and he becomes a better participant in community development efforts, and is helpful to other projects someday.

May 1, 2024

I thought Oliver had been quiet for a few days and MAYBE had finally quiesced, and perhaps I should go through the giant heap of Oliver's unrequited posts. (At least to find any actual bug reports. I don't WANT to, but it's like getting a vaccination. Short term pain, then feeling terrible afterwards, to avoid... something worse down the line. I guess.)

But then I checked the web archive and... no, he posted on the 29th. Judging by the title, a patch adding a -w option to "strings"? (Quick check: nope, busybox strings does not have a -w option. Huh, for that matter neither does "man strings".) So he has _not_ stopped/noticed, the to-do pile of work he's trying to create for me is ever-growing.

Alright, now I'm curious, what IS... Ah, it was a ruse, it's actually some sort of xz-style "the maintainer should step down and appoint ME instead, how dare he" rant. (Has he been doing this a lot? I haven't been reading his posts until he calmed down and took a break from posting any. Which never happened...)

You know, it's not fair to subject the other list posters to that. I'm unsubscribing him. He can email me directly, but the project's list is not his megaphone to broadcast abuse with. There's hands off, and then there's abandoning responsibility.

April 30, 2024

Fuzzy told me Austin's mayor is mandating all new cars in austin come with AI-powered emergency brakes so you get rear ended when leaves blow across the street, and I recoiled in horror. Partly because you can't half-ass self driving (either you're paying full attention or you're not paying attention, having the vehicle make SOME of the decisions for you seems like a recipe for disaster), and partly because this doesn't seem like an area where incrementalism gets you to the goal. "We're going to make everybody healthy by slowly adding small amounts of antibiotics to the water supply, and gradually ramp it up until sickness is eliminated. What do you mean everything everywhere is now antibiotic resistant?" Austin's move to eliminate parking minimums and actually install light rail seems far more useful here. NOT adding more demand-inducing lanes to I-35 would be good too, but that would mean turning down federal highway funds...

I "have no dog in this race" as they say: I've moved out of Austin, and I didn't replace my car in 2018. At first because I was out of town (working in milwaukee), then because I was thinking of spending a few years in japan while Fade finished her doctorate (I had a 5 year residency permit and everything), and then there was a pandemic. Plus there was a multi-year gap before then where I hadn't renewed my driver's license when it expired because I kept meaning to contest a ticket in whatever suburb of Houston that was, and just never bothered (carried around my passport as ID, finally dealt with it when Nick needed help moving). I didn't drive for years, and didn't really miss it. A car is convenient, but driving is stressful and expensive at the best of times. Even if the car itself is paid off, when you add up gas and insurance and repairs you can take quite a number of lÿft rides each month before breaking even. And that was before "catalytic converter theft" became a thing...

But I remember my experience with a shiny new hybrid loaner car while mine was in the shop, and its automated lane keeping actively fighting me in a construction zone on the I-35 frontage road where the lines on the street were overlapping and wrong. I figured out pretty quickly how to turn all the "driver assistance" features off because they seemed far more LIKELY to make me crash. Level 3 self-driving where you have to pay full attention to a vehicle you're not controlling sounds like torture.

It seems 90% likely an AI powered emergency brake mandate in Texas is so police can have a little clicker that stops any car that tries to drive away from them. No more chases ever, we can stop you at any time for any reason because it's not really your car. And of course Ford patented self-driving reposessions. Self-driving primarily makes sense as a cheap taxi service, not for individually owned vehicles. (It's mine but I'm not legally liable for its decisions?)

I'm all for progress: we automated away elevator operators and phone operators, the printing press elminated "scribe" as a job, "computer" was the name of a job before it was the name of a machine. They're having trouble making that leap here, but incrementalism seems more likely to trigger an allergic reaction than boil this particular frog. The new system is GOING to break, expensively.

*shrug* Oh well. Not my call...

April 29, 2024

It would be nice if android wouldn't vibrate then when it receives notifications WHILE TETHERED AND PROVIDING USB NETWORK. Something I posted on mastodon's getting retweeted and replied to, and it's disconnected itself three times already this morning. (I put it in "focus mode" to stop it.)

Is coreutils adding python as a hard build requirement? So Linux From Scratch will have to build python before it can build coreutils? That's gonna suck. (Especially combined with python's "your mouse has moved, you must reboot windows for this change to take effect" rapid aging problem requiring constant version updates of the one and only implementation of the runtime. Works best in internet explorer. Use only genuine microsoft excel.)

Oh NOW why is the github test thingy giving me a red X... macos linker doesn't understand --start-group. I tested it on llvm's linker in the NDK, but of course mac doesn't USE llvm's linker because it's creating mach-o binaries. (It COMPILES with clang, but doesn't LINK with it. right.)

That's a little awkward to wrap in scripts/, which is parsed before library probing but assigning x="stuff $VAR stuff" only expands $VAR when it's a string constant, I'm trying too think of a syntax where A="stuff $VAR stuff"; VAR="potato"; B="$A" winds up with potato in B, without without washing it through "eval" which is always fraught...

Ok, ssh into the mac system, run homebrew, build and run macos_defconfig and... yup, same error. And generated/ has a half-dozen libraries in it because it's detecting their existence whether or not the build NEEDS them. (I don't track that, I just throw everything at the toolchain and let --as-needed sort it out, because that works fine with both binutils and lld. Heck, it SHOULD work fine with tinycc linking it...)

If I blank LIBRARIES="" the mac build dies with iconv missing. And there's -lm and -lutil in the probe list. None of which have dangling references last I checked, but the problem is I don't KNOW which ones do, nor am I trying to track the dependency chains of external libraries changing over time. I'm happy saying "mac breaks on external libraries", but libm and libutil are essentially part of libc.

Alright, check if LDFLAGS has "-static" in it, and only add the wrapper if so. That way dynamic linking doesn't require it to be a NOP, meaning only dynamic linking works on macos but I think that was already the case anyway? (Checking uname for Darwin _definitely_ belongs in but the library probes happen after that's included and I'd have to conditionally define before/after variables and it's ugly no matter what I do...)

April 28, 2024

Finished the hwclock.c fixes to work around the glibc and musl breakage. (The trick was realizing that asm/unistd.h is what's getting called under the covers and if we _has_include it and #include it before any other header the existing header guards against double inclusion should make it just work.)

I should probably move some of that into lib/portability.c, but this is its only current user, and "musl breakage" has a history of being sprayed around the tree because Rich really puts EFFORT into breaking stuff to punish people writing software he doesn't approve of.

I need to figure out an automated way to test watch 'while true; do echo -n .; sleep .1; done' because it's easy to check manually, but painful to automatically. For one thing, there's no way to tell watch "run this twice then stop". I suppose I could -e and "exit 1" but the debian one goes "press any key to exit" when that happens, which is EXTRA useless. And of course if you run the above command on debian's watch it produces no output, just hangs there waiting. (Presumably if I left it running long enough it would eventually fill up some buffer, but I gave it a full minute and nothing happened.)

So yes, here's a command that is LITERALLY USELESS for scripting, it can ONLY be used interactively as far as I can tell... and it doesn't produce progressive output because stdio buffer. Bra fscking vo, procps maintainers. You bought into the gnu/stupid.

Honestly, we need nagle on stdio. They did it for net but won't do it for stdout and I dunno why. Make write() a vdso call marshalling data into a single page (4k) vdso ring buffer mapped write-only, which flushes when full or on a timer 1/10th of a second after the last write to it (tasklet driven by the kernel timer wheel). This avoids syscall overhead for the "small writes to stdout" common case without all this NONSENSE around manually flushing. Which the gnu loons have been arguing about on the coreutils list for weeks, inventing whole new APIs that read another magic environment variable to change default behavior, oh yeah that's not gonna have security implications somewhere. A denial of service attack because something in a pipeline never flushed and hung instead...

And yes, I'd special case PID 1 here. Unix pipelines are a thing. Put nagle on writes specifically to stdout, that way you don't need lots of 4k buffers to handle byte at a time writes to the kernel without syscall overhead.

April 27, 2024

Elliott replied to Oliver (with a "no, because..." on something to do with readelf) and now I feel guilty for leaving Elliott to clean up the mess. My lack of sufficient "no, because..." should not leave him having to do it.

On the one hand, if I read Oliver's Mt. Email accumulation and reply to them I will literally do nothing else on the project because he drains my energy and DOES NOT STOP. On the other, letting him run rampant and unsupervised... He is referring to toybox as "our code", and will be calling it "my code" (meaning his) soon enough.

I totally admit this is me failing as a maintainer. Someone comes in well-meaning and energetic and I am not making proper use of their enthusiasm. I should stop coding and become a full-time mentor of other people. I can't do both.

Bug reports are useful. I'm all for _suggestions_. But "right about the problem, wrong about the solution" still applies, and people who won't take "no" for an answer are a time sink. "That's not how I want to fix it" isn't final, people can argue against my point, but reiterating the exact same thing more emphatically without adding new information isn't it, and "you are a bad person for saying that" (shooting the messenger) is exhausting. Plus sprinkling in words like "defective" and "obviously" in your "don't ask questions post errors" posts... sigh.

Right now github has two related threads: in one somebody's arguing that they'd like a different aesthetic user interface to trigger something they can already do. Meanwhile, in another thread, static linking with the optional external libraries (zlib/libssl/libselinux and so on) had an order dependency that parallel probing broke, because dynamic linking automatically remembers symbols seen in previous libraries and static linking does not. Each of the 2 github threads has a "wrong fix". One wants me to add a static linking checkbox to kconfig (you can already LDFLAGS=-static but busybox had a _checkbox_ to add -static to LDFLAGS for you), the other wants me to maintain magic library order. And that's not how I want to solve either problem.

Let's start with the second one: yes I COULD create a software contraption to maintain the library order: turn the library list into a bash array, have each probe use/return an array index, and then output the enabled array indexes in array order. But that's ugly and brittle and complicated and not how I want to fix it. It can still break on library combinations I haven't personally tested, and it isn't immedately clear WHY it's doing that (because dynamic linking doesn't need it).

Instead I want to tell the linker to use --start-group, which is a flag to tell the linker to just do the right thing. It turns out the liker CAN do this already (they just don't because "performance", which again is a C++ problem not a C problem, and probably last came up in the 1990s but hasn't been re-evaluated, and again it's already how it works for dynamic linking because it WILL tell you at compile time (not runtime) about unresolved symbols that weren't mentioned in any previous dynamic library). But adding -Wl,--start-group to the default LDFLAGS in scripts/ makes some linker versions complain if there's no corresponding --end-group (and then do the right thing, but first they need to noisily announce their unhappiness, which is very gnu). Another reason I didn't check it in immediately is because I needed to test that it IS a NOP on dynamic linking, and specifically that it didn't break --gc-sections (in both gcc and llvm linkers), but my default build doesn't have any optional libraries in it, and at the moment neither "defconfig" nor "android_defconfig" build under the android NDK (the first because it assumes crypt() is available but I haven't finished and checked in the lib/ version yet, the second because the NDK hasn't got selinux.h but the shipped android build enables it because AOSP's toolchain still isn't quite the same as the NDK toolchain). So I needed to come up with test build configs/environments (and try it on mac and bsd with their silly --dead-strip thing), and make it add --end-group as appropriate.

But by NOT immediately checking it in, the submitter seemed to think I meant everyone doing LDFLAGS=-static should remember to also manuallly add -Wl,--start-group to their LDFLAGS, which would be a sharp edge no matter how I documented it: people who Do The Obvious Thing without needing to be told would still hit breakage because they didn't read the docs thoroughly before building, and then dismiss toybox as broken rather THAN read the docs. (I myself would definitely move on to something else if that was my early impression of the project.)

And the guy in the SECOND thread then posted to the FIRST thread advocating that the magic kconfig checkbox should add the magic extra "static link properly" flags. Which is STILL WRONG, it's just more deeply wrong.

The "just add a checkbox" solution to the first one is wrong because static linking is already fraught in numerous ways unrelated to this, in part because glibc is terrible. One result of these threads is "maybe I should collect the various faq.html mentions of static linking into a dedicated static linking faq entry". There's some in "how do I cross compile toybox" and some in "what architectures does toybox support" (in all three parts) and some in "What part of Linux/Android does toybox provide" and then there's MORE material about mkroot/packages/dynamic that's just in the blog and/or mailing list not the faq and none of that actually addresses link order. So a faq entry collecting together information about static linking (how to do it and why it's fraught) could be good.

Another todo item resulting from this is trying to make static linking LESS fraught, which a kconfig entry for static linking WOULD NOT FIX. I don't want to have multiple ways to do things: you can already LDFLAGS=--static and that's the obvious way to do it to a lot of people (and on a lot of other projects). Requiring people to add -Wl,--start-group to --static in LDFLAGS is a land mine, and having a kconfig entry that performs extra magic but leaves LDFLAGS people facing nonobvious breakage is NOT GOOD. I miss when "there should be one obvious way to do it" was python's motto (back before 3.0 broke everything).

I don't want to add a kconfig entry for static linking for several reasons. I'm not setting CROSS_COMPILER through there, or setting binary type (fdpic or static pie): the only reason there's a TOYBOX_FORCE_NOMMU option is it used to be called TOYBOX_MUSL_NOMMU_IS_BROKEN, in a proper toolchain you can autodetect this but Rich refuses to have a __MUSL__ symbol you can check for and ships a broken fork() that fails at runtime to defeat conventional compile time probes for mmu support.

The existing kconfig entries are all things the code needs to make a decision about but can't probe for. When you link in zlib or openssl it calls different functions which provide different behavior. And it's not the same as just having a library and headers installed on the host: we don't pull in random crap just because it's available. Should we use this or that implementation is a DECISION, I can probe for availability but not intent.

So adding a kconfig entry, and making it do increasingly magic things, would add ever-increasing amounts of magic but never make it reliable. For example, it's easy to have dynamic libraries but not static libraries installed, which came up in the NDK and is also a Fedora problem. I tried to get an selinux test environment setup, which means Fedora, but they don't install ANY static libraries by default (because that's where Ulrich Drepper railed in German against the unclean ways needing to be purged for many years before leaving to work for Goldman Sachs during some financial crisis or other), and the online instructions I found to "install static libraries on fedora" only installed static libc but not static versions of the other libraries from other packages. Which means you can have the headers but not the (right kind of) library, meaning even _has_include() doesn't help.

What I want is to make it "just work" for as many people as I can, while NOT getting in the way of existing experts who want to handle the difficult cases (or provide answers to people who ask them). The solution I came up with was to have scripts/ probe $LIBRARIES and then if it's not empty, LIBRARIES="-Wl,--start-group $LIBRARIES -Wl,--endgroup". So it's only added if it has something to do, and there's an end tag to stop glibc's silly warning spam. Yes it does it for dynamic linking, which is why I had to test it was a NOP, and was supported in all the build environments I want. (I first used this flag doing hexagon bringup in 2011 and it wasn't brand new then either.)

Unfortunately, Oliver piped up in the first thread before I got to fixing stuff and turned the situation into an outright flamewar. Somebody (not the original issue submitter, just a drive-by rando) got mad I tyranically wouldn't add the aesthetic checkbox despite the Will Of The People or some such, and Oliver managed to fan the flames, and I wound up actually looking up how to block somebody on github for the first time. (After just deleting something inflammatory I didn't want to reply to and getting a HOW DARE I indignant response that confirmed I never want to hear from that person again.) And no, it wasn't Oliver, but it may have been collateral damage from Oliver trying to act in an administrative capacity for the project. (Not dealing with Oliver is having side effects. I'm 99% sure he MEANS well, and he's trying very hard to contribute positively to the project, unlike the guy I blocked. But I never had to block anyone before Oliver acted as self-appointed moderator.)

I want to get things done. I want to clean UP messes and REMOVE unnecessary complexity. And I'm not always immediately sure how best to do that in any given situation, but it's not about voting or who is the loudest, it's about working out the right thing to do. Half the time it's a question of keeping up with the flood and finding time/energy to properly think it through. There's always more corner cases. I just made a note that lib/portability.h has a glibc-only prototype for crypt() that needs to go when the new crypt() replacement in lib/ gets finished. I'd like a mechanism to annotate and expire old workarounds that lets me run a scan as part of my release.txt checklist, but right now portability.h has #ifndef AT_FDCWD with the note Kernel commit 5590ff0d5528 2006 and that's old enough (18 years, 2.5 times the 7 year horizon) that I've probably looked at it before and kept it for some reason? But what is the reason and when can it go away? Do I need to test on mac and freebsd? The bash "wait -n" thing was centos having a 10 year horizon: has THAT expired yet? (And then MacOS needed it because last GPLv2 release of bash doesn't understand -n, so... no. It gets an exception to the 7 year rule.) Doing that by hand is tedious and error prone, I'd like some automated way to check.

But that is SO far down the todo list...

April 26, 2024

Ok, got compare_numsign() rewritten and now I'm trying to write new find tests (there weren't any for -link -size -inum or -*time let alone checking rounding and corner cases) and as always getting TEST_HOST to pass is the hard part. It turns out the debian one is crappier than I remembered: "-atime 1s" isn't recognized because the time suffixes are apparently something I added? (Which I guess is why they never had to wrestle with "-atime 1kh" multiplying the units.)

Another question is which find -filters implicity add "-type f" so "find dir -blah" doesn't include "dir" itself. I've noticed "-size" is one such, but -mtime is not.

April 25, 2024

Yay, at 9am a Dreamhost employee got in and put my website back up. Thats a relief. (It was sort of understandable... except for the part that not one file they've been concerned about so far has changed in the past 10 years. As in they did a deeper scan of the whole mess for other files that might retroactively justify their concern, and the list literally did not include a single file that hasn't been there IN THAT DIRECTORY, unchanged, since 2014 or earlier. How can they be INFECTED if they're UNCHANGED FOR A DECADE?)

Under the weather today. Minor sore throat's been building for a few days, probably got a thing. Trying to squint at the find compare_numsign() weirdness but I'm low on focus.

Good to know I'm not alone at being annoyed at the crunchyroll censorship and han shot first trend in modern society. Downside of digital media: if you don't own your own a copy, fascists just LOVE to quietly rewrite all the textbooks each year and claim they're unchanged no it was always like that you're remembering wrong. How can you know history if you can't preserve it? Outsourcing stuff to or a streaming service doesn't cut it, and the Comstock Act was never actually repealed, it just got overruled by various court judgements rendering it unenforceable... which the maga-packed supreme court is reinstating. (Yes maga-packed: six of the current members are in that category. Five were appointed by presidents who LOST the popular vote: Barret, Kavanaugh and Gorsuch by Trump, Alito and Roberts by Dubyah, and of course daddy Bush appointed Clarence "uncle" Thomas, whose confirmation where Anita Hill accused him of sexual harassment was chaired by Joe Biden, no really. Politically Bush Sr. had to pick a black person to replace Thurgood Marshall, so the guy behind the Willie Horton ads found a black man who hates black people, and that's before he and his wife's personal corruption in office.)

April 24, 2024

Oh bravo Dreamhost. Chef's kiss. They took my website down today. Calloo callay. Twardling dreamhost. (I used to have a button that said "The mome rath isn't born that can outgrabe me." But I am, currently, frumious at the whiffling tulgey manxome burblers.)

Yes, I know that malware authors have been using my old toolchains to build their malware since something like 2013, and yes gnu crap used to leak the host path the libraries were built at into the resulting binaries until the debian guys did their "reproducible build" work last decade and came up with patches to stop some of the stupid (and yes, I'd been yelling at people about this in public for years before... ahem). And some bug bounty people were very bad at googling back when google could still find stuff (I shipped a general purpose compiler, yes you can build bad stuff with it, I have no say in this), and now Dreamhost has identified THE ORIGINAL COMPILER SOURCE TARBALL as containing those same strings and thus CLEARLY INFECTED. (It GOES the other WAY. Causality... ARGH.)

So I need to explain to a human that they're putting Descartes before the horse here. Luckily Dreamhost _does_ have actual humans on staff (unlike gmail), there's just a bit of a turnaround time getting their attention. (They strive for nine fives of uptime, and mostly achieve it.)

Meanwhile, I've got work to do...

Implementing lsns requires some options, and -p behaves non-obviously because every process has every namespace, but namespaces "belong" to the first process that has it. So when I lsns -p my chromium task (with two local namespaces), it shows the first bash process as the owner of all but 2 of the namespaces. (So lsns -p 3457 shows 2 lines belonging to that and 5 lines belonging to pid 581.) Except when I ran this at txlf it reported pid 459 owning those namespaces, which has exited since. It's NOT claiming that PID 1 or similar owns this, because ls -l /proc/1/ns is permission denied. So it's attributing it to the first one it FINDS, which when run as a non-root user is somewhat potluck.

This seems easy to implement because "ls -1f /proc" shows PIDs in numerical order, so I don't need to do any special sorting. EXCEPT that pids wrap, so a lower numbered PID can be the parent of a higher numbered PID. What does the util-linux implementation of lsns do? Not a clue! What's the CORRECT behavior to implement here? Dunno.

I want to ask on the list if anybody really needs octal (since two people have complained about it), and just have atolx skip leading zeroes followed by a digit, but Oliver would reply five times and drown out any other conversation. (The mailing list is still up, including the archive. For once being a separate server I don't/can't administer was a net positive, at least in context.)

April 23, 2024

Darn it, got an email notification that Google is disabling pop/imap access to gmail in September (unless I want to login on blockchain). I need to migrate my email to Dreamhost before then...

Went through my inbox and laboriously restored the unread messages, although somewhere in double digits from Oliver I stopped marking his. He's been replying as the Representative Of The Project on github too, holding threads where he solemnly comes to a decision with the bug reporter, and then presumably sends me a patch. I haven't read those threads, just skimmed to see what the actual bug report is.

Oh hey, Oliver finally noticed that I haven't been reading his stuff for weeks. (I assume that's what the body of the message is, I've just seen the title.) I'm tempted to reply with that Neil Gaiman quote, but... do I want to reply at all?

If Oliver had noticed I wasn't replying and rolled to a stop, and then poked me after some silence, I would feel obligated to re-engage and shovel through the backlog. But he's never stopped. He's never paused. He's INCREASED his output, including speaking on behalf of the project on github. Oliver does not care that he's making work for me. He does not care that reading and replying to his messages takes time and energy on my part. Even when I'm mostly saying "no" to him, it still eats time and energy, and when he objects to the "no" and I have to give a more detailed explanation and then he KEEPS objecting to the "no" because he's sure he's smarter than me and I just didn't understand the point he was making...

I find the signal to noise ratio to be poor here. Being spammed with low-quality review that results in a string of "no, becuase... no, because... no, because..." does not help the project. Oliver is absorbing engineering time to educate himself at the EXPENSE of the project. He's not listening, he's telling. He's not asking questions about the years of old mailing list post or blog entries where we discussed stuff. He's seldom asking questions at all, he's making assertions. Questions are fine, if it's written up somewhere I can point him at it, and if it isn't then once I HAVE written it up maybe it should go in the FAQ or code.html or design.html or something. That way if I do a writeup the work contributes towards an end beyond just answering one person's questions. But Oliver seems to believe I owe him ENGAGEMENT, and that I am a bad person for not prioritizing him more, and I am SO TIRED.

And the longer I wait, the larger the accumulated pile of demands becomes because Oliver keeps talking to an empty room, piling up more and more posts he 100% expects me to shovel through, and any time I spend on that is time I'm not spending shoveling through my own backlog of todo items and other people's pokes. (Which at least have novelty and often shortest-job-first scheduling. Those MAY be a quick fix, or that person MAY just need unblocking rather than hand-holding and spoon feeding. Often I do get a patch and apply it. Sometimes it's "good question, wrong answer" and I can fix it or add it to the todo list.)

It's the difference between random interrupts and a screaming interrupt. One source constantly providing low-quality interrupts gets squelched. I really don't want to make it formal, but I am not scheduling a tasklet for this RIGHT NOW, and the longer the unanswered queue gets the more likely I am to just dump it. I'm losing faith that dealing with Oliver's backlog would help the project. I'm losing faith that I'm capable of helping Oliver mature into a developer that would help other projects in future. I expect he eventually will, but I personally do not have the social skills to expedite this process for a time/energy expenditure I have budget for. Yes, this is a failing on my part, I know. Failure acknowledged, I suck. Moving on to what I _can_ do...

April 22, 2024

Bit of a ping-pong day. Swap thrashing between various tasks, none of which are low hanging fruit collectable without a heavy lift. Keep rattling bars to see if any are loose...

I've done the start of a konfig.c to replace kconfig, but there's design questions kind of looming. I'm currently writing a standalone C program the build compiles with $HOSTCC and runs... Which means I'm reimplementing xzalloc() and strstart() and friends, which is a bit awkward. I mean I COULD have it pull in lib.c, but that smells like extending the scripts/prereq/ plumbing I recently did and that is intentionally as simple as I could figure out how to make it at the time. I'd kind of LIKE to do this in bash so you don't compile anything, but this much string processing in bash is awkward. (It's awkward in C too, but I'm used to it there.) And I kind of want to have this replace scripts/config2help.c while I'm there, which would be WAY more work to try to do in bash...

Since I recently fiddled with the record-commands plumbing, I ran my Linux From Scratch build "" script (from October) twice in a row under "taskset 1" to see what differences show up in two presumably identical single processor builds run consecutively in the same directory. (So I can start replacing commands in the $PATH and see if the output has any detectable differences: that's one of my big near-term consumers of record-commands output.) There are 3 build artifacts from that: log.txt with the record-commands output, out.txt with the |& tee output, and an "lfs" directory with the new chroot. If I move each of those to a save directory and run the build again in the original location, any absolute paths written out into the build are the same, so the only noise should be actual differences...

The diffstat of the captured stdout/stderr has 16 insertions/deletions, which is 4 different lines: for some reason the bash build does "ls -l bash" on the file it just built, which of course has a varying timestamp in it. There's 3 instances of "configure: autobuild timestamp... 20240422T005241Z", 2 instances of "Configuring NCURSES 6.4 ABI 6 (Sun Apr 21 19:53:21 CDT 2024)", and the rest are "-/path/to/lib/gcc/x86_64-x-linux/12.2.0/../../../../x86_64-x-linux/bin/ld: total time in link: 0.043413" with the amount of MICROSECONDS THE LINK TOOK varying between builds. (Because we needed to know!)

I can filter most of that through sed easily enough without worrying TOO much about false positives getting yanked: sed -E 's/(autobuild timestamp...|total time in link:) [0-9].*//;s/^-rwx.* bash$//'; but the "Configuring NCURSES" line is less obvious how best to trim. (I want to narrowly identify stuff to remove, not encode knowledge about stuff to _keep_, that way lies version skew.) Hmmm... I suppose if I match the parentheses at the end and just yank from those... s/^(Configuring NCURSES .* )[(].*[)]$/\1/ seems to work.

(x() { sed -E 's/(autobuild timestamp...|total time in link:) [0-9].*//;s/^-rwx.* bash$//;s/^(Configuring NCURSES .* )[(].*[)]$/\1/';};diff -u <(x<out.txt) <(x<out1.txt))

Of course I left off work on this LFS build script with pending design issues. One of them is the record-commands setup requires a toybox binary that's not part of the toybox muliplexer, which is a bit of a sharp edge about where best to get it from. The problem is logpath does argv[0] shenanigans that are incompatible with the toybox multiplexer's argv[0] shenanigans, and rather than special case the command in toybox_main() I made it only work as a standalone binary with a #warning if you compile it as a builtin. Both approaches suck, pick your poison...

The annoying part is I'd like record-commands to work both from a host build or within mkroot: the obvious way to do it in each context is very different, and I don't want to do both with if/else context detection. I just updated record-commands so you can ~/toybox/mkroot/record-commands blah blah from wherever you are and it should run the command line with the hijacked $PATH writing everything into log.txt in the current directory, and then clean itself up on the way out. But I haven't got the toybox source in mkroot, and don't want to add a dependency on that to the LFS build. Which means I'd need to build and install the "logwrap" binary into the $PATH and have the script "which logpath" and do its own setup. EXCEPT I can't trust that to be there on the host, and when it IS there maybe it's running under the first record-commands invocation and the path is already wrapped.

In theory I can just have mkroot/packages/lfs build logwrap for the target AND copy the mkroot/record-commands script from the toybox source into the new root filesystem, and run it myself to wrap the runner at the appropriate point. If logwrap is in the $PATH it won't rebuild it, but just do the setup, so can still be used as a wrapper. Except this build sets up a chroot environment and then runs a second script in the chroot, and if the contents of THAT are to be logged...

What I was in the process of writing when I left off on the LFS work last time was a logwrap_reset() function that can run inside the chroot to _update_ the log wrapper path when a command just installed new commands, and I want to put them at the start of the $PATH but record when they get run. That can assume (or detect) that we already have a wrapper set up, and just tweak the existing setup.

Proving that toybox provides enough of a command line to set up the chroot build is one thing. Proving that toybox provides enough of a command line to run the builds that happen WITHIN the chroot is a second thing. I can do them in stages, but it's hard to sit on my hands and not attack the second part during the first part. The goal is to eventually have something vaguely alpine-shaped where the base system is toybox but any other packages you need to build under that build fine, using toybox.

I should track down who the riscv guy was at txlf and ping him, but looking at buildroot the bios it built is an ELF file passed to QEMU via -bios, and I've done various elf build shenanigans for the "hello world" kernel stuff moving the link address around, and all I really care about in the FIRST pass is that it stop complaining about a conflict and try to actually run the vmlinux kernel I gave it. I refuse to pull in an external package dependency, but ${CROSS_COMPILE}cc -nostartfiles -nostdlib -Wl,-Ttext-segment=0xdeadbeef - <<<"void x(void){;}" -o notbios seems feasible?

Except since I never added the partial riscv config I'd worked out to (because it didn't _work_), I dunno where it is. I know I built a riscv vmlinux that didn't work, but am not immediately in a position to repeat it. (Other than "defconfig with that one extra symbol switched on", which takes FOREVER to build. Sigh, ok, find an electrical outlet...)

Ok, I did a "git pull" in buildroot and rebuilt the qemu_riscv32_virt_defconfig target, and readelf -a on the "fw_jump.elf" in that says the .text segment starts at 0x80000000. And when I yank that argument... it still boots. Huh.

Oh. Aha! It's not booting the vmlinux, it's booting the arch/riscv/boot/Image file. The build also creates an Image.gz file in the same directory, which doesn't boot under qemu, but the MIDDLE of the three files (vmlinux->Image->Image.gz) is the one that works with qemu. And doesn't complain about conflicting mapping ranges.

April 21, 2024

Right clicked on the "Inbox" folder and thunderbird popped up the menu and immediately dismissed it, apparently selecting "mark folder as read" with no undo uption. Thank you thunderbird. I had like 50 unread messages in there since the start of the month. (Admittedly half of them from Oliver.)

Android gave me the "79 files (your mp3 collection on this phone) should be deleted!" pop-up WHILE I was using the File app to play one of them. There is no "permanently fuck off" option, it will do it again over and over as long as I have this phone.

Ok, I need to add the "return" builtin to toysh, which means popping function contexts. I think I've done this analysis before, but it's been a while so let's re-do it: function contexts are created by call_function() which doesn't actually call a function, lemme rename that new_fcall(). It's called from run_subshell(), run_command(), sh_main(), eval_main(), and source_main().

The three main()s are relatively straightforward: sh_main() creates the initial function context and ->next being NULL means you can't return. The function context in eval_main() is there so I have a pipeline cursor (TT.ff->pl) that I can return to the calling code from, and to snapshot LINENO:

$ X=$'echo one $LINENO\necho two $LINENO\necho three $LINENO'; eval "$X"; echo here $LINENO
one 1
two 2
three 3
here 1

Sigh, in this old devuan bash -c 'echo $LINENO' is saying zero, but I think one of the conversations with Chet pointed that out to him and he changed it. I should wait until after the version upgrade to add tests, or maybe run tests in an LFS chroot? Hmmm...

Anyway, the transparent function context from eval should basically be ignored:

$ echo $(return)
bash: return: can only `return' from a function or sourced script

But there's a "stop context", preventing child processes from running parent commands. And return is looking PAST that sometimes:

$ x() { echo $(return); }; x

Sigh. I want to ask Chet why that DOESN'T error, but there's a significant chance that would introduce more version skew.

April 20, 2024

Trying to fix a bug report that the submitter closed once the issue was diagnosed and they could work around it. Nope, that's not the same as FIXING it, so I've added more comments that probably nobody will ever see in future because "closed issue". (Not a fan of Microsoft Github.) Two of those comments document my wrestling with alpine:

I tried to set up an alpine test environment (my last one was a chroot years ago), but it doesn't seem like they ship a livecd? Or at least the "extended" x86-64 image on their "downloads" page isn't one.

I downloaded their CD, kvm -m 2048 -cdrom blah.iso and got a login prompt instead of a desktop, the only account I could guess was "root", then I couldn't "git clone https://toybox" because it didn't have "git" installed. I googled and did an "apk add git" but it said it didn't know the package, "apk update" and "apk upgrade" didn't help...

This is not really a livecd.

I may have been a bit spoiled by knoppix and devuan's livecds, which set up a union mount reading the iso and writing changes into an overlaid tmpfs, with apt-get set up to install arbitrary additional packages. (Ok, you need to boot a recent enough livecd that not doing an "apt-get update/upgrade" that would fill up the tmpfs with noise doesn't complain that the package versions it's trying to find aren't available or compatible with the existing install, but that's just bog standard cloud rot trying to talk to servers that aren't local. I made puppy eyes at the devuan guys and they packaged up pool1.iso for me, with the whole repo on a big DVD image so VM bringing doesn't require talking to severs that may not be there anymore when regression testing against an older image, and sometimes I even bother to set that up and use it properly. I have the incantations written down somewhere...)

Anyway, the saga continued:

Used the setup program to install it to a virtual disk, booted that, logged in, installed git, logged in as the non-root user I'd created, cloned the repo, there was no make... and no sudo. And "apk add sudo" didn't work. Right... Ok, installed make, there was no gcc, installed that, and now it says ctype.h not found. I have to install an additional package to get standard posix headers supplied by musl, installing the compiler does not get me headers.

This is not the friendliest distro I've encountered. Also, what's the difference between the "extended" image and the "minimal" image?

Installed musl-dev. Installed bash. And now the build is complaining linux/rfkill.h isn't installed...

Which is the point where I gave up and just installed a local busybox airlock dir to stick at the start of the $PATH for testing. I don't actually care about alpine specifically (until someone complains specifically), the question here is do the busybox commands work here, and the answer was "no" but not a deep no. The airlock setup failed because -type a,b isn't implemented in busybox find (actually the wrapper directory setup failed, which is odd because it came AFTER the airlock setup...?) which failed back to the host $PATH which meant busybox commands were doing all sorts of things and going "I don't understand this option to this command!" But fixing the airlock to use the toybox commands made the build work, which, you know, is why it's there...

April 19, 2024

The problem with cleanup and promotion of stty is I dunno what half this crap DOES, and the stty man page doesn't really explain it either.

There's a bunch of legacy nonsense leftover from 1970's tty devices that connected a physical printer (with ink on paper) with keyboard via serial cable. (Back in the day special purpose video monitors were too expensive for mere mortals, and using mass produced televisions as displays had a half-dozen different problems: heavy, expensive, hot, NTSC resolution was poor, generating the input signal yourself had regulatory issues... Technology advanced to normalize video monitors in the 1980s but Unix was 15 years old by then.) This is why the Linux tty layer is a nightmare for maintainers. Or so I'm told...

Setting serial speed makes sense (for serial devices), although independent ispeed and ospeed was last relevant when Hayes/USR/Telebit and v32.bis modems were fighting it out in the market in 1992. (The proprietary encodings all lost, the Navy bought a zillion of one of them, USR I think, as they were end of lifed but nobody else cared. That was the "fast one direction, slow the other direction" encoding that didn't have echo cancellation so didn't care about satellite transmission delays, but these days the satellite transmissions start out digital. v32 sent basically the same data in both directions and cancelled out the echo of what it knew it had sent, which meant there was a maximum delay before the ring buffer cycled and it couldn't recognize the echo to cancel it, which never got exceeded in domestic calls but happened routing through satellites.)

Yesterday I poked at setting cols and rows without the xterm noticing the change. "min" sets minimum characters per -icannon read and I have no clue why you'd want to do that. "time" sets a read timeout but doesn't say what the UNITS are (seconds? Milliseconds?) and isn't that what poll/select are for anyway?

"Line discipline" is not documented: the number selects which tty driver Linux loads to handle a serial port, there's a list of numbers in bits/ioctl-types.h (0 is N_TTY) and the kernel has MODULE_ALIAS_LDISC() lines that tag drivers as handling a specific line discipline number, but of the 16 in the 6.8 kernel only 3 might matter (other than 0, which means NOT loading a driver): N_PPP, N_SLIP, and N_MOUSE. And you don't set any of those via stty.

The Linux Test Project makes me sad (and mostly tests kernel anyway). The posix conformance tests (which I've never seen and last I heard were very expensive) also make me sad. Coming up with the tests the code needs to pass is WELL over half the work of most commands. And other projects' test suites either don't test anything of interest, are full of tests I don't mind NOT passing, or I never bothered to work out how to get it to run on anything but its built-in command. (They never did a TEST_HOST that I could find.)

April 18, 2024

I haven't checked yesterday's stty fix in yet because... how do you test this? I don't have physical serial hardware currently set up, and the hardware I have at hand that could do that is currently set up to use it as serial consoles, which means changing them is kinda awkard (if something goes wrong I probably have to reboot the board to get it back). I mean I should set up ssh _and_ console in parallel, which also means setting up at the desk where all the boards are instead of "laptop out at coffee shop away from endlessly barking dog"...

I wondering if some sort of tty master/slave thing can let me regression test this? Or strace? (The problem with "stty write, stty read and display" is if it's the SAME stty so if it's got something wrong it's likely to get it bidirectionally wrong.) But I suppose in the short term I can use debian's stty to test that MY stty set the right stuff. Yes, I am changing the speed of ptys. (It records them!)

Another just WEIRD thing stty can do is set columns and rows for the current terminal, but xfce's "Terminal" program does NOT resize itself when you do this, so when you "stty cols 37 rows 15" bash then wordwraps VERY strangely until you grab the edge of the window and resize it (which resets the pty's cols and rows to the xterm's size). I tried "kill -SIGWINCH $PPID" but that didn't help. I thought I'd strace the "resize" command to see what that's doing, but:

$ resize 37 15
resize: Can't set window size under VT100 emulation
$ TERM=linux resize 37 15
resize: Can't set window size under VT100 emulation
$ reset -s 15 37

Oh wow, that made bash VERY unhappy. And "reset" doesn't fix it! Hmmmm. Weeeird... that will make the terminal _bigger_, but not smaller. Ooh, and the grab-and-resize is out of sync now! It thinks a window that is 20 rows tall (I counted) is 80x2 and won't let me shrink it vertically any farther. I should email the xfce guys about this... Ok, "stty rows 25 cols 80; resize -s 25 80" seems to have gotten the terminal back into something controllable. And I can shrink it to... 22x3. Which counting characters agrees with. Yay. And resizing that BACK up has remembered what the first half of the screen had, but bash has 8 lines of garbage at the bottom ala "landley@dlandley@dlandley@d..."

Does nobody else actually TEST CORNER CASES? Sigh...

So yeah, "man 4 console_codes" probably has some resize magic I could dig into (and toybox's reset.c may need a bigger hammer), but that doesn't help with stty.

April 17, 2024

Poking at stty, promoting which is the last thing left in an old todo file I'd like to delete and it's only 460 lines so presumably reasonably low-hanging fruit? The problem is, it's basically impossible to TEST in an automated fashion. (Or at least I haven't got a clue how, except for setting values and having it spit them back? For what that demonstrates?)

The lists of speeds is duplicated in the command, I've got it in lib/lib.c but... xsetspeed() just calls the ioctl(), it doesn't have a way to convert a baud rate to/from the magic BOTHER values the ioctl eats, which we need to display the values. Ok, break out the array into a static, add new to/from functions and make the existing function call the converter... Sigh, the conversion is evil magic, what's it doing... Ok, the magic extension bit for "we ran out of speeds, let's glue another 0-15 range on" is bit 13 (4096), and +1 because I skipped B0 in my table (why save zero in the table when you can't set the hardware to rate zero), and then BOTHER isn't actually a usable value (it's defined as a mask, but the first VALUE they made a macro for is 010001 for NO APPARENT REASON, they just wasted another entry), so there's two magic +1 in there depending where you are in the range, and then you have to subtract the first range when setting the second (except it's not -16, it's -14 because we skipped B0 and then we skipped BOTHER)...

And previously I rolled all that up into a test adding a constant, which I commented insufficiently, the commit comment did not explain, and looking at it I don't trust it. Great. Ok, cp toys/example/{skeleton,bang}.c and then edit bang.c to a test function with the size array and the #defined constant array (all the B50, B75, B110 and so on), and make sure that all the from conversion and two conversion produce what the constants SAY they should produce... No I am not checking bang.c in, I confirmed it but that really doesn't seem to be the kind of thing we need to regression test? (Unless the values are different on BSD and such, in which case... I'm not sure I CARE if it works there?)

You'd think this would just be "set an arbitrary speed" by feeding it an integer and having the driver work out what clock divisor to set the hardware to, but alas half the drivers out there don't do that because modems and physical tty devices didn't do that (they had standard speeds), and those were dominant users of serial ports forever. So there is some way to set an arbitrary one, but the last couple drivers I looked at ignored what you tried to set through that and only used the B38400 style values. And you can set it to 4 million bits/second through that, which is pretty much the limit of what serial hardware's going to do with a cable longer than a few inches anyway: if you need to go faster than half a megabyte per second, you might wanna twist the wires and have a packet protocol for error correction and retransmission. I mean yeah you can layer ppp etc in userspace, and people do... The point is 500 kilobytes/sec hasn't been limiting enough for people to put much effort into fixing it because if you push that hardware much further things get weird anyway because of how the cables and signaling work.

The fancier protocols like USB send complementary data across two wires twisted together with encoding that breaks up runs of zeroes and ones and makes sure there's roughly equal numbers of each to avoid radio interference weirdness, and they care about things like "pin capacitance" that just didn't come up much with slow serial data... In the USB turtle hat we just grabbed an off the shelf USB 2.0 PHY ("physical transciever") chip that sent/received the wire signals for us and gave us a 4 bit parallel data running at 50mhz, so we could send/receive a byte every 2 clocks at a rate our FPGA could run at. (Going that fast over milimeters of wire is a lot less fraught than going that fast over even a few inches of wire. Presumably signals work better in metric.) For the turtle's builtin USB ports we were talking USB 1.1 to a hub chip that downshifted for us, so it was an order of magnitude slower. You could still plug USB 2.0 into the other end of the hub (on the 4 exterior ports the board exposed to the outside world) and the hub chip would forward packets to the USB 1.1 "host" connection inside the board, and it presumably all worked because the USB protocol is a call-and-response thing where the "device" end mostly just replies to packets sent by the "host" end asking it for data. So it would go slow but work... if we'd ever made a bitstream that actually IMPLEMENTED a USB host controller. (The stuff for turtle board was the other end, USB gadget side. Which is simpler because it can advertise a single protocol and doesn't care what other devices are plugged in, while the host has to support lots of different protocols and track the state of all the attached devices.)

April 16, 2024

Sigh, I hadn't replied to Oliver since the 8th but I fell off the wagon. I knew better. (Ok, technically I replied to Jarno, but...)

And in reply, Oliver says I can just wait to read his replies so he can speak for the project to everybody on the mailing list I maintain, without me having to care what he says. Yup, that'll solve everything... Oh well, as long as I have his permission to ignore him (clearly something I needed to have). I wonder how long it'll take him to notice?

Rather than try to deal with magic "/usr/bin/env" path or making sure I "bash" everywhere instead of just running it, I want to merge scripts/ into scripts/ The reason it's separate is the config plumbing needs to call it: anything sourcing is going to try to import generated/ and generated/Config.probed. That might be another vote for bumping "rewrite kconfig" up the list, although a drop-in replacement for the old kernel kconfig would still have the same sequencing issue.

There are only 2 probed symbols left: TOYBOX_ON_ANDROID and TOYBOX_FORK. In theory both of them could just check #defines, the first __ANDROID__ and the second __FDPIC__. But configuration dependency resolution needs config symbols, the C only gets compiled (and can check #ifdefs) after the .config file is written out and processed. That's the real sequencing issue. Is there an easy design way to have a config symbol "depends on" a #define? The current upstream kernel kconfig is turing complete and can do all sorts of things (including rm -rf on your home directory), but I'm unaware of a specific existing syntax for this sort of check. I also dunno what's gotten migrated into busybox, buildroot, u-boot, or whatever other packages are using kconfig forks these days. "depends on #ifdef __FDPIC__" is easy to implement but "a subset" and "a fork" are different things from an "other people learning this stuff" standpoint. Forks diverge further over time, once I start ADDING stuff there's no obvious bright line way to say "no" (or regression test against another implementation)...

The other thing this sort of implies is "depends on #ifdef __SELINUX__" except that requires an #include before the test because the symbol is defined in a header rather than built in to the compiler. The android guys patched their compiler to say __ANDROID__ without #including any of the bionic headers. (I don't know WHY they did that, but it's what the NDK is doing and you work with the toolchain you have, not the one you'd like. The compiler also says __linux__ but that's the ELF ABI it's generating when it writes out a .o file.)

Hmmm, I do NOT want the plumbing automatically sucking in dependencies "because they're there", but dependencies that don't show up in the config when not available ALSO means they'd magically vanish when not available, which means the build DOESN'T break if you told it to build against zlib and zlib wasn't there in your build environment. The config symbol would instead silently switch itself off again because dependencies, and silently working with a slower fallback isn't what they ASKED FOR. Breaking at build time (the current behavior) seems like the right thing there. Hmmm...

Tricksy. It would be nice if the kernel, uclibc, busybox, buildroot, and u-boot had already gotten together and SOLVED this for me, but it doesn't look like they were even asking questions along these lines.

I suppose I can pipe the cc -dM output through sed to produce config symbols in one pass (even with some __has_include() nonsense at the start) which means I can do it CHEAPLY. Something like :|${CROSS_COMPILE}cc -dM -E -|sed -En ;s/^#define __(FDPIC|ANDROID)__ .*/CONFIG_\1\n\tbool\n\tdefault y/p' . That still needs to happen at config time instead of make time, but maybe it ONLY has to happen at config time? I think scripts/ doesn't read, it just reads .config. Still a question of WHERE to put "FDPIC" and "ANDROID" though, the LOGICAL place is in the top level file. There just isn't a syntax for it.

Alright, what did the kernel guys add for this. Documentation/kbuild/kconfig-language.rust says depends on $(cc-option,-fstack-protector) on line 538 (long after it's done explaining what "depends on" is, this is not documentation it's a wiki page of notes.) Which is not what I want, a #define and a command line --compiler-option are two different things. The other syntax it mentions is def_bool $(success,$(srctree)/scripts/ $(CC)) which is the outright turing complete "go ahead and run rm -rf ~ when pulling in an external module, why not" stuff that made me nope out when they added it in 2018. I mean make can already do that, but CONFIGURE doing it is new.

I want "preprocess this source snippet, then set this list of symbols based on output strings being found or not being found in the result". I'm not spotting it in the existing kconfig kernel documentation. I can make a shell script that does it, but... I've GOT that already, and would like to avoid having to call it from 2 places so I don't have the freebsd guys bugging me about what shell to call it WITH just because they made a bad call years ago and are stuck with it now.

I can just take the call to scripts/ out of scripts/ and just have the Makefile call "bash scripts/", which would make the BSD guys happy. That also means yanking the "Config.probed changed" warning...

Ah, the other problem is that config2help parses, which means pulling in generated/ That's why needed to call it.

April 15, 2024

Called the tax lady and got through, confirming that she filed an extension. Yay.

So many messages from Oliver, speaking for the project to other people on github, dictating ultimate truth instead of making suggestions or asking questions. I am so tired. It's increasingly hard to edit my replies to be polite. (And of course every time I DO object, I'm being unreasonable because he IS the only arbiter of absolute truth in the universe...)

I should be an adult. I should not be bothered by this. It just... adds up.

April 14, 2024

Night on airport floor. Cold, loud, and the alarms keep going off. (Pretty sure the alarms are intentional to punish people doing what I'm doing. The cold is probably to bank up air conditioning so when the sun comes up and crowds arrive the climate control has a headstart, arbitraging cheap overnight electricity.)

Once again trying to charge my phone from the laptop, since that's the only thing I could plug into the wall. Did not get a full charge this time either.

It's weird to consider that you do not need to show a boarding pass to go through security theatre. They don't care whether you're getting on a plane, you can go through to meet people at the gate. What the TSA is even theoretically securing _against_ remains an open question.

Yesterday's "evolution of computers" rant reminded me of the theory that living cells evolved from zeolite deposits near undersea volcanic vents, a mineral which which naturally develops a bunch of roundish little empty niches on the surface in certain chemical environments, which then naturally develop an electric charge near active volcanic vents, and the wide range of energetic organic compounds constantly flow out of the vents even today often can form an organic film somewhere between soap scum and the inner cell membranes around various organelles inside the cell. This electric charge can then discharge itself to ratchet all sorts of other chemical reactions "upwind" against entropy, and today we call this a cell's "resting membrane potential" and the main job of molecules like ATP and NADH and so on is to recharge the membrane potential, which is the cell's actual chemcial synthesis worktable. The theory is this process developed interesting molecules that spread from indentation to indentation in some patch of zeolite, and then contaminated other patches of zeolite near other vents (in which case viruses may have predated freefloating cells), and one thing that made molecules more "interesting" (or at least more likely to reproduce and spread) was building/improving membranes to collect higher concentrations of interesting molecules (collect the components, maintain a better electrical charge across the membrane, catalyze reactions likely to turn compoments into more complicated molecules using the membrane charge), and after a long enough time some cells "better membrane" process didn't just extend them across holes faster (both to fix damage and to colonize new surfaces) but extended out protrusions that closed themselves off, turning the membrane into a free-floating sphere, inventing free-floating cells. And then those cells could bud off another one when they'd collected enough chemicals (so yeast budding predated full cell division)...

I miss studying biology.

Got home. Collapsed. The usual.

April 13, 2024

TXLF day two

Signing (docusign, there's no WAY that has any legal weight) the actual "put the house on the market when the realtor is ready" paperwork. She's listing it for only $125k less than the tax assessment (Fade negotiated well), so the amount various contractors have invoiced to take out of the sale price has increased the sale price... approximately one to one. Ok then. And it looks like the realtor is taking 6% and then any buyer's realtor would take 3% on TOP of that? So 9% commission total? Sigh, Fade read this closely, I leave it to her.

Our usual handyman Mike was very insistent that he could do a lot of the prep work cheap and get paid at closing, and "a lot" became EVERYTHING ALL OF IT GIVE ME THE WORK, and he underbid the other contractors and bit off waaaaay more than he could chew, and is now the one holding up the listing. (Or so the realtor told me on the phone yesterday, I haven't spoken to him since leaving Austin.) The realtor said she's going to change the locks and have her team finish the last of the work. Fine. Good luck. I'm still letting Fade handle all this because I have not recovered sufficient emotional resilience in this area to have coherent opinions. We are in the process of washing our hands of it, and just need to navigate the extrication.

Back to the Palmer Center for TXLF: Spent fifteen minutes in the talk room getting laptop hdmi displaying on the projector. Yay. (The trick was 1024x768 and using the mirror checkbox in the main xfce "display" widget, ignoring the destination selector pop-up because clicking on that does NOT mirror the displays.)

The riscv guy said he'd be in the dealer's room at 9am, but the dealer's room isn't open. I'd email him, but I do NOT remember his name. (I brought reading glasses this trip, so I have to tilt them and squint to read people's badges. My see stuff far away glasses are on the desk in my bedroom in minneapolis.) He already knew my name and I forgot to ask his: I almost certainly know who he is, he implied we've exchanged email before, the question is WHICH person. Email does not help attach a name to a face. I'm not sure how to check the schedule for people running booths in the dealer's room, and the signs only say which company it is, not who's running the booth... Eh, likely to bump into him later.

Sitting in a talk called "what I wish I'd known about containers", which so far I could have given except for the "terminology" part: a container "image" like the "RHEL Universal Basic Income Image", a container "engine" (podman, docker) so basically the launcher, a container "orchestrator" (kubernetes, swarm) which I think is doing cluster management at a level I have never personally had to care about. (I remember back in the beowulf days when there was a multi-ssh tool that connected to multiple systems and mirrored what you typed at all the sessions. We've come a ways since then, but not THAT far.)

He brought up an "unshare, cgroups, seccomp, selinux" slide near the start, and now he's explaining the unshare command. I'm curious if there's anything I should add to the unshare command I wrote for toybox. He's using all --longopts for his unshare --user --pid --map-root-user --mount-proc --fork bash example. (I got to ask a question: if --mount-proc used any special flags or anything to distinguish it from simply "mount -t proc /proc /proc" inside the container. He didn't know. Eh, I can strace it.)

His selinux explanation was just a slide saying "", and now he's brought up that page which is a plea and a link to somebody's video. Nope. (Debian hasn't got selinux even installed by default, it's one of the things I like about it.)

Sigh, and now it's all podman ephemera. I should go dig into "bocker", or the "implement containers in 100 lines of C" link, or the rubber-docker repository...

Ooh, he just ran an "lsns" command in passing that looks interesting. And "man unshare" has stuff about /proc/pid/thingies used to export shared namespaces or something? Ok, add those to the todo heap. I have learned something this talk! Time well spent.

He also mentioned that "runc" and "crun" are both container runtimes, in a "fun facts" sort of way. I note that "runtime" was not in his image/engine/orchestrator terminology slide. Is this the container's PID 1 that talks to the outside world through inherited pipes, maybe? I've seen _previous_ container plumbing talks, I just mostly haven't gone on a deep dive into here because too many plates spinning...

Good point about persistent vs ephemeral data. (I was aware of the topic but he highlighted it as a thing administrators spend brain on setting up containers for people.) For "persistent" he says bind mounts and "volumes" are the main options, but did not explain what volumes ARE. (So, like, qcow? I note that bocker assumes you have a btrfs mount and uses the weird magic snapshot stuff in that. The last time I heard anything described as a "volume" was IBM S360 DASD volumes from the 1990s, and since IBM peed all over KVM until it smelled like them it's no surprise to see the term show up here, but what do they MEAN by it in this context? Loopback or NBD mounted disk image, maybe? The raid management plumbing?)

I gave my mkroot talk! Hopefully, someday, there may be a video posted. Argued with the projector a bit _again_ but got there early enough to have time for it. Turns out you have to select "mirror" from the output type selection pop-up AND click the unrelated "mirror displays" checkbox. Can't blame the venue, that's XFCE user interface being... I can't say "disappointing" because my expectations weren't really violated here. Open source cannot do user interfaces, XFCE is _less_bad_ than most.

I got through about half the material I'd prepared, and of course not in the order I wrote down. My "simplest possible linux system" talk from 1927 2017 started with a rant about circular dependencies because that's the big problem here: everything needs something else _first_, both to run it and to explain it. So the urge to stop in the middle of an explanation and TANGENT into the thing you need to understand first is very strong, and I'm always weak to that. (ADHD! Weave the tangents into a basket!)

The fundamental problem with system bringup dependencies is the last evolutionary ancestor that could actually light a fire by rubbing sticks together went extinct. In the microcomputer world, the last piece of hardware that could boot up without using a program saved out by some other computer was the MITS Altair, which could toggle a boot program into memory using the front panel switches and buttons. (Select address, select value, press "write". Eventually you flip the "cpu is stopped" switch to "run" and let it go from the known address is resets to when power cycled.)

In the minicomputer world DEC's PDP minicomputers could boot from the tty serial peripheral devices (dunno if that was a small ROM or a physical circuit that held the processor in reset until it finished a read/write loop or what, it's probably in the PDP-8 FAQ or something). The ASR-33 teletype and similar (big clackety third party printer+keyboard I/O peripheral) included a paper tape reader/writer on the side, and not only were there mechanical punching keyboards that could punch paper tapes as you pressed the keys via basically clockwork (or presumably an ASR-33 could do it running standalone), but you could work out the bit patterns and punch appropriate holes in a blank tape by hand with a push pin if you really had to. This is how the PDP-7 bootstrapped unix for the first time, by loading a paper tape. Haven't got a bootloader program? Work one out by hand with the processor documentation and graph paper, punch a tape by hand, then turn the machine on and feed in the tape you made. You can't brick that without damaging the hardware.

But modern computers can only read code written by another computer program. Lots of programs take human input, but it's a program writing it out in machine-readable format. A blank computer with no program can't do ANYTHING without lighting its fire from another computer. The olympic torch relay from the sacred fire distribution point is mandatory, even matches are obsolete outside of the embedded space.

Saw Elizabeth Joseph's talk on mainframeness and s390x. (She was at least the third presenter in this room who couldn't get the HDMI to work in the first 5 minutes.) She says I should join the "linux distributions working group" and apply to IBM LinuxOne to get an s390 login, a bit like the mac login Zach van Rijn gave me. I mean there's no obvious reason I _couldn't_ cross-compile all the toolchains from s390x. Other than nobody else having done so and thus they're unlikely to work. (Let's make a cross compiler from s390x to superh! That's clearly going to be a well-tested codepath...)

Went to the dealer's room, the sci-five guy did not get qemu working last night. I gave him a card and he said he'd email me. Forgot to get _his_ contact info again, but presumably he'll email me?

Bus to the airport from palmer center is a direct shot, good to know. I had the second pipeline punch while giving my talk, but I still had the "rio" flavor monster can left over at the airport and of course security theatre wouldn't let it through. It's kind of nasty and I wound up pouring most of it out. Oh well, learning experience. (Never been to Rio, for all I know that's what the city tastes like. Not in the habit of licking architecture. Pipeline Punch is guava flavored, Khaos is tangerine, Ripper was pineapple, this was not a recognizeable fruit. Maybe it's Durian. I wonder what Durian tastes like?)

April 12, 2024

TXLF day one.

Walked to the Leander Light Tactical Rail station this morning: it's about 4 miles, which is about "there and back" to the UT geology building's picnic tables from my old house. Left well before the sun came up, so it wasn't too bad. Bought a "3 for $7" deal on Monster on the walk. Two pipeline punch and a new "rio" flavor, green can with a lady dressed as a butterfly on the can. Had one of the pipelines on the walk, and breakfast was about 1/3 of the tupperware container of strawberry lemon bars fuzzy gave me. Bit more sugar than I'd like, but hey: walking the calories off.

Rode the rail to the end (a downtown dropoff point) and walked to palmer center from there, across the 1st street bridge. All of this early enough that the sun wasn't doing much yet, and it was still reasonably cool, because many years ago I gave myself heatstroke by walking to an earlier Texas Linuxfest in 110 degree midday austin heat and rehydrating with the Rockstar "hydrating" tea flavored abomination: when the caffeine wore off I thought I was having a heart attack, had to lie prone for most of an hour, and I suspect that's what damaged my left eye. (Blind spot in that one's three times the size of the blind spot in my right eye. It's in the right "optic nerve plugs in here" place but should not be that big, and I first noticed it the next day.) I've been very careful NOT to push stuff like that again, and yes I was going "drinking the pipeline on the long walk is not the smartest thing" but I hydrated a LOT before heading out and the sun wasn't up yet, and there are (terrible) beverages at the venue. (And spoilers: I had lunch at a nearby burger place, with ISO standard diet coke.) I'm generally fine while walking, I can "walk it off" for a surprising number of issues. It's when I STOP that it catches up with me.

Of course traveling to the venue so early in the morning means the tax lady wouldn't have been there yet when I went past on the light rail, meaning I basically did not manage to make it to the tax office this trip. (It's a half-hour walk each way from the house and at least twice as far from Palmer Center, so without a car or bike "just drop by" is an hour time investment or a Lyft fee, and their voicemail message basically said they're not taking visitors right now, and yesterday I'd have gotten there around 5:30 so they might have left already anyway). I emailed her to request she file an extension. I should follow up on monday, but I'm not entirely sure how if they're not answering their phone and don't reply to my email...? (If I really have to, I can probably file my own extension. Or have another tax person do it. But... not today.)

Checked in to TXLF, got a bag with a t-shirt proclaiming a date from this decade. Yay! That's been a bit of a problem with my stash of t-shirts, I'm embarassed to wear something from a conference in 2014 because that's a decade ago now. Yeah I'm old, but I prefer not to broadcast it quite THAT much, and I think my last in-person conference was pre-pandemic? (The TXLF guys say this is their first in-person conference SINCE the pandemic, they went virtual for a while.)

Eliminating talks given by a CEO or about Kubernetes, the first thing I wanted to see was a 4:30pm talk about bash (which I eventually walked out of after 15 minutes into a 1 hour talk, because the guy was still going on about how to clone his github to set up his testing framework and had yet to actually say anything about bash except how to check the version number). Hung out in the dealer's room a lot before then. 2/3 of the booths are pointy hair nonsense too, but there's still more interesting people running booths than giving talks.

Bothered the python booth people to see if maybe there's a ph7/mruby variant for python? Which seems unlikely due to the 3.7 expiration being quite so rigidly policed: not only can There Be Only One Implementation, but there can be only one active VERSION of that implementation. Three different forks of python are _going_ to vary more than python 3.6 vs 3.7, if it's THAT much of a problem for them people using slightly old versions, this is way too brittle to have a compatible ecosystem. Add in the general tendency for embedded versions NOT to stay cutting edge all the time and constantly replace themselves... The embedded world is still installing 2.6 kernels half the time: we're BIG into "stable", and when we do implement new stuff we try to 80/20 a compatible subset cutting as many corners as we can get away with. Python's Progeria Policing would be quite a headwind for an embedded version.

Anyway, the python guys suggested two projects, micropython and circuit python, which turns out to be a fork of micropython. Google for "tiny python" also finds "tinypy", "tiny python", and "snek". And has a wiki with links to a bunch of implementations: python written in python, lithp, php, one wrtten in haskell... The google summary for the link shows "rustpython", which I haven't scrolled down to yet but I'm pretty sure that's not in the first half of the page. (Google seems to have a bit of a bias here. Then again maybe that's the most recent change to the page, I dunno how much of the previous stuff here dates back to Python 2.0 before they started aggressively purging their ranks. Logically... probably most of it.)

Anyway, I'm interested in maybe adding ph7 and mruby and whatever the python equivalent is to mkroot as packages. You want this language on the target? Sure, here's how to build it. (Although for me rust goes in the "riscv" bucket: wake me in 5 years if it's still a thing, after I've done enough others that "yeah, if I'm adding or1k I suppose riscv isn't _less_ important"...)

Speaking of, I bothered the guy at the Sci Five booth about my inability to get qemu-system-riscv to boot a vmlinux built from vanilla source without external dependency packages, which is the hack buildroot used. This architecture still has NO BOARD DEFCONFIGS, just the "use the default y/n for each symbol and thus build hundreds of modules" defconfig. He identified what buildroot was using that firmware for: riscv needs some sort of hypervisor layer so the kernel can call into a vendor-supplied equivalent of Intel's system management mode and run code behind your back, or something? (Perhaps it's more like Sony's playstation Hardware Abstraction Layer they did their PS3 Linux port on top of? Because that ended well.) The point is, there IS a "CONFIG_RISCV_SBI_V01" symbol in the vanilla kernel I can enable to build one into the vmlinux, and the help text for that symbol says "This will be deprecated in future once legacy M-mode software are no longer in use". So his workaround is something they've promised to remove. How nice. And then of course when I did build that, I was back to the "qemu wants to map a rom in the same place as the vmlinux so refuses to load" problem, which I showed him and he went "huh" and promised to take a look when he had time.

Staying at my house tonight turned out to be fraught: I pinged the realtor to be sure that A) it's not currently on the market (it is not), B) no contractors are doing work on it tonight (they're not), but rather than answer my text she voice called me and wouldn't get off the phone for 20 minutes trying to find me a hotel. (I didn't ask her to do this, that's not what I wanted, it's still my house, stop it. Her REASONS for saying I couldn't stay at my own house back when my talk was approved DO NOT APPLY yet. She has not come up with a DIFFERENT reason, she's just squicked by me being in HER house.)

Once it became clear I wasn't taking no for an answer without some sort of actual REASON, me spending the night in what's still technically my house then became HER PROJECT where she had to drop off an air mattress and towels and so on, and... I didn't ask for that? I couldn't STOP her (I tried), and then she texted me another FOUR TIMES about it at various points during the day until I blew up at her. Look: I just dowanna check an unknown hotel for bedbugs, potentially oversleep, and then work out transit from wherever it is to the venue in the morning. The #10 bus picks up from "within sight of the house's driveway" and drops off within sight of palmer center. This is a known place that I technically still own and is not being used. It's a stretch of floor, behind a lockable door in a climate controlled space, with a shower and electrical outlets for charging stuff (which we're still paying the monthly electric bills for). I have spent the night in worse. It's NOT A BIG DEAL, and I am not HER GUEST. My flight out on sunday takes off at 5:30 am so I'm planning on spending tomorrow night on the floor of the airport (otherwise I'd have to leave at 3 am anyway), which I have also done rather a large number of times before (usually without warning), which has neither a lock nor a shower. I don't plan to leave trash in the house or anything, and I intend to be out before sunrise. It shouldn't have to spend more than half an hour trying to GET PERMISSION to do this.

This is my relationship with the realtor in a nutshell: what I want to do, and what I consider obvious to do, is completely irrelevant to her. It simply does not fit into her head. She will force me to do everything exactly her way unless I make a scene, and then it's a big production that's my fault when all I wanted was for her to just not. Can we NOT replace the (completely undamaged) floors? No, that was not an option. And now the floors have been replaced wrong, the new not hugely waterproof flooring in both bathrooms up to the edge of the shower (because Mike apparently stopped listening to her at some point too). Apparently I should feel guilty about "the thing I said we shouldn't do at all" being done wrong over my objections, because we didn't use HER contractor to do it.

Sigh. I have a finite capacity for politeness processing, which I've been sadly overbudget on the past couple months. I can smile and stay silent on most things, or walk away and let them get on with it without me, but diplomatic negotiating to "let the other person have my way" is something I've been handing off to Fade where possible. I am so tired.

Dinner at the HEB. I bought all their remaining cans of checkerboard tea, so I have something other than energy drinks to drink at the conference tomorrow.

I should have brought a USB outlet charger. I thought I had one in the backpack, but apparently not. My phone is "charging slowly" from my laptop, which has to stay on for it to do so. It has not been at 100% this entire trip, but has brought up its "dying in an hour" warning more than once. Overnight last night at Stu's place got it to 85%. (It's also possible I'm just not getting enough sleep...)

April 11, 2024

Flying to Texas LinuxFest today.

Called the tax lady, but voicemail says they're full and not listening to voicemail. Huh. I knew I can't get an appointment now, but I need to file an extension (which in previous years took them like 30 seconds), and would like to hand them a pile of paperwork while I'm in town to stick in a folder until they DO have time to look at it. (Taking pictures with my phone to email to them violates my "the phone does not handle money" rule, which covers giving it my identity theft number as well. Kinda all over the tax info...)

Airport, airplane to Austin (no luggage, I can fit a couple changes of clothes in my backpack), bus to the house (because that's the busses I know, didn't fish the key out of the lock box but peeked in through the windows and grabbed the mail; all spam, forwarding should have kicked in by now).

Alas, by the time I arrived at the house the half hour walk to the tax place would have put me there well after 5pm. Showing up unannounced after hours while they're slammed seems impolite, maybe I can do this tomorrow morning. Instead I had dinner at the HEB (where I bought several cans of checkerboard tea; they're fully stocked because I haven't been buying it).

Then I took the light tactical rail to visit Fuzzy and Stu, and Fade got me a lyft from Leander station to Stu's house. Fuzzy is stressed. Peejee has lost weight. Stu was mostly asleep.

April 10, 2024

Speaking of languages with multiple implementations (I.E. _real_ programming languages), there's an embedded "mruby" implementation of Ruby, and I got asked if that works with mkroot. (Or at least I'm chosing to interpret the question that way, there was some confusion.)

The mruby downloads page provides a microsoft github link to dynamically generate a git shapshot of a tag from a project. Meaning the release archives go away when microsoft github does. A hard dependency on a microsoft cloud service is... "not ideal". But I guess it's not THAT much worse than sourceforge links persisting in 2024? (Except when sourceforge went evil in 2016 it changed hands again and the new owners have worked to rebuild trust. So there isn't the same "inevitable decline" aura around it the big boys have squeezing blood from every stone...)

You can adjust the .zip extension to .tar.gz to get a known archive format (the github URL parser microsoft inherited is flexible that way), but the archive name is still just "3.3.0.extension" with no project name on the front of it, and my download function in mkroot/packages/plumbing doesn't know what to do with that. (Maybe I need a fourth argument? Hmmm...)

The next problem is that ruby has its own magic build utility, "rake", which is IMPLEMENTED IN RUBY. So, circular dependency there. (And yet another build tool to add to the pile of wannabe make replacements.)

I tried running the rake build under record-commands and see if maybe I could create a canned build script, but there's a lot of varying -D define arguments, and a large section where it's building a bunch of small C programs and then running them to produce output it then assembles. (Some sort of self-bootstrapping JIT code maybe?) And creating a rake replacement in C: the build dependencies are written in Ruby. The language needs itself installed to build itself. There does not appear to be a "microperl" build option here that can create a tiny portable mruby just big enough to run "rake". Hmmm...

April 9, 2024

Python 3.7 came out in 2018 and had a dot-release in 2023, but QEMU stopped building with it a year ago because it's "too old" (not "there was a bug because it used a new feature", but it had an EXPLICIT VERSION CHECK and REFUSED). The kernel b4 utility just broke the same way and it's apparently explicit policy, amongst all USERS of python. Projects like ph7 or tinycc can implement fairly stale forks of the language and still get widely used, but python POLICES and SANITIZES its userbase. You Are Not Welcome Here with that old stuff. (That's still in a debian LTS release that's still supported.) They go out of their way to break it, over and over.

Python's progeria would drive me away from the langauge even if the transition from 2.0 hadn't pretty much done it for me. "How dare you continue to run existing code! For shame!" Seriously, they BURNED OUT GUIDO. When your "benevolent dictator for life" steps down because the flamewars got too bad, something is wrong with the community.

Meanwhile, I only moved toybox from C99 to C11 in 2022. Partly because I'd already broken it without regression testing and didn't want to clean up the (struct blah){x=y;} inline constants I'd started using (which turned out to be a C11 feature), partly because C11 offered a convenient bug workaround for LLVM, and partly because I'd been envying the _has_include() feature for a while so there was an actual obvious benefit to moving (turning configure probes into #ifdefs in the code, simplifying the build plumbing).

If I had an 8 year old car that stopped being able to fill up at current gas pumps or drive on current roads, and had to move to a lease model going foward because ownership is no longer allowed, I would object. But the Python guys seem to have no problem with this. "Subscribe or die." You own nothing, you must rent.

April 8, 2024

I should just stop replying to Oliver, which eats all my energy and accomplishes nothing. I'm trying to get a release out, and have instead wasted multiple entire work sessions replying to Oliver.

One of the harder parts of cutting toybox releases is remembering a Hitchhiker's Guide quote I haven't already used. I wanted to go with "For a moment, nothing happened. Then, after a second or so, nothing continued to happen" since it's been WAY TOO LONG since the last release, but it turns out I already used that one in 2012. The "Eddies in the space time continuum, and this is his sofa is it?" line got used last year. I wanted a little more context to the "spending a year dead for tax purposes" line but google is unhelpful and I put my actualy physical copies of the books in boxes and then a storage cube last month. (I tend to go with the book phrasing rather than the BBC miniseries phrasing, especially since half the clever lines are only in the book description and weren't actually dialogue or narration.)

After 1.0 I might switch over to Terry Pratchett quotes. Who knows. Insert disclaimer about forward looking statements and so on.

I outright PANICED when I checked my email and saw a $10k invoice from some random stranger against the middleman, but it wasn't _approved_. (Anybody with an account can submit an invoice.) I logged in and rejected it, then submitted my own invoice for Q1 (which I was waiting until after I got a release out to do, because last year the middleman made a stink about invoicing for work I hadn't done yet; they put _conditions_ on passing along the Google money). Then their website went "something is wrong" at the end of the submission process, and gave a full screen error when I went back to the main page.

And I'm going "oh yeah, I had to borrow Fade's macbook to approve my invoice last quarter" (it _submitted_ fine, but then the site went nuts), because even though debian applies security fixes to this ancient chromium build (where "ancient" = 2020), the VERSION it claims to be is old and various websites reject it. Plus devuan balderdash is probably actually end of life now? No, it says it's still maintained as "oldoldstable", and I fetched security updates last night and there was one. Possibly through June?

I should update after Texas Linuxfest anyway. (And buy a new hard drive at the best buy there, I dunno where to go to get those in person here in Minneapolis and I'm always reluctant to order stuff like that online. I like to _see_ it before buying. Yes I bought stuff through Computer Shopper back in high school, and bought the Orange Pi 3b boards online and had them mailed to me, but for storage specifically there's way too much chinese fake stuff online these days. Amazon is completely useless.)

April 7, 2024

I held my nose and honestly tried to get a riscv qemu target booting, but arch/riscv/configs/defconfig is gigantic (it's not a config, it's the "default y/n" entries from Kconfig, and the result has little to do with the architecture and is full of =m modules), but arch/riscv/configs doesn't offer a lot of obvious alternatives, nor does make ARCH=riscv help. My next guess, make CROSS_COMPILE=riscv32-linux-musl- ARCH=riscv nommu_virt_defconfig which at least claims to be for qemu's "virt" board produces a kernel that qemu-system-riscv32 -M virt -nographic -kernel vmlinux complains has "overlapping ROM regions", because "mrom.reset" lives at 0x1000-0x1028 and the kernel is trying to load itself at address zero.

Buildroot's qemu_riscv32_virt_defconfig is building firmware blobs from a separate source package and feeding -bios fw_jump.elf to qemu's command line. I do NOT want external dependency packages, that's why I have an x86-64 patch to remove the ELF library dependency (and allow it to use the frame pointer unwinder every other architecture can use).

So qemu has a -kernel loader for riscv, but it doesn't work. A brand new architecture needs a spyware blob running "system management mode" over the kernel. Bra fscking vo.

I tried the defconfig build with all the modules just to be sure (that has EIGHT console drivers enabled: vt, hw, serial_{8250,sh_sci,sifive), virtio, dummy, and framebuffer: no idea what the qemu board's default is for -nographic, and don't ask me what device console= should be set to for any of those), but it had the same ROM/kernel conflict. And the problem isn't qemu board selection either: every -M board type had the same conflict except "none", which instead complains it doesn't support "-kernel".

Eh, revisit this after upgrading devuan, since I can't build current qemu with python 3.7. That's unlikely to fix it, but if I'm building current I can ask questions on the qemu mailing list...

April 6, 2024

I spend SO MUCH TIME writing and rewriting responses to Oliver's messages. Here's my first reply to "utf8towc(), stop being defective on null bytes" (yes, that's his title) which I did NOT send, but instead copied here and then wasted hours trying to make it sound "professional" instead of honest.

On 4/6/24 17:48, Oliver Webb via Toybox wrote:
> Heya, looking more at the utf8 code in toybox. The first thing I spotted
> is that utf8towc() and wctoutf8() are both in lib.c instead of utf8.c,
> why haven't they been moved yet, is it easier to track code that way?

Love the accusatory tone. "Yet." Why haven't I moved xstrtol() from lib.c to xwrap.c "yet".

> Also, the documentation (header comment) should probably mention that
> they store stuff as unicode codepoints, I spent a while scratching my
> head at the fact wide characters are 4 byte int's when the maximum
> utf8 single character length is 6 bytes.
> Another thing I noticed is that if you pass a null byte into utf8towc(),
> it will assign, but will not "return bytes read" like it's supposed to,
> instead it will return 0 when it reads 1 byte.

And strlen() doesn't include the null terminator in the length "like it's supposed to". That can't possibly be intentional...

> Suppose you have a function that turns a character string into a array
> of "wide characters", this is easily done by a while loop keeping a
> index for the old character string and the new wide character string.
> So you should just be able to "while (ai < len) ai += utf8towc(...",
> the problem?

Again with the "should". No point checking what existing commands using these functions do:

$ grep -l utf8towc toys/*/*.c | grep -v pending | wc -l


I'm aware of "don't ask questions, post errors" but being polite in response to Oliver is EXHAUSTING. And takes a ZILLION rewrites to scrub the sarcasm from, and even then my reply is not all smiles, but at least provided a lot of patient explanation.

April 3, 2024

Tried to run scripts/prereq/ on mac without first running "homebrew" and it spat SO many warnings and errors. The warnings I don't care about: they deprecated vfork() and syscall() and so on but they're still there, why would anybody EVER think adding an integer to a string constant would append to the string that's a strange thing to warn about in C which still is not C++, and shut up about "illegal character encoding in string literal" because it's NOT a unicode character...

But the part I don't understand is "toys/other/readlink.c:67:7: error: no member named 'realpath' in 'union global_union'" when grep realpath scripts/prereq/generated/globals.h finds it just fine. It's there! If you couldn't read the headers out of that directory we wouldn't have gotten that far. There are no #ifdefs in that file. You know what global_union _is_, so why isn't mac's /usr/bin/cc finding the member? This is clang:

$ /usr/bin/cc --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

I've built this code with clang. Is there some flag I need to hit it with to tell it to stop being weird?

Huh. Hang on, it's also complaining that wc can't find FLAG_L which means it's reading old headers from somewhere. Lemme try a "make clean" and then...

$ grep -i error out2.txt
toys/other/taskset.c:52:17: error: use of undeclared identifier '__NR_sched_getaffinity'
toys/other/taskset.c:81:15: error: use of undeclared identifier '__NR_sched_setaffinity'
toys/other/taskset.c:119:29: error: use of undeclared identifier '__NR_sched_getaffinity'
3 warnings and 3 errors generated.

Ok, that's a lot more reasonable. (This compiler is searching the current directory before -I even though I had to -I . over on Linux or it WOULDN'T search the current directory for things under a path. PICK A SEMANTIC.)

Next problem: it wants nproc, which uses taskset. Splitting it out into its own function won't help because it's the same sched_getaffinity() plumbing being called to populate the cpu usage mask and then count enabled processors. I dunno the "right" way to do that on a mac or BSD, I should ask somebody...

April 2, 2024

Ok, I went through the commits up to now and made primordial release notes from them (which, like my blog, require a lot of rephrasing and gluing bits together and and HTML massaging to be publishable).

Doing that meant writing down a lot of TODO items that a commit left unfinished, four of which I have already decided NOT to hold up this release for (finish leftover backslash newline shell stuff, promote csplit, redo tsort for new algorithm, file parsing JPEG EXIF data doesn't refill buffer) and five of which seem kind of important: (test/fix passwd rewrite and re-promote passwd.c to toys/lsb/passwd.c, finish fixing up hwclock.c (glibc and musl broke it in DIFFERENT WAYS), the new mkroot init rewritten not to use oneit wants "trap SIGCHLD" but toysh hasn't got a trap builtin yet, and also hasn't got "return", and I need to post the kernel patches I'm using to build 6.8 to the linux-rectocranial-inversion mailing list so they can be sneered at and then ignored again.

Possibly I should just punt on those fixes and try to get a follow-up release out soonish.

April 1, 2024

I am not putting out a release on April 1, so I have a little more time to poke at stuff.

Updating the roadmap, which has a "packages" section. In theory mapping commands to packages is basically (declare -A ray; for i in $(toybox); do which $i >/dev/null && ray[$(dpkg-query -S $(readlink -f $(which $i)) | toybox cut -DF 1)]+=" $i" || ray["none:"]+=" $i"; done; for i in ${!ray[@]}; do echo $i ${ray[$i]}; done;) In practice, that dumps a lot in "none" because the relevant package isn't installed on my laptop. (Although a lot less if you remember to add /sbin and /usr/sbin into the $PATH. Debian is insane, it thinks calling "ifconfig" to see the machine's IP address is something non-root should never do. Everyone else puts those directories in normal users' $PATH for a REASON.) Debian also breaks packages up at fairly stupid granularity: things like eject, passwd, pwgen, and login are each in their own package. Other things are in WEIRD places: cal is in bsdmainutils (which is NOT the same as bsdutils containing the two commands "logger" and "nice"), "which" is in debianutils, mkpasswd is in whois (what?), crc32 is in libarchive-zip-perl (really?)...

I'm not entirely convinced this is a useful exercise. The list I did before was mostly based on Linux From Scratch, which says what commands are installed by each source package it builds. I checked each package list and grouped stuff by hand, which was a lot of work. Updating that list based on an automated trawl of debian source control is EASY, but not necessarily USEFUL, because debian's package repository seems like a lower quality data source and I can't figure out how to query packages I don't currently have installed.

At the end of my local copy of the roadmap is a TODO section I've been meaning to check in, and one of the things on it is:

Ship a minimal generated/ with snapshot generated/ directory that builds _just_ the commands used by the toybox build, with no optional library dependencies, so minimal host compiler can build toybox prerequisites instead of requiring "gsed" and "gmake": mkroot/record-commands make clean defconfig toybox && toybox cut -DF 1 log.txt | sort -u | xargs

I want to build the toybox prerequisites without any optional libraries, so you can run a "scripts/" with a simple enough syntax even the shell built into u-boot can handle it (just substitute in $VARIABLES and run commands, no flow control or redirection or anything) and have it compile a toybox binary that provides what toybox needs out of the $PATH. Maybe even letting you build on mac without homebrew, and making native bootstrap on qnx and similar feasible-ish.

Hmmm... I've already got plumbing to collect the actual commands used by the build: mkroot/record-commands make clean defconfig toybox populates log.txt, and a config with JUST those symbols enabled would be... (should I throw in --help support while we're there?)

for i in toybox toybox_help toybox_help_dashdash $(toybox cut -DF 1 log.txt | sort -u | xargs); do grep -qi CONFIG_$i'[= ]' .config && echo CONFIG_$i=y; done | tr '[:lower:]' '[:upper:]'

Ok, grind grind grind... hmmm, I want to simplify the shipped headers and if I DON'T have --help I can basically stub out generated/help.h so let's NOT add --help support here. Need a script to regenerate all this automatically, of course...

Sigh, I put a test for MacOS in the simplified so I could feed the linker two different kinds of garbage collection (because -dead-strip on Linux's ld is interpreted as -d -e ad-strip replacing the entry point with the nonexistent symbol "ad-strip", which then gets replaced with a default value that results in a segfault when you try to run it, so I can't just feed both to the build all the time). Except with the Defective Annoying SHell, my test dash -c '[ "$(uname)" == Darwin ] && echo hello' says "[: Linux: unexpected operator" which makes me tired.

I figured it out (posix is = not == and dash goes out of its way to break on any non-posix syntax) but I wound up just blanking LINK="" anyway because it's simpler: the binary still builds and runs, there's no unreachable symbols getting pulled in without the dead code elimination here, it's just a bigger binary and I don't really care in this context. Smaller and simpler script wins out.

Got it working enough to check in.

March 31, 2024

Hammering away to get a toybox release out today, because I don't want to do an April 1 release but I _do_ want one in Q1 2024.

Sigh, didn't manage it. Got a lot done, but the tunnel didn't go all the way through. Got the mkroot kernel configs converted to use be2csv with curly bracket grouping, added a microblaze kernel config, documentation updates...

March 30, 2024

The xz exploit is all over mastodon. Hopefully my paranoia about not wanting to run Orange Pi's default image seems slightly less silly now.

Seeing so many rust worshippers going "rust wouldn't have stopped this, but it proves we need to switch everything to rust ANYWAY". They're attacking C for a MULTI-YEAR SOCIAL ENGINEERING EXPLOIT, which was found precisely because people were intimately familiar with what to expect from the C ecosystem, including 30 years of ELF linking in Linux and "objdump -d". If somebody did this exploit to rust stuff, nobody would ever find it. (Google for "objdump disassembly on rust output" vs "objdump disassembly on C output". One has tons of blogs and tutorials and such, the other doesn't seem to have a single relevant link newer than 2013 in the first page. That's seems to me like a PROBLEM.)

How is this social engineering attack an argument FOR replacing 30 years of established field-tested software that was developed in public with public logs of the discussions and years of history and everybody attending in-person conferences and networking and giving recorded talks and so on... let's throw all that out for brand new software developed from scratch by unknowns now that state level actors have shown interest in targeting this area. Because it's written in The Annointed Language and coding in anything else is a sin, you vile unconverted heathen still clinging to the old ways.

Sigh, one of my concerns about self-driving cars was the skill of driving a car atrophying, so after a while nobody could improve the self-driving cars because nobody knew how to do the task anymore. Automating away a task can eliminate expertise in that task from the wider population, which isn't necessary a bad thing but it's something to be AWARE of. C is a portable assembly language, however much the C++ developers hate anyone pointing out an advantage of C that clearly C++ does not have. You can map from C to assembly in your head, even with fairly extreme optimizer shenanigans a bit of paleontology will dig up the ancestral relationship. It is therefore POSSIBLE to dig down into "here is where what the machine is doing diverged from what I thought I told it to do", and this is a pretty standard debugging technique in C. It's not intro level, but usually by the time you've got a few years experience you've done it more than once. "This code resulted in this write or jump instead of that one, and here's the instruction that made the decision I didn't expect".

Of course right now the venture capitalists have pivoted from blockchain to large language models, so expressing concern about loss of human expertise is going to fall on deaf ears at least until they cash in their bailouts after the next crash. (Billionaires are not allowed to lose money, congress will print more and hand it to them every time they screw up bigly enough for as long as capitalism remains the state religion. Oh well...) And collapse is not the end: the industry got built from scratch over a century, our descendants can do it again I suppose. Not immediately useful for strategic decision making, though.

March 29, 2024

The xz exploit looks like somebody checked in a "test case" with an x86-64 binary blob, and the overcomplicated build system spliced that in to the build with some variant of LD_PRELOAD linker shenanigans overriding the correct symbol. (Which means it does not affect toybox, and my own systems use dropbear for ssh anyway, yes including my laptop).

This is similar to the problem I had with things like pam, and why I tend not to enable module support. You can start with a secure system, then add arbitrary binary blobs at runtime to change how it works. If nothing else that makes the system less AUDITABLE. I can't usefully examine blobs provided to me from on high, and a signing chain of custody is still GIGO. I have to trust that my upstream didn't get exploited, and when that upstream includes systemd they've already lost control of what is and isn't included in that system. (And then people inexplicably want stuff like ssh-agent talking through d-bus: just keeping up with the version skew on how it works when you apt-get update is more than I have bandwidth for. A .ssh/key file in a directory may not be "as secure" but I at least think I understand what's GOING ON.)

A more secure system is one that has LESS in it. Same logic as "watertight". I mean, you can argue about encapsulation and layers of privilege (yay containers), but the people who talk about that tend to think microkernels are a good idea. If I cut and paste an ssh key from one window to another, my clipboard has privileged information in it. My clipboard is not particularly secure. (Yes, there have been attacks on this.) And the threat model of keyloggers, screen scrapers, and processes listening to the laptop's microphone (from which you can apparently reconstruct what keys were typed on the keyboard!) doesn't require a kernel exploit if the information isn't being securely collected and distributed. If my laptop or phone camera had a physical LED that lit up when it was powered, at the HARDWARE level, there wouldn't be a band-aid and electrical tape over them, respectively. If I can't ask the kernel to enumerate all listeners to the microphone, what's the POINT? (Sure, output's got a mixer, but input probably shouldn't.)

Ahem. Tangent. A black box with a sign lighting up saying "all is well" is actually saying "trust me bro". You can sign it but I can't meaningfully examine it.

March 28, 2024

Finally got a sh4eb target with the fdpic loader running under qemu, which can run the sh2eb nommu root filesystem! Woo! It's not a 100% solution because it won't suffer from fragmentation like the other one does, and if the code DOES try to do things to call mmap() with the wrong flags it'll work fine because the underlying kernel isn't a nommu kernel.

Still, it's an alternative to sneakernetting an sd card over to the turtle board every time I want to do ANY nommu smoketesting. Modulo I haven't got a build that's putting them together, instead I'm manually cpio.gz-ing the fs directory and editing to use "-kernel ../sh2eb/fs.cpio.gz". I should probably automate that somehow...

Meanwhile, I have a reasonably sized kernel patch adding FDPIC support to the with-mmu version of superh, which would go upstream if linux-kernel was still functioning. Sigh. Throw it in the list of 6.8 patches to post along with mkroot binaries, I guess? (I should post them to the list again for the usual mockery and derision. Don't really want to, but it's conceptually adjacent to spring cleaning. Big pain, big mess, probably healthy.)

March 27, 2024

I applied the commit as-is, but I wonder what a tests/inotifyd.test would look like? I mean, even under mkroot, there's some design work here...

March 26, 2024

Trying to add a bootable microblaze target to mkroot now that either musl-1.2.4 or musl-1.2.5 seems to have fixed whatever segfault was happening in the userspace code, or at least I ran some toybox commands with qemu-microblaze application emulation and they didn't die like they used to.

I built a kernel from linux's one and only microblaze config (arch/microblaze/configs/mmu_defconfig which nominally implies a nommu variant they didn't bother providing a defconfig for but let's worry about that later) and trying to boot it under qemu-system-microblaze died immediately complaining about unaligned access. And left the terminal in "raw" mode so nothing you type produces output until you run "reset" blind, definitely an arch with all the rough edges polished off.

Eventually I ran "file" on the vmlinux to see that the defconfig had built a little endian kernel, and the presence of qemu-system-microblazeel in the $PATH suggests qemu-system-microblaze is big endian. The root filesystem I built is also big endian, because telling the gcc tuple "microblaze-unknown-linux" with no further details produces a big endian toolchain with big endian libraries, which built a big endian toybox binary. But Linux's .config defaults to little endian unless I add an explicit CONFIG_CPU_BIG_ENDIAN=y config symbol that isn't in the defconfig.

Switching endianness gave me a kernel that booted on qemu's default board (-M petalogix-s3adsp1800), and CONFIG_SERIAL_UARTLITE wants the serial device "ttyUL0" which gave me boot messages. (Tempted to do targets for both endiannesses since there's a qemu-system for the other one, but I already published new toolchains which did NOT include a little endian microblaze toolchain with little endian libraries... maybe next time.)

The external initramfs.cpio.gz loader works and I got a shell prompt! As with or1k I can't figure out how to get the kernel to halt in a way that causes qemu -no-reboot to exit, but it's better than nothing. (Worry about that once I'm running current qemu builds again, which requires a newer version of perl.)

Trying to harvest config symbols out of this defconfig, the next problem is it has the same kind of CPU feature micromanagement nonsense that or1k had:


Which is just LEVELS of sad. Isn't this a compiler -m flag rather than config nonsense? I already BUILT userspace and it didn't need to be micromanaged like this. Can you maybe trap on the missing instruction and emulate the way FPUs are handled (sure it's slow but it means I don't have to care), or some kind of cpu version feature bitfield with the runtime linking patch nonsense all the other architectures do? (Reserve space for the function call, turning it into instruction plus NOP when you don't need it.) I mean seriously, I don't have to do this on a real architecture.

But the annoying part for ME is how verbose the config is: I can either leave them all out so the already slow emulator is even slower because it's making function calls for instructions qemu is clearly emulating (it booted!) or else the microconfig version of the above is the outright tedious XILINX_MICROBLAZE0_USE_MSR_INSTR=1 XILINX_MICROBLAZE0_USE_PCMP_INSTR=1 XILINX_MICROBLAZE0_USE_BARREL=1 XILINX_MICROBLAZE0_USE_DIV=1 XILINX_MICROBLAZE0_USE_HW_MUL=2 XILINX_MICROBLAZE0_USE_FPU=2 which is BEGGING for bash's curly bracket expansion syntax. Which the bash man page calls "brace expansion". That would be XILINX_MICROBLAZE0_USE_{{MSR_INSTR,PCMP_INSTR,BARREL,DIV}=1,{HW_MUL,FPU}=2} which is almost reasonable. (I mean still CONCEPTUALLY broken in a "this is not a real processor" way, but not quite as horrible to include in One line vs three.)

The problem is brace expansion produces space separated output, and this is CSV (comma separated values). I can of course trivially be2csv() {echo "$@" | tr ' ' ,;} in a function, and calling that function would perform the brace expansion on its arguments, so using it would look like $(be2csv abc{def} blah blah) which I guess isn't that bad? Conceptually it's extra complication (now there's FOUR levels of config processing), but there's a bunch of other repetition in the existing microconfigs that could get cleaned up with brace expansion, and while I'm at it I could properly wordwrap the Very Long Lines that most configs are right now.

I note that this would increase the line count of which I brag about, but Goodhardt's Law applies here: a metric that becomes a target stops measuring anything useful. More lines containing LESS DATA and being easier to read is a good thing. This is also why I've got a bunch of comment lines in the code (and yes they're in the line count).

The slightly embarassing part is I have a mkroot talk back in Austin on the 12th, and I think I put the line count in the talk description. Oh well, I can explain. The Very Long Lines were always a cheat, anyway.

March 25, 2024

Updated musl-cross-make, got it to build more targets, and uploaded the resulting toolchains.

March 23, 2024

Remember my rant last month about crunchyroll censorship? A brief follow-up. You can't make the "cartoons are for kids" argument when you show that much gore (which is not new for this show), but of course everybody was wearing towels in the shower because THAT can't be shown while a single Boomer still draws breath.

Half of my problem here is "han shot first". Speilberg came to publicly regret editing guns out of ET, and was quite eloquent about Not Doing That again in future.

I want to watch the original version that made this thing popular. Not some pearl-clutching geezer's edits showing me what THEY want me to see, even when the geezer editing it was once involved in the property's creation back before they ossified into a loon and were compelled to render unwatchable the work they did when they were younger.

But having a distribution channel do this en masse? Sets my teeth on edge. And every time I wonder if what's on screen is a choice the original made or a choice the distributor airbrished over the original breaks my immersion and pulls me right out of the story. Fade to black, clever camera angles, non-transparent water, ALL FINE. But only if it's the original doing it and not changed "for your protection" by someone who knows better than me what I should be allowed to see. Distributors want the exclusive right to convey stuff they didn't create to an audience... and then only provide changed stuff that's NOT what gets shown in Japan. Makes me want to _speculatively_ buy DVDs to see if I MIGHT like things.

This is a separate issue from the original artist _disgracing_ the work so it's still available in its original form but seems tainted, like Dilbert, Harry Potter, Bill Cosby... Death of the Author vs Harvey Weinstein holding Dogma hostage. When Disney's attempts to bury Song of the South turn into photoshopping cigarettes out of pictures of its founder who died of lung cancer, and then its streaming service is riddled with changes... Disney is really big and keeps buying stuff it didn't create and has a history of editing those properties once it owns them. Like crunchyroll is doing.

March 21, 2024

There's no convenient place to set my laptop up in Fade's bedroom: it's full of stuff. There are at least 3 nice places to set my laptop up elsewhere in the apartment, but Adverb will scratch constantly at the bedroom door if I don't let him out and bark constantly at the front door out into the hallway if I do. I have my own bedroom I could close the door to, but again: constant scratching to be let out if someone else is in the apartment and he can't cling to them.

So once again, despite escaping the cat situation, I have a dog situation where I need to leave and go find workspace out in the wider world to take my laptop to. Luckily the apartment has a couple of shared workspaces, which haven't been _too_ busy so far...

March 20, 2024

9am phone call with the realtor, who wants to spend an additional $12k to (among other things) do a more extensive version of the floor replacement I keep trying to talk her out of. (It's entirely for aesthetic reasons, the floor isn't damaged, she just doesn't like it. Now she wants to rip out the toilets so new flooring can go under it in the bathrooms, which I explicitly said no to the last week I was packing up, but nothing she ever wants to do is settled until she gets her way, "no" just means it will be brought up again later.)

The City of Austin's tax assessment on the place was $700k. Speaking to her she thought it was worth $550k but could be brought up to $600k with about $20k of work. Now she's wants to spend an extra $12k on top of that, and is saying it's worth $400-450k. The argument that money we spend fixing the place up will have twice that impact on the sale price isn't very convincing when the base number for the sale price was never in writing and seems subject to endless downward revision.

So to recap: we said we could probably afford about $6k-$8k of work, got talked up to $20k, and now she want to increase it to $32k. And the result of the work done so far seems to have been to DECREASE the amount she wants to list it for.

I find this process stressful. She's also insisting that the city of Austin's tax evaulation is fraudulent, that the three biggest online house assessment sites are frauduluent (that part's plausible), and the two realtor email lists telling me how other houses in the area sold (one I've been on since I bought the place a decade ago, the other I got subscribed to by the mortgage guy I talked to when I tried to refinance back when rates were briefly under 3% during the pandemic) are also fraudulent. Everybody everywhere is giving bad numbers except her, and her numbers keep changing, always in the same direction.

But my wife agrees with the realtor her sister recommended, so fine. There's no equity in the house, meaning I have very little saved for retirement. Good to know. (I don't THINK all the realtor's aesthetic judgements are because she has a specific friend she wants to sell the house to cheap. She's somehow guessing what everyone everywhere would universally like. FINE. Not my area of expertise.)

I have moved beyond finding the process stressful to finding it exhausting.

Update: running the numbers again, we might get out the same amount of equity we put into it from selling the condo back in 2012, only having lost money to ten years of inflation. At this point, that seems like a best-case scenario.

March 18, 2024

Looking at orange pi 3b kernel building, the vanilla kernel still claims to have Orange Pi 3 support, but not 3b. I dunno what the difference is between them: it's an rk3566 chipset either way but bunches of stuff use that, apparently very differently.

Orange pi's github has a new "orange-pi-6.6-rk35xx" branch that looks promising. Of course it doesn't have actual linux git history in it, the entire branch history is just 3 commits, labeled "First Commit", "Init commit for linux6.6", and "Support Orange Pi 3B". So in order to read through a patch of what they added to vanilla linux, I need to come UP with such a patch via diff -ruN against a fresh vanilla v6.6 checkout.

The first difference from orange pi's "init" commit is that the first line of arch/alpha/boot/ (the SPDX-identifier line) is missing, and git annotate in 6.6 says that was added in commit b24413180f560 in 2017. So I dunno what this "init" commit is, but it's ANCIENT... the top level Makefile says 4.9.118. Why would you even... I mean what's the POINT?

Ok, let's try the SECOND commit, the one that says it's linux 6.6, and piping the diff into diffstat we get 1209 files changed, 12719 insertions(+), 11506 deletions(-) which is NOT a vanilla release. Maybe it's one of Greg KH's ME ME ME releases? Hmmm... Not obvious how to get those in a git repo. I can get incremental patches, even fetch them all via for i in $(seq 1 21); do wget$i-$((++i)).xz; done but there's no zero to one, it starts with 1-2, meaning I think I have to start with 6.6.1 instead of Linus's release version?

Except the first patch in that series (the 1-2 one) starts by adding a "dcc:" entry between the "dc:" and "sym:" entries of Documentation/ABI/testing/sysfs-driver-qat and the "init commit" for linux-6.6 does NOT have that change. Was it reverted by a later patch? Grep says the line only appears in the first patch, not in any later patch (reverting it would have a minus line removing it).

So the orange pi chinese developers went from some variant of 5.9 to something that is not 6.6 nor one of the dot releases after... hang on. Check the Makefile... That says 6.6-rc5. Maybe it's an EARLIER version? (I just want to see where they forked off vanilla! I'm assuming any changes that actually made it into vanilla AREN'T spyware. Probably. Or at least multiple people other than me looked at them already to catch anything obvious.)

Ok, *cracks knuckles*: for i in $(git log v6.6-rc5..v6.6-rc6 | grep '^commit ' | awk '{print $2}'); do git checkout -q $i; echo -n ${i:0:12}; diff -ru . ../linux-orangepi | diffstat | tail -n 1; done

The point of divergence has to be newer than the one that changed the Makefile to say -rc5, but older than the commit that changed it to say -rc6. I could also look at individual diff lines and try to annotate them to a commit from -rc6, but this just runs in the background...

Sigh, the closest commit (6868b8505c80) still has 416 files changed, 9404 insertions(+), 4611 deletions(-). Whatever orange pi checked in as their "base", it is NOT a vanilla commit.

March 17, 2024

I have a pending fix I'm staring at because I called the variable "edna" and I should change it to "mode" but I have recently been informed that my variable names aren't good enough even when I do cleanup passes to remove idiosyncratic naming.

I don't want to be reverse psychologied into making the codebase worse just because some else threw a tantrum, but I've had an exhausting month and it's _really_ hard for me to get "in the zone", as it were.

Anyway, the technical issue is my install -d was creating the directory with permission 0777 and letting the default umask 022 drop out the group and other write bits, but for the _files_ the callback was using base permissions of 0755 to apply the string_to_mode() delta against, so of course I had to test (umask 0; install -d potato) and confirm that yes, the base permissions are 0755 for the directory too.

But THEN I did:

$ (umask 0; install -dm +w potato)
$ ls -o
total 4
d-w--w--w- 2 landley 4096 Mar 17 04:51 potato

Which says that when it DOES have a delta, the base permissions are ZERO which is just SAD. I mean, I can do that, but... ew?

As always, doing it is easy, figuring out WHAT to do is hard...

March 14, 2024

Ok, Oliver has explicitly progressed to flamewar and there's no useful reply I can make that that.

What's my current todo list:

finish log/release notes
fix /etc/passwd, re-promote command, promote other commands
build with current kernel
toysh builtins "return" and "trap"
orange pi kernel and/or qemu arm64 debootstrap chroot
cut a release
LFS build to create chroot
LFS build part that runs under chroot
android's pending list
  diff expr tr brctl getfattr lsof modprobe more stty traceroute vi
blog catch up
close tabs, reboot laptop, reinstall 16 gig ram, devuan update

I should go work on some of that...

March 13, 2024

I am irritable. I don't WANT to be irritable, but line buffering is being stroppy in more or less the way I'd expected, and I'm being lectured by Oliver again.

Sigh, I'm pretty sure Oliver MEANS "your half-finished code could use more cleanup and comments" and not "I am the omniscient arbiter of taste, bow before my iron whim". But he's dictating to me how my own code MUST be organized because there's exactly one right way to do it and I was Clearly Wrong, and I just don't have the spoons to handle this gracefully right now. (That's why I've ignored it as long as I have, even when I don't pull my laptop out I tend to check the web archive on my phone to see if there's something new I should respond to. This was a "definitely should not respond to it JUST NOW", with the move and all.)

Busybox had lots of commands that I didn't maintain, but delegated and forwarded requests about. Awk most prominently comes to mind. I tried to let that happen in toybox a few times, which is how I wound up with bc.c being the longest file in the tree (longer than news.html, AND perched in a Cloud of Drama but I mostly try to ignore that). Sigh: it's hard to delegate _and_ maintain the code equivalent of bonsai.

I should book the flight back to Austin for my Texas LinuxFest talk. The realtor was very unhappy at the idea of me bringing a sleeping bag back to Austin and crashing on the floor of my own house for 2 nights. Oh well, I've flown to random cities and spent money on a hotel room before. I just... really don't want to.

March 12, 2024

I moved to Minneapolis. There were weeks of tetris-ing things in boxes and lifting heavy things into various piles. I did 6 consecutive nights on 4 hours or less of sleep per night, which I am no longer young enough to bounce back from the next day.

We moved my flight to Minneapolis back from the 5th to the 10th, and moved back the deadline to have the storage pod picked up TWICE, because SO MUCH TO PACK. Podzilla finally came for it Saturday morning, and a couple hours later I rented a U-haul (from the place a 10 minute walk away on I-35) so we could fill it up with Fuzzy's stuff and I could drive it to her Father's place in Leander. (I _tried_ to get some sleep there, but he played a podcast about the "pillowcase rapist" at full volume ten feet away; he's gotten far older in the past ~5 years than in the previous 15.)

Peejee is settling in well at Stu's. She has a familiar caretaker monkey, and her warm square, and slurry. There was rather a lot of hissing at their existing cat, Scoop, but she's lived with other cats before.

When we finally got the dead motorcycle and chest freezer and SO MANY BOXES out of the U-haul and swept it out and I drove it back, I returned to the house one last time to pack the final 3 suitcases to take on my 8pm flight to Minneapolis: everything else got thrown out (or donated if the realtor's up to elegantly disposing of stuff), including half my clothes that didn't fit in the suitcase. (I tried to get a nap first, but workmen were pressure washing the driveway: our handyman was willing to work on contingency, so the realtor got her $20k worth of work so she could sell the place for $150K less than the current tax assessment. Wheee.)

Headed to the airport, caught my flight to Minneapolis, collapsed at Fade's, and was informed the next day that Drama Had Occurred in my absence. (Pretty sure it's the guy who crossed the street from the apartment complex to ask me about the giant container with the storage company's billboard on the side of it in my driveway, but not much I can do about it from here and... strangely, only minor annoyance levels of harm done? When we first moved in, our game consoles were stolen, then nothing for 12 years, and moving out the realtor didn't get the air fryer because it was stolen.)

Heh, I forgot the 2012 breakin was why I stopped trying to get a account. (Went to a mandatory in-person keysigning, backup disk got stolen with that key on it, didn't bother to try again.)

March 11, 2024

Oh goddess, I just want to know what the RIGHT BEHAVIOR IS, so I can implement it.

Except what coreutils is doing/advocating is very clearly NOT the right behavior. And I'm a monoligual english speaker with a TINY SMATTERING of japanese, so really not qualified to opine on this stuff. But watching silicon valley financially comfortable white males make decrees about it leaves an aftertaste, you know? Bit more humility please. You do not live in the "circle of rice" (which can be sung to that song from the Lion King), and are thus outvoted.

I note that the original circle of rice from reddit is probably correct. I don't trust the smaller one the guy in singapore redrew to exclude Japan because it depends on china's inflated estimates of its population. China's local governments get funding based on head count, so when the "one child policy" reduced population inventing more people on paper and self-certifying their existence was a big temptation. One theory why it's so hard to migrate within china was local governments trying to hide that sort of thing. (This was a chronic problem throughout history, the phrase "pass muster" in Europe originally meant inspecting a regiment of troops to confirm each listed soldier could be present at the same time, because officers would make up enlisted men so they could pocket the extra salaries. The inspection by the people paying the bills wasn't to make sure their boots were shined, it was making sure those boots actually had someone in them.)

That's why estimates of china's actual current population run as low as 800 million, but even China's own central government has been unable to actually _check_ because the local governments really really really don't want them to. Since covid, china relaxed its internal migration rules, in part because they can blame covid for any _specific_ missing people and the central government really doesn't want to do that so carefully doesn't look: one cover-up hides the other. But some fraction of the declining number of births might be because some portion of the young adults nominally capable of having them only ever existed on paper. There's so much fraud it's hard to tell, especially from here.

[Backdated entry: I didn't touch my laptop for several days during the height of the move, but this is when the email came in.]

March 7, 2024

Got a google alert, which I set on my last name over 10 years ago and hasn't been useful in forever (and barely ever triggers anymore), telling me that my grandmother died.

Nothing I can do about it at this point. She lived to be 100, like her mother before her. More boxes to pack...

March 5, 2024

If you collect your mp3 files into a directory, The Android 12 ("snow cone") built in file browser app can be convinced to play them in sequence, and will continue playing with the screen switched off. (Just go to "audio files" and it shows you folders you've created in random other places, for some reason?)

But as soon as focus returns to the app (which is what happens by default when you switch the screen back ON), the playback immedately jumps to the position it was at when you switched it off, and playback switches to that point in that song. Redrawing the app's GUI resets the playback position. Oh, and if you let it play long enough, it just suddenly stops. (And then jumps to the old position when you open it to see what's going on.) The user interface here is just *chef's kiss*.

March 4, 2024

We're tentatively having the storage pod picked up on friday, renting a u-haul to take Fuzzy's stuff to her father's place on saturday, including the 20 year old cat, and then I drive to the airport Sunday. Fingers crossed.

My proposed talk at Texas LinuxFest (explaining mkroot) got accepted! Except I plan to be in minneapolis after this week, and have to fly BACK for the talk. (And get a hotel room, because the realtor is highly dubious about me bringing a sleeping bag to crash on the floor of a house with a lockbox on the front. Yes, this is the same realtor that insists the place has to be listed for $150k less than the tax assessment. She's a friend of my wife's sister.) So I may have to get a hotel in order to speak at an Austin conference. Oh well, I've done that for a zillion other conferences...

In the netcat -o hexdump code, TT.ofd is unsigned because I'm lazy and wanted one "unsigned fd, inlen, outlen;" line in the GLOBALS() declaration instead of two lines (one int fd, one unsigned inlen, outlen), since xcreate() can't return -1 (it does a perror_exit() instead). I thought about adding a comment, but adding a comment line to explain I saved a line seems a bit silly.

I found an old pair of glasses while packing (in a box with a prescription slip from 2012), which is kind of backwards from the pair I've been wearing in that the LEFT eye is more or less clearly corrected, but the RIGHT eye is fuzzy at any distance. I've refused to update my prescription for several years now with the excuse "they're reading glasses" ever since I figured out that the reason I'm nearsighted is my eyes adjust to whatever I've been looking at recently, and I read a lot. The day of the school eye test in second grade on Kwaj I'd been reading all morning and my eyes hadn't had time to adjust BACK, so they gave me glasses. Which my parents kept reminding me to wear. So I'd read with those, focusing up close, and 20 years of feedback loop later I finally figured out what's going on and STOPPED UPDATING. But I still spend most of my time staring at a laptop or phone or similar, so far away is fuzzy unless I've taken a couple days off. But it mostly stopped GETTING WORSE, as evidenced by glasses from 2012 not being worse than the current set, just... different.

My last few sets of glasses I just went "can you copy the previous prescription", which they can do by sticking it in a machine that reads the lenses, but after a few "copy of a copy" iterations it went a little weird in a church glass sort of way. (Which my eyes mostly adjusted to!) But I've developed a dominant eye over the past couple years... and these old glasses are BACKWARDS. The dominant eye with these glasses is the LEFT one, and the right is hard to read text at my normal length with just that one eye open.

So I'm wearing that pair now, on the theory variety's probably good in terms of not screwing up my visual cortex so nerves atrophy or something, in a "that eye's input isn't relevant" sort of way. Honestly I should go outside and stare at distant things more often, but texas sunlight and temperatures are kind of unpleasant most of the year.

(I remember why I stopped wearing this pair. One of the nose pieces is sharp and poky.)

March 3, 2024

Gave up and admitted I'm not making the March 5 flight to minneapolis, and had Fade bump it back to the evening of the 10th (which is when I actually told the realtor I'd be out of here). I immediately got hit with ALL THE STRESS, because my subconscious knew the deadline on the 5th wasn't real but the one the 10th is. (My brain is odd sometimes, but I've been living with it for a while now.)

Red queen's race continues: I hadn't checked in the hwclock rewrite motivated by glibc breakage which screwed up the syscall wrapper to not actually pass the arguments to the syscall. Meanwhile, musl-libc changed their settimeofday() to NOT ACTUALLY CALL THAT SYSCALL AT ALL, which is the only way to set the in-kernel timezone adjustment. So I rewrote hwclock to call the syscall directly, but before checking it in I wanted to test that it still works properly (I.E. reads and writes the hardware clock properly), and I'm not gonna do that on my development laptop so I needed to do a mkroot build to test under qemu.

Which is how I just found the musl commit that removed __NR_settimeofday, thus breaking my new version that calls the syscall directly. Rich both broke the wrapper AND went out of his way to make sure nobody calls the syscall directly, because users aren't allowed to do things he disapproves of. (For their own good, they must be CONSTRAINED.)

March 2, 2024

I've had mv -x sitting in my tree for a couple days, but it came up on the coreutils mailing list (in a "don't ask questions, post errors" sort of way) so I'm checking it in.

In theory both renameat2() and RENAME_EXCHANGE went in back in 2014 (ten years ago now!), but glibc doesn't expose either the Linux syscall or the constant Linux added unless you #define STALLMAN_FOREVER_GNU_FTAGHN_IA_IA and I categorically refuse. Also, this should build on macos and freebsd, which probably don't have either? So I need a function in portability.[ch] wrapping the syscall myself inside an #ifdef.

Which is a pity, because renameat() seems like what "mv" really WANTS to be built around. Instead of making a bunch of "path from root" for the recursive case, the clean way to handle -r is to have openat() style directory filehandles in BOTH the "from" and "to" sides, and that's what renameat() does: olddirfd, oldname, newdirfd, newname.

Although there's still the general dirtree scalability issue I have a design for but haven't properly coded yet: keeping one filehandle open per directory level leads to filehandle exhaustion if you recurse down far enough. I need to teach dirtree() to close parent filehandles and re-open them via open("..") as we return back up (then fstat() and compare the dev/ino and barf if it's not the same). (And even if I teach the dirtree() plumbing to do this, teaching _mv_ to do it would be separate because it's two parallel traversals happening at the same time.)

Without conserving filehandles you can't get infinite recursion depth, and you can trivially create an infinite depth via while true; do echo mkdir -p a b; echo mv a b/a; echo mv b a; done or similar so at least "rm -r" can't be limited by PATH_MAX. And without the stat to see if that gets us the parent node's same dev/ino back rm -rf could wind up deleting the WRONG STUFF if an ill-timed directory move happened in a tree that was going away, which is important to prevent. So we both need to check that the parent filehandle is safe to close because we can open("..") to get it back (if not, we followed a symlink or something and should keep the filehandle open: if you cause filehandle exhaustion by recursing through symlinks to directories, that's pilot error if you ask me), AND we need to confirm we got the right dev/ino back after reopening.

But if we DO get a different dev/ino when eventually reopening "..", what's the error recovery? We can drill back down from the top and see how far we get, but do we error out or prune the branch or what? Doing "mv" or "rm" on a tree we're in the middle of processing is bad form, and if we're getting different results later somebody mucked with our tree mid-operation, but what's the right RESPONSE? At a design level, I mean.

Anyway, that's a TODO I haven't tackled yet.

March 1, 2024

The pod people's flatbed truck arrived today, and dropped off a storage container using what I can only describe as an "elaborate contraption". (According to Fade, their website calls it PODzilla, imagine a giant rectangular daddy longlegs spider with wheels, only it lifts cargo containers on and off a big flatbed tow truck.) There is now a large empty box with a metal garage door on one side in the driveway, which I have been carrying the backlog of cardboard boxes we packed and taped up into.

I'm very tired. Fuzzy's gone to the u-haul store to buy more boxes. We're like 20% done, tops.

I tried to get a toybox release out yesterday (using the "shoot the engineers and go into production" method of just SHIPPING WHAT I HAVE, with appropriate testing and documentation), but got distracted by a mailing list question about the "getopt" command in pending and wound up wasting the evening going through that instead. Although really the immediate blocker on the release is I un-promoted the passwd command when I rewrote lib/password.c until I can properly test that infrastructure (under mkroot, not on my development system!) and that's both a pain to properly set up tests for (the test infrastructure doesn't run under toysh yet because I've refused to de-bash it, I'm trying to teach toysh all the bashisms it uses instead) and because there's a half-dozen other commands (groupadd, groupdel, useradd, userdel, sulogin, chsh) that are low hanging fruit to promote once that infrastructure's in, and what even ARE all the corner cases of this plumbing...

There are like 5 of these hairballs accumulated, each ALMOST ready, but that one that causes an actual regression if I don't finish it.

Wound up promoting getopt, so that's something I guess. Still not HAPPY with it, but it more or less does the thing. Given my stress levels accomplishing anything concrete is... an accomplishment.

February 29, 2024

The coreutils maintainer, Padrig Brady, just suggested using LLMs to translate documentation. I keep thinking gnu can't possibly get any more so, but they manage to plumb new depths.

The University of Texas just started offering a master's degree program in "AI".

Linus Torvalds recently talked about welcoming LLM code into the kernel, in the name of encouraging the younguns to fleet their yeek or some such. (The same way he wants to have langauge domain crossings in ring zero by welcoming in Rust while the majority of the code is still C. Because nothing says "maintainable" like requiring a thorough knowledge of two programming langauges' semantics and all possible interactions between them to trace the logic of a single system call. So far I've been able to build Linux without needing a BPF compiler. If at some point I can't build kernels without needing a Rust compiler, that's a "stay on the last GPLv2 release until finding a different project to migrate to" situation.)

The attraction of LLMs is literally Dunning-Kruger syndrome. Their output looks good to people who don't have domain expertise in the relevant area, so if you ask it to opine about economics it looks GREAT to people who have no understanding of economics. But if you ask it to output stuff you DO know about, well obviously it's crap. I.E. "It's great for everything else, but it'll never replace ME, so I can fire all my co-workers and just have LLMs replace them while I use my unique skills the LLMs do a bad job replicating".

Fundamentally, an LLM can't answer any question that hasn't got a known common answer already. It's morphing together the most common results out of a big web-scraped google cache, to produce the statistically most likely series of words from the input dataset to follow the context established by the prompt. The answer HAS to already be out there in a "let me Google that for you" sense, or an LLM can't provide it. The "morphing together" function can combine datasets ("answer this in the style of shakespeare" is a more advanced version of the old "jive" filter), but whether the result is RIGHT is entirely coincidental. Be careful what you wish for and caveat emptor are on full display.

I can't wait for license disputes to crop up. Remember the chimp who took a photo of itself and a court ruled the image wasn't copyrighted? LLM code was trained on copyrighted material, but the output is not itself copyrightable because human creativity wasn't involved. But it's not exactly public domain, either? Does modifying it and calling your derived work your own IP give you an enforceable copyright when 95% of it was "monkey taking a selfie?" and the other 5% is stolen goods?

Lovely comment on mastodon, "Why should I bother to read an LLM generated article when nobody could be bothered to write it?" Also people speculating that ChatGPT-4 is so much worse than ChatGPT-3 that it must have been intentionally sabotaged (with speculation about how this helps them cash out faster or something?) when all the LLM designers said months ago that sticking LLM output into an LLM training dataset was like sticking a microphone into a speaker, and the math goes RAPIDLY pear shaped with even small amounts of contamination poisoning the "vibe" or whatever's going on there. (Still way more an art than a science.) So scraping an internet that's got LLM-generated pages in it to try to come up with the NEXT round of LLM training data DOESN'T WORK RIGHT. The invasive species rapidly poisons its ecosystem, probably leading to desertification.

Capitalism polluting its own groundwater usually has a longer cycle time, but that's silicon valley for you. And white guys who confidently answer questions regardless of whether they actually know anything about the topic or not are, of course, highly impressed by LLMs doing the same. They made a mansplaining engine, they LOVE it.

"Was hamlet mad" was a 100 point essay question in my high school shakespeare class, where you could argue either side as long as you supported it. "Was hamlet mad" was a 2 point true/false question in my sophomore english class later the same month. Due to 4 visits to the Johns Hopkins CTY program I wound up taking both of those the same semester in high school, because they gave me the senior course form to fill out so I could take calculus as a sophomore, so I picked my other courses off there too and they didn't catch it until several months later by which point it was too late. I did not enjoy high school, but the blatant "person in authority has the power to define what is right, even when it's self-contradictory and patently ridiculous" experience did innoculate me against any desire to move to Silicon Valley and hang out with self-important techbros convinced everyone else is dumber than they are and there's nothing they don't already know. A culture where going bankrupt 4 times and getting immediate venture capital funding for a 5th go is ABSOLUTELY NORMAL. They're card sharps playing at a casino with other people's money, counting cards and confidently bluffing. The actual technology is a side issue. And now they've created a confident bluffing engine based on advanced card counting in a REALLY BIG deck, and I am SO TIRED.

February 28, 2024

Trying hard to get a leap day toybox release out, because the opportunity doesn't come along that often.

This is why Linux went to time based releases instead of "when it's ready" releases, because the longer it's BEEN since the last release the harder it is to get the next release out. Working on stabilization shakes todo items loose and DESTABILIZES the project.

February 27, 2024

When I tested Oliver's xz cleanup, which resulted in finding this bug, what I muttered to myself (out loud) is "It's gotta thing the thing. If it doesn't thing the thing it isn't thinging."

This is my clue to myself that it may be time to step away from the keyboard. (I didn't exhaust myself programming today, I exhausted myself boxing up the books on 4 bookshelves so somebody could pick the empty bookshelves up and move them to her daughter's bedroom. This leaves us with only 14 more bookshelves to get rid of.)

Remember how two people were working on fdpic toolchain support for riscv? Well now the open itanium crowd has decided to remove nommu support entirely. Oh well. (It's a good thing I can't _be_ disappointed by riscv...)

February 24, 2024

Sigh, started doing release notes with today's date at the top, and as usual, that was... a bit ambitious.

Editing old blog entries spins off todo items as I'm reminded of stuff I left unfinished. Going through old git commits to assemble release notes finds old todo items. Doing "git diff" on my dirty main dev tree finds old todo items... The question is what I feel ok skipping right now.

I'm too stressed by the move to make good decisions about that at the moment...

February 23, 2024

Sigh, the censorship on crunchyroll is getting outright distracting. Rewatching "kobyashi maid dragon" (_without_ subtitles this time, I've heard it so many times I kind of understand some of the japanese already and I know the plot so am trying to figure which word means what given that I sort of know what they're saying), and in the first episode Tohru (the shapeshifted dragon) was shown from behind, from the waist up, with her shirt off. But you can no longer show a woman's bare back on crunchyroll (you could last year!), so they edited in another character suddenly teleporting behind her to block the view.

This is 1950's "Elvis Presley's Pelvis can't be shown on TV" levels of comstock act fuckery. (And IT IS A CARTOON. YOU CANNOT HAVE SEX WITH A DRAWING. There are so many LAYERS of wrong here...)

Imagine the biblical prohibitions on food had been what survived into the modern day instead of the weirdness about sex. The bible's FULL of dietary restrictions predating germ theory, the discovery of vitamins, or any understanding of allergens: can't mix milk and meat, no shellfish, no meat on fridays, give stuff up for lent, fasting, the magic crackers and wine becoming LITERALLY blood and human flesh that you are supposed to cannibalize but it's ok because it's _church_ magic... Imagine black censor bars over the screen every time somebody opens their mouth to eat or drink. Imagine digitally blurring out any foodstuff that isn't explicitly confirmed, in-universe, as kosher or halal. Imagine arguing that watching "the great british bake-off", a dirty foreign film only available to adults on pay-per-view in 'murica, was THE SIN OF GLUTTONY and would make you statistically more likely to get tapeworms because FOOD IS DANGEROUS.

Kind of distracting, isn't it? Whether or not you're particularly interested in whatever made anime character du jour shout "oiishiiii" yet again (it's a trope), OBVIOUSLY CENSORING IT is far, far, far more annoying than the trope itself could ever be. Just show it and keep going. Even if I wanted to (I don't) I can't eat a drawing of food through the screen... but why exactly would it be bad if I could? What's the actual PROBLEM?

I am VERY TIRED that right-wing loons' reversion to victorian "you can see her ankles!" prudishness is being humored by so many large corporations. These idiots should not have traction. Their religion is funny about sex EXACTLY the same way it's funny about food, with just as little scientific basis. These days even their closest adherents ignore the EXTENSIVE explicit biblical dietary prohibitions (Deuteronomy 14 is still in the bible, forbidding eel pie and unagi sushi although Paul insists that God changed his mind since then, but even the new testament forbids eating "blood" and "meat of strangled animals" in Acts 15:29 and the medieval church had dozens of "fast days" on top of that, plus other traditions like anorexia mirabilis, but these days we ignore all that because their god isn't real and we all AGREE the food prohibitions were nothing but superstition propagated from parent to child the same way santa claus and the tooth fairy are. Even the more RECENT stuff like "lent" (which gave us the McDonalds Fish sandwich because christianity was still culutrally relevant as recently as the 1960s) is silly and quaint to anyone younger than Boomers.

But the SEX part persists (officiating marriage was too lucrative and provided too much control over the populace to give up), and is still causing enormous damage. Religious fasting is obsolete but shame-based abstinence is still taught in schools. Except most sexually transmitted diseases only still EXIST because of religious shame. Typhoid mary was stopped by science, because we published the information and tracked the problem down and didn't treat getting a disease as something shameful to be hidden and denied. Sunlight was the best disinfectant, we find OUT sources of contamination and track them down with the help of crowdsourcing. NOT with medieval "for shame, you got trichinosis/salmonella/listeria what a sinner, it's yahweh jehovah jesus's punishment upon you, stone them to death!" It's precisely BECAUSE we drove the religious nonsense out and replaced it with science and sane public policy that you can eat safely in just about any restaurant even on quite rural road trips. We have regular testing and inspections and have driven a bunch of diseases out of the population entirely, and when there IS an outbreak of Hepatitis A we don't BLAME THE VICTIMS, we track down the cause and get everybody TREATED.

I don't find cartoon drawings of women particularly arousing for the same reason I don't find cartoon drawings of food particularly appetizing... but so what if I did? So what if "delicious in dungeon" or "campfire cooking" anime made me hungry? Cartoon food on a screen is not real food in front of me for MULTIPLE REASONS. which also means I can't get fat from it, or catch foodborne pathogens, or allergens, or deprive someone else's of their rightful share by eating too much, or steal the food on screen, or contaminate it so other people get sick. Even if I _did_ salivate at cartoon food... so what?

Even if I was attending a play with real actors eating real food up on the stage live in front of me, which I could literally SMELL, I still couldn't run up and eat it because that's not how staged entertainment works. But the Alamo Drafthouse is all about "dinner and a movie" as a single experience, and when I watched Sweeney Todd at the Alamo Drafthouse they had an extensive menu of meat pies (which is how I found out I'm allergic to parsnips), and it was NOT WRONG TO EAT WHILE WATCHING when the appropriate arrangements had been made to place reality in front of each individual attendee, EVEN THOUGH THAT MOVIE IS LITERALLY ABOUT CANNIBALISM. You can't make a "slippery slope" argument when the thing LITERALLY ACTUALLY HAPPENING would be fine. Oh wow, imagine if a summoned elf from another world climbed out of the TV and had sex with me right now! Um... ok? This is up there with wanting to fly and cast "healing" from watching a cartoon with magic in it. The same church also did a lot of witch burnings, it was wrong of them and we're over that now. Today, watching Bewitched or I Dream of Jeanie, I'm really not expecting to pick up spells because I'm not four years old, but if watching "The Tomorrow People" taught me to teleport... where's the downside? What do you think you're protecting anyone FROM?

These entertainments regularly show people being brutally, bloodily murdered, and THAT is just fine. Multiple clips of deadpool on youtube show the "one bullet through three heads in slow motion" scene unblurred, but the scenes showing consensual sex with the woman Wade Wilson lives with and proposes marriage to and spends half the movie trying to protect and/or get back to, THAT can't be shown on youtube. (And even the movie has some internalized misogyny, albeit in the form of overcompensating the other way and still missing "equality": in the scene where he collapses from the first sign of cancer, he's fully naked and she's wearing underwear, because male nudity isn't sexual while women in underwear or even tight clothing are always and without exception sexual and beyond the pale, and showing an orifice literally HALF THE POPULATION has is unthinkable even in an R rated movie.)

Sexual repression has always correlated strongly with fascism. The nazis first book burning was a sexual research institute. The victorian prudishness of the british was the period they were conquering an empire with jamaican slave plantations and feeding opium to china and the East India company subjugating india and native american genocides (George "town killer" Washington) so on.

It's currently the boomers doing it. As teenagers in the 1960s they pushed "sex drugs rock and roll" into the mainstream, and then once they were too old to have sex with teenagers they outlawed teenagers having sex with EACH OTHER or selling pictures they took of themselves (the supreme court's Oberfell decision in 1982 invented the legal category of "child porn" because some teenage boys selling pictures they took of themselves masturbating made it all the way to the supreme court, which is why everybody used to have naked baby pictures before that and the 1978 movie "superman" showed full frontal nudity of a child when his spacecraft lands without anybody thinking it was sexual, but 4 years later the law changed so filming things like that is now SO TERRIBLE that you can't even TALK ABOUT IT without being branded as "one of them", which makes being a nudist a bit frustrating). And now the Boomers are so old even the viagra's stopped working, they're trying to expunge sex from the culture entirely.

Sigh. This too shall pass. But it's gonna get uglier ever year until a critical mass of Boomers is underground. (In 2019 there were estimated to be about 72 million Boomers left, and 4 million of them died between the 2016 and 2020 elections which was the main reason the result came out differently.)

In the meantime... crunchyroll. Last week I tried to start a new series called "I couldn't become a hero, so I reluctantly decided to get a job", and I'm tempted to try to buy the DVD of a series I may not even like because I CANNOT WATCH THIS. In the first FIVE MINUTES they'd clearly edited a half-dozen shots to be less porny. I'm not interested in trying to sexualize cartoon characters, but this is "han shot first" and the ET re-release digitally editing the guns into walkie-talkies levels of obvious and unconvincing bullshit. Even when I'm theoretically on their side (defund the police, ACAB, I'm very glad the NRA is imploding) the cops who showed up to separate Elliott from his alien friend HAD GUNS and STOPPIT WITH THE PHOTOSHOP. If I can tell on a FIRST WATCH that you're editing the program within an inch of its life... every time I'm pulled right out of my immersion again.

I dislike smoking, but Disney photoshopping cigarettes out of Walt Disney's photos is historical revisionism. If a show had a bunch of characters chain-smoke but they digitally edited them to have lollypops and candycanes in their mouths all the time instead, gesticulating with them... You're not fooling anyone. Imagine if they did that to Columbo. Columbo with his cigar digitally removed and every dialog mention of it clipped out. You can be anti-cigar and still be WAY CREEPED OUT BY THAT. Cutting the "cigarette? why yes it is" joke out of Police Squad does not make you the good guy.

Do not give these clowns power. The law is whatever doesn't get challenged.

February 22, 2024

Sat down to rebuild all the toolchains this morning for the upcoming release (so I can build mkroot against the new kernel), but the sh4 sigsetjmp() fix went in recently (a register other stuff used was getting overwritten) and Rich said it was just in time for the upcoming musl release, so I asked on IRC how that was doing, and also mentioned my struggle with nommu targets and the staleness of musl-cross-make, and there was a long quite productive discussion that resulted in Rich actually making a push to mcm updating musl to 1.2.4! Woo! And it looks like they're doing a lot of cool stuff that's been blocked for a bit.

As part of that discussion, somebody new (sorear is their handle on the #musl channel on is working on a different riscv fdpic attempt, and meowray is working on adding fdpic support to llvm-arm. Either could potentially result in a nommu qemu test environment, I'm all for it.

February 21, 2024

One of my phone apps "updated" itself to spray advertising all over everything, after 2 years of not doing that. Showing one on startup I'd probably wince and let the frog boil, but having an animated thing ALWAYS on screen when it's running: nope. And Android of course does not let me downgrade to the previous version of anything because that would be giving up too much control.

It doesn't show ads if I kill the app, go into airplane mode, and relaunch it without network access. Then I get the old behavior. So I went into the app permissions, viewed all, and tried to revoke the "have full network access" permission. The app is an mp3 player reading files off of local storage, I switch to it from the google built-in one because Google's didn't understand the concept of NOT streaming but just "only play local files"...

But Android won't let me revoke individual app permissions. I can view "other app capabilities", but long-press on it does nothing, nor does swipe to the side, and tapping on it just brings up a description with "ok". No ability to REVOKE any. Because despite having purchased a phone, I am the product not the customer. Even having put the phone into debug mode with the "tap a zillion times in a random sub-menu" trick, I still don't get to control app permissions. (So what are the permissions FOR, exactly?)

Sigh, serves me right for running vanilla android instead of one of the forks that actually lets me have control over my phone. I suppose there's a thing I could do with adb(?), but keeping the sucker in airplane mode while listening is a workaround for now...

And no I don't feel guilty about "but what about all the effort the app developer put into it", I can play an mp3 I downloaded through the "files" widget: it's built into the OS. Which is fine for the copy of Rock Sugar's "reinventinator" Fade bought me for christmas: whole album is one big ogg file, threw it on on my web server and downloaded it, and it plays fine. But the File app doesn't advance to the next one without manual invervention. "Play this audio file" is probably a single line of java calling a function out of android's standard libraries. Going from an android "hello world" app tutorial to "display list of files, click on one to play and keep going in order, show progress indicator with next/forward and pause/play button, keep going when screen blanked with the lock screen widget... In fact nevermind that last bit, the "file" widget is doing the exact same lock screen widget playing that ogg file, so this is probably a standard gui widget out of android's libraries and you just instantiate it with flags and maybe some callbacks. (Sigh, it's Java, they're going to want you to subclass it and provide your own constructor and... Ahem.) Anyway, that's also built into the OS.

This is probably a weekend's work _learning_ how to do all that. Including installing android studio. And yes my $DAYJOB long ago was writing java GUI apps for Quest Multimedia and I taught semester long for-credit Java courses at austin community college: I'm stale at this but not intimidated by it.

But I haven't wanted to open the app development can of worms because I'm BUSY, especially now you have to get a developer ID from Google by providing them government ID in order to have permission to create a thing you can sideload on your OWN PHONE.

Not going down that rathole right now. I am BUSY.

February 20, 2024

Hmmm, you know a mastodon feed of this blog doesn't have to be CURRENT, I could do audio versions of old entries, do notes/01-23-4567 dirs each with an index.html and mp3 file (alongside the existing one-big-text version), and post links to/from a (new, dedicated) mastodon account as each one goes up, which would allow people to actually comment on stuff, without my tendency to edit and upload weeks of backlog at a time. (Hmmm, but _which_ mastodon account? Does dreamhost do mastodon? Doesn't look like it. I don't entirely trust to still be around in 5 years, I mean PROBABLY? But it's outside of my control. How much of the legal nonsense of running your own server is related to letting OTHER people have accounts on it, and how much is just "the Boomers are leaving behind a dysfunctinally litigous society". There was a lovely thread about mastodon legal setup tricks for individuals running their own server, things like notifying some government office (a sub-program of the library of congress I think?) to act as a DMCA takedown notice recipient "agent" on your behalf, but it was on twitter and went away when that user deleted their account. Mirror, don't just bookmark...)

Ahem: backstory.

This blog is a simple lightly html formatted text file I edit in vi, and I tend to type in the text extemporaneously and do most of the HTML formatting in a second pass, plus a bunch of editing to replace [LINK] annotations with the appropriate URL I didn't stop to grab at the time, and finish half-finished trail off thoughts not englished wordily because brain distract in

Anyway, the "start of new entry" lines are standardized, and as I go through editing I replace my little "feb 20" note with a cut and paste from the last entry I edited to the start of the new one, and change the date in the three places it occurs. Yes vi has cut and paste: "v [END] y [PAGEUP... cursor cursor...] p" and then "i" to go into insert mode and cursor over to the three places the entry's date shows up in the first line and type over it because I'm sure there's a "search and replace within current line" magic key but I've never bothered to learn it. It would be great to to have the date in just ONE place, but I'm editing raw HTML and it's got an <a name="$DATE"> to provide jump anchors, an <hr> tag to provide a dividing line, <h2> start and end tags to bump the font up, an <a href="#$DATE"> tag to provide an easily copyable link to the entry (each entry links to itself), and then an expanded english date to provide the display name for the link. (And then on the next line, usually a <span id=programming> tag so SOMEDAY I can make multiple rss feed generators that show only specific categories, if you "view source" there's a commented out list of span tags at the top I've historically used and try to stick to.)

The advantage of each new entry having a standardized line at the start is it's easy to search for and parse, and I have a python script a friend (Dr. What back at timesys) wrote ages ago to generate an rss feed for my blog, which I've rewritten a lot since then but it's still in python rather than sed out of historical inertia, and also me treating rss (actually "atom", I think?) as a magic undocumented format likely to shatter if touched. (It is python 2. It will not ever be python 3. If a debian upgrade takes away python 2, that's when the sed comes out. Posix has many failings, but "posix-2024" is not going to force you to rewrite "posix-2003" scripts that work, the same way modern gasoline still works in a 20 year old car.)

What this form of blogging does NOT provide is any way for readers to leave comments (other than emailing me or similar), which was the big thing I missed moving from livejournal back to blogging on my own site. And I am NOT doing that myself: even if I wanted to try to deal with some sort of CGI plumbing for recording data (I don't), user accounts and moderation and anti-spam and security and so on are way too much of a pain to go there. (I have met the founders of Slashdot. It ate their lives, and that was 20 years ago.)

But now that I'm on mastodon (as pretty much my only social network, other than some email lists and the very occasional youtube comment under an account not directly connected to anything else), using a mastodon account as an rss feed for the blog seems... doable? Ok, the entries don't have TITLES. Summaries would be a problem. (On posts have a 500 character limit, I guess I could just do start of entry. But they're not realy organized with topic scentences, either.)

The real problem has been that I'm not posting promptly, and tend to do so in batches (because editing) which floods the feed. Possibly less of an issue with rss feeds, where you can get to it much later. (The feed readers I've seen had each data source basically in its own folder, not one mixed together stream like social media likes to do so stuff gets buried if you don't get to it immediately.)

There's also a lot of "chaff", since a blog has multiple topics and I might want to serialize just one (the id=programming stuff). I've (manually) put the tags in, but haven't USED them yet. Haven't even mechanically confirmed the open/close pairs match up, just been eyeballing it...

February 19, 2024

Watched the building a busybox based debian peertube video, which really should have been a 5 minute lightning talk. It boils down to "I use mmdebstrap instead of debootstrap, here's some command line options that has and how I used them to install debian's busybox package in a semi-empty root directory and got it to boot". It's not _really_ a busybox based debian, more hammering in a screw and filing the edges a bit.

First he established "debian's too big for embedded" by doing mmdebstrap --variant=minbase unstable new-dir-name and showing the size (not quite 200 megs), then he trimmed it with --dpkgopt='path-exclude=/usr/share/man/*' and again for (/usr/share/doc/* and /usr/share/locale/*) which was still over 100 megs.

Next he mentioned you can --include packagename (which takes a CSV argument) and introduced the --variant=custom option which only installs the packages you list with --include. And he talked about --setup-hook and --customize-hook which are just shell command lines that run before and after the package installs (in a context he didn't really explain: it looks like "$1" is the new chroot directory and the current directory already has some files in it from somwhere? Maybe it's in the mmdebstrap man page or something...)

Putting that together, his "busybox install" was:

mmdebstrap --variant=custom --include=$INCLUDE_PKGS \
  --hook-dir=/usr/share/mmdebstrap/hooks/busybox \
  --setup-hook='set -i -e "1 s/:x:/::/g" > "$1/etc/passwd"' \
  --customize-hook='cp inittab $1/etc/inittab' \
  --customize-hook='mkdir $1/etc/init.d; cp rcS $1/etc/init.d.rcS' \
  unstable busybox-amd64

(Note, the "amd64" at the end was just naming the output directory, the plumbing autodetects the current architecture. There's probably a way to override that but he didn't go there.)

He also explained that mmdebootstrap installs its own hooks for busybox in /usr/share/mmdebootstrap/hooks/busybox and showed and out of there, neither of which seemed to be doing more than his other customize-hook lines so I dunno why he bothered, but that's what the --hook-dir line was for apparently. (So it doesn't do this itself, and it doesn't autodetect it's installing busybox and fix stuff up, but you can have it do BITS of this while you still do most of the rest manually? I think?)

In addition to the packages he explicitly told it to install, this sucked in the dependencies gcc-12-base:amd64 libacl1:amd64 libbz2-1.0:amd64 libc6:am64 libdebconfclient0:amd64 libgcc-s1:amd64 liblzma5:amd64 libpcre2-8-0:amd64 libselinux1:amd64 mawk tar zlib1g:amd64 and that list has AWK and TAR in it (near the end) despite busybox having its own. I haz a confused. This was not explained. (Are they, like, meta-packages? I checked on my ancient "devuan botulism" install and awk claims to be a meta-package, but tar claims to be gnu/tar.)

Anyway, he showed the size of that (still huge but there's gcc in there) then did an install adding the nginix web server, which required a bunch more manual fiddling (creating user accounts and such, so he hasn't exactly got a happy debian base that "just works" for further packages, does he) and doing that added a bunch of packages and ~50 megs to the image size. (Plus naginiks's corporate maintainer went nuts recently and that project forked under a new name, but that was since this video.)

Finally he compared it against the alpine linux base install, which is still smaller than his "just busybox" version despite containing PERL for some reason. This is because musl, which the above technique does not address AT ALL. (It's pulling packages from a conventionally populated repository. Nothing new got built from source.)

Takeaway: the actual debian base appears to be the packages dpkg, libc-bin, base-files, base-passwd, and debianutils. This does not provide a shell, command line utilities, or init task, but something like toybox can do all that. Of course after installing a debootstrap I generally have to fiddle with /etc/shadow, /etc/inittab, and set up an init ANYWAY. I even have the checklist steps in my old container setup docs somewhere...

February 18, 2024

The limiting factor on a kconfig rewrite has been recreating menuconfig, but I don't really need to redo the current GUI. I can just have an indented bullet point list that scrolls up and down with the cursor keys and highlight a field with reverse text. Space enables/disable the currently highlighted one, and H or ? shows its help text. Linux's kconfig does a lot with "visibility" that I don't care about (for this everything's always visible, maybe greyed if it needs TOYBOX_FLOAT or something that's off?). And Linux's kconfig goes into and out of menus because an arbitrarily indented bullet point list would go off the right edge for them: the kernel's config mess goes a dozen levels deep, but toybox's maximum depth is what, 4? Shouldn't be that hard...

As for resolving "selects" and "depends", according to sed -n '/^config /,/^\*\//{s/^\*\///;p}' toys/*/*.c | egrep 'selects|depends' | sort -u there aren't current any selects, and the existing depends use fairly simple logic: && and || and ! without even any parentheses, which is the level of logic already implemented in "find" and "test" and such (let alone sh). Shouldn't be too challenging. I should probably implement "selects" and parentheses just in case, though...

The cursor up and down with highlighting stuff I already did in "top" and "hexedit" and such, and I should really revisit that area to do shell command line editing/history...

February 17, 2024

The deprecation news of the week:

The last one is sad. FreeBSD is rendering itself irrelevant in the embedded world. Oh well, if they want to embrace being "MacOS Rawhide and nothing more", it's their project...

Ongoing sh4 saga: I might be able to get FDPIC working on qemu-system-sh4, but it turns out qemu-system-sh4 doesn't boot mkroot anymore, even in a clean tree using the known-working kernel from last release.

I bisected it to a specific commit but commenting out the setvbuf() in main didn't help. Tracked it down to sigsetjmp() failing to return. Note that this is SET, which should just be writing to the structure. Yes it's 8 byte aligned. This bug is jittery crap that heisenbugs away if my debug printfs() have too many %s in them (then it works again). Asked for help on the musl, linux-sh, and toybox lists.

And of course, I got private email in reply to my list posts. As always:

On 2/16/24 20:22, [person who declined to reply publicly] wrote:
> Shot into the blue:
> try with qemu-user; mksh also currently has a regression test
> failing on a qemu-user sh4 Debian buildd but with one of the
> libcs only (klibc, incidentally, not musl, but that was with
> 1.2.4)

Hmmm, that does reproduce it much more easily, and I get more info:

Unhandled trap: 0x180
pc=0x3fffe6b0 sr=0x00000001 pr=0x00427c40 fpscr=0x00080000
spc=0x00000000 ssr=0x00000000 gbr=0x004cd9e0 vbr=0x00000000
sgr=0x00000000 dbr=0x00000000 delayed_pc=0x00451644 fpul=0x00000000
r0=0x3fffe6b0 r1=0x00000000 r2=0x00000000 r3=0x000000af
r4=0x00000002 r5=0x00481afc r6=0x407fffd0 r7=0x00000008
r8=0x3fffe6b0 r9=0x00456bb0 r10=0x004cea74 r11=0x3fffe6b0
r12=0x3fffe510 r13=0x00000000 r14=0x00456fd0 r15=0x407ffe88
r16=0x00000000 r17=0x00000000 r18=0x00000000 r19=0x00000000
r20=0x00000000 r21=0x00000000 r22=0x00000000 r23=0x00000000

Might be able to line up the PC with the mapped function with enough digging to find the failing instruction...

What IS a trap 0x180? Searching the sh4 software manual for "trap" says there's something called an exception vector... except "exception" has over 700 hits in that PDF and "exception vector" has two, neither of which are useful.

Ok, in qemu the string "Unhandled trap" comes from linux-user/sh4/cpu_loop.c which is printing the return code from cpu_exec() which is in accel/tcg/cpu-exec.c which is a wrapper for cc->tcg_opts->cpu_exec_enter() which is only directly assigned to by ppc and i386 targets, I'm guessing uses one of those curly bracket initializations in the others? According to include/hw/core/tcg-cpu-ops.h the struct is TCGCPUOps... Sigh, going down that path could take a while.

Alright, cheating EVEN HARDER:

$ grep -rw 0x180 | grep sh
hw/sh4/sh7750_regs.h:#define SH7750_EVT_ILLEGAL_INSTR 0x180 /* General Illegal Instruction */

What? I mean... WHAT? Really? (That macro is, of course, never used in the rest of the code.) But... how do you INTERMITTENTLY hit an illegal instruction? (What, branch to la-la land? The sigsetjmp() code doesn't branch!)

That email also said "It might just as well be just another qemu bug..." which... Maybe? It _smells_ like unaligned access, but I don't know _how_, and the structure IS aligned. I don't see how it's uninitialized anything since A) the sigsetjmp() function in musl writes into the structure without reading from it, B) adding a memset() beforehand doesn't change anything. If a previous line is corrupting memory... it's presumably not heap, because nothing here touches the heap. The "stack taking a fault to extend itself" theory was invalidated by confirming the failure case does not cross a page boundary. "Processor flags in a weird state so that an instruction traps when it otherwise wouldn't" is possible, but WEIRD. (How? What would put the processor flags in that state?)

Continuing the private email:

> There's also that whole mess with
> which affects {s,g}etcontext in glibc, maybe it applies
> somewhere within musl? (The part about what happens when
> a signal is delivered especially.)

Which is interesting, but musl's sigsetjmp.s doesn't have frchg or fschg instructions.

But what I _could_ try doing is building and testing old qemu versions, to see if that affects anything...

February 16, 2024

Broke down and added "riscv64::" to the architecture list, which built cross and native toolchains. (Because musl/arch only has riscv64, no 32 bit support.)

To add it to mkroot I need a kernel config and qemu invocation, and comparing qemu-system-riscv64 -M '?' to ls linux/arch/riscv/configs gives us... I don't know what any of these options are. In qemu there's shakti, sifive, spike, and virt boards. (It would be really nice if a "none" board could be populated with memory and devices and processors and such from the command line, but that's not how IBM-maintained QEMU thinks. There are "virt" boards that maybe sort of work like this with a device tree? But not command line options, despite regularly needing to add devices via command line options ANYWAY.) Over on the kernel side I dunno what a k210 is, rv32 has 32 in it with musl only supporting 64, and nommu_virt_defconfig is interesting but would have to be a static PIE toolchain because still no fdpic. (Maybe later, but I could just as easily static pie coldfire.)

(Aside: static pie on nommu means that running "make tests" is unlikely to complete because it launches and exits zillions of child processes, any of which can suddenly fail to run because memory is too fragmented to give a large enough contiguous block of ram. FDPIC both increases sharing (the text and rodata segments can be shared between instances, meaning there's only one of each which persist as toybox processes run and exit), and it splits the 4 main program segments apart so they can independently fit into smaller chunks of memory (the two writeable segments, three if you include stack, are small and can move independently into whatever contiguous chunks of free memory are available). So way less memory thrashing, thus less fragmentation, and way less load in general (since each instance of toybox doesn't have its own copy of the data and rodata segements) thus a more reliable system under shell script type load. This is why I'm largely not bothering with static pie nommu systems: I don't expect them to be able to run the test suite anyway.)

This leaves us with linux's riscv "defconfig", which I built and set running and ran FOREVER and was full of modules and I really wasn't looking forward to stripping that down, so I went "does buildroot have a config for this?" And it does: qemu_riscv64_virt_defconfig with the corresponding qemu invocation from board/qemu/riscv64-virt/readme.txt being "qemu-system-riscv64 -M virt -bios fw_jump.elf -kernel Image -append "rootwait root=/dev/vda ro" -drive file=rootfs.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -netdev user,id=net0 -device virtio-net-device,netdev=net0 -nographic" which... needs a bios image? Really? WHY? You JUST INVENTED THIS ARCHITECTURE, don't make it rely on LEGACY FIRMWARE.

But maybe this is an easier kernel .config to start with (less to strip down anyway), so I tried building it and of course buildroot wants to compile its own toolchain, within which the binutils build went: checking for suffix of object files... configure: error: in `/home/landley/buildroot/buildroot/output/build/glibc-2.38-44-gd37c2b20a4787463d192b32041c3406c2bd91de0/build': configure: error: cannot compute suffix of object files: cannot compile

Right, silly me, it's a random git snapshot that's weeks old now, so I did a "git pull" and ran it again and... exact same failure. Nobody's built 64 bit riscv4 qemu image in buildroot in multiple weeks, or they would have noticed the build failure.

Open source itanic. It's not a healthy smell.

(WHY is it building a random glibc git snapshot? What's wrong with the release versions? Buildroot can PATCH STUFF LOCALLY, overlaying patches on top of release versions was one of the core functions of buildroot back in 2005. Right, ok, back away slowly...)

February 15, 2024

Rich confirmed that he intentionally broke another syscall because he doesn't like it, and wants all his users to change their behavior because it offends him. So I wrapped the syscall.

But the problem with fixing up hwclock to use clock_settime() and only call settimeofday() for the timezone stuff (via the wrapped syscall, yes this is a race condition doing one time update with two syscalls) is now I need to TEST it, and it's one of those "can only be done as root and can leave your host machine in a very unhappy state". The clock jumping around (especially going backwards) makes various systemwide things unhappy, and doing it out from under a running xfce and thunderbird and chromium seem... contraindicated.

February 14, 2024

Emailed Maciej Rozycki to ask about the riscv fdpic effort from 2020 and got back "Sadly the project didn't go beyond the ABI design phase."

Since arm can (uniquely!) do fdpic _with_ mmu, I tried to tweak the sh4 config dependencies in fs/Kconfig.binfmt in the kernel to move superh out of the !MMU group and next to ARM, and the kernel build died with binfmt_elf_fdpic.c:(.text+0x1b44): undefined reference to `elf_fdpic_arch_lay_out_mm'.

Emailed the superh and musl mailing lists with a summary of my attempts to get musl-fdpic working on any target qemu-system can run. (Not including the or1k/coldfire/bamboo attempts that, it turns out, don't support fdpic at all.) Hopefully SOMEBODY knows how to make this work...

February 13, 2024

Emailed linux-kernel about sys_tz not being namespaced, cc-ing two developers from last year's commit making the CLONE_NEWTIME flag actualy work with clone().

I don't expect a reply. As far as I can tell the kernel development community is already undergoing gravitational collapse into a pulsar, which emits periodic kernels but is otherwise a black hole as far as communication goes. Members-only.

The clone flag that didn't work with clone() was introduced back in 2019 and stayed broken for over 3 years. Linux's vaunted "with enough eyeballs all bugs are shallow" thing relied on hobbyists who weren't just focusing on the parts they were paid to work on. You don't get peer review from cubicle drones performing assigned tasks.

I am still trying to hobbyist _adjacent_ to the kernel, and it's like being on the wrong side of gentrification or something. The distain is palpable.

February 12, 2024

So glibc recently broke settimeofday() so if you set time and timezone at the same time it returns -EALLHAILSTALLMAN.

But if you DON'T set them together, your clock has a race window where the time is hours off systemwide. And while "everything is UTC always" is de-facto Linux policy, dual boot systems have to deal with windows keeping system clock in local time unless you set an obscure registry entry which isn't universally honored. Yes this is still the case on current Windows releases.

Digging deeper into it, while a lot of userspace code uses the TZ environment variable these days, grep -rw sys_tz linux/* finds it still used in 36 kernel source files and exported in the vdso. The _only_ assignment to it is the one in kernel/time/time.c from settimeofday(), so you HAVE to use that syscall to set that field which the kernel still uses.

When musl switched settimeofday() to clock_settime() in 2019 it lost the ability to assign to sys_tz at all, which I think means it lost the ability to dual boot with most windows systems?

The other hiccup is sys_tz didn't get containerized when CLONE_NEWTIME was added in 2019 so it is a systemwide global property regardless of namespace. Then again they only made it work in clone rather than unshare last year so that namespace is still cooking.

The real problem is the actual time part of settimeofday() is 32 bit seconds, ala Y2038. That's why musl moved to the 64 bit clock_settime() api. The TZ environment variable assumes the hardware clock is returning utc. The point of sys_tz is to MAKE it return UTC when the hardware clock is set wrong because of windows dual booting.

February 11, 2024

The paper Decision Quicksand: how Trivial Choices Suck Us In misses an important point: when the difference in outcome is large, it's easier to weigh your options. When the difference in outcome is small, it's harder to see/feel what the "right thing" is because the long-term effect of the decision is buried in noise. So more important questions can have a clearer outcome and be easier to decide, less important ones tend to get blown around by opinion. (Hence the old saying, "In academia the fighting is so vicious because the stakes are so small". See also my longstanding observation that open source development relies on empirical tests to establish consensus necessary for forward progress, subjective judgements from maintainers consume political capital.)

The classic starbucks menu decision paralysis is similar (there's no "right choice" but so many options to evaluate) but people usually talk about decision fatigue when they discuss that one (making decisions consumes executive function). These are adjacent and often conflated factors, but nevertheless distinct.

February 10, 2024

Sigh, shifting sands.

So gentoo broke curses. The gnu/dammit loons are making egrep spit pointless warnings and Oliver is not just trying to get me to care, but assuming I already do. Each new glibc release breaks something and this time it's settimeofday(), which broke hwclock.

And I'm cc'd on various interminable threads about shoving rust in the kernel just because once upon a time I wrote documentation about the C infrastructure they're undermining.

I can still build a kernel without bpf, because (like perl) it's not in anything vital to the basic operation of a Linux compute node. If the day comes I can't build a kernel without rust, then I stay on the last version before they broke it until finding a replacement _exactly_ like a package that switched to GPLv3. I have never had a rust advocate tell me a GOOD thing about Rust other than "we have ASAN too", their pitch is entirely "we hate C++ and confuse it with C so how dare you not use our stuff, we're as inevitable as Hillary Clinton was in 2016"; kind of a turn-off to be honest. They don't care what the code does, just that it's in the "right" langauge. This was not the case for go, swift, zig, oberon, or any of the others vying to replace C++. (Which still isn't C, and I'm not convinced there's anything wrong with C.)

All this is a distraction. I'm trying to build towards goals, but I keep having to waste cycles getting back to where I was because somebody broke stuff that previously worked.

February 9, 2024

Finally checked what x86-64 architecture generation my old laptop is, and it's v2. Presumably upgrading from my netbook to this thing got me that far (since the prebuilt binaries in AOSP started faulting "illegal instruction" on my old netbook circa 2018, and this was back when I was trying to convince Elliott the bionic _start code shouldn't abort() before main if stdin wasn't already open so I kinda needed to be able to test the newest stuff...)

Meaning the pointy haired corporate distros like Red Hat and Ubuntu switching to v3 does indeed mean this hardware can't run them. Not really a loss, the important thing is devuan/debian not abandoning v2. (Updating devuan from bronchitis->diptheria presumably buys me a few years of support even if elephantitis were to drop v2 support. I _can_ update to new hardware, just... why?)

Went to catch up on the linux-sh mailing list (superh kernel development) and found that half the "LTP nommu maintainer" thread replies got sorted into that folder due to gmail shenanigans. (Remember how gmail refuses to send me all the copies of email I get cc'd on but also get through a mailing list, and it's potluck which copy I get _first_? Yeah, I missed half of another conversation. Thanks gmail!)

There's several interesting things Greg Ungerer and Geert Uytterhoeven said that I totally would have replied to back on January 23rd... but the conversation's been over a couple weeks now. Still, "you can implement regular fork() no nommu with this one simple trick" is an assertion I've heard made multiple times, but nobody ever seems to have _done_, which smells real fishy.

Arguing with globals.h generation again: sed's y/// is terribly designed because it doesn't support ranges so converting from lower to upper case (which seems like it would be the DEFINITION of "common case") is 56 chars long (y///+26+26), and hold space is terribly designed because "append" inserts an un-asked-for newline and the only way to combine pattern and hold space is via append. With s/// I can go \1 or & in the output, but there's no $SYNTAX to say "and insert hold space here" in what I'm replacing. You'd think there would be, but no. (More than one variable would also be nice, but down that path lies awk. And eventually perl. I can see drawing the line BEFORE there.)

But some of this is REALLY low hanging fruit. I don't blame the 1970s Unix guys who wrote the original PDP-11 unix in 24k total system ram (and clawed their way up to 128k on its successor the 11/45), but this is gnu/sed. They put in lots of extensions! Why didn't they bother to fix OBVIOUS ISSUES LIKE THAT? Honestly!

My first attempt produced 4 lines of output for each USE() block, which worked because C doesn't care, but looks terrible. Here's a variant that glues the line together properly: echo potato | sed -e 'h;y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/;H' -e 'g;s/\n/ /;s/\([^ ]*\) \(.*\)/USE_\2(struct \1_data \1;)/'

Which is mildly ridiculous because all it's using hold space for is somewhere to stash the lower case string because I can't tell y/// to work on PART of the current line: the /regex/{commands} syntax says which entire lines to trigger on, and s/// doesn't have a way to trigger y/// or similar on just the text it's matched and is replacing.

(And while I'm complaining about things sed SHOULD let you do, why can't I match the first or last line WITHIN a range? The 1,$p ranges don't _nest_, so in sed -n '/^config /,${/^ *help/,/^[^ ]/{1d;$d;p}}' toys/*/ls.c | less the 1d;$d is irrelevant because that's "whole file", not "current match range". I want a syntax to say "this range is relative to the current scope" which would be easy enough for me to implement in the sed I wrote, but wouldn't be PORTABLE if I did that. It's like the gnu/dammit devs who added all these extensions never tried to actually USE sed in a non-trivial way...)

But eh, made it work. And it runs on toys/*/*.c in a single sed invocation (and then a second sed on the output of the first to generate the GLOBALS() block from the previous list of structure definitions) and is thus WAY faster than the "one sed call per input file" it was doing before. Fast enough I can just run it every time rather than doing a "find -newer" to see if I need to run it. (And, again, potentially parallelizable with other headers being generated.)

But that just cleaned up generation of the header with the wrong USE() macros, which still build breaks. I need per-file USE() macros, or some such. Back up, design time. (Meaning "restate the problem from first principles and see where telling that story winds up".)

The GLOBALS() block is unique per-file, and shared by all commands using the same file. Previously the name of the block was the name of the file, but sed working on toys/*/*.c doesn't KNOW the name of the current file it's working on (ANOTHER thing the gnu clowns didn't extend!) and thus I'm using the last #define FOR_walrus macro before each GLOBALS() block (once again: sed "hold space", we get ONE VARIABLE to save a string into) as the name of both the structure type name and the name of the instance of that struct in the union. So now instead of being the name of the file, it's the name of the first command in the file, which is fine. As long as it's unique and the various users can agree on it.

Which means the manual "#define TT.filename" overrides I was doing when the "#define FOR_command" didn't match can go away again. (And need to, they're build breaks.) So that's a cleanup from this...

But there's still the problem that the first command in the file can be switched off in menuconfig, but a later command in the same file can be enabled, so we're naming the struct after the first command, but a USE() macro with the name OF that command would be disabled and thus yank the structure out of the union, resulting in a build break.

The REASON I want to yank the structure out of the union is so the union's size is the ENABLED high water mark, not the "everything possible command including the ones in pending" high water mark.

Oh, but I'm generating the file each time now, which means I don't need the USE() macros. Instead I need to generate globals.h based on the toys/*/*.c files that are switched on by the current config, meaning the sed invocation takes $TOYFILES as its input file list instead of the wildcard path. There's an extra file (main.c) in $TOYFILES, but I don't care because it won't have a GLOBALS() block in it. Generating $TOYFILES already parsed .config earlier in so I don't even have to do anything special, just use data I already prepared.

February 8, 2024

So scripts/ writes generated/globals.h via a pile of sed invocations against toys/*/*.c and alas it can't do just ONE sed invocation but has to loop calling sed against individual files because it needs to know the current input filename, which slows it down tremendously _and_ doesn't parallelize well, but anyway... I just modified it to wrap a USE_FILENAME() macro around each "struct filename_struct filename;" line in union global_union {...} this; at the end of the file, in hopes of shrinking sizeof(this) down to only the largest _enabled_ GLOBALS() block in the current config. (So the continued existence of ip.c in pending doesn't set a permanent high water mark according to scripts/probes/GLOBALS.)

Unfortunately, while the current filename is used to name the structure and the union member, and TT gets defined to TT.filename even with multiple commands in the same file... there's no guarantee a config FILENAME entry actually exists, which means there's no guarantee the USE_FILENAME() macro I'm adding is #defined. This showed up in git.c, and then again in i2ctools.c: lots of commands, none of them with the same name as the file.

Need to circle back and redesign some stuff to make this work...

Ok, second attempt: use the #define FOR_blah macros instead of the filename, which _does_ allow a single sed invocation to work on toys/*/*.c in one go, although I have to do a lot of hold space shenanigans and use y/// with the entire alphabet listed twice instead of "tr a-z A-Z" to do the upper and lower case variants, but I made the header file I wanted to make! Which now doesn't work for a DIFFERENT reason: if the first command in the file isn't enabled, the USE_BLAH() thing removes the TT struct from the union, and the second command in the same file attempting to use the shared structure gets an undefined member error dereferencing TT.

Which... um, yeah. That's what would happen. I need a USE() macro that's X or Y or Z, which I haven't got logic for. I can add a new hidden symbol and do either selects or depends, but I kinda want to SIMPLIFY the kconfig logic instead of complicating it.

Long ago when I was maintaining busybox, I proposed factoring out the Linux kernel's "kconfig" so other packages can use it, about the way "dtc" (the device tree compiler) eventuallly got factored out. This fell apart because I wanted to keep it in the kernel source but make it another thing the kernel build could install, and Roman Zippel or whoever it was wanted to remove it from the kernel and make a new package that was a build dependency of the linux kernel, which was such a horrible idea that NOT EVER RE-USING THIS CODE was better than adding a build dependency to the kernel, so the idea died. (I note that dtc is still in Linux, despite also being an external project. They didn't do the "make install_dtc" route from the linux source, but they didn't add the dependency either. Instead they maintain two projects in parallel forever, which is what the then-kconfig maintainer insisted was impossible. He's also the guy who rejected properly recognizing miniconfig as a thing unless I did major surgery on the kconfig.c files. I waited for him to go away. He did eventually, but I haven't bothered to resubmit. The perfect is the enemy of the good, and if my only option is the Master Race I'm ok siding with extinction. Kinda my approach to Linux development in a nutshell, these days.)

And since factoring out kconfig DIDN'T happen, and I've instead got an ancient snapshot of code under an unfortunate license that has nothing to do with modern linux kconfig (which became TURING COMPLETE and can now rm -rf your filesystem, bravo), I need to discard/rewrite it and want to reproduce as little as possible. The scripts/mkflags.c code was supposed to be the start of that, but that wound up using output digested by sed. And then the scripts/config2help.c code was going to be the start of a kconfig rewrite, but that stalled and started to back itself out again at the design level because a zillion sub-options is a bad thing. (Somebody once contributed the start of one written in awk. I still haven't got an awk.)

I haven't reopened this can of worms recently, but changing the config symbol design requirements is... fraught. What do I want this to DO...

February 7, 2024

Sigh, I needed a second email account and went "my phone demanded a google account to exist for Android, I'll use that one"... and was then waiting for the email to arrive for 2 weeks. Today they texted me about it and I investigated and "auto-sync" is turned off, so of course I'd never get a notification or see a new email in the list: I had to do the "pull down" guesture to load new emails. (I remember this! Same problem came up last time I tried to use this app some years back, when I still had a work gmail account on the phone for the weekly google hangouts calls that became google meet calls when hangouts joined the google graveyard and we were forced to migrate and I needed an updated link from an email...)

I went into the settings to turn auto-sync back on, along the way turning off two new "we're sending all your data to google to train our chatgpt-alike and sell to advertisers by calling it personalization" options it grew and auto-enabled since the last time I was there (because if you never had the chance to say no, it's not a lack of consent?), but turning on auto-sync has a pop-up:

Changes you make to all apps and accounts, not just Gmail, will be synchornized between the web, your other devices, and your phone. [Learn more]

And now I remember why it was turned OFF. (And why I usually create a new gmail account every time I get a new phone, discarding the old history.) You do not get to flush every photo I take of my cat to your cloud service as a condition of checking email. I don't care what the bribe is, that's microsoft-level creepy bundling and monopoly leverage and yes disabling it renders YOUR phone app unusable which is a YOU problem, that's why I wasn't using that email account for anything before now.

This round of gmail being creepy on my phone is seperate from gmail being buggy recently on the account I use on my laptop via pop3 to fetch email sent to my domain. They're not the same account, and the only way google ever has to connect the two is intrusive data harvesting. Of a kind that occasionally makes it confuse me with my father, who saddled me with his name and a "junior" which is why I started getting AARP offers in my 30's. Which admittedly had some pretty good discounts in the brochure, but no, they had me confused with someone else over a thousand miles away.

(Ok, the AARP thing was because when I moved out of Austin as my mother was dying and didn't get a new place there for a year, I had my mail forwarded to my father's place in pennsylvania. And then had it forward from there to the new place in Austin when I moved back. And wound up getting more than half his mail because of similar names and disabled the forwarding fairly quickly (he' just box up and mail me my accumulated junk mail every few weeks), but places like AARP had voraciously "updated" based on scraps of misinformation to TRACK ITS PREY... and wouldn't accept "no". This was years before "FAANG" started doing it, although I dunno why netflix is considered intrusive in that acronym? I keep forgetting I _have_ that, mostly it's Fade watching.)

So yeah, the gmail phone app's useless because they intentionally refused to offer an "automatically notice new email on the server" option that does NOT "constantly send every photo you take and random audio recordings to data harvesting servers even if you never open this email app again".

The reason I needed the second email account is the second room of Fade's apartment up in minneapolis has been empty since mid-pandemic (they were assigning her roommates until then, but her last one moved back in with her family to ride out the pandemic, and it's been empty for well over a year now), and we asked her front office and they made us a very good deal on a 6 month lease through August, when we might be moving anyway depending on where Fade gets a job. (Now that she's graduated, she got piecemeal teaching work for the spring semester but is also job-hunting for something more permanent.) Which is why I'm trying to sell the house and move up there. Fuzzy's moving back in with her father (who's old and in the hospital way too much and could use more looking after anyway, she's been visiting him on weekends already, he lives up in Leander about a five minute drive from the far end of Austin's Light Tactical Rail line), and she's taking the geriatric cat with her.

Fade's made it clear she's never moving back to a state that wants her to literally die of an ectopic pregnancy, so we were going to sell the house at some point anyway, and "timing the market" is another phrase for "reading the future", so now's as good as any. (Last year would have been way better. Next year could be anything.)

The second email account came in because I was the "guarantor" on her lease for the first account, since she was a student and obviously student housing involves a parent or similar co-signing, doesn't it? Except with my email already in the system _that_ way, me actually signing up to get a room there confused their computer deeply, so to apply to RENT there I had to create a new account, which required a new email address... (I can bypass "guarantor" by just paying multiple months in advance.)

I continue to break everything. (And just now trying to e-sign the lease document, I noticed the "download a PDF copy" link was on the first page but hitting the checkbox to accept electronic delivery advanced to the second page, and hitting the back button put me back in the email, and clicking on the link again said it had already been used and was thus expired... Eh, the usual. Fade's handling it.)

February 6, 2024

Alas, devuan doesn't seem to have qemu-deboostrap (anymore?), so trying to reverse engineer it to set up an arm64 VM image, the root filesystem part looks like:

$ dd if=/dev/zero of=arm64-chimaera.img bs=1M count=65536
$ /sbin/mkfs.ext4 arm64-chimaera.img
$ mkdir sub
$ sudo mount arm64-chimaera.img sub
$ sudo debootstrap --arch=arm64 --keyring=/usr/share/keyrings/devuan-archive-keyring.gpg --verbose --foreign chimaera sub
$ sudo umount sub

And then fishing a kernel out of the network installer and booting the result:

$ wget -O arm64-vmlinux
$ qemu-system-aarch64 -M virt -cpu cortex-a57 -m 2048 "$@" -nographic -no-reboot -kernel arm64-vmlinux -append "HOST=aarch64 console=ttyAMA0 root=/dev/sda init=/bin/sh" -drive format=raw,file=arm64-chimaera.img

Which died because the ext4 driver is not statically linked into that kernel image and thus can't mount the root=. In fact the list of drivers it tried was blank, it has NO drivers statically linked in. Which implies you have to insmod from initramfs in order to be able to mount any filesystem from a block device, which is just INSANE. Swapping in the kernel mkroot builds for the aarrcchh6644 target, and using root=/dev/vda instead (because different drivers and device tree), I got a shell prompt and could then run:

# mount -o remount,rw /
# /debootstrap/debootstrap --second-stage
# echo '/dev/vda / ext4 rw,relatime 0 1' > /etc/fstab
# ifconfig lo
# ifconfig eth0
# route add default gw
# apt-get install linux-image-arm64

Which successfully installed packages from the net into the VM, but I'm not sure that last install is actually helpful? It installed a kernel, but didn't install a bootloader. Can qemu boot if I just give it the -hda and not externally supply a -kernel?

$ qemu-system-aarch64 -M virt -cpu cortex-a57 -m 2048 "$@" -no-reboot -drive format=raw,file=arm64-chimaera.img

Nope, looks like it did not. Or doesn't know how to produce any output? It popped up a monitor window but not a display window, and didn't produce serial console output. And fishing that kernel out of the ext4 filesystem and passing it to -kernel in qemu means I'd also need to pass -initrd in as well (still assuming it does not have any static filesystem drivers), and then what is it trying to display to? Where exactly does it think it's getting its device tree from? (If it's statically linked into the kernel then I haven't got one to feed to qemu to try to PROVIDE those devices. And still no way to add console= to point at serial console...)

Eh, stick with the mkroot kernel for now I guess. This should let build native arm hosted toolchains, both 32 and 64 bit, for next release. It would be way better to use one of the orange pi 3b actual hardware devices I can plug into the router via cat5 and leave on 24/7, that can do the qemu regression testing via cron job and everything. Plus my home fiber's faster than the wifi so stuff physically plugged into the router doesn't even count against the bandwidth we're actually using, it could act as a SERVER if they didn't go to such extreme lengths to make you pay extra for a static IP (four times the base cost of the service, for no reason except "they can").

But I don't trust the Orange Pi's chinese kernel not to have spyware in it (like... 30% chance?) and I haven't sat down to hammer a vanilla kernel into giving me serial output and a shell prompt on the hardware yet. Mostly because I can't power an orange pi from my laptop USB the way I can a turtle board, it wants a 2 amp supply and the laptop wants to give half an amp. I mostly think of working on it when I'm out-with-laptop...

February 5, 2024

I fell behind on email over the weekend (dragged the laptop along but didn't connect it to the net), and gmail errored out a "denied, you must web login!" pop-up during my first pop3 fetch to catch up.

So I went to the website and did a web login, and it went "we need need need NEED to send you an sms, trust us bud honest this will be the only one really please we just GOTTA"... I have never given gmail a phone number, and refuse to confirm or deny its guess.

So I clicked the "get help" option... which also wanted me to login. So I did and it said it needed to verify the account, and this time offered to contact my next-of-kin email (it's 2am, she's asleep).

So I decided to wait (and maybe vent on mastodon a bit, and look up what I need to do in dreamhost to switch my mx record to point at the "you are a paying customer" servers I get with my domain and website rather than the "you are the product" servers... yeah I'd lose the accumulated weekend of email but the main reason I _hadn't_ done it was screwing up and losing access to email for a bit would be annoying and here gmail has DONE IT FOR ME), and messed with some other windows for a bit, then out of habit switched desktops and clicked the "get messages" button in thunderbird...

And it's downloading email again just fine. (And did so for the 5 logins it took to grab a couple hundred messages at a time and clear the backlog: linux-kernel and qemu-devel and so on are high traffic lists and their pop3 implementation has some arbitrary transaction limit.) And it looks like a reasonable weekend's worth of email...? Nothing obviously wrong?

I haz a confused.

I don't _really_ want to move email providers at the same time I'm trying to sell a house and move, but... leaving this alone feels kind of like ignoring termite damage. Some things you descend upon with fire. Gmail is _telling_ me that it's unsafe.

I'm _pretty_ sure this is their out of control data harvesting trying to connect together pieces of their social graph to map every human being to a phone that has a legal name and social security number using it, and can be tracked via GPS coordinates 24/7. If there WAS any actual "security" reason behind it, it obviously didn't WORK. I got access back without ever providing more than the old login. I didn't get WEB access back, but that just means I can't fish stuff out of the spam filter. So... greedy or incompetent?

But why _now_? What triggered it...

February 4, 2024

I have a pending pull request adding port probing to netcat. It adds two flags: -z is a "zero I/O mode" flag where it connects and closes the connection immediately, which isn't really zero I/O because a bunch of TCP/IP packets go through setting up and tearing down the connection so the other side totally notices. Also a separate -v flag that just prints that we've connected successfully, which seems weird because we print a big error message and exit when we DON'T connect successfully, so saying that we did seems redundant.

The patch didn't invent these options, I checked and both are in busybox's "nc_bloaty" which seems to be a full copy of Netcat 1.10, because busybox has multiple different implementations of the same command all over the place in the name of being small and simple. In theory nc_bloaty.c is Hobbit's netcat from the dawn of time which Denys nailed to the side of busybox and painted the project's color in 2007, although maybe it's had stuff added to it since, I haven't checked.

(Sorry, old argument from my busybox days: making cartriges for an Atari 2600 and coin-op machines in a video arcade are different skillsets, and gluing a full sized arcade cabinet to the side of an atari 2600 is NOT the same as adding a cartrige to its available library. As maintainer I strongly preferred fresh implementations to ports because license issues aside, if it already existed and we couldn't do BETTER why bother? Hobbit's netcat is actually pretty clean and slim as external programs you could incorporate go, but Vodz used to swallow some whales.)

Anyway, that's not the part that kept me from merging the netcat patch from the pull request into toybox the first day I saw it. Nor is the fact I have the start of nommu -t support using login_tty() in my tree (another thing I need a nommu test environment for) and have to back it out to apply this.

No, the head scratcher is that the name on the email address of the patch I wget by adding ".patch" to the github URL is "कारतोफ्फेलस्क्रिप्ट™" which Google Translate says is Marathi for "Kartoffelscript" with a trademark symbol. Marathi is the 4th most widely spoken language in India (about 90 million speakers), and Kartoffel is german for Potato.

I mean, it's not some sort of ethnic slur or exploit or something (which is why I checked the Thing I Could Not Read), so... yay? I guess I could apply that as is, I'm just... confused.

And I'm also looking at the OTHER available options in the bigger netcat's --help output and going "hex dump would be lovely". I don't need a "delay interval" because the sender of data can ration it easily enough, and each call to netcat does a single dialout so the caller can detect success/fail and delay in a loop if they're manually scanning a port range for some reason. (Look, nmap exists.) I'm reluctant to add -b "allow broadcasts" because... what's the use case here? I can do that one if somebody explicitly asks for it, which means they bring a use case.

February 3, 2024

Moving is exhausting, and so far I've barely packed up one bookcase.

Follow-up to yesterday's email, my correspondent is still looking into the IP status of older architectures, sending me a quote from a reuters article:

> "In 2017, under financial pressure itself, Imagination Technologies sold the
> MIPS processor business to a California-based investment company, Tallwood
> Venture Capital.[47] Tallwood in turn sold the business to Wave Computing in
> 2018,[48] both of these companies reportedly having their origins with, or
>l ownership links to, a co-founder of Chips and Technologies and S3 Graphics.[49]
> Despite the regulatory obstacles that had forced Imagination to divest itself of
> the MIPS business prior to its own acquisition by Canyon Bridge, bankruptcy
> proceedings for Wave Computing indicated that the company had in 2018 and 2019
> transferred full licensing rights for the MIPS architecture for China, Hong Kong
> and Macau to CIP United, a Shanghai-based company.[50]"

As far as I can tell mips imploded because of the PR backlash from screwing over Lexra.

Mips used to be all over the place: Linksys routers were mips, Playstation 2 was mips, the SGI Irix workstations were mips... Then they turned evil and everybody backed away and switched to powerpc and arm and such.

China didn't back away from mips, maybe due to a stronger caveat emptor culture and maybe due to not caring about lawsuits that couldn't affect them. The Lexra chips that got sued out of existence here were still widely manufactured over there (where US IP law couldn't reach at the time; that's how I got involved, somebody was importing a chinese router and trying to update its kernel to a current version, and it needed an old toolchain that didn't generate the 4 patented instructions). China's Loongson architecture recently added to the linux kernel is a Mips fork dating back to around 2001.

Yes, "homegrown clone". Don't ask, I don't know. See also this and this for the arm equivalent of what china did to mips. Any technology sent to china gets copied and then they claim to have invented it.

February 2, 2024

I get emails. I reply to emails. And then I cut and paste some long replies here:

> Is there an expiration on ARM patents such as the ARM7TDMI and ARM9? With the
> SH-2 being developed in 1992, and expiring in 2015, I am curious if the ARM7
> would be synthesizable.

In theory?

Ten years ago there was a big push to do open hardware arm, and Arm Inc. put its foot down and said they didn't mind clones of anything _before_ the ARMv3 architecture (which was the first modern 32 bit processor and the oldest one Linux ran on) but if you tried to clone ARMv3 or newer they would sue.

That said, the point of patents is to expire. Science does not advance when patents are granted, it advances when they expire. Lots of product introductions simultaneously from multiple vendors, such as iphone and arm launching within 18 months of each other, can be traced back to things like important touchscreen patents expiring.

The problem is, the big boys tend to have clouds of adjacent patents and patent-extension tricks, such as "submarine" patents where they file a patent application and then regularly amend it so it isn't granted promptly but instead remains an application for years, thus preventing its expiration clock from starting since it expires X years after being _granted_, not applied for. (But prior art is from before the _application_ for the patent.) Or the way drug companies patented a bunch of chemicals that were racemic mixtures, and then went back and patented just the active isomer of that chemical, and then sued anybody selling the old raecemic mixtures because it _contains_ the isomer. (Which CAN'T be legal but they can make you spend 7 years in court paying millions annually to _prove_ it. The point of most Fortune 500 litigation isn't to prove you're right, it's to tie the other side up in court for years until you bankrupt them with legal fees, or enough elections go by for regulatory capture to Citizens United up some pet legislators who will replace the people enforcing the law against you.)

Big companies often refuse to say exactly what all their relevant patents ARE. You can search yourself to see what patents they've been granted, but did they have a shell company, or did they acquire another company, so they control a patent their name isn't on? And this is poker: they regularly threaten to sue even when they have nothing to sue with. Bluffing is rampant, and just because they're bluffing doesn't mean they won't file suit if they think you can't afford a protracted defense. (Even if they know they can't win, they can delay your product coming to market for three years and maybe scare away your customers with "legal uncertainty".)

You can use existing hardware that was for sale on known dates, and publications that would invalidate patents that hadn't yet been filed (there was some attempt to bring submarine patents under control over the past couple decades, but it's reformers fighting against unguillotined billionaires with infinitely deep pockets and they have entire think tanks and lawfirms on retainer constantly searching for new loopholes and exploits).

My understanding (after the fact and not hugely informed) was that a big contributor to J-core happening was going to Renesas with old hardware and documentation to confirm "anything implementing this instruction set has to have expired because this came out on this date and either the patent had already been granted or this is prior art invalidating patents granted later", and when Renesas still insisted on license agreements demanding per-chip royalties, refusing to sign and telling them to sue. Which they did not, either because they were bluffing or the cost/benefit analysis said it wasn't worth it. But standing up to threats and being willing to defend against a lawsuit for years if necessary was an important part of the process, because the fat cats never STOP trying to intimidate potential competitors.

The J-core guys could have chosen any processor from that era to do the same thing with: m68k, Alpha, etc. And in fact they initially started trying to use an existing Sparc clone but it didn't do what they needed. The sparc was memory inefficient and power hungry, which led to the research into instruction set density, which led to superh as the sweet spot. In fact superh development started when Motorola's lawyers screwed over Hitachi on m68k licensing, so their engineers designed a replacement. x86 is even more instruction dense due to the variable length instructions, but requires a HUGE amount of circuitry to decode that mess at all efficiently. Starting with the Pentium it has a hardware frontend that converts the x86 instructions into internal RISC instructions and then actually executes those. (That's why RISC didn't unseat x86 like everybody expected it would: they converted their plumbing to RISC internally with a translation layer in front of it for backwards compatibility. The explosion of sparc, alpha, mips, powerpc, and so on all jockeying to replace x86... didn't. They only survived at the far ends of the performance bell curve, the mainstream stayed within the network effect feedback loop of wintel's dominant market share. Until phones.)

Arm Thumb, and thus Cortex-m, was a derivative of superh. To the point it got way cheaper when the superh patents expired and arm didn't have to pay royalties to renesas anymore, which is why that suddenly became cheap and ubiquitous. But from a hardware cloning perspective, keep in mind "thumb" was not present in the original arm processors. Also, things like "arm 7" and "arm 9" are chips, not different instruction set architectures. (Pentium III and Pentium M were both "i686".) The instruction set generations have a 'v" in them: armv1, armv2, armv3, up through armv8.

It goes like this:

Acorn Risc Machines started life as a UK company that won a contract with the BBC to produce the "BBC Micro" back in 1981 alongside an educational television program teaching kids how to compute. Their first machine was based on the MOS 6502 processor, same one in the Commodore 64 and Apple II and Atari 2600: that had 8-bit registers and 16 bit memory addressing, for 64k RAM total. (The story of MOSTEK is its own saga, the 6502 was to CPU design a bit like what Unix was to OS design, it showed people that 90% of what they'd been doing was unnecessary, and everybody went "oh".)

ARMv1 came from acorn's successor machine the Archimedes (released in 1987, circa the Amiga) which used a home-grown CPU that had 32 bit registers (but only 26 bit addressing, 64 megs max memory). ARMv2 added a hardware multipler and a faster interrupt mode (which only saved half the registers), but still 26 bit addressing. Think of ARMv1 and ARMv2 as a bit like the 286 processor in intel-land: a transitional attempt that wound up as a learning experience, and fixing what was wrong with them means backwards compatibility doesn't go back that far.

The oldest one Linux runs on is ARMv3, which did a proper flat 32 bit address space, and is generally considered the first modern ARM architecture. ARMv4 introduced a bunch of speedups, and also a way of announcing instruction set extensions (like different FPUs and such) so you could probe at runtime what was available. These extensions were indicated by adding a letter to the architecture. The most important extension was the "thumb" instruction set, ARMv4T. (But there was also some horrible java accelerator, and so on.) ARMv5 had various optimizations and integrated thumb so it wasn't an extension anymore but always guaranteed to be there: recompiling for ARMv5 speeds code up about 25% vs running ARMv4 code on the same processor, I don't remember why. ARMv6 added SMP support which is mostly irrelevant outside the kernel so you generally don't see compilers targeting it because why would they? And then ARMv7 was the last modern 32 bit one, another big speedup to target it with a compiler, but otherwise backwards compatible ala i486/i586/i686. All this stuff could still run ARMv4T code if you tried, it was just slower (meaning less power efficient when running from battery, doing the "race to quiescence" thing).

Along the way Linux switched its ARM Application Binary Interface to incorporate Thumb 1 instructions in function call and system call plumbing, the old one retroactively became known as "OABI" and the new (extended) one is "EABI", for a definition of "new" that was a couple decades ago now and is basically ubiquitious. Support for OABI bit-rotted over the years similarly to a.out vs ELF binaries, so these days ARMv4T is pretty much the oldest version Linux can run without serious effort. (For example, musl-libc doesn't support OABI, just EABI.) In THEORY a properly configured Linux kernel and userspace could still run on ARMv3 or ARMv4 without the T, but when's the last time anybody regression tested it? But if ARMv3 was your clone target, digging that stuff up might make sense. Easier to skip ahead to ARMv4T, but A) lots more circuitry (a whole second instruction set to implemment), B) probably more legal resistence from whoever owns ARM Inc. this week.

And then ARMv8 added 64 bit support, and kept pretending it's unrelated to historical arm (stuttering out aarrcchh6644 as a name with NO ARM IN IT), although it still had 32 bit mode and apparently even a couple new improvements in said 32 bit mode so you can compile a 32 bit program for "ARMv8" if you try and it won't run on ARMv7. Dunno why you WOULD though, it's a little like x32 on intel: doesn't come up much, people mostly just build 64 bit programs for a processor that can't NOT support them. Mostly this is a gotcha that when you tell gcc you want armv8-unknown-linux instead of aarrcchh6644-talklikeapirateday-linux you get a useless 32 bit toolchain instead of what you expected. Sadly linux accepts "arm64" but somehow the "gnu gnu gnu all hail stallman c compiler that pretends that one of the c's retroactively stands for collection even though pcc was the portable c compiler and icc was the intel c compiler and tcc was the tiny c compiler" does not. You have to say aarrcchh6644 in the autoconf tuple or it doesn't understand.

So what's Thumb: it's a whole second instruction set, with a mode bit in the processor's control register saying which kind it's executing at the moment. Conventional ARM instructions are 32 bits long, but thumb instructions are 16 bits (just like superh). This means you can fit twice as many instructions in the same amount of memory, and thus twice as many instructions in each L1 cache line, so instructions go across the memory bus twice as fast... The processor has a mode bit to switch between executing thumb or conventional ARM instructions, a bit like Intel processors jumping between 8086 vs 80386 mode, or 32 vs 64 bit in the newer ones.

Note that both Thumb and ARM instruction modes use 32 bit registers and 32 bit addresses, this just how many bits long is each _instruction_. The three sizes are unrelated: modern Java Virtual Machines have 8 bit instructions, 32 bit registers, and 64 bit memory addresses. Although you need an object lookup table to implement a memory size bigger than the register size, taking advantage of the fact a reference doesn't HAVE to be a pointer, it can be an index into an array of pointers and thus "4 billion objects living in 16 exabytes of address space". In hardware this is less popular: the last CPU that tried to do hardware-level object orientation was the Intel i432 (which was killed by the 286 outperforming it, and was basically the FIRST time Intel pulled an Itanium development cycle). And gluing two registers together to access memory went out with Intel's segment-offset addressing in the 8086 and 286, although accessing memory with HI/LO register pairs was also the trick the 6502 used years earlier (8 bit instructions, 8 bit registers, 16 bit addresses). These days everybody just uses a "flat" memory model for everything (SO much easier to program) which means memory size is capped by register size. But 64 bit registers can address 18 exabytes, and since an exabyte is a triangular rubber coin million terabytes and the S-curve of Moore's Law has been bending down for several years now ("exponential growth" is ALWAYS an S-curve, you run out of customers or atoms eventually), this is unlikely to become a limiting factor any time soon.

The first thumb instruction set (Thumb 1) was userspace-only, and didn't let you do a bunch of kernel stuff, so you couldn't write an OS _only_ in Thumb instructions, you still needed conventional ARM instructions to do setup and various administrative tasks. Thumb 2 finally let you compile a Linux kernel entirely in Thumb instructions. Thumb2 is what let processors like the Cortex-M discard backwards compatibility with the original 32-bit ARM instruction set. It's a tiny cheap processor that consumes very little power, and the trick is it's STUCK in thumb mode and can't understand the old 32 bit instruction set, so doesn't need that circuitry. Along the way, they also cut out the MMU, and I dunno how much of that was "this instruction set doesn't have TLB manipulation instructions and memory mapping it felt icky" or "as long as we were cutting out lots of circuitry to make a tiny low-power chip, this was the next biggest thing we could yank to get the transistor count down". Didn't really ask.

Thumb 2 was introduced in 2003. I don't know what actual patentable advances were in there given arm existed and they were licensing superh to add this to it, but I assume they came up with some kind of fig leaf. (People keep trying to patent breathing, it's a question what the overworked clerks in the patent office approve, and then what the insane and evil magic court that ONLY hears IP law cases on behalf of rich bastards gets overruled on as they perpetually overreach.) But it still came out 20 years ago: patents are going to start expiring soon.

The ARM chip design company the original Acorn RISC guys spun out decades ago was proudly british for many years... until the Tories took over and started selling the government, and then they did Brexit to avoid the EU's new financial reporting requirements (which were going to force billionaires doing money laundering through the City of London and the Isle of Man to list what all their bank accounts and how much money was in each, Switzerland having already caved some years earlier so "swiss bank account" no longer meant you could launder stolen nazi gold for generations)... and the result was Worzel Gummidge Alexander "Boris" de Pfeffel Johnson (Really! That's his name! Look it up!) sold ARM to Softbank, a Japanese company run by a billionaire who seemed absolutely BRILLIANT until he decided Cryptocoins were the future and funded WeWork. Oh, and apparently he also took $60 billion from Mister Bone Saw, or something?

So how much money ARM has to sue people these days, or who's gonna own the IP in five years, I dunno.

February 1, 2024

Happy birthday to me...

Closing tabs, I have a bunch open from my earlier trudge down nommu-in-qemu lane, which started by assuming or1k would be a nommu target, then trying to get bamboo to work, then coldfire...

A tab I had open was the miniconfig for the coldfire kernel that ran in qemu, and that's like half the work of adding it to mkroot... except that was built by the buildroot uclibc toolchain. So I'm trying to reproduce the buildroot coldfire toolchain with musl instead of uclibc, but there IS no tuple that provides the combination of things it wants in the order it wants them, and patching it is being stroppy. Alas gcc is as far from generic as it gets. This config plumbing is a collection of special cases with zero generic anything, and it's explicitly checking for "uclinux" in places and "-linux-musl" in others, and that leading dash means "-uclinux-musl" doesn't match, but "-linux-musl-uclinux" doesn't put data in the right variables (because some bits of the config thinks there are 4 slots with dedicated roles) plus some things have * on the start or the end and other things don't, so sometimes you can agglutinate multiple things into a single field and other times you can't, and it is NOT SYSTEMATIC.

This isn't even fdpic yet! This is just trying to get the config to do what the other thing was doing with musl instead of uclibc. I can probably whack-a-mole my way down it, but if the patch is never going upstream... (Sigh. I should poke coreutils about cut -DF again.)

Now that Fade's graduated, we've decided to pull the trigger on selling the house. Fade's already done paperwork for me to move into the other room at her apartment for the next 6 months, and they start charging us rent on the extra room on the 15th I think? But if I fly back up there with an actual place to live, I don't really want to fly back here, and this place is EXPENSIVE. (I bought it thinking "room to raise kids", but that never happened.) So packing it out and getting it on the market... I should do that.

Fuzzy took the news better than I expected, although her father's been sick for a while now and moving back in to take care of him makes sense. She's keeping the 20 year old cat.

I bought 4 boxes at the U-haul place across I-35 and filled them with books. It didn't even empty one bookshelf. Um. Moving from the condo at 24th and Leon to here was moving into a BIGGER place, so we didn't have to cull stuff. And that was 11 years ago. Before that Fade and I moved a U-haul full of stuff up to Pittsburgh circa 2006... and then moved it all back again a year and change later. The third bedroom is basically box storage, we emptied our storage space out into that to stop paying for storage, and still haven't unpacked most of it. Reluctant to drag it up to Minneapolis (and from there on to wherever Fade gets a job with health insurance, it's the exchange until then). But I don't have the energy to sort through it either. I have many books I haven't read in years. (Yes I am aware of E-books. I'm also aware you don't really _own_ those, just rent them at a billionaire's whim.)

I'm reminded that packing out the efficiency apartment I had for a year in Milwaukee took multiple days (and that was on a deadline), and I'd gone out of my way to accumulate stuff while I was there because it was always temporary. And lugging it all to Fade's I pulled a muscle carrying the "sleeping bag repurposed as a carry sack" I'd shoved all the extra stuff that wouldn't fit into the suitcases into, while switching from a bus to minneapolis's Light Tactical Rail. This time Fade wants to do the "storage pod, which can be somewhat automatically moved for you" thing.

January 31, 2024

Parallelizing the header file generation is a bit awkward: it's trivial to launch most of the header generation in parallel (even all the library probes can happen in parallel, order doesn't matter and >> is O_APPEND meaning atomic writes won't interleave) and just stick in a "wait" at the two places that care about synchronization (creating wants to consume the output of optlibs.dat, and creating flags.h wants to consume config.h and newtoys.h).

The awkward part is A) reliable error detection if any of the background tasks fail ("wait" doesn't collect error return codes, creating a "generated/failed" file could fail due to inode exhaustion, DELETING a generated/success file could have a subprocess fail to launch due to PID exhaustion or get whacked by the OOM killer... I guess annotate the end of each file with a // SUCCESS line and grep | wc maybe?), B) ratelimiting so trying to run it in on a wind-up-toy pi-alike board or a tiny VM doesn't launch too many parallel processes. I have a ratelimit bash function but explicitly calling it between each background & process is kinda awkward? (And it doesn't exit, it returns error, so each call would need to perform error checking.) It would be nice if there was a proper shell syntax for this, but "function that calls its command line" is a quoting nightmare when pipelines are involved. (There's a reason "time" is a builtin.) I suppose I could encapsulate each background header generation in its own shell function? But just having them inline with & at the end is otherwise a lot more readable. (I'm actually trying to REDUCE shell functions in this pass, and do the work inline so it reads as a simple/normal shell script instead of a choose-your-own-adventure book.)

While I'm going through it, the compflags() function in is its own brand of awkward. That function spits out nine lines of shell script at the start of, and although running generated/ directly is pretty rare (it's more or less a comment, "if you don't like my build script, this is how you compile it in the current configuration"), it's also used for dependency checking to see if the toolchain or config file changed since last build. When we rerun, it checks lines that 5-8 of a fresh compflags() match the existing file, and if not deletes the whole "generated" directory to force a rebuild because you did something like change what CROSS_COMPILE points to. That way I don't have to remember to "make clean" between musl, bionic, and glibc builds, or when switcing between building standalone vs multiplexer commands (which have different common plumbing not detected by $TOYFILES collection). The KCONFIG_CONFIG value changes on line 8 when you do that: it's a comment, but not a CONSTANT comment.

The awkward part is needing to compare lines 5-8 of 9, which involves sed. That magic line range is just ugly. Lines 1 is #!/bin/bash and lines 2 and 9 are blank, so comparing them too isn't actually a problem, but lines 3 and 4 are variable assignments that CAN change, without requiring a rebuild. Line 3 is VERSION= which contains the git hash when you're building between releases, if we don't exclude that doing a pull or checkin would trigger a full rebuild. And line 4 is LIBRARIES= which is probed from the toolchain AFTER this dependency check, and thus A) should only change when the toolchain does, B) used to always be blank when we were checking if it had changed, thus triggering spurious rebuilds. (I switched it to write the list to a file, generated/optlibs.dat, and then fetch it from that file here, so we CAN let it through now. The comparison's meaningless, but not harmful: does the old data match the old data.)

Unfortunately, I can't reorganize to put those two at the end, because the BUILD= line includes "$VERSION" and LINK= includes "$LIBRARIES", so when written out as a shell script (or evaluated with 'eval') the assignments have to happen in that order.

Sigh, I guess I could just "grep -v ^VERSION=" both when comparing it? The OTHER problem is that later in the build it appends a "\$BUILD lib/*.c $TOYFILES \$LINK -o $OUTNAME" line to the end, which isn't going to match between runs either. Hmmm... I suppose if TOYFILES= and OUTNAME= were also variable assignments, then that last line could become another constant and we could have egrep -v filter out "^(VERSION|LIBRARIES|TOYFLIES|OUTNAME)=" which is uncomfortably complicated but at least not MAGIC the way the line range was...

(The reason main.c lives in TOYFILES instead of being explicit on the last line is to avoid repetition. The for loop would also have to list main.c, and single point of truth... No, I'm not happy with it. Very minor rough edge, but it's not exactly elegant either...)

January 30, 2024

What does do... First some setup:

  • declares some functions
  • does a (safe) rm -rf generated/ if compiler options changed
  • check if options changed
    • function compflags, just check lines 5-8: $BUILD $LINK $PATH $KCONFIG_CONFIG
    • delete the whole "generated" dir if they don't match, forcing full rebuild
  • set $TOYFILES (grep toys/*/*.c for {OLD|NEW}TOY()s enabled in .config)
  • warns if "pending" is in there (in red)

And then header generation:

  • write optlibs.dat (shared library probe)
  • write (standalone build script, to reproduce this binary on targets that have a compiler but not much else, like make or proper sed)
  • Call which writes Config.probed,, and .singlemake (that last one at the top level instead of in generated, because "make clean" can't delete it or you wouldn't be able to "make clean; make sed".
  • Check if we should really run "make oldconfig" and warn if so.
  • write newtoys.h (sed toys/*/*.c)
  • write config.h (sed .config)
  • write flags.h (compile mkflags.c, sed config.h and run newtoys.h through gcc -E, pipe both into mkflags)
  • write globals.h (sed toys/*/*.c)
  • write tags.h (sed toys/*/*.c)
  • write help.h (compile config2help.c, reads .config and which includes dependencies ala generated/Config.)
  • write zhelp.h (compile install.c and run its --help through gzip | od | sed)

And that's the end of header generation, and it's on to compiling stuff (which is already parallelized).

It's awkward how scripts/ is a separate file, but "make menuconfig" needs those files because they're imported by at the top level, so that has to be able to build those files before running configure. Possibly I should split _all_ the header generation out into (replacing, and just have it not do the .config stuff if .config doesn't exist? (And then could check for the file early on and go "run defconfig" and exit if it's not there...)

Having .singlemake at the top level is uncomfortably magic (running "make defconfig" changes the available make targets!) but getting the makefile wrapper to provide the semantics I want is AWKWARD, and if it's in generated/ then "make clean" forgets how to do "make sed".

The reason the above warning about calling "make oldconfig" doesn't just call it itself is that would be a layering violation: scripts/*.c CANNOT call out to kconfig because of licensing. The .config file output by kconfig is read-only consumed by the rest of the build, meaning the kconfig subdirectory does not actually need to _exist_ when running "make toybox". Kconfig is there as a convenience: not only is no code from there included in our build, but no code from there is RUN after the configuration stage (and then only to produce the one text file). You COULD create a .config file by hand (and android basically does). Blame the SFLC for making "the GPL" toxic lawsuit fodder that needs to be handled at a distance with tongs. (I _asked_ them to stop in 2008. Eben stopped, Bradley refused to.)

Of the three scripts/*.c files built and run by the build, the only one I'm _comfortable_ with is install.c I.E. instlist, which spits out the list of commands and I recently extended to spit out the --help text so I could make a compressed version of it. It's basically a stub version of main.c that only performs those two toybox multiplexer tasks, so I don't have to build a native toybox binary and run it (which gets into the problem of different library includes or available system calls between host and target libc when cross compiling, plus rebuilding *.c twice for no good reason). This is a ~60 line C file that #includes generated/help.h and generated/newtoys.h to populate toy_list[] and help_data[], and then writes the results to stdout.

The whole mkflags.c mess is still uncomfortably magic, I should take a stab at rewriting it, especially if I can use (CONFIG_BLAH|FORCED_FLAG)<<shift to zero them out so the flags don't vary by config. I still need something to generate the #define OPTSTR_command strings, because my original approach of having USE() macros drop out made the flag values change, and I switched to annotating the entries so they get skipped but still count for the flag value numbering. Maybe some sort of macro that inserts \001 and \002 around string segments, and change lib/args.c to increment/decrement a skip counter? I don't really want to have a whole parallel ecology of HLP_sed("a:b:c") or similar in config.h, but can't think of a better way at the moment. (Yes it makes the strings slightly bigger, but maybe not enough to care? Hmmm... Actually, I could probably do something pretty close to the _current_ processing with sed...)

The config2help.c thing is a nightmare I've mentioned here before, and has an outstanding bug report about it occasionally going "boing", and I'd very much like to just rip that all out and replace it with sed, but there's design work leading to cleanup before I can do real design work here. (Dealing with the rest of the user-visible configurable command sub-options, for one thing. And regularizing the -Z support and similar so it's all happening with the same mechanism, and working out what properly splicing together the help text should look like...)

January 29, 2024

It's kind of amusing when spammers have their heads SO far up their asses that their pitch email is full of spammer jargon. The email subject "Get High DA/DR and TRAFFIC in 25-30 Days (New Year Discount!" made it through gmail's insane spam filter (despite half of linux-kernel traffic apparently NOT making it through and needing to be fished out), but the target audience seems to be other SEO firms. (No, it didn't have an ending parentheses.)

Wrestling with grep -w '' and friends, namely:

$ for i in '' '^' '$' '^$'; do echo pat="$i"; \
  echo -e '\na\n \na \n a\na a\na  a' | grep -nw "$i"; done
5: a
7:a  a
5: a

The initial bug report was that --color didn't work right, which was easy enough to diagnose, but FIXING it uncovered that I was never handling -w properly, and needed more tests. (Which the above rolls up into one big test.)

As usual, getting the test right was the hard part. Rewriting the code to pass the tests was merely annoying.

January 28, 2024

Managed to flush half a dozen pending tabs into actual commits I could push to the repo. Mostly a low-hanging-fruit purge of open terminal tabs, I have SO MANY MORE half-finished things I need to close down.

Heard back from Greg Ungerer confirming that m68k fdpic support went into the kernel but NOT into any toolchain. I'm somewhat unclear on what that MEANS, did they select which register each segment should associate with, or not? (Did that selection already have to be made for binflt and it just maps over? I'm unclear what the elf2flt strap-on package actually DOES to the toolchain, so I don't know where the register definitions would live. I was thinking I could read Rich's sh2 patches out of musl-cross-make but they vary WIDELY by version, and some of this seems to have gone upstream already? For a definition of "already" that was initially implemented 7 or 8 years ago now. It LOOKED like this was one patch to gcc and one to binutils in recent versions, but those mostly seem to be changing config plumbing, and grepping the ".orig" directory for gcc is finding what CLAIMS to be fdpic support for superh in the base version before the patches are applied? So... when did this go upstream, and at what granularity, and what would be LEFT to add support for a new architecture?)

People are trying to convince me that arm fdpic support was a heavy lift with lots of patches, but looking back on the superh fdpic support it doesn't seem THAT big a deal? Possibly the difference was "already supported binflt", except the hugely awkward bag on the end postprocessor (called elf2flt, it takes an ELF file and makes a FLT file from it) argues against that? But that doesn't mean they didn't hack up the toolchain extensively (pushing patches upstream even!) and THEN "hit the output with sed" as it were. You can have the worst of both worlds, it's the gnu/way.

I got a binflt toolchain working in aboriginal way back when. Maybe I should go back and look at what elf2flt actually DID, and how building the toolchain that used it was configured. (I honestly don't remember, it's been most of a decade and there was "I swore I'd never follow another startup down into bankruptcy but here we are" followed by the Rump administration followed by a pandemic. I remember THAT I did it, but the details are all a bit of a blur...)

But now is not the best time to open a new can of worms. (I mean there's seldom a GOOD time, but... lemme close more tabs.)

January 27, 2024

Sigh. I'm frustrated at the continuing deterioration of the linux-kernel development community. As they collapse they've been jettisoning stuff they no longer have the bandwidth or expertise to maintain, and 5 years back they purged a bunch of architectures.

Meanwhile, I'm trying to get a nommu fdpic test environment set up under qemu, and checking gcc 11.2.0 (the latest version musl-cross-make supports) for fdpic support, grep -irl fdpic gcc/config has hits in bfin, sh, arm, and frv. I'm familiar with sh, and bits of arm were missing last I checked (although maybe I can hack my way past it?) But the other two targets, blackfin and frv, were purged by linux-kernel.

I.E. the increasingly insular and geriatric kernel development community discarded half the architectures with actual gcc support for fdpic. Most of the architectures you CAN still select fdpic for don't seem to have (or to have ever had) a toolchain capable of producing it. That CAN'T be right...

Cloned git:// to see if any more fdpic targets spawned upstream: nope. Still only four targets supporting fdpic, two of which linux-kernel threw overboard to lighten the load as the hindenberg descends gently into Greg's receivership. As the man who fell off a tall building said on his way down, "doing fine so far"...

Yes I still think driving hobbyists away from the platform was a bad move, but as with most corporate shenanigans where you can zero out the R&D budget and not notice for YEARS that your new product pipeline has nothing in it... the delay between cause and effect is long enough for plausible deniability. It "just happened", not as a result of anything anyone DID.

And which is worse: Carly Fiorina turning HP into one of those geriatric rock bands that keeps touring playing nothing but 40 year old "greatest hits" without a single new song (but ALL THE MONEY IN THE WORLD for lawyers to sue everybody as "dying business models explode into a cloud of IP litigation" once again)... or Red Hat spreading systemd? Zero new ideas, or TERRIBLE ideas force-fed to the industry by firms too big to fail?

Caught up on some blog editing, but haven't uploaded it yet. (Japanese has a tendency to omit saying "I", which is has been a tendency in my own writing forever. "I" am not an interesting part of the sentence. That said, it technically counts as a bad habit in english, I think?) I made a mess of december trying to retcon some entries (I'd skipped days and then had too many topics for the day I did them and wanted to backfill _after_ I'd uploaded, which probably isn't kind to the rss feed), and I only recently untangled that and uploaded it, and I'm giving it a few days before replacing it with the first couple weeks of January.

My RSS feed generator parses the input html file (capping the output at something like the last 30 entries, so the rss file isn't ridiculously huge in the second half of the year), but that makes switching years awkward unless I cut and paste the last few entries from december after the first few entries of January. Which I've done for previous years, and then at least once forgotten to remove (which I noticed back when Google still worked by searching for a blog entry I knew I'd made and it found it in the wrong year's fine). Trying to avoid that this year, but that means giving the end of december a few days to soak.

January 26, 2024

Hmmm... can I assume toybox (I.E. the multiplexer) is available in the $PATH of the test suite? Darn it, no I can't, not for single command tests. Makes it fiddly to fix up the water closet command's test suite...

So Elliott sent me a mega-patch of help text updates, mostly updating usage: lines that missed options that were in the command's long one-per-line list, tweaking option lists that weren't sorted right, and a couple minor cleanups like some missing FLAG() macro conversions that were still doing the explicit if (toys.optflags & FLAG_walrus) format without a good excuse. And since my tree is HUGELY DIRTY, it conflicted with well over a dozen files so applying it was darn awkward... and today he gave me a "ping" because I'd sat on it way too long (I think I said a week in the faq?) at which point my documented procedure is I back my changes out, apply his patch, and port my changes on top of it because I've already had PLENTY OF TIME to deal with it already.

And of course trying to put my changes back on top of his was fail-to-apply city (the reason I couldn't just easily apply it in the first place), so I went through and reapplied my changes by hand, some of which are JUST conflicting documentation changes (like patch.c) and others are fairly low hanging fruit I should just finish up.

Which gets us to wc, the water closet word count command, where I was adding wc -L because somebody asked for it and it apparently showed up in Debian sometime when I wasn't looking. (It's even in the ancient version I still haven't upgraded my laptop off of.) It shows maximum line length, which... fine. Ok. Easy enough to add. And then which order do the fields show up in (defaults haven't changed and the new fifth column went in at the end, which was the sane way to do it), so I add tests, and...

The problem is TEST_HOST make test_wc doesn't pass anymore, which is not related to THIS change. The first failure is a whitespace variation, which already had a comment about in the source and I can just hit it with NOSPACE=1 before that test (not fixing it to match, one tab between each works fine for me, I do not care here; poke me if posix ever notices and actually specifies any of this).

But the NEXT problem is that the test suite sets LC_ALL=c for consistent behavior (preventing case insensitive "sort" output and so on), and we're testing utf-8 support (wc -m) which works FINE in the toybox version regardless of environment variables, but the gnu/dammit version refuses to understand UTF-8 unless environment variables point to a UTF-8 language locale. (Which makes as much sense as being able to set an environment vbariable to get the gnu stuff to output ebcdic, THIS SHIP HAS SAILED. And yet, they have random gratuitous dependencies without which they refuse to work.)

On my Debian Stale host, the environment variables are set to "en_us.UTF-8", so the test works if run there, but doesn't work in the test suite where it's consistently overridden to LC_ALL=c. (In a test suite it's more important to be CONSISTENT than to be RIGHT.)

I could of course set it to something else in a specific test, but nothing guarantees that this is running on a system with the "en_us" locale installed. And fixing this is HORRIFIC: in toybox's main.c we call setlocale(LC_CTYPE, "") which reads the environment variables and loads whatever locale they point to (oddly enough this is not the default libc behavior, you have to explicitly REQUEST it), and then we check that locale to see if it has utf8 support by calling nlcodeinfo(CODESET) which is laughable namespace pollution but FINE, and if that doesn't return the string "UTF-8" (case sensitive with a dash because locale nonsense), then we try loading C.UTF-8 and if that doesn't work en_us.UTF-8 because MacOS only has that last one. (So if you start out with a french utf8 locale we keep it, if not we try "generic but with UTF-8", which doesn't work on mac because they're just RECENTLY added mknodat() from posix-2008. As in it was added in MacOS 13 which came out October 2022. FOURTEEN YEARS later. Yes really. Steve Jobs is still dead.)

So ANYWAY, I have painfully hard-fought code in main.c that SHOULD deal with this nonsense, but what do I set it to in a shell script? There is a "locale" command which is incomprehensible:

$ locale --help | head -n 3
Usage: locale [OPTION...] NAME
  or:  locale [OPTION...] [-a|-m]
Get locale-specific information.
$ locale -a
$ locale C.UTF-8
locale: unknown name "C.UTF-8"
$ locale en_US.utf8
locale: unknown name "en_US.utf8"

Bravo. (What does that NAME argument _mean_ exactly?) So querying "do you have this locale installed" and "what does this locale do" is... less obvious than I'd like.

I was thinking maybe "toybox --locale" could spit out what UTF-8 aware locale it's actually using, but A) can't depend on it being there, B) ew, C) if it performed surgery on the current locale to ADD UTF-8 support with LC_CTYPE_MASK there's no "set the environment variable to this" output for that anyway.

Sigh. I could try to come up with a shell function that barfs if it can't get utf8 awareness, but... how do I test for utf8 awareness? Dig, dig, dig...

Dig dig dig...

Sigh, what a truly terrible man page and USELESS command --help output. Dig dig dig...

Ah: "locale charmap". for i in $(locale -a); do LC_ALL=$i locale charmap; done

What was the question again?

January 25, 2024

Running toybox file on the bamboo board's filesystem produced a false positive. It _said_ it had ELF FDPIC binaries, but the kernel config didn't have the fdpic loader enabled. And the dependencies for BINFMT_ELF_FDPIC in the kernel are depends on ARM || ((M68K || RISCV || SUPERH || XTENSA) && !MMU) so I only have 5 targets to try to get an fdpic nommu qemu system working on. (And need to read through the elf FDPIC loader to figure out how THAT is identifying an fdpic binary, it seems architecture dependent...)

I haven't poked at arm because musl-cross-make can't build a particularly new toolchain and hasn't been updated in years, but maybe the toolchain support went in before the kernel support did? I should come back to that one...

SuperH I'm already doing but only on real hardware (the j-core turtle board), and qemu-system-sh4 having "4" in the name is a hint WHY sh2 support hasn't gone in there yet. (Since qemu-sh4 application emulation can run it might be possible to build a kernel with the fdpic loader if I hack the above dependency to put superh next to ARM and outside of the !MMU list? Dunno what's involved but presumably arm did _some_ of that work already.)

M68K is coldfire, I ran buildroot's qemu_m68k_mcf5208_defconfig to get one of those which booted, but all the binaries are binflt. I grepped the patched gcc that mcm built to see how its configure enables fdpic support, but the patches vary greatly by version. Hmmm...

January 24, 2024

Sigh, I really need to add a "--shoehorn=0xa0000000,128m" option to qemu to tell it to just forcibly add DRAM to empty parts of a board's physical address range, and a kernel command line option for linux to use them...

My first attempt at fixing grep -w '' didn't work because it's not just "empty line goes through, non-empty line does not"... Turns out "a  a" with two spaces goes through also. Which means A) the '$' and '^' patterns, by themselves in combination with -w, suddenly become more interesting, B) my plumbing to handle this is in the wrong place, C) 'a*' in the regex codepath has to trigger on the same inputs as empty string because asterisk is ZERO or more so this extension to the -w detection logic still needs to be called from both the fixed and regex paths without too much code duplication, but how do I pass in all the necessary info to a shared function...

Marvin the Martain's "Devise, devise" is a good mantra for design work.

January 23, 2024

I want a qemu nommu target so I can regression test toybox on nommu without pulling out hardare and sneakernetting files onto it, and or1k's kernel config didn't have the FDPIC loader in it so I'm pretty sure that had an mmu.

Greg Ungerer said he tests ELF-fdpic on arm, and regression tests elf PIE nommu on arm, m68k, riscv, and xtensa. Which isn't really that helpful: I still don't care about riscv, arm requires a musl-cross-make update to get a new enough compiler for fdpic support, and xtensa is a longstanding musl-libc fork that's based off a very old version. (I could try forward porting it, but let's get back to that one...)

The three prominent nommu targets I recall from forever ago (other than j-core, which never got a qemu board) are m68k (I.E. coldfire), powerpc (where bamboo and e500 were two nommu forks from different vendors, each of which picked a slightly different subset of the instruction set), and of course arm (cortex-m, see toolchain upgrade needed above).

Buildroot's configs/ directory has "qemu_ppc_bamboo_defconfig" and board/qemu/ppc-bamboo/readme.txt says "qemu-system-ppc -nographic -M bamboo -kernel output/images/vmlinux -net nic,model=virtio-net-pci -net user" is how you launch it. Last time I tried it the build broke, but let's try again with a fresh pull...

Hey, and it built! And it boots under qemu! And hasn't got "file" or "readelf" so it's not immediately obvious it's fdpic (I mean, it's bamboo, I think it _has_ to be, but I'd like to confirm it's not binflt). And qemu doesn't exit (halt does the "it its now safe to turn off" thing, but eh, kill it from another window). And from the host I can "toybox file toybox file output/target/bin/busybox" which says it's fdpic.

Ok, the kernel build (with .config) is in output/build/linux-6.1.44 and... once again modern kernel configs are full of probed gcc values so if I run my without specifying CROSS_COMPILE (in addition to ARCH=powerpc) the blank line removal heuristic fails and it has to dig through thousands of lines of extra nonsense, let's see... it's in output/host/bin/powerpc-buildroot-linux-gnu- (and of COURSE it built a uclibc-necromancy toolchain, not musl) so... 245 lines after the script did its thing, and egrep -v "^CONFIG_($(grep -o 'BINFMT_ELF,[^ ]*' ~/toybox/mkroot/ | sed 's/,/|/g'))=y" mini.config says 229 lines aren't in the mkroot base config, with the usual noise (LOCALVERSION_AUTO and SYSVIPC and POSIX_MQUEUE and so on)... static initramfs again, does bamboo's kernel loader know how to specify an external initramfs or is static a requirement like on or1k?

Yet another "melting down this iceberg" session like with or1k (which I'd HOPED would get me a nommu test system), but the other big question here is does musl support bamboo? It supports powerpc, and the TOOLCHAIN supports bamboo, but is there glue missing somewhere? (Long ago I mailed Rich a check to add m68k support, but he had some downtime just then and gave me a "friend rate" on an architecture nobody else was going to pay to add support for probably ever, and I was working a well-paying contract at the time so had spare cash. If nothing else, there's been some inflation since then...)

January 22, 2024

So, unfinished design work: I want more parallelism and less dependency detection in setup work (mostly header generation).

It's not just generating FILES in parallel, I want to run the compile time probes from scripts/ in parallel, and probe the library link list (generated/optlib.dat) in parallel, and both of those have the problem of collecting the output from each command and stitching it together into a single block of data. Which bash really doesn't want to do: even a=b | c=d | e=f discards the assignments because each pipe segment is an implicit subshell to which assignments are local, yes even the last one. I can sort of do a single x=$(one& two& three&) to have the subshell do the parallelizing and collect the output, but A) each output has to be a single atomic write, B) they occur in completion order, which is essentially randomized.

The problem with A=$(one) B=$(two) C=$(three) automatically running in parallel is that variable assignments are sequenced left to right, so A=abc B=$A can depend on A already having been set. Which means my toysh command line resolver logic would need to grow DEPENDENCIES.

In theory I could do this, the obvious way (to me) is another variable type flag that says "assignment in progress" so the resolver could call a blocking fetch data function. Also, I'd only background simple standalone assignments, because something like A=$(one)xyz where the resolution was just _part_ of the variable would need to both store more data and resume processing partway through... Darn it, it's worse than that because variable resolution can assign ${abc:=def} and modify ala $((x++)) so trying to do them out of sequence isn't a SIMPLE dependency tree, you'd have to lookahead to see what else was impacted with a whole second "collect but don't DO" parser, and that is just not practical.

I can special case "multiple assignments on the same line that ONLY do simple assignment of a single subshell's output" run in parallel, but... toysh doing that and bash NOT doing that is silly. Grrr.Alright, can I extend the "env" command to do this? It's already running a child process with a modified environment, so env -p a="command" -p b="command" -p c="command" echo -e '"$a\n$b\n$c" could... resolve $a $b and $c in the host shell before running env, and if I put single quotes around them echo DOESN'T know how... Nope, this hasn't got the plumbing and once again my command would be diverging uncomfortably far from upstream and the gnu/dammit guys still haven't merged cut -DF.

The shell parallelism I have so far is a for loop near the end of scripts/ that writes each thing's output to a file, and then does a collation pass from the file data after the loop. Which I suppose is genericizeable, and I could make a shell function to do this. (I try to quote stuff properly so even if somebody did add a file called "; rm -rf ~;.c" to toys/pending it wouldn't try to do that, and maintaining that while passing arbitrary commands through to a parallelizer function would be a bit of thing. But it's also not an attack vector I'm hugely worried about, either.)

January 21, 2024

Bash frustration du jour: why does the "wait" builtin always return 0? I want to fire off multiple background processes and then wait for them all to complete, and react if any of them failed. The return value of wait should be nonzero if any of the child processes that exited returned nonzero. But it doesn't do that, and there isn't a flag to MAKE it do that.

I'm trying to rewrite scripts/ to parallelize the header file generation, so builds go faster on SMP systems. (And also to just remove the "is this newer than that" checks and just ALWAYS rebuild them: the worst of the lot is a call to sed over a hundred or so smallish text files, it shouldn't take a significant amount of time even on the dinky little orange pi that's somehow slower than my 10 year old laptop. And the OBVIOUS way to do it is to make a bunch of shell functions and then: "func1& func2& func3& func4& func5& wait || barf" except wait doesn't let me know if anything failed.

Dowanna poke chet. Couldn't use a new bash extension if I did not just because of 7 year time horizon, but because there's still people relying on the 10 year support horizon of Red IBM Hat to run builds under ancient bash versions that predate -n. And of course the last GPLv2 version of bash that MacOS stayed on doesn't have that either, and "homebrew" on the mac I've got access to also gives you bash 3.2.57 from 2007 which hasn't got -n. So a hacky "fire off 5 background processes and call wait -n 5 times" doesn't fix it either. (And is wrong because "information needs to live in 2 places": manually updated background process count. And "jobs" shows "active" jobs so using it to determine how many times I'd need to call wait -n to make sure everything succeeded doesn't work either.)

Meanwhile, wait -n returns 127 if there's no next background process, which is the same thing you get if you run "/does/not/exist" as a background job. So "failure to launch" and "no more processes" are indistinguishable if I just loop until I get that, meaning I'd miss a category of failure.

I made some shell function plumbing in scripts/ to handle running the gcc invocations in the background (which, as I've recently complained is just a workaround for "make -j" being added instead of "cc -j" where it BELONGS. (HONESTLY! How is cc -j $(nproc) one.c two.c three.c... -o potato not the OBVIOUS SYNTAX?) Maybe I can genericize that plumbing into a background() function that can also handle the header generation...

That said, I think at least one of the headers depends on previous headers being generated, so there's dependencies. Sigh, in GENERAL I want a shell parallelism syntax where I can group "(a& b&) && c" because SMP is a thing now. I can already create functions with parentheses instead of curly brackets which subshell themselves (turns out a function body needs to be a block, but it turns out "potato() if true; echo hello; fi" works just fine because THAT'S A BLOCK. I want some sort of function which doesn't return until all the subshells it forked exit, and then returns the highest exit code of the lot. It would be easy enough for me to add that to toysh as an extension, but defining my own thing that nobody else uses is not HELPFUL.

Meanwhile, cut -DF still aren't upstream in gnuutils. Despite repeated lip service. Sigh, I should poke them again. And post my 6.7 patches to linux-kernel...

January 20, 2024

Oh dear:

unlike Android proper, which is no longer investigating bazel, the [android] kernel build fully switched to bazel, and doesn't use the upstream build at all. (but there's a whole team working on the kernel...

I had to step away from the keyboard for a bit, due to old scars.

On the one hand, "yay, multiple independent interoperable implementations just like the IETF has always demanded to call something a standard". That's GREAT. This means you're theoretically in a position to document what the linux-kernel build actually needs to DO now, having successfully reimplemented it.

On the other hand... oh no. Both "build system preserved in amber" and "straddling the xkcd standards cycle" are consulting bingo squares, like "magic build machine" or "yocto".

AOSP is actually pretty tame as fortune 500 examples of the Mongolian Hordes technique go: everything is published and ACTUALLY peer reviewed with at least some feedback incorporated upstream. Their build has to be downloadable and runnable on freshly installed new machines with a vanilla mainline Linux distro and retail-available hardware, and at least in theory can complete without network access, all of which gets regression tested regularly by third parties. And they have some long-term editors at the top who know where all the bodies are buried and shovel the mess into piles. (There's a reason DC comics didn't reboot its history with "Crisis on Infinite Earths" until Julius Scwartz retired. Then they rebooted again for Zero Hour, Infinite Crisis, 52, Flashpoint, the New 52, DC Rebirth, Infinite Frontier, Dawn of DC... I mean at this point it could be a heat problem, a driver issue, bad RAM, something with the power supply...)

This means AOSP does NOT have a magic build machine, let alone a distributed heterogeneous cluster of them. They don't have Jenkins launching Docker triggered by a git commit hook ported from perforce. Their build does not fail when run on a case sensitive filesystem, nor does it require access to a specific network filesystem tunneled through the firewall from another site that's it both writes into and is full of files with 25 year old dates. Their build does not check generated files into an oracle database and back out again halfway through. They're not using Yocto.

(YES THOSE ARE ALL REAL EXAMPLES. Consulting is what happens when a company gives up trying to solve a problem internally and throws money at it. Politics and a time crunch are table stakes. It got that bad for a REASON, and the job unpicking the gordian knot is usually as much social skills, research, and documentation as programming, and often includes elements of scapegoat and laxative.)

January 19, 2024

Onna plane, back to Austin.

Did some git pulls in the airport to make sure I had updated stuff to play with: the most recent commit to musl-cross-make is dated April 15, 2022, updating to musl-1.2.3. (There was a 1.2.4 release since then, which musl-cross-make does not know about.) And musl itself was last updated November 16, 2023 (2 months ago). He's available on IRC, and says both projects do what they were intended to so updates aren't as high a priority. But the appearances worry me.

I am reminded of when I ran the website for Penguicon 1, and had a "heartbeat blog" I made sure to update multiple times per week, even if each update was something completely trivial about one of our guests or finding a good deal on con suite supplies or something, just to to provide proof of life. "We're still here, we're still working, progress towards the event is occurring and if you need to contact us somebody will notice prompt-ish-ly and be able to reply".

Meanwhile, if a project hasn't had an update in 3 months, and I send in an email, will it take 3 more months for somebody to notice it in a dead inbox nobody's checking? If it's been 2 years, will anybody ever see it?

That kind of messaging is important. But I can't complain about volunteers that much when I'm not the one doing it, so... If it breaks, I get to keep the pieces.

January 18, 2024

If I _do_ start rebuilding all the toybox headers every time in scripts/ (parallelism is faster than dependency checking here, I'm writing a post for the list), do they really need to be separate files? Would a generated/toys.h make more sense? Except then how would I take advantage of SMP to generate them in parallel? (I suppose I could extend toysh so A=$(blah1) B=$(blah2) C=$(blah3) launched them in parallel background tasks, since they already wait for the pipe to close. Then bash would be slow but toysh would parallelize...

I originally had just toys.h at the top level and lib/lib.h in the lib/ directory, and it would make sense to have generated/generated.h or similar as the one big header there. But over the years, lib grew a bunch of different things because scripts/install.c shouldn't need to instatiate toybuf to produce bin vs sbin prefixes, and lib/portability.h needed ostracism, and so on. Reality has complexity. I try to collate it, but there's such a thing as over-cleaning. Hmmm...

January 16, 2024

Sat down to knock out execdir and... it's already there? I have one? And it's ALWAYS been there, or at least it was added in the same commit that added -exec ten years ago.

And the bug report is saying Alpine uses toybox find, which is news to me. (When they were launching Alpine, toybox wasn't ready yet. They needed some busybox, so they used all of busybox, which makes sense in a "using all the parts of the buffalo" sort of way.)

Sigh, I feel guilty about toybox development because a PROPER project takes three years and change. Linux took 3 years to get its 1.0 release out. Minix took 3 years from AT&T suing readers of the Lyons book to Andrew Tanenbaum publishing his textbook with the new OS on a floppy in the back cover. The Mark Williams Company took 3 years to ship Coherent. Tinycc took three years to do tccboot building the linux kernel. There's a pretty consistent "this is how long it takes to become real".

Toybox... ain't that. I started working on it in 2006, I'm coming up on the TWENTIETH ANNIVERSARY of doing this thing. Admittedly I wasn't really taking it seriously at first and mothballed it for a bit (pushing things like my patch implementation, nbd-client, and even the general "all information about a new command is in a single file the build picks up by scanning for it" design (which I explained to Denys Vlasenko when we met in person at ELC 2010). I didn't _restart_ toybox development until 2012 (well, November 2011) when Tim Bird poked me. But even so, my 2013 ELC "why is toybox" talk was a decade ago now.

I'm sort of at the "light at the end of the tunnel" stage, especially with the recent Google sponsorship... but also losing faith. The kernel is festering under me, and I just CAN'T tackle that right now. The toolchain stuff... I can't do qcc AND anything else, and nobody else has tried. (Both gcc and llvm are both A) written in C++, B) eldrich tangles of interlocking package dependencies with magic build invocations, C) kind of structurally insane (getting cortex-m fdpic support into gcc took _how_ many years, and llvm still hasn't got superh output and asking how to do it is _not_ a weekend job).

And musl-libc is somewhere between "sane" and "abandoned". Rich disappears for weeks at a time, musl-cross-make hasn't been updated since 2022. Rich seems to vary between "it doesn't need more work because it's done" and "it doesn't get more work because I'm not being paid", depending on mood. It's the best package for my needs, and I... SORT of trust it to stay load bearing? And then there's the kernel growing new build requirements as fast as I can patch them out (rust is coming as a hard requirement, I can smell it). I would like to reach a good 1.0 "does what it says on the tin" checkpoint on toybox and mkroot before any more floorboards rot out from under me.

Sigh, once I got a real development environment based on busybox actually working, projects like Alpine Linux sprang up with no connection to me. I'd LIKE to get "Android building under android" to a similar point where it's just normal, and everybody forgets about the years of work I put in making it happen because it's not something anybody DID just the way the world IS. I want phones to be real computers, not locked down read-only data consumption devices that receive blessings from the "special people who aren't you" who have the restricted ability to author new stuff.

And I would really, really, really like to not be the only person working toward this goal. I don't mind going straight from "toiling in obscurity" to "unnecessary and discarded/forgotten", but I DO mind being insufficiently load-bearing. Things not happening until I get them done is ANNOYING. Howard Aiken was right.

January 15, 2024

I saw somebody wanting execdir and I went "ooh, that seems simple enough", although git diff on the find.c in my main working tree has debris from xargs --show-limits changing lib/env.c to a new API, which is blocked on me tracing through the kernel to see what it's actually counting for the size limits. (Since the argv[] and envp[] arrays aren't contiguous with the strings like I thought they were, do they count against the limit? If not, can you blow the stack with exec command "" "" "" "" ""... taking a single byte of null terminator each time but adding 8 bytes of pointer to argv[] for each one, so I have to read through the kernel code and/or launch tests to see where it goes "boing"?

Elliott's going "last time you look at this you decided it changed too often to try to match", which was true... in 2017. When it had just changed. But as far as I can tell it hasn't changed again SINCE, and it's coming up on 7 years since then. (My documented time horizon for "forever ago".) So it seems worth a revisit. (And then if they break me again, I can complain. Which if Linus is still around might work, and if Greg "in triplicate" KH has kicked him out, there's probably a 7 year time horizon for replacing Linux with another project. (On mastodon people are looking at various BSD forks and even taking Illumos seriously, which I just can't for licensing reasons.)

January 14, 2024

Bash does not register <(command) >(line) $(subshells) with job control, and thus "echo hello | tee >(read i && echo 1&) | { read i; wait; echo $?; }" outputs a zero. This unfortunately makes certain kinds of handoffs kind of annoying, and I've had to artifically stick fifos in to get stuff like my shell "expect" implementation to work.

On an adjacent note, a shell primitive I've wanted forever is "loop" to connect the output of a pipeline to the input back at the start of the pipeline. Years and YEARS of wanting this. You can't quite implement it as a standalone command for the same reason "time cmd | cmd | cmd" needs to be a builtin in order to time an entire pipeline. (Well, you can have your command run a child shell, ala loop bash -c "thingy", a bit like "env", but it still has to be a command. You can't quite do it with redirection because you need to create a new pipe(2) pair to have corresponding write to and read from filehandles: writing to the same fd you read from doesn't work. Which is where the FIFO comes in...)

January 13, 2024

Ubuntu and Red Hat are competing to see who can drop support for older hardware fastest, meaning my laptop with the core i5-3340M processor won't be able to run their crap anymore.

I guess I'm ok with that, as long as Debian doesn't pull the same stupidity. (I bought four of these suckers, and have broken one so far, in a way that MOST of it is still good for spare parts. I am BUSY WITH OTHER THINGS, don't force me to do unnecessary tool maintenance.)

January 11, 2024

A long thread I got cc'd on turned into a "Call for LTP NOMMU maintainer", which... I want Linux to properly support nommu, but don't really care about the Linux Test Project (which is an overcomplicated mess).

Linux should treat nommu/mmu the way it treats 32/64 bit, or UP vs SMP, as mostly NOT A BIG DEAL. Instead they forked the ELF loader and the FDPIC loader the way ext2 and ext3 got forked (two separate implementations, sharing no code), and although ext4 unified it again (allowing them to delete the ext2 and ext3 drivers because ext4 could mount them all), they never cleaned up the FDPIC loader to just be a couple of if statements in the ELF loader.

It's just ELF with a separate base register for each of the 4 main segments, text, data, rodata, and bss. Instead of having them be contiguous following from one base register. Dynamic vs static linking is WAY more intrusive. PIC vis non-PIC is more intrusive. They handle all THAT in one go, but fdpic? Exile that and make it so you CANNOT BUILD the fdpic loader on x86, and can't build the elf loader on nommu targets, because kconfig and the #ifdefs won't let you.

And instead of that, when I try to explain to people "uclinux is to nommu what knoppix was to Linux Live CDs: the distro that pioneered a technique dying does NOT mean Linux stopped being able to do that thing, nor does it mean nobody wanted to do it anymore, it just means you no longer need a specific magic distro to do it"... Instead of support, I get grizzled old greybeards showing up to go "Nuh-uuuh, uclinux was never a distro, nobody ever thought uclinux was a DISTRO, the distro was uclinux-dist and there was never any confusion about that on anyone's part". With the obvious implication that "the fact became a cobweb site and eventually went down must be because nommu in Linux IS obsolete and unsupported and it bit-rotted into oblivion because nobody cared anymore. Duh."

Not helping. Really not helping.

January 10, 2024

Got the gzipped help text checked in.

My method of doing merges on divergent branches involves checking it in to a clean-ish branch, extracting it again with "git format-patch -1", and then a lot of "git am 000*.patch" and "rm -rf .git/rebase-apply/" in my main branch repeatedly trying to hammer it into my tree, with "git diff filename >> todo2.patch; git checkout filename" in between, and then once I've evicted the dirty files editing the *.patch file with vi to fix up the old context and removed lines that got changed by other patches preventing this from applying, and then when it finally DOES apply and I pull it into a clean tree and testing throws warnings because I didn't marshall over all the (void *) to squelch the "const" on the changed data type, a few "git show | patch -p1 -R && git reset HEAD^1" (in both trees) and yet MORE editing the patch with vi and re-applying. And then once it's all happy, don't forget "patch -p1 todo2.patch" to re-dirty those bits of the tree consistently with whatever other half-finished nonsense I've wandered away from midstream.

Meanwhile, the linux-kernel geezers have auto-posters bouncing patches because "this looks like it would apply to older trees but you didn't say which ones". (And "I've been posting variants of this patch since 2017, you could have applied any of those and CHOSE not to, how is this now my problem" is not allowed because Greg KH's previous claim to fame was managing the legacy trees, and personal fame is his reason for existing. Then again it does motivate him to do a lot of work, so I can only complain so much. Beats it not happening. But there are significant negative externalities, which Linus isn't mitigating nearly as much as he used to.)

January 9, 2024

I've been up at Fade's and not blogging much, but I should put together a "how to do a new mkroot architecture" explainer.

You need a toolchain (the limiting factor of which is generally musl-libc support), you need a linux kernel config (using arch/$ARCH/defconfig has a file), and you need a qemu-system-$ARCH that can load the kernel and give serial output and eventually run at least a statically linked "hello world" program out of userspace. (Which gets you into elf/binflt/fdpic territory sometimes.)

The quick way to do this is use an existing system builder that can target qemu, get something that works, and reverse engineer those settings. Once upon a time QEMU had a "free operating system zoo" (at which is long dead but maybe fishable out of which I examined a few images from, and debian's qemu-debootstrap is another interesting source (sadly glibc, not musl), but these days buildroot's configs/qemu_* files have a bunch (generally uclibc instead of musl though, and the qemu invocations are hidden under "boards" at paths that have no relation to the corresponding defconfig name; I usually find them by grepping for "qemu-system-thingy" to see what they've got for that target).

Once you've got something booted under qemu, you can try to shoehorn in a mkroot.cpio.gz image as its filesystem here to make sure it'll work, or worry about that later. If you don't specify LINUX= then mkroot doesn't need to know anything special about the target, it just needs the relevant cross compiler to produce binaries. (The target-specific information is all kernel config and qemu invocation, not filesystem generation.)

Adding another toolchain to is its own step, of course. Sometimes it's just "target::" but some of them need suffixes and arguments. Usually "gcc -v" will give you the ./configure line used to create it, and you can compare with the musl-cross-make one and pick it apart from there.

The tricksy bit of adding LINUX= target support is making a microconfig. I should probably copy my old out of aboriginal linux into toybox's mkroot directory. That makes a miniconfig, which laboriously discovers the minimal list of symbols you'd need to switch on to turn "allnoconfig" into the desired config. (Meaning every symbol in the list is relevant and meaningful, unlike normal kernel config where 95% of them are set by defaults or dependencies.)

Due to the way the script works you give it a starting config in a name OTHER than .config (which it repeatedly overwrites by running tests to see if removing each line changes the output: the result is the list of lines that were actually needed). You also need to specify ARCH= the same way you do when running make menuconfig.

The other obnoxious thing is that current kernels do a zillion toolchain probes and save the results in the .config file, and it runs the probes again each time providing different results (so WHY DOES IT WRITE THEM INTO THE CONFIG FILE?) meaning if you don't specify CROSS_COMPILE lots of spurious changes happen between your .config file and the tests it's doing. (Sadly, as its development community ages into senescence, the linux kernel gets more complicated and brittle every release, and people like me who try to clean up the accumulating mess get a chorus of "harumph!" from the comfortable geezers wallowing in it...)

Then the third thing you do once you've got the mini.config digested is remove the symbols that are already set by the mkroot base config, which I do with a funky grep -v invocation, so altogether that's something like:

$ mv .config walrus
$ CROSS_COMPILE=/path/to/or1k-linux-musl- ARCH=openrisc ~/aboriginal/aboriginal/more/ walrus
$ egrep -v "^CONFIG_$(grep -o 'BINFMT_ELF,[^ ]*' ~/toybox/mkroot/ | sed 's/,/|/g')=y" mini.config | less

And THEN you pick through the resulting list of CONFIG_NAME= symbols to figure out which ones you need, often using menuconfig's forward slash search function to find the symbol and then navigating to it to read its help text. Almost always, you'll be throwing most of them away even from the digested miniconfig.

And THEN you turn the trimmed miniconfig into a microconfig by peeling off the CONFIG_ prefix and the =y from each line (but keep ="string" or =64 or similar), and combining the result on one line as a comma separated value list. And that's a microconfig.

And THEN you need to check that the kernel has the appropriate support: enough memory, virtual network, virtual block device, battery backed up clock, and it can halt/reboot so qemu exits.

January 6, 2024

The amount of effort the toys/pending dhcpd server is putting in is ridiculous for what it accomplishes. Easier to write a new one than trim this down to something sane.

Easier != easy, of course.

January 5, 2024

I had indeed left the 256 gig sd card at Fade's apartment, which is what I wanted to use in the "real server". (I had a half-dozen 32 gig cards lying around, but once the OS is installed that's not enough space to build both the 32 bit and 64 bit hosted versions of all the cross compilers, let alone everything else. I want to build qemu, both sets of toolchains for all targets, mkroot with kernel for all targets, and set up some variant of regression test cron build. So big sd card.)

The orange pi OS setup remains stroppy: once I got the serial adapter hooked up to the right pins, there's a u-boot running on built-in flash somewhere, as in boot messages go by without the sd card inserted. Not hugely surprising since the hardware needs a rom equivalent: it's gotta run something first to talk to the SD card. (And this one's got all the magic config to do DRAM init and so on, which it chats about to serial while doing it. At 1.5 megabit it doesn't slow things down much.) Which means I'm strongly inclined to NOT build another u-boot from source and just use that u-boot to boot a kernel from the sd card. (If it's going to do something to trojan the board, it already did. But that seems a bit low level for non-targeted spyware? My level of paranoia for that is down closer to not REALLY trusting Dell's firmware, dreamhost's servers, or devuan's preprepared images. A keylogger doing identity theft seems unlikely to live THERE...)

Besides, trying to replace it smells way too bricky.

I _should_ be able to build the kernel from vanilla source, but "I have a device tree for this board" does not tell me what config symbols need to be enabled to build the DRIVERS used by that device tree. Kind of a large missing data conversion tool that, which is not Orange Pi's fault...

So anyway, I've copied the same old chinese debian image I do not trust (which has systemd) to the board, and I want to build qemu and the cross compilers and mkroot with Linux for all the targets on the nice BIG partition, and record this setup in checklist format. (In theory I could also set up a virtual arm64 debian image again and run it under qemu to produce the arm toolchains, but I have physical hardware sitting RIGHT THERE...)

I _think_ the sudo apt-get install list for the qemu build prerequisites is python3-venv ninja-build pkg-config libglib2.0-dev libpixman-1-dev libslirp-dev but it's the kind of thing I want to confirm by trying it, and the dhcp server in pending is being stroppy. I got it to work before...

Sigh. It's HARDWIRED to hand out a specific address range if you don't configure it. It doesn't look at what the interface is set for, so it's happy to try to hand out address that WILL NOT ROUTE. That's just sad.

January 2, 2024

I fly back to Minneapolis for more medical stuff on wednesday (doing what I can while still on the good insurance), which means I REALLY need to shut my laptop down for the memory swap and reinstall before flying out.

So of course I'm weaning mkroot off oneit, since writing (most of) a FAQ entry about why toybox hasn't got busybox's "cttyhack" command convinced me it could probably be done in the shell, something like trap "" CHLD; setsid /bin/sh <>/dev/$(sed '$s@.*/@@' /sys/class/tty/console/active) >&0 2>&1; reboot -f; sleep 5 presumably covers most of it.

But while testing mkroot to make sure reparent-to-init doesn't accumulate zombies and such. That's what the trap doing SIG_IGNORE on SIGCHLD is for, a zombie sticks around while its signal delivery is pending; presumably so the parent can attach to it and query more info, but if the parent doesn't know it's exited until the signal is delivered, and it goes away as soon as the signal IS delivered, I don't know how one would take advantage of that?

Anyway, I noticed that "ps" is not showing any processes, which is a thing I hit back on the turtle board, and it's because /proc/self/stat has 0 in the ttynr field, even though stdin got redirected. But stdout and stderr still point to /dev/console? Which means the kernel thinks we're not attached to a controlling tty, so of course it won't show processes attached to the current tty.

I vaguely remember looking at lash years ago (printed it out in a big binder and read it through on the bus before starting bbsh) and it was doing some magic fcntl or something to set controlling tty, but I'm in a systematic/deterministic bug squishing mood rather than "try that and see", so let's trace through the kernel code to work backwards to were this value comes from.

We start by looking at MY code to confirm I'm looking at the right thing. (It's worked fine on the host all along, but you never know if we just got lucky somehow.) So looking at my ps.c line 247, it says SLOT_ttynr is at array position 4 (it's the 5th entry in the enum but the numbering starts from zero), and function get_ps() is reading /proc/$PID/stat on line 749, skipping the first three oddball fields (the first one is the $PID we needed to put in the path to get here, the second is the (filename) and the third is a single character type field, everything after that is a space-separated decimal numeric field), and then line 764 is the loop that reads the rest into the array starting from SLOT_ppid which is entry 1 back in the enum on line 245. This means we started reading the 4th entry (if we started counting at 1) into array position 1 (which started counting at 0), so array position 4-1=3, and 4+3 is entry 7 out of the stat field table in the kernel documentation. (In theory we COULD overwrite this later in get_ps(), but it only recycles unused fields and this is one we care about.)

The kernel documentation has bit-rotted since I last checked it. They converted proc.txt to rust (to make the git log/annotate history harder to parse), and in the process the index up top still says "1.8 Miscellaneous kernel statistics in /proc/stat" but if you search for "1[.]8" you get "1.8 Ext4 file system parameters". Which should not be IN the proc docs anyway, that should be in some sort of ext4 file? (Proc is displaying it, but ext4 is providing it.)

I _think_ what I want is "Table 1-2: Contents of the status fields (as of 4.19)" (currently line 236), but right before that it shows /proc/self/status which _looks_ like a longer version of the same info one per line with human readable field descriptions added... except it's not. That list skips Ngid, and if you look at the current kernel output it's inserted "Umask" in second place. So "which label goes with which entry offset" is lost, they gratuitously made more work for everyone by being incompatible. That's modern linux-kernel for you, an elegant solution to making the kernel somewhat self-documenting is right there, and instead they step in gratuitous complexity because "oops, all bureaucrats" drove away every hobbyist who might point that out. Anyway, table 1-2 is once again NOT the right one (it hasn't even GOT a tty entry!), table 1-4 on line 328 is ("as of 2.6.30-rc7", which came out May 23, 2009 so that note is 15 years old, lovely), and the 7th entry in that is indeed tty_nr! So that's nice. (Seriously, when Greg finally pushes Linus out this project is just going to CRUMBLE TO DUST.)

Now to find where the "stat" entry is generated under fs/proc in the kernel source. Unfortunately, there's not just /proc/self/stat, there's /proc/stat and /proc/self/net/stat so grep '"stat"' fs/proc/*.c produces 5 hits (yes single quotes around the double quotes, I'm looking for the string constant), but it looks like the one we want is in base.c connecting to proc_tid_stat (as opposed to the one above it connecting to proc_tgid_stat which is probably /proc/$PID/task/$PID/stat). Of course neither of those functions are in fs/proc/base.c, they're in fs/proc/array.c right next to each other where each calls do_task_stat() with the last argument being a 0 for the tid version and a 1 for the tgid version. The do_task_stat() function is in that same file, and THAT starts constructing the output line into its buffer on line 581. seq_put_decimal_ll(m, " ", tty_nr); is the NINTH output, not the seventh, but seq_puts(m, " ("); and seq_puts(m, ") "); just wrap the truncated executable name field, and subtracting those two makes tty_nr entry 7. So yes, we're looking at the right thing.

So where does tty_nr come from? It's a local set earlier in the function via tty_nr = new_encode_dev(tty_devnum(sig->tty)); (inside an if (sig->tty) right after struct signal_struct *sig = task->signal;) which is _probably_ two uninteresting wrapper functions: new_encode_dev() is an inline from include/linux/kdev_t.h that shuffles bits around because major:minor are no longer 8 bits each but when they expanded both minor wound up straddling major to avoid changing existing values that fit within the old ranges). And tty_devnum() is in drivers/tty/tty_io.c doing return MKDEV(tty->driver->major, tty->driver->minor_start) + tty->index; for whatever that's worth. But really, I think we care that it's been set, meaning the pointer isn't NULL.

So: where does task->signal->tty get set? I did grep 'signal->tty = ' * -r because the * skips the hidden directories, so it doesn't waste a bunch of time grinding through gigabytes of .git/objects. There's no guarantee that's what the assignment looks like, but it's a reasonable first guess, and finds 4 hits: 1 in kernel/fork.c and three in drivers/tty/tty_jobctrl.c. The fork() one is just copying the parent process's status. The assignment in proc_clear_tty() sets it to NULL, which is getting warmer. A function called __proc_set_tty() looks promising, and the other assignment is tty_signal_session_leader() again setting it to NULL. (Some kind of error handling path?)

So __proc_set_tty() is the unlocked function, called from two places (both in this same file): tty_open_proc_set_tty() and by proc_set_tty() (a wrapper that just puts locking around it). The second is called from tiocsctty(), which is a static function called from tty_jobctrl_ioctl() in case TIOCSCTTY which means this (can be) set by an ioctl.

Grepping my code for TIOCSCTTY it looks like that ioctl is getting called in openvt.c, getty.c, and init.c. The latter two of which are in pending.

The main reason I haven't cleaned up and promoted getty is I've never been entirely sure when/where I would need it. (My embedded systems have mostly gotten along fine without it.) And it's STILL doing too much: the codepath that calls the ioctl is also unavoidably opening a new fd to the tty, but I already opened the new console and dup()'d it to stdout and stderr in the shell script snippet. The openvt.c plumbing is just doing setsid(); ioctl(0, TIOCSCTTY, 0); which is a lot closer to what I need, except I already called setsid myself too. Ooh, the man page for that says there's a setsid -c option! Which didn't come up here because it's tcsetpgrp(), which in musl is a wrapper around ioctl(fd, TIOCSPGRP, &pgrp_int); Which in the kernel is back in drivers/tty/tty_jobctrl.c and tty_jobctrl_ioctl() dispatches it to tiocspgrp() which does if (!current->signal->tty) retval = -ENOTTY; so that would fail here. And it setting a second field, which seems to depend on this field.

TWO fields. Um. Ok, a non-raw controlling tty does signal delivery, when you hit ctrl-C or ctrl-Z. Presumably, this is the process (group?) the signal gets delivered TO?

Ah, man 4 tty_ioctl. Settling in for more reading. (I studied this EXTENSIVELY right when I was starting writing my own shell... in 2006. And I didn't really get to the end of it, just... deep therein.)

My real question here is "what tool(s) should be doing what?" Is it appropriate for toysh to do this for login shells? Fix up setsid -c to do both ioctl() types? Do I need to promote getty as "the right way" to do this?

I don't like getty, it SMELLS obsolete: half of what it does is set serial port parameters, which there are separate tools for (stty, and why stty can't select a controlling tty for this process I dunno). Way back when you had to manually do IRQ assignments depending on how you'd set the jumpers on your ISA card, and there was a separate "setserial" command for that nonsense because putting it in getty or stty. There's tune2fs, and hdparm, and various tools to mess with pieces of hardware below the usual "char or block device" abstractions.

But getty wants to know about baud rate and 8N1 and software flow control for historical reasons, and I'm going... This could be netconsole or frame buffer, and even if it ISN'T the bootloader set it up already (or it's virtual, or a USB device that ACTS like a serial port but isn't really, hardware like "uartlite" that's hardwired to a specific speed, so those knobs spin without doing anything) and you should LEAVE IT ALONE.

Back to 2023