Rob's Blog (rss feed) (mastodon)

2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002


March 5, 2024

If you collect your mp3 files into a directory, The Android 12 ("snow cone") built in file browser app can be convinced to play them in sequence, and will continue playing with the screen switched off. (Just go to "audio files" and it shows you folders you've created in random other places, for some reason?)

But as soon as focus returns to the app (which is what happens by default when you switch the screen back ON), the playback immedately jumps to the position it was at when you switched it off, and playback switches to that point in that song. Redrawing the app's GUI resets the playback position. Oh, and if you let it play long enough, it just suddenly stops. (And then jumps to the old position when you open it to see what's going on.) The user interface here is just *chef's kiss*.


March 4, 2024

We're tentatively having the storage pod picked up on friday, renting a u-haul to take Fuzzy's stuff to her father's place on saturday, including the 20 year old cat, and then I drive to the airport Sunday. Fingers crossed.

My proposed talk at Texas LinuxFest (explaining mkroot) got accepted! Except I plan to be in minneapolis after this week, and have to fly BACK for the talk. (And get a hotel room, because the realtor is highly dubious about me bringing a sleeping bag to crash on the floor of a house with a lockbox on the front. Yes, this is the same realtor that insists the place has to be listed for $150k less than the tax assessment. She's a friend of my wife's sister.)

So I may have to get a hotel in order to speak at an Austin conference. Oh well, I've done that for a zillion other conferences...

In the netcat -o hexdump code, TT.ofd is unsigned because I'm lazy and wanted one "unsigned fd, inlen, outlen;" line in the GLOBALS() declaration instead of two lines (one int fd, one unsigned inlen, outlen), since xcreate() can't return -1 (it does a perror_exit() instead).

I thought about adding a comment, but adding a comment line to explain I saved a line seems a bit silly.

I found an old pair of glasses while packing (in a box with a prescription split from 2012), which is kind of backwards from the pair I've been wearing in that the LEFT eye is more or less clearly corrected, but the RIGHT eye is fuzzy at any distance. I've refused to update my prescription for several years now with the excuse "they're reading glasses" ever since I figured out that the reason I'm nearsighted is my eyes adjust to whatever I've been looking at recently, and I read a lot. The day of the school eye test in second grade on Kwaj I'd been reading all morning and my eyes hadn't had time to adjust BACK, so they gave me glasses. Which my parents kept reminding me to wear. So I'd read with those, focusing up close, and 20 years of feedback loop later I finally figured out what's going on and STOPPED UPDATING. But I still spend most of my time staring at a laptop or phone or similar, so far away is fuzzy unless I've taken a couple days off. But it mostly stopped GETTING WORSE, as evidenced by glasses from 2012 not being worse than the current set, just... different.

My last few sets of glasses I just went "can you copy the previous prescription", which they can do by sticking it in a machine that reads the lenses, but after a few "copy of a copy" iterations it went a little weird in a church glass sort of way. (Which my eyes mostly adjusted to!) But I've developed a dominant eye over the past couple years... and these old glasses are BACKWARDS. The dominant eye with these glasses is the LEFT one, and the right is hard to read text at my normal length with just that one eye open.

So I'm wearing that pair now, on the theory variety's probably good in terms of not screwing up my visual cortex so nerves atrophy or something, in a "that eye's input isn't relevant" sort of way. Honestly I should go outside and stare at distant things more often, but texas sunlight and temperatures are kind of unpleasant most of the year.

(I remember why I stopped wearing this pair. One of the nose pieces is sharp and poky.)


March 3, 2024

Gave up and admitted I'm not making the March 5 flight to minneapolis, and had Fade bump it back to the evening of the 10th (which is when I actually told the realtor I'd be out of here). I immediately got hit with ALL THE STRESS, because my subconscious knew the deadline on the 5th wasn't real but the one the 10th is. (My brain is odd sometimes, but I've been living with it for a while now.)

Red queen's race continues: I hadn't checked in the hwclock rewrite motivated by glibc breakage which screwed up the syscall wrapper to not actually pass the arguments to the syscall. Meanwhile, musl-libc changed their settimeofday() to NOT ACTUALLY CALL THAT SYSCALL AT ALL, which is the only way to set the in-kernel timezone adjustment. So I rewrote hwclock to call the syscall directly, but before checking it in I wanted to test that it still works properly (I.E. reads and writes the hardware clock properly), and I'm not gonna do that on my development laptop so I needed to do a mkroot build to test under qemu.

Which is how I just found the musl commit that removed __NR_settimeofday, thus breaking my new version that calls the syscall directly. Rich both broke the wrapper AND went out of his way to make sure nobody calls the syscall directly, because users aren't allowed to do things he disapproves of. (For their own good, they must be CONSTRAINED.)


March 2, 2024

I've had mv -x sitting in my tree for a couple days, but it came up on the coreutils mailing list (in a "don't ask questions, post errors" sort of way) so I'm checking it in.

In theory both renameat2() and RENAME_EXCHANGE went in back in 2014 (ten years ago now!), but glibc doesn't expose either the Linux syscall or the constant Linux added unless you #define STALLMAN_FOREVER_GNU_FTAGHN_IA_IA and I categorically refuse. Also, this should build on macos and freebsd, which probably don't have either? So I need a function in portability.[ch] wrapping the syscall myself inside an #ifdef.

Which is a pity, because renameat() seems like what "mv" really WANTS to be built around. Instead of making a bunch of "path from root" for the recursive case, the clean way to handle -r is to have openat() style directory filehandles in BOTH the "from" and "to" sides, and that's what renameat() does: olddirfd, oldname, newdirfd, newname.

Although there's still the general dirtree scalability issue I have a design for but haven't properly coded yet: keeping one filehandle open per directory level leads to filehandle exhaustion if you recurse down far enough. I need to teach dirtree() to close parent filehandles and re-open them via open("..") as we return back up (then fstat() and compare the dev/ino and barf if it's not the same). (And even if I teach the dirtree() plumbing to do this, teaching _mv_ to do it would be separate because it's two parallel traversals happening at the same time.)

Without conserving filehandles you can't get infinite recursion depth, and you can trivially create an infinite depth via while true; do echo mkdir -p a b; echo mv a b/a; echo mv b a; done or similar so at least "rm -r" can't be limited by PATH_MAX. And without the stat to see if that gets us the parent node's same dev/ino back rm -rf could wind up deleting the WRONG STUFF if an ill-timed directory move happened in a tree that was going away, which is important to prevent. So we both need to check that the parent filehandle is safe to close because we can open("..") to get it back (if not, we followed a symlink or something and should keep the filehandle open: if you cause filehandle exhaustion by recursing through symlinks to directories, that's pilot error if you ask me), AND we need to confirm we got the right dev/ino back after reopening.

But if we DO get a different dev/ino when eventually reopening "..", what's the error recovery? We can drill back down from the top and see how far we get, but do we error out or prune the branch or what? Doing "mv" or "rm" on a tree we're in the middle of processing is bad form, and if we're getting different results later somebody mucked with our tree mid-operation, but what's the right RESPONSE? At a design level, I mean.

Anyway, that's a TODO I haven't tackled yet.


March 1, 2024

The pod people's flatbed truck arrived today, and dropped off a storage container using what I can only describe as an "elaborate contraption". (According to Fade, their website calls it PODzilla, imagine a giant rectangular daddy longlegs spider with wheels, only it lifts cargo containers on and off a big flatbed tow truck.) There is now a large empty box with a metal garage door on one side in the driveway, which I have been carrying the backlog of cardboard boxes we packed and taped up into.

I'm very tired. Fuzzy's gone to the u-haul store to buy more boxes. We're like 20% done, tops.

I tried to get a toybox release out yesterday (using the "shoot the engineers and go into production" method of just SHIPPING WHAT I HAVE, with appropriate testing and documentation), but got distracted by a mailing list question about the "getopt" command in pending and wound up wasting the evening going through that instead. Although really the immediate blocker on the release is I un-promoted the passwd command when I rewrote lib/password.c until I can properly test that infrastructure (under mkroot, not on my development system!) and that's both a pain to properly set up tests for (the test infrastructure doesn't run under toysh yet because I've refused to de-bash it, I'm trying to teach toysh all the bashisms it uses instead) and because there's a half-dozen other commands (groupadd, groupdel, useradd, userdel, sulogin, chsh) that are low hanging fruit to promote once that infrastructure's in, and what even ARE all the corner cases of this plumbing...

There are like 5 of these hairballs accumulated, each ALMOST ready, but that one that causes an actual regression if I don't finish it.

Wound up promoting getopt, so that's something I guess. Still not HAPPY with it, but it more or less does the thing. Given my stress levels accomplishing anything concrete is... an accomplishment.


February 29, 2024

The coreutils maintainer, Padrig Brady, just suggested using LLMs to translate documentation. I keep thinking gnu can't possibly get any more so, but they manage to plumb new depths.

The University of Texas just started offering a master's degree program in "AI".

Linus Torvalds recently talked about welcoming LLM code into the kernel, in the name of encouraging the younguns to fleet their yeek or some such. (The same way he wants to have langauge domain crossings in ring zero by welcoming in Rust while the majority of the code is still C. Because nothing says "maintainable" like requiring a thorough knowledge of two programming langauges' semantics and all possible interactions between them to trace the logic of a single system call. So far I've been able to build Linux without needing a BPF compiler. If at some point I can't build kernels without needing a Rust compiler, that's a "stay on the last GPLv2 release until finding a different project to migrate to" situation.)

The attraction of LLMs is literally Dunning-Kruger syndrome. Their output looks good to people who don't have domain expertise in the relevant area, so if you ask it to opine about economics it looks GREAT to people who have no understanding of economics. But if you ask it to output stuff you DO know about, well obviously it's crap. I.E. "It's great for everything else, but it'll never replace ME, so I can fire all my co-workers and just have LLMs replace them while I use my unique skills the LLMs do a bad job replicating".

Fundamentally, an LLM can't answer any question that hasn't got a known common answer already. It's morphing together the most common results out of a big web-scraped google cache, to produce the statistically most likely series of words from the input dataset to follow the context established by the prompt. The answer HAS to already be out there in a "let me Google that for you" sense, or an LLM can't provide it. The "morphing together" function can combine datasets ("answer this in the style of shakespeare" is a more advanced version of the old "jive" filter), but whether the result is RIGHT is entirely coincidental. Be careful what you wish for and caveat emptor are on full display.

I can't wait for license disputes to crop up. Remember the chimp who took a photo of itself and a court ruled the image wasn't copyrighted? LLM code was trained on copyrighted material, but the output is not itself copyrightable because human creativity wasn't involved. But it's not exactly public domain, either? Does modifying it and calling your derived work your own IP give you an enforceable copyright when 95% of it was "monkey taking a selfie?" and the other 5% is stolen goods?

Lovely comment on mastodon, "Why should I bother to read an LLM generated article when nobody could be bothered to write it?" Also people speculating that ChatGPT-4 is so much worse than ChatGPT-3 that it must have been intentionally sabotaged (with speculation about how this helps them cash out faster or something?) when all the LLM designers said months ago that sticking LLM output into an LLM training dataset was like sticking a microphone into a speaker, and the math goes RAPIDLY pear shaped with even small amounts of contamination poisoning the "vibe" or whatever's going on there. (Still way more an art than a science.) So scraping an internet that's got LLM-generated pages in it to try to come up with the NEXT round of LLM training data DOESN'T WORK RIGHT. The invasive species rapidly poisons its ecosystem, probably leading to desertification.

Capitalism polluting its own groundwater usually has a longer cycle time, but that's silicon valley for you. And white guys who confidently answer questions regardless of whether they actually know anything about the topic or not are, of course, highly impressed by LLMs doing the same. They made a mansplaining engine, they LOVE it.

"Was hamlet mad" was a 100 point essay question in my high school shakespeare class, where you could argue either side as long as you supported it. "Was hamlet mad" was a 2 point true/false question in my sophomore english class later the same month. Due to 4 visits to the Johns Hopkins CTY program I wound up taking both of those the same semester in high school, because they gave me the senior course form to fill out so I could take calculus as a sophomore, so I picked my other courses off there too and they didn't catch it until several months later by which point it was too late. I did not enjoy high school, but the blatant "person in authority has the power to define what is right, even when it's self-contradictory and patently ridiculous" experience did innoculate me against any desire to move to Silicon Valley and hang out with self-important techbros convinced everyone else is dumber than they are and there's nothing they don't already know. A culture where going bankrupt 4 times and getting immediate venture capital funding for a 5th go is ABSOLUTELY NORMAL. They're card sharps playing at a casino with other people's money, counting cards and confidently bluffing. The actual technology is a side issue. And now they've created a confident bluffing engine based on advanced card counting in a REALLY BIG deck, and I am SO TIRED.


February 28, 2024

Trying hard to get a leap day toybox release out, because the opportunity doesn't come along that often.

This is why Linux went to time based releases instead of "when it's ready" releases, because the longer it's BEEN since the last release the harder it is to get the next release out. Working on stabilization shakes todo items loose and DESTABILIZES the project.


February 27, 2024

When I tested Oliver's xz cleanup, which resulted in finding this bug, what I muttered to myself (out loud) is "It's gotta thing the thing. If it doesn't thing the thing it isn't thinging."

This is my clue to myself that it may be time to step away from the keyboard. (I didn't exhaust myself programming today, I exhausted myself boxing up the books on 4 bookshelves so somebody could pick the empty bookshelves up and move them to her daughter's bedroom. This leaves us with only 14 more bookshelves to get rid of.)

Remember how two people were working on fdpic toolchain support for riscv? Well now the open itanium crowd has decided to remove nommu support entirely. Oh well. (It's a good thing I can't _be_ disappointed by riscv...)


February 24, 2024

Sigh, started doing release notes with today's date at the top, and as usual, that was... a bit ambitious.

Editing old blog entries spins off todo items as I'm reminded of stuff I left unfinished. Going through old git commits to assemble release notes finds old todo items. Doing "git diff" on my dirty main dev tree finds old todo items... The question is what I feel ok skipping right now.

I'm too stressed by the move to make good decisions about that at the moment...


February 23, 2024

Sigh, the censorship on crunchyroll is getting outright distracting. Rewatching "kobyashi maid dragon" (_without_ subtitles this time, I've heard it so many times I kind of understand some of the japanese already and I know the plot so am trying to figure which word means what given that I sort of know what they're saying), and in the first episode Tohru (the shapeshifted dragon) was shown from behind, from the waist up, with her shirt off. But you can no longer show a woman's bare back on crunchyroll (you could last year!), so they edited in another character suddenly teleporting behind her to block the view.

This is 1950's "Elvis Presley's Pelvis can't be shown on TV" levels of comstock act fuckery. (And IT IS A CARTOON. YOU CANNOT HAVE SEX WITH A DRAWING. There are so many LAYERS of wrong here...)

Imagine the biblical prohibitions on food had been what survived into the modern day instead of the weirdness about sex. The bible's FULL of dietary restrictions predating germ theory, the discovery of vitamins, or any understanding of allergens: can't mix milk and meat, no shellfish, no meat on fridays, give stuff up for lent, fasting, the magic crackers and wine becoming LITERALLY blood and human flesh that you are supposed to cannibalize but it's ok because it's _church_ magic... Imagine black censor bars over the screen every time somebody opens their mouth to eat or drink. Imagine digitally blurring out any foodstuff that isn't explicitly confirmed, in-universe, as kosher or halal. Imagine arguing that watching "the great british bake-off", a dirty foreign film only available to adults on pay-per-view in 'murica, was THE SIN OF GLUTTONY and would make you statistically more likely to get tapeworms because FOOD IS DANGEROUS.

Kind of distracting, isn't it? Whether or not you're particularly interested in whatever made anime character du jour shout "oiishiiii" yet again (it's a trope), OBVIOUSLY CENSORING IT is far, far, far more annoying than the trope itself could ever be. Just show it and keep going. Even if I wanted to (I don't) I can't eat a drawing of food through the screen... but why exactly would it be bad if I could? What's the actual PROBLEM?

I am VERY TIRED that right-wing loons' reversion to victorian "you can see her ankles!" prudishness is being humored by so many large corporations. These idiots should not have traction. Their religion is funny about sex EXACTLY the same way it's funny about food, with just as little scientific basis. These days even their closest adherents ignore the EXTENSIVE explicit biblical dietary prohibitions (Deuteronomy 14 is still in the bible, forbidding eel pie and unagi sushi although Paul insists that God changed his mind since then, but even the new testament forbids eating "blood" and "meat of strangled animals" in Acts 15:29 and the medieval church had dozens of "fast days" on top of that, plus other traditions like anorexia mirabilis, but these days we ignore all that because their god isn't real and we all AGREE the food prohibitions were nothing but superstition propagated from parent to child the same way santa claus and the tooth fairy are. Even the more RECENT stuff like "lent" (which gave us the McDonalds Fish sandwich because christianity was still culutrally relevant as recently as the 1960s) is silly and quaint to anyone younger than Boomers.

But the SEX part persists (officiating marriage was too lucrative and provided too much control over the populace to give up), and is still causing enormous damage. Religious fasting is obsolete but shame-based abstinence is still taught in schools. Except most sexually transmitted diseases only still EXIST because of religious shame. Typhoid mary was stopped by science, because we published the information and tracked the problem down and didn't treat getting a disease as something shameful to be hidden and denied. Sunlight was the best disinfectant, we find OUT sources of contamination and track them down with the help of crowdsourcing. NOT with medieval "for shame, you got trichinosis/salmonella/listeria what a sinner, it's yahweh jehovah jesus's punishment upon you, stone them to death!" It's precisely BECAUSE we drove the religious nonsense out and replaced it with science and sane public policy that you can eat safely in just about any restaurant even on quite rural road trips. We have regular testing and inspections and have driven a bunch of diseases out of the population entirely, and when there IS an outbreak of Hepatitis A we don't BLAME THE VICTIMS, we track down the cause and get everybody TREATED.

I don't find cartoon drawings of women particularly arousing for the same reason I don't find cartoon drawings of food particularly appetizing... but so what if I did? So what if "delicious in dungeon" or "campfire cooking" anime made me hungry? Cartoon food on a screen is not real food in front of me for MULTIPLE REASONS. which also means I can't get fat from it, or catch foodborne pathogens, or allergens, or deprive someone else's of their rightful share by eating too much, or steal the food on screen, or contaminate it so other people get sick. Even if I _did_ salivate at cartoon food... so what?

Even if I was attending a play with real actors eating real food up on the stage live in front of me, which I could literally SMELL, I still couldn't run up and eat it because that's not how staged entertainment works. But the Alamo Drafthouse is all about "dinner and a movie" as a single experience, and when I watched Sweeney Todd at the Alamo Drafthouse they had an extensive menu of meat pies (which is how I found out I'm allergic to parsnips), and it was NOT WRONG TO EAT WHILE WATCHING when the appropriate arrangements had been made to place reality in front of each individual attendee, EVEN THOUGH THAT MOVIE IS LITERALLY ABOUT CANNIBALISM. You can't make a "slippery slope" argument when the thing LITERALLY ACTUALLY HAPPENING would be fine. Oh wow, imagine if a summoned elf from another world climbed out of the TV and had sex with me right now! Um... ok? This is up there with wanting to fly and cast "healing" from watching a cartoon with magic in it. The same church also did a lot of witch burnings, it was wrong of them and we're over that now. Today, watching Bewitched or I Dream of Jeanie, I'm really not expecting to pick up spells because I'm not four years old, but if watching "The Tomorrow People" taught me to teleport... where's the downside? What do you think you're protecting anyone FROM?

These entertainments regularly show people being brutally, bloodily murdered, and THAT is just fine. Multiple clips of deadpool on youtube show the "one bullet through three heads in slow motion" scene unblurred, but the scenes showing consensual sex with the woman Wade Wilson lives with and proposes marriage to and spends half the movie trying to protect and/or get back to, THAT can't be shown on youtube. (And even the movie has some internalized misogyny, albeit in the form of overcompensating the other way and still missing "equality": in the scene where he collapses from the first sign of cancer, he's fully naked and she's wearing underwear, because male nudity isn't sexual while women in underwear or even tight clothing are always and without exception sexual and beyond the pale, and showing an orifice literally HALF THE POPULATION has is unthinkable even in an R rated movie.)

Sexual repression has always correlated strongly with fascism. The nazis first book burning was a sexual research institute. The victorian prudishness of the british was the period they were conquering an empire with jamaican slave plantations and feeding opium to china and the East India company subjugating india and native american genocides (George "town killer" Washington) so on.

It's currently the boomers doing it. As teenagers in the 1960s they pushed "sex drugs rock and roll" into the mainstream, and then once they were too old to have sex with teenagers they outlawed teenagers having sex with EACH OTHER or selling pictures they took of themselves (the supreme court's Oberfell decision in 1982 invented the legal category of "child porn" because some teenage boys selling pictures they took of themselves masturbating made it all the way to the supreme court, which is why everybody used to have naked baby pictures before that and the 1978 movie "superman" showed full frontal nudity of a child when his spacecraft lands without anybody thinking it was sexual, but 4 years later the law changed so filming things like that is now SO TERRIBLE that you can't even TALK ABOUT IT without being branded as "one of them", which makes being a nudist a bit frustrating). And now the Boomers are so old even the viagra's stopped working, they're trying to expunge sex from the culture entirely.

Sigh. This too shall pass. But it's gonna get uglier ever year until a critical mass of Boomers is underground. (In 2019 there were estimated to be about 72 million Boomers left, and 4 million of them died between the 2016 and 2020 elections which was the main reason the result came out differently.)

In the meantime... crunchyroll. Last week I tried to start a new series called "I couldn't become a hero, so I reluctantly decided to get a job", and I'm tempted to try to buy the DVD of a series I may not even like because I CANNOT WATCH THIS. In the first FIVE MINUTES they'd clearly edited a half-dozen shots to be less porny. I'm not interested in trying to sexualize cartoon characters, but this is "han shot first" and the ET re-release digitally editing the guns into walkie-talkies levels of obvious and unconvincing bullshit. Even when I'm theoretically on their side (defund the police, ACAB, I'm very glad the NRA is imploding) the cops who showed up to separate Elliott from his alien friend HAD GUNS and STOPPIT WITH THE PHOTOSHOP. If I can tell on a FIRST WATCH that you're editing the program within an inch of its life... every time I'm pulled right out of my immersion again.

I dislike smoking, but Disney photoshopping cigarettes out of Walt Disney's photos is historical revisionism. If a show had a bunch of characters chain-smoke but they digitally edited them to have lollypops and candycanes in their mouths all the time instead, gesticulating with them... You're not fooling anyone. Imagine if they did that to Columbo. Columbo with his cigar digitally removed and every dialog mention of it clipped out. You can be anti-cigar and still be WAY CREEPED OUT BY THAT. Cutting the "cigarette? why yes it is" joke out of Police Squad does not make you the good guy.

Do not give these clowns power. The law is whatever doesn't get challenged.


February 22, 2024

Sat down to rebuild all the mcm-buildall.sh toolchains this morning for the upcoming release (so I can build mkroot against the new kernel), but the sh4 sigsetjmp() fix went in recently (a register other stuff used was getting overwritten) and Rich said it was just in time for the upcoming musl release, so I asked on IRC how that was doing, and also mentioned my struggle with nommu targets and the staleness of musl-cross-make, and there was a long quite productive discussion that resulted in Rich actually making a push to mcm updating musl to 1.2.4! Woo! And it looks like they're doing a lot of cool stuff that's been blocked for a bit.

As part of that discussion, somebody new (sorear is their handle on the #musl channel on libra.chat) is working on a different riscv fdpic attempt, and meowray is working on adding fdpic support to llvm-arm. Either could potentially result in a nommu qemu test environment, I'm all for it.


February 21, 2024

One of my phone apps "updated" itself to spray advertising all over everything, after 2 years of not doing that. Showing one on startup I'd probably wince and let the frog boil, but having an animated thing ALWAYS on screen when it's running: nope. And Android of course does not let me downgrade to the previous version of anything because that would be giving up too much control.

It doesn't show ads if I kill the app, go into airplane mode, and relaunch it without network access. Then I get the old behavior. So I went into the app permissions, viewed all, and tried to revoke the "have full network access" permission. The app is an mp3 player reading files off of local storage, I switch to it from the google built-in one because Google's didn't understand the concept of NOT streaming but just "only play local files"...

But Android won't let me revoke individual app permissions. I can view "other app capabilities", but long-press on it does nothing, nor does swipe to the side, and tapping on it just brings up a description with "ok". No ability to REVOKE any. Because despite having purchased a phone, I am the product not the customer. Even having put the phone into debug mode with the "tap a zillion times in a random sub-menu" trick, I still don't get to control app permissions. (So what are the permissions FOR, exactly?)

Sigh, serves me right for running vanilla android instead of one of the forks that actually lets me have control over my phone. I suppose there's a thing I could do with adb(?), but keeping the sucker in airplane mode while listening is a workaround for now...

And no I don't feel guilty about "but what about all the effort the app developer put into it", I can play an mp3 I downloaded through the "files" widget: it's built into the OS. Which is fine for the copy of Rock Sugar's "reinventinator" Fade bought me for christmas: whole album is one big ogg file, threw it on on my web server and downloaded it, and it plays fine. But the File app doesn't advance to the next one without manual invervention. "Play this audio file" is probably a single line of java calling a function out of android's standard libraries. Going from an android "hello world" app tutorial to "display list of files, click on one to play and keep going in order, show progress indicator with next/forward and pause/play button, keep going when screen blanked with the lock screen widget... In fact nevermind that last bit, the "file" widget is doing the exact same lock screen widget playing that ogg file, so this is probably a standard gui widget out of android's libraries and you just instantiate it with flags and maybe some callbacks. (Sigh, it's Java, they're going to want you to subclass it and provide your own constructor and... Ahem.) Anyway, that's also built into the OS.

This is probably a weekend's work _learning_ how to do all that. Including installing android studio. And yes my $DAYJOB long ago was writing java GUI apps for Quest Multimedia and I taught semester long for-credit Java courses at austin community college: I'm stale at this but not intimidated by it.

But I haven't wanted to open the app development can of worms because I'm BUSY, especially now you have to get a developer ID from Google by providing them government ID in order to have permission to create a thing you can sideload on your OWN PHONE.

Not going down that rathole right now. I am BUSY.


February 20, 2024

Hmmm, you know a mastodon feed of this blog doesn't have to be CURRENT, I could do audio versions of old entries, do notes/01-23-4567 dirs each with an index.html and mp3 file (alongside the existing one-big-text version), and post links to/from a (new, dedicated) mastodon account as each one goes up, which would allow people to actually comment on stuff, without my tendency to edit and upload weeks of backlog at a time. (Hmmm, but _which_ mastodon account? Does dreamhost do mastodon? Doesn't look like it. I don't entirely trust mstdn.jp to still be around in 5 years, I mean PROBABLY? But it's outside of my control. How much of the legal nonsense of running your own server is related to letting OTHER people have accounts on it, and how much is just "the Boomers are leaving behind a dysfunctinally litigous society". There was a lovely thread about mastodon legal setup tricks for individuals running their own server, things like notifying some government office (a sub-program of the library of congress I think?) to act as a DMCA takedown notice recipient "agent" on your behalf, but it was on twitter and went away when that user deleted their account. Mirror, don't just bookmark...)

Ahem: backstory.

This blog is a simple lightly html formatted text file I edit in vi, and I tend to type in the text extemporaneously and do most of the HTML formatting in a second pass, plus a bunch of editing to replace [LINK] annotations with the appropriate URL I didn't stop to grab at the time, and finish half-finished trail off thoughts not englished wordily because brain distract in

Anyway, the "start of new entry" lines are standardized, and as I go through editing I replace my little "feb 20" note with a cut and paste from the last entry I edited to the start of the new one, and change the date in the three places it occurs. Yes vi has cut and paste: "v [END] y [PAGEUP... cursor cursor...] p" and then "i" to go into insert mode and cursor over to the three places the entry's date shows up in the first line and type over it because I'm sure there's a "search and replace within current line" magic key but I've never bothered to learn it. It would be great to to have the date in just ONE place, but I'm editing raw HTML and it's got an <a name="$DATE"> to provide jump anchors, an <hr> tag to provide a dividing line, <h2> start and end tags to bump the font up, an <a href="#$DATE"> tag to provide an easily copyable link to the entry (each entry links to itself), and then an expanded english date to provide the display name for the link. (And then on the next line, usually a <span id=programming> tag so SOMEDAY I can make multiple rss feed generators that show only specific categories, if you "view source" there's a commented out list of span tags at the top I've historically used and try to stick to.)

The advantage of each new entry having a standardized line at the start is it's easy to search for and parse, and I have a python script a friend (Dr. What back at timesys) wrote ages ago to generate an rss feed for my blog, which I've rewritten a lot since then but it's still in python rather than sed out of historical inertia, and also me treating rss (actually "atom", I think?) as a magic undocumented format likely to shatter if touched. (It is python 2. It will not ever be python 3. If a debian upgrade takes away python 2, that's when the sed comes out. Posix has many failings, but "posix-2024" is not going to force you to rewrite "posix-2003" scripts that work, the same way modern gasoline still works in a 20 year old car.)

What this form of blogging does NOT provide is any way for readers to leave comments (other than emailing me or similar), which was the big thing I missed moving from livejournal back to blogging on my own site. And I am NOT doing that myself: even if I wanted to try to deal with some sort of CGI plumbing for recording data (I don't), user accounts and moderation and anti-spam and security and so on are way too much of a pain to go there. (I have met the founders of Slashdot. It ate their lives, and that was 20 years ago.)

But now that I'm on mastodon (as pretty much my only social network, other than some email lists and the very occasional youtube comment under an account not directly connected to anything else), using a mastodon account as an rss feed for the blog seems... doable? Ok, the entries don't have TITLES. Summaries would be a problem. (On mstdn.jp posts have a 500 character limit, I guess I could just do start of entry. But they're not realy organized with topic scentences, either.)

The real problem has been that I'm not posting promptly, and tend to do so in batches (because editing) which floods the feed. Possibly less of an issue with rss feeds, where you can get to it much later. (The feed readers I've seen had each data source basically in its own folder, not one mixed together stream like social media likes to do so stuff gets buried if you don't get to it immediately.)

There's also a lot of "chaff", since a blog has multiple topics and I might want to serialize just one (the id=programming stuff). I've (manually) put the tags in, but haven't USED them yet. Haven't even mechanically confirmed the open/close pairs match up, just been eyeballing it...


February 19, 2024

Watched the building a busybox based debian peertube video, which really should have been a 5 minute lightning talk. It boils down to "I use mmdebstrap instead of debootstrap, here's some command line options that has and how I used them to install debian's busybox package in a semi-empty root directory and got it to boot". It's not _really_ a busybox based debian, more hammering in a screw and filing the edges a bit.

First he established "debian's too big for embedded" by doing mmdebstrap --variant=minbase unstable new-dir-name and showing the size (not quite 200 megs), then he trimmed it with --dpkgopt='path-exclude=/usr/share/man/*' and again for (/usr/share/doc/* and /usr/share/locale/*) which was still over 100 megs.

Next he mentioned you can --include packagename (which takes a CSV argument) and introduced the --variant=custom option which only installs the packages you list with --include. And he talked about --setup-hook and --customize-hook which are just shell command lines that run before and after the package installs (in a context he didn't really explain: it looks like "$1" is the new chroot directory and the current directory already has some files in it from somwhere? Maybe it's in the mmdebstrap man page or something...)

Putting that together, his "busybox install" was:


INCLUDE_PKGS=dpkg,busybox,libc-bin,base-files,base-passwd,debianutils
mmdebstrap --variant=custom --include=$INCLUDE_PKGS \
  --hook-dir=/usr/share/mmdebstrap/hooks/busybox \
  --setup-hook='set -i -e "1 s/:x:/::/g" > "$1/etc/passwd"' \
  --customize-hook='cp inittab $1/etc/inittab' \
  --customize-hook='mkdir $1/etc/init.d; cp rcS $1/etc/init.d.rcS' \
  unstable busybox-amd64

(Note, the "amd64" at the end was just naming the output directory, the plumbing autodetects the current architecture. There's probably a way to override that but he didn't go there.)

He also explained that mmdebootstrap installs its own hooks for busybox in /usr/share/mmdebootstrap/hooks/busybox and showed setup00.sh and extract00.sh out of there, neither of which seemed to be doing more than his other customize-hook lines so I dunno why he bothered, but that's what the --hook-dir line was for apparently. (So it doesn't do this itself, and it doesn't autodetect it's installing busybox and fix stuff up, but you can have it do BITS of this while you still do most of the rest manually? I think?)

In addition to the packages he explicitly told it to install, this sucked in the dependencies gcc-12-base:amd64 libacl1:amd64 libbz2-1.0:amd64 libc6:am64 libdebconfclient0:amd64 libgcc-s1:amd64 liblzma5:amd64 libpcre2-8-0:amd64 libselinux1:amd64 mawk tar zlib1g:amd64 and that list has AWK and TAR in it (near the end) despite busybox having its own. I haz a confused. This was not explained. (Are they, like, meta-packages? I checked on my ancient "devuan botulism" install and awk claims to be a meta-package, but tar claims to be gnu/tar.)

Anyway, he showed the size of that (still huge but there's gcc in there) then did an install adding the nginix web server, which required a bunch more manual fiddling (creating user accounts and such, so he hasn't exactly got a happy debian base that "just works" for further packages, does he) and doing that added a bunch of packages and ~50 megs to the image size. (Plus naginiks's corporate maintainer went nuts recently and that project forked under a new name, but that was since this video.)

Finally he compared it against the alpine linux base install, which is still smaller than his "just busybox" version despite containing PERL for some reason. This is because musl, which the above technique does not address AT ALL. (It's pulling packages from a conventionally populated repository. Nothing new got built from source.)

Takeaway: the actual debian base appears to be the packages dpkg, libc-bin, base-files, base-passwd, and debianutils. This does not provide a shell, command line utilities, or init task, but something like toybox can do all that. Of course after installing a debootstrap I generally have to fiddle with /etc/shadow, /etc/inittab, and set up an init ANYWAY. I even have the checklist steps in my old container setup docs somewhere...


February 18, 2024

The limiting factor on a kconfig rewrite has been recreating menuconfig, but I don't really need to redo the current GUI. I can just have an indented bullet point list that scrolls up and down with the cursor keys and highlight a field with reverse text. Space enables/disable the currently highlighted one, and H or ? shows its help text. Linux's kconfig does a lot with "visibility" that I don't care about (for this everything's always visible, maybe greyed if it needs TOYBOX_FLOAT or something that's off?). And Linux's kconfig goes into and out of menus because an arbitrarily indented bullet point list would go off the right edge for them: the kernel's config mess goes a dozen levels deep, but toybox's maximum depth is what, 4? Shouldn't be that hard...

As for resolving "selects" and "depends", according to sed -n '/^config /,/^\*\//{s/^\*\///;p}' toys/*/*.c | egrep 'selects|depends' | sort -u there aren't current any selects, and the existing depends use fairly simple logic: && and || and ! without even any parentheses, which is the level of logic already implemented in "find" and "test" and such (let alone sh). Shouldn't be too challenging. I should probably implement "selects" and parentheses just in case, though...

The cursor up and down with highlighting stuff I already did in "top" and "hexedit" and such, and I should really revisit that area to do shell command line editing/history...


February 17, 2024

The deprecation news of the week:

The last one is sad. FreeBSD is rendering itself irrelevant in the embedded world. Oh well, if they want to embrace being "MacOS Rawhide and nothing more", it's their project...

Ongoing sh4 saga: I might be able to get FDPIC working on qemu-system-sh4, but it turns out qemu-system-sh4 doesn't boot mkroot anymore, even in a clean tree using the known-working kernel from last release.

I bisected it to a specific commit but commenting out the setvbuf() in main didn't help. Tracked it down to sigsetjmp() failing to return. Note that this is SET, which should just be writing to the structure. Yes it's 8 byte aligned. This bug is jittery crap that heisenbugs away if my debug printfs() have too many %s in them (then it works again). Asked for help on the musl, linux-sh, and toybox lists.

And of course, I got private email in reply to my list posts. As always:

On 2/16/24 20:22, [person who declined to reply publicly] wrote:
> Shot into the blue:
>
> try with qemu-user; mksh also currently has a regression test
> failing on a qemu-user sh4 Debian buildd but with one of the
> libcs only (klibc, incidentally, not musl, but that was with
> 1.2.4)

Hmmm, that does reproduce it much more easily, and I get more info:

Unhandled trap: 0x180
pc=0x3fffe6b0 sr=0x00000001 pr=0x00427c40 fpscr=0x00080000
spc=0x00000000 ssr=0x00000000 gbr=0x004cd9e0 vbr=0x00000000
sgr=0x00000000 dbr=0x00000000 delayed_pc=0x00451644 fpul=0x00000000
r0=0x3fffe6b0 r1=0x00000000 r2=0x00000000 r3=0x000000af
r4=0x00000002 r5=0x00481afc r6=0x407fffd0 r7=0x00000008
r8=0x3fffe6b0 r9=0x00456bb0 r10=0x004cea74 r11=0x3fffe6b0
r12=0x3fffe510 r13=0x00000000 r14=0x00456fd0 r15=0x407ffe88
r16=0x00000000 r17=0x00000000 r18=0x00000000 r19=0x00000000
r20=0x00000000 r21=0x00000000 r22=0x00000000 r23=0x00000000

Might be able to line up the PC with the mapped function with enough digging to find the failing instruction...

What IS a trap 0x180? Searching the sh4 software manual for "trap" says there's something called an exception vector... except "exception" has over 700 hits in that PDF and "exception vector" has two, neither of which are useful.

Ok, in qemu the string "Unhandled trap" comes from linux-user/sh4/cpu_loop.c which is printing the return code from cpu_exec() which is in accel/tcg/cpu-exec.c which is a wrapper for cc->tcg_opts->cpu_exec_enter() which is only directly assigned to by ppc and i386 targets, I'm guessing uses one of those curly bracket initializations in the others? According to include/hw/core/tcg-cpu-ops.h the struct is TCGCPUOps... Sigh, going down that path could take a while.

Alright, cheating EVEN HARDER:

$ grep -rw 0x180 | grep sh
hw/sh4/sh7750_regs.h:#define SH7750_EVT_ILLEGAL_INSTR 0x180 /* General Illegal Instruction */

What? I mean... WHAT? Really? (That macro is, of course, never used in the rest of the code.) But... how do you INTERMITTENTLY hit an illegal instruction? (What, branch to la-la land? The sigsetjmp() code doesn't branch!)

That email also said "It might just as well be just another qemu bug..." which... Maybe? It _smells_ like unaligned access, but I don't know _how_, and the structure IS aligned. I don't see how it's uninitialized anything since A) the sigsetjmp() function in musl writes into the structure without reading from it, B) adding a memset() beforehand doesn't change anything. If a previous line is corrupting memory... it's presumably not heap, because nothing here touches the heap. The "stack taking a fault to extend itself" theory was invalidated by confirming the failure case does not cross a page boundary. "Processor flags in a weird state so that an instruction traps when it otherwise wouldn't" is possible, but WEIRD. (How? What would put the processor flags in that state?)

Continuing the private email:

> There's also that whole mess with
> https://sourceware.org/bugzilla/show_bug.cgi?id=27543
> which affects {s,g}etcontext in glibc, maybe it applies
> somewhere within musl? (The part about what happens when
> a signal is delivered especially.)

Which is interesting, but musl's sigsetjmp.s doesn't have frchg or fschg instructions.

But what I _could_ try doing is building and testing old qemu versions, to see if that affects anything...


February 16, 2024

Broke down and added "riscv64::" to the mcm-buildall.sh architecture list, which built cross and native toolchains. (Because musl/arch only has riscv64, no 32 bit support.)

To add it to mkroot I need a kernel config and qemu invocation, and comparing qemu-system-riscv64 -M '?' to ls linux/arch/riscv/configs gives us... I don't know what any of these options are. In qemu there's shakti, sifive, spike, and virt boards. (It would be really nice if a "none" board could be populated with memory and devices and processors and such from the command line, but that's not how IBM-maintained QEMU thinks. There are "virt" boards that maybe sort of work like this with a device tree? But not command line options, despite regularly needing to add devices via command line options ANYWAY.) Over on the kernel side I dunno what a k210 is, rv32 has 32 in it with musl only supporting 64, and nommu_virt_defconfig is interesting but would have to be a static PIE toolchain because still no fdpic. (Maybe later, but I could just as easily static pie coldfire.)

(Aside: static pie on nommu means that running "make tests" is unlikely to complete because it launches and exits zillions of child processes, any of which can suddenly fail to run because memory is too fragmented to give a large enough contiguous block of ram. FDPIC both increases sharing (the text and rodata segments can be shared between instances, meaning there's only one of each which persist as toybox processes run and exit), and it splits the 4 main program segments apart so they can independently fit into smaller chunks of memory (the two writeable segments, three if you include stack, are small and can move independently into whatever contiguous chunks of free memory are available). So way less memory thrashing, thus less fragmentation, and way less load in general (since each instance of toybox doesn't have its own copy of the data and rodata segements) thus a more reliable system under shell script type load. This is why I'm largely not bothering with static pie nommu systems: I don't expect them to be able to run the test suite anyway.)

This leaves us with linux's riscv "defconfig", which I built and set running and ran FOREVER and was full of modules and I really wasn't looking forward to stripping that down, so I went "does buildroot have a config for this?" And it does: qemu_riscv64_virt_defconfig with the corresponding qemu invocation from board/qemu/riscv64-virt/readme.txt being "qemu-system-riscv64 -M virt -bios fw_jump.elf -kernel Image -append "rootwait root=/dev/vda ro" -drive file=rootfs.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -netdev user,id=net0 -device virtio-net-device,netdev=net0 -nographic" which... needs a bios image? Really? WHY? You JUST INVENTED THIS ARCHITECTURE, don't make it rely on LEGACY FIRMWARE.

But maybe this is an easier kernel .config to start with (less to strip down anyway), so I tried building it and of course buildroot wants to compile its own toolchain, within which the binutils build went: checking for suffix of object files... configure: error: in `/home/landley/buildroot/buildroot/output/build/glibc-2.38-44-gd37c2b20a4787463d192b32041c3406c2bd91de0/build': configure: error: cannot compute suffix of object files: cannot compile

Right, silly me, it's a random git snapshot that's weeks old now, so I did a "git pull" and ran it again and... exact same failure. Nobody's built 64 bit riscv4 qemu image in buildroot in multiple weeks, or they would have noticed the build failure.

Open source itanic. It's not a healthy smell.

(WHY is it building a random glibc git snapshot? What's wrong with the release versions? Buildroot can PATCH STUFF LOCALLY, overlaying patches on top of release versions was one of the core functions of buildroot back in 2005. Right, ok, back away slowly...)


February 15, 2024

Rich confirmed that he intentionally broke another syscall because he doesn't like it, and wants all his users to change their behavior because it offends him. So I wrapped the syscall.

But the problem with fixing up hwclock to use clock_settime() and only call settimeofday() for the timezone stuff (via the wrapped syscall, yes this is a race condition doing one time update with two syscalls) is now I need to TEST it, and it's one of those "can only be done as root and can leave your host machine in a very unhappy state". The clock jumping around (especially going backwards) makes various systemwide things unhappy, and doing it out from under a running xfce and thunderbird and chromium seem... contraindicated.


February 14, 2024

Emailed Maciej Rozycki to ask about the riscv fdpic effort from 2020 and got back "Sadly the project didn't go beyond the ABI design phase."

Since arm can (uniquely!) do fdpic _with_ mmu, I tried to tweak the sh4 config dependencies in fs/Kconfig.binfmt in the kernel to move superh out of the !MMU group and next to ARM, and the kernel build died with binfmt_elf_fdpic.c:(.text+0x1b44): undefined reference to `elf_fdpic_arch_lay_out_mm'.

Emailed the superh and musl mailing lists with a summary of my attempts to get musl-fdpic working on any target qemu-system can run. (Not including the or1k/coldfire/bamboo attempts that, it turns out, don't support fdpic at all.) Hopefully SOMEBODY knows how to make this work...


February 13, 2024

Emailed linux-kernel about sys_tz not being namespaced, cc-ing two developers from last year's commit making the CLONE_NEWTIME flag actualy work with clone().

I don't expect a reply. As far as I can tell the kernel development community is already undergoing gravitational collapse into a pulsar, which emits periodic kernels but is otherwise a black hole as far as communication goes. Members-only.

The clone flag that didn't work with clone() was introduced back in 2019 and stayed broken for over 3 years. Linux's vaunted "with enough eyeballs all bugs are shallow" thing relied on hobbyists who weren't just focusing on the parts they were paid to work on. You don't get peer review from cubicle drones performing assigned tasks.

I am still trying to hobbyist _adjacent_ to the kernel, and it's like being on the wrong side of gentrification or something. The distain is palpable.


February 12, 2024

So glibc recently broke settimeofday() so if you set time and timezone at the same time it returns -EALLHAILSTALLMAN.

But if you DON'T set them together, your clock has a race window where the time is hours off systemwide. And while "everything is UTC always" is de-facto Linux policy, dual boot systems have to deal with windows keeping system clock in local time unless you set an obscure registry entry which isn't universally honored. Yes this is still the case on current Windows releases.

Digging deeper into it, while a lot of userspace code uses the TZ environment variable these days, grep -rw sys_tz linux/* finds it still used in 36 kernel source files and exported in the vdso. The _only_ assignment to it is the one in kernel/time/time.c from settimeofday(), so you HAVE to use that syscall to set that field which the kernel still uses.

When musl switched settimeofday() to clock_settime() in 2019 it lost the ability to assign to sys_tz at all, which I think means it lost the ability to dual boot with most windows systems?

The other hiccup is sys_tz didn't get containerized when CLONE_NEWTIME was added in 2019 so it is a systemwide global property regardless of namespace. Then again they only made it work in clone rather than unshare last year so that namespace is still cooking.

The real problem is the actual time part of settimeofday() is 32 bit seconds, ala Y2038. That's why musl moved to the 64 bit clock_settime() api. The TZ environment variable assumes the hardware clock is returning utc. The point of sys_tz is to MAKE it return UTC when the hardware clock is set wrong because of windows dual booting.


February 11, 2024

The paper Decision Quicksand: how Trivial Choices Suck Us In misses an important point: when the difference in outcome is large, it's easier to weigh your options. When the difference in outcome is small, it's harder to see/feel what the "right thing" is because the long-term effect of the decision is buried in noise. So more important questions can have a clearer outcome and be easier to decide, less important ones tend to get blown around by opinion. (Hence the old saying, "In academia the fighting is so vicious because the stakes are so small". See also my longstanding observation that open source development relies on empirical tests to establish consensus necessary for forward progress, subjective judgements from maintainers consume political capital.)

The classic starbucks menu decision paralysis is similar (there's no "right choice" but so many options to evaluate) but people usually talk about decision fatigue when they discuss that one (making decisions consumes executive function). These are adjacent and often conflated factors, but nevertheless distinct.


February 10, 2024

Sigh, shifting sands.

So gentoo broke curses. The gnu/dammit loons are making egrep spit pointless warnings and Oliver is not just trying to get me to care, but assuming I already do. Each new glibc release breaks something and this time it's settimeofday(), which broke hwclock.

And I'm cc'd on various interminable threads about shoving rust in the kernel just because once upon a time I wrote documentation about the C infrastructure they're undermining.

I can still build a kernel without bpf, because (like perl) it's not in anything vital to the basic operation of a Linux compute node. If the day comes I can't build a kernel without rust, then I stay on the last version before they broke it until finding a replacement _exactly_ like a package that switched to GPLv3. I have never had a rust advocate tell me a GOOD thing about Rust other than "we have ASAN too", their pitch is entirely "we hate C++ and confuse it with C so how dare you not use our stuff, we're as inevitable as Hillary Clinton was in 2016"; kind of a turn-off to be honest. They don't care what the code does, just that it's in the "right" langauge. This was not the case for go, swift, zig, oberon, or any of the others vying to replace C++. (Which still isn't C, and I'm not convinced there's anything wrong with C.)

All this is a distraction. I'm trying to build towards goals, but I keep having to waste cycles getting back to where I was because somebody broke stuff that previously worked.


February 9, 2024

Finally checked what x86-64 architecture generation my old laptop is, and it's v2. Presumably upgrading from my netbook to this thing got me that far (since the prebuilt binaries in AOSP started faulting "illegal instruction" on my old netbook circa 2018, and this was back when I was trying to convince Elliott the bionic _start code shouldn't abort() before main if stdin wasn't already open so I kinda needed to be able to test the newest stuff...)

Meaning the pointy haired corporate distros like Red Hat and Ubuntu switching to v3 does indeed mean this hardware can't run them. Not really a loss, the important thing is devuan/debian not abandoning v2. (Updating devuan from bronchitis->diptheria presumably buys me a few years of support even if elephantitis were to drop v2 support. I _can_ update to new hardware, just... why?)

Went to catch up on the linux-sh mailing list (superh kernel development) and found that half the "LTP nommu maintainer" thread replies got sorted into that folder due to gmail shenanigans. (Remember how gmail refuses to send me all the copies of email I get cc'd on but also get through a mailing list, and it's potluck which copy I get _first_? Yeah, I missed half of another conversation. Thanks gmail!)

There's several interesting things Greg Ungerer and Geert Uytterhoeven said that I totally would have replied to back on January 23rd... but the conversation's been over a couple weeks now. Still, "you can implement regular fork() no nommu with this one simple trick" is an assertion I've heard made multiple times, but nobody ever seems to have _done_, which smells real fishy.

Arguing with globals.h generation again: sed's y/// is terribly designed because it doesn't support ranges so converting from lower to upper case (which seems like it would be the DEFINITION of "common case") is 56 chars long (y///+26+26), and hold space is terribly designed because "append" inserts an un-asked-for newline and the only way to combine pattern and hold space is via append. With s/// I can go \1 or & in the output, but there's no $SYNTAX to say "and insert hold space here" in what I'm replacing. You'd think there would be, but no. (More than one variable would also be nice, but down that path lies awk. And eventually perl. I can see drawing the line BEFORE there.)

But some of this is REALLY low hanging fruit. I don't blame the 1970s Unix guys who wrote the original PDP-11 unix in 24k total system ram (and clawed their way up to 128k on its successor the 11/45), but this is gnu/sed. They put in lots of extensions! Why didn't they bother to fix OBVIOUS ISSUES LIKE THAT? Honestly!

My first attempt produced 4 lines of output for each USE() block, which worked because C doesn't care, but looks terrible. Here's a variant that glues the line together properly: echo potato | sed -e 'h;y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/;H' -e 'g;s/\n/ /;s/\([^ ]*\) \(.*\)/USE_\2(struct \1_data \1;)/'

Which is mildly ridiculous because all it's using hold space for is somewhere to stash the lower case string because I can't tell y/// to work on PART of the current line: the /regex/{commands} syntax says which entire lines to trigger on, and s/// doesn't have a way to trigger y/// or similar on just the text it's matched and is replacing.

(And while I'm complaining about things sed SHOULD let you do, why can't I match the first or last line WITHIN a range? The 1,$p ranges don't _nest_, so in sed -n '/^config /,${/^ *help/,/^[^ ]/{1d;$d;p}}' toys/*/ls.c | less the 1d;$d is irrelevant because that's "whole file", not "current match range". I want a syntax to say "this range is relative to the current scope" which would be easy enough for me to implement in the sed I wrote, but wouldn't be PORTABLE if I did that. It's like the gnu/dammit devs who added all these extensions never tried to actually USE sed in a non-trivial way...)

But eh, made it work. And it runs on toys/*/*.c in a single sed invocation (and then a second sed on the output of the first to generate the GLOBALS() block from the previous list of structure definitions) and is thus WAY faster than the "one sed call per input file" it was doing before. Fast enough I can just run it every time rather than doing a "find -newer" to see if I need to run it. (And, again, potentially parallelizable with other headers being generated.)

But that just cleaned up generation of the header with the wrong USE() macros, which still build breaks. I need per-file USE() macros, or some such. Back up, design time. (Meaning "restate the problem from first principles and see where telling that story winds up".)

The GLOBALS() block is unique per-file, and shared by all commands using the same file. Previously the name of the block was the name of the file, but sed working on toys/*/*.c doesn't KNOW the name of the current file it's working on (ANOTHER thing the gnu clowns didn't extend!) and thus I'm using the last #define FOR_walrus macro before each GLOBALS() block (once again: sed "hold space", we get ONE VARIABLE to save a string into) as the name of both the structure type name and the name of the instance of that struct in the union. So now instead of being the name of the file, it's the name of the first command in the file, which is fine. As long as it's unique and the various users can agree on it.

Which means the manual "#define TT.filename" overrides I was doing when the "#define FOR_command" didn't match can go away again. (And need to, they're build breaks.) So that's a cleanup from this...

But there's still the problem that the first command in the file can be switched off in menuconfig, but a later command in the same file can be enabled, so we're naming the struct after the first command, but a USE() macro with the name OF that command would be disabled and thus yank the structure out of the union, resulting in a build break.

The REASON I want to yank the structure out of the union is so the union's size is the ENABLED high water mark, not the "everything possible command including the ones in pending" high water mark.

Oh, but I'm generating the file each time now, which means I don't need the USE() macros. Instead I need to generate globals.h based on the toys/*/*.c files that are switched on by the current config, meaning the sed invocation takes $TOYFILES as its input file list instead of the wildcard path. There's an extra file (main.c) in $TOYFILES, but I don't care because it won't have a GLOBALS() block in it. Generating $TOYFILES already parsed .config earlier in make.sh so I don't even have to do anything special, just use data I already prepared.


February 8, 2024

So scripts/make.sh writes generated/globals.h via a pile of sed invocations against toys/*/*.c and alas it can't do just ONE sed invocation but has to loop calling sed against individual files because it needs to know the current input filename, which slows it down tremendously _and_ doesn't parallelize well, but anyway... I just modified it to wrap a USE_FILENAME() macro around each "struct filename_struct filename;" line in union global_union {...} this; at the end of the file, in hopes of shrinking sizeof(this) down to only the largest _enabled_ GLOBALS() block in the current config. (So the continued existence of ip.c in pending doesn't set a permanent high water mark according to scripts/probes/GLOBALS.)

Unfortunately, while the current filename is used to name the structure and the union member, and TT gets defined to TT.filename even with multiple commands in the same file... there's no guarantee a config FILENAME entry actually exists, which means there's no guarantee the USE_FILENAME() macro I'm adding is #defined. This showed up in git.c, and then again in i2ctools.c: lots of commands, none of them with the same name as the file.

Need to circle back and redesign some stuff to make this work...

Ok, second attempt: use the #define FOR_blah macros instead of the filename, which _does_ allow a single sed invocation to work on toys/*/*.c in one go, although I have to do a lot of hold space shenanigans and use y/// with the entire alphabet listed twice instead of "tr a-z A-Z" to do the upper and lower case variants, but I made the header file I wanted to make! Which now doesn't work for a DIFFERENT reason: if the first command in the file isn't enabled, the USE_BLAH() thing removes the TT struct from the union, and the second command in the same file attempting to use the shared structure gets an undefined member error dereferencing TT.

Which... um, yeah. That's what would happen. I need a USE() macro that's X or Y or Z, which I haven't got logic for. I can add a new hidden symbol and do either selects or depends, but I kinda want to SIMPLIFY the kconfig logic instead of complicating it.

Long ago when I was maintaining busybox, I proposed factoring out the Linux kernel's "kconfig" so other packages can use it, about the way "dtc" (the device tree compiler) eventuallly got factored out. This fell apart because I wanted to keep it in the kernel source but make it another thing the kernel build could install, and Roman Zippel or whoever it was wanted to remove it from the kernel and make a new package that was a build dependency of the linux kernel, which was such a horrible idea that NOT EVER RE-USING THIS CODE was better than adding a build dependency to the kernel, so the idea died. (I note that dtc is still in Linux, despite also being an external project. They didn't do the "make install_dtc" route from the linux source, but they didn't add the dependency either. Instead they maintain two projects in parallel forever, which is what the then-kconfig maintainer insisted was impossible. He's also the guy who rejected properly recognizing miniconfig as a thing unless I did major surgery on the kconfig.c files. I waited for him to go away. He did eventually, but I haven't bothered to resubmit. The perfect is the enemy of the good, and if my only option is the Master Race I'm ok siding with extinction. Kinda my approach to Linux development in a nutshell, these days.)

And since factoring out kconfig DIDN'T happen, and I've instead got an ancient snapshot of code under an unfortunate license that has nothing to do with modern linux kconfig (which became TURING COMPLETE and can now rm -rf your filesystem, bravo), I need to discard/rewrite it and want to reproduce as little as possible. The scripts/mkflags.c code was supposed to be the start of that, but that wound up using output digested by sed. And then the scripts/config2help.c code was going to be the start of a kconfig rewrite, but that stalled and started to back itself out again at the design level because a zillion sub-options is a bad thing. (Somebody once contributed the start of one written in awk. I still haven't got an awk.)

I haven't reopened this can of worms recently, but changing the config symbol design requirements is... fraught. What do I want this to DO...


February 7, 2024

Sigh, I needed a second email account and went "my phone demanded a google account to exist for Android, I'll use that one"... and was then waiting for the email to arrive for 2 weeks. Today they texted me about it and I investigated and "auto-sync" is turned off, so of course I'd never get a notification or see a new email in the list: I had to do the "pull down" guesture to load new emails. (I remember this! Same problem came up last time I tried to use this app some years back, when I still had a work gmail account on the phone for the weekly google hangouts calls that became google meet calls when hangouts joined the google graveyard and we were forced to migrate and I needed an updated link from an email...)

I went into the settings to turn auto-sync back on, along the way turning off two new "we're sending all your data to google to train our chatgpt-alike and sell to advertisers by calling it personalization" options it grew and auto-enabled since the last time I was there (because if you never had the chance to say no, it's not a lack of consent?), but turning on auto-sync has a pop-up:

Changes you make to all apps and accounts, not just Gmail, will be synchornized between the web, your other devices, and your phone. [Learn more]

And now I remember why it was turned OFF. (And why I usually create a new gmail account every time I get a new phone, discarding the old history.) You do not get to flush every photo I take of my cat to your cloud service as a condition of checking email. I don't care what the bribe is, that's microsoft-level creepy bundling and monopoly leverage and yes disabling it renders YOUR phone app unusable which is a YOU problem, that's why I wasn't using that email account for anything before now.

This round of gmail being creepy on my phone is seperate from gmail being buggy recently on the account I use on my laptop via pop3 to fetch email sent to my domain. They're not the same account, and the only way google ever has to connect the two is intrusive data harvesting. Of a kind that occasionally makes it confuse me with my father, who saddled me with his name and a "junior" which is why I started getting AARP offers in my 30's. Which admittedly had some pretty good discounts in the brochure, but no, they had me confused with someone else over a thousand miles away.

(Ok, the AARP thing was because when I moved out of Austin as my mother was dying and didn't get a new place there for a year, I had my mail forwarded to my father's place in pennsylvania. And then had it forward from there to the new place in Austin when I moved back. And wound up getting more than half his mail because of similar names and disabled the forwarding fairly quickly (he' just box up and mail me my accumulated junk mail every few weeks), but places like AARP had voraciously "updated" based on scraps of misinformation to TRACK ITS PREY... and wouldn't accept "no". This was years before "FAANG" started doing it, although I dunno why netflix is considered intrusive in that acronym? I keep forgetting I _have_ that, mostly it's Fade watching.)

So yeah, the gmail phone app's useless because they intentionally refused to offer an "automatically notice new email on the server" option that does NOT "constantly send every photo you take and random audio recordings to data harvesting servers even if you never open this email app again".

The reason I needed the second email account is the second room of Fade's apartment up in minneapolis has been empty since mid-pandemic (they were assigning her roommates until then, but her last one moved back in with her family to ride out the pandemic, and it's been empty for well over a year now), and we asked her front office and they made us a very good deal on a 6 month lease through August, when we might be moving anyway depending on where Fade gets a job. (Now that she's graduated, she got piecemeal teaching work for the spring semester but is also job-hunting for something more permanent.) Which is why I'm trying to sell the house and move up there. Fuzzy's moving back in with her father (who's old and in the hospital way too much and could use more looking after anyway, she's been visiting him on weekends already, he lives up in Leander about a five minute drive from the far end of Austin's Light Tactical Rail line), and she's taking the geriatric cat with her.

Fade's made it clear she's never moving back to a state that wants her to literally die of an ectopic pregnancy, so we were going to sell the house at some point anyway, and "timing the market" is another phrase for "reading the future", so now's as good as any. (Last year would have been way better. Next year could be anything.)

The second email account came in because I was the "guarantor" on her lease for the first account, since she was a student and obviously student housing involves a parent or similar co-signing, doesn't it? Except with my email already in the system _that_ way, me actually signing up to get a room there confused their computer deeply, so to apply to RENT there I had to create a new account, which required a new email address... (I can bypass "guarantor" by just paying multiple months in advance.)

I continue to break everything. (And just now trying to e-sign the lease document, I noticed the "download a PDF copy" link was on the first page but hitting the checkbox to accept electronic delivery advanced to the second page, and hitting the back button put me back in the email, and clicking on the link again said it had already been used and was thus expired... Eh, the usual. Fade's handling it.)


February 6, 2024

Alas, devuan doesn't seem to have qemu-deboostrap (anymore?), so trying to reverse engineer it to set up an arm64 VM image, the root filesystem part looks like:

$ dd if=/dev/zero of=arm64-chimaera.img bs=1M count=65536
$ /sbin/mkfs.ext4 arm64-chimaera.img
$ mkdir sub
$ sudo mount arm64-chimaera.img sub
$ sudo debootstrap --arch=arm64 --keyring=/usr/share/keyrings/devuan-archive-keyring.gpg --verbose --foreign chimaera sub
$ sudo umount sub

And then fishing a kernel out of the network installer and booting the result:

$ wget http://debian.csail.mit.edu/debian/dists/bullseye/main/installer-arm64/current/images/netboot/debian-installer/arm64/linux -O arm64-vmlinux
$ qemu-system-aarch64 -M virt -cpu cortex-a57 -m 2048 "$@" -nographic -no-reboot -kernel arm64-vmlinux -append "HOST=aarch64 console=ttyAMA0 root=/dev/sda init=/bin/sh" -drive format=raw,file=arm64-chimaera.img

Which died because the ext4 driver is not statically linked into that kernel image and thus can't mount the root=. In fact the list of drivers it tried was blank, it has NO drivers statically linked in. Which implies you have to insmod from initramfs in order to be able to mount any filesystem from a block device, which is just INSANE. Swapping in the kernel mkroot builds for the aarrcchh6644 target, and using root=/dev/vda instead (because different drivers and device tree), I got a shell prompt and could then run:

# mount -o remount,rw /
# /debootstrap/debootstrap --second-stage
# echo '/dev/vda / ext4 rw,relatime 0 1' > /etc/fstab
# ifconfig lo 127.0.0.1
# ifconfig eth0 10.0.2.15
# route add default gw 10.0.2.2
# apt-get install linux-image-arm64

Which successfully installed packages from the net into the VM, but I'm not sure that last install is actually helpful? It installed a kernel, but didn't install a bootloader. Can qemu boot if I just give it the -hda and not externally supply a -kernel?

$ qemu-system-aarch64 -M virt -cpu cortex-a57 -m 2048 "$@" -no-reboot -drive format=raw,file=arm64-chimaera.img

Nope, looks like it did not. Or doesn't know how to produce any output? It popped up a monitor window but not a display window, and didn't produce serial console output. And fishing that kernel out of the ext4 filesystem and passing it to -kernel in qemu means I'd also need to pass -initrd in as well (still assuming it does not have any static filesystem drivers), and then what is it trying to display to? Where exactly does it think it's getting its device tree from? (If it's statically linked into the kernel then I haven't got one to feed to qemu to try to PROVIDE those devices. And still no way to add console= to point at serial console...)

Eh, stick with the mkroot kernel for now I guess. This should let mcm-buildall.sh build native arm hosted toolchains, both 32 and 64 bit, for next release. It would be way better to use one of the orange pi 3b actual hardware devices I can plug into the router via cat5 and leave on 24/7, that can do the qemu regression testing via cron job and everything. Plus my home fiber's faster than the wifi so stuff physically plugged into the router doesn't even count against the bandwidth we're actually using, it could act as a SERVER if they didn't go to such extreme lengths to make you pay extra for a static IP (four times the base cost of the service, for no reason except "they can").

But I don't trust the Orange Pi's chinese kernel not to have spyware in it (like... 30% chance?) and I haven't sat down to hammer a vanilla kernel into giving me serial output and a shell prompt on the hardware yet. Mostly because I can't power an orange pi from my laptop USB the way I can a turtle board, it wants a 2 amp supply and the laptop wants to give half an amp. I mostly think of working on it when I'm out-with-laptop...


February 5, 2024

I fell behind on email over the weekend (dragged the laptop along but didn't connect it to the net), and gmail errored out a "denied, you must web login!" pop-up during my first pop3 fetch to catch up.

So I went to the website and did a web login, and it went "we need need need NEED to send you an sms, trust us bud honest this will be the only one really please we just GOTTA"... I have never given gmail a phone number, and refuse to confirm or deny its guess.

So I clicked the "get help" option... which also wanted me to login. So I did and it said it needed to verify the account, and this time offered to contact my next-of-kin email (it's 2am, she's asleep).

So I decided to wait (and maybe vent on mastodon a bit, and look up what I need to do in dreamhost to switch my mx record to point at the "you are a paying customer" servers I get with my domain and website rather than the "you are the product" servers... yeah I'd lose the accumulated weekend of email but the main reason I _hadn't_ done it was screwing up and losing access to email for a bit would be annoying and here gmail has DONE IT FOR ME), and messed with some other windows for a bit, then out of habit switched desktops and clicked the "get messages" button in thunderbird...

And it's downloading email again just fine. (And did so for the 5 logins it took to grab a couple hundred messages at a time and clear the backlog: linux-kernel and qemu-devel and so on are high traffic lists and their pop3 implementation has some arbitrary transaction limit.) And it looks like a reasonable weekend's worth of email...? Nothing obviously wrong?

I haz a confused.

I don't _really_ want to move email providers at the same time I'm trying to sell a house and move, but... leaving this alone feels kind of like ignoring termite damage. Some things you descend upon with fire. Gmail is _telling_ me that it's unsafe.

I'm _pretty_ sure this is their out of control data harvesting trying to connect together pieces of their social graph to map every human being to a phone that has a legal name and social security number using it, and can be tracked via GPS coordinates 24/7. If there WAS any actual "security" reason behind it, it obviously didn't WORK. I got access back without ever providing more than the old login. I didn't get WEB access back, but that just means I can't fish stuff out of the spam filter. So... greedy or incompetent?

But why _now_? What triggered it...


February 4, 2024

I have a pending pull request adding port probing to netcat. It adds two flags: -z is a "zero I/O mode" flag where it connects and closes the connection immediately, which isn't really zero I/O because a bunch of TCP/IP packets go through setting up and tearing down the connection so the other side totally notices. Also a separate -v flag that just prints that we've connected successfully, which seems weird because we print a big error message and exit when we DON'T connect successfully, so saying that we did seems redundant.

The patch didn't invent these options, I checked and both are in busybox's "nc_bloaty" which seems to be a full copy of Netcat 1.10, because busybox has multiple different implementations of the same command all over the place in the name of being small and simple. In theory nc_bloaty.c is Hobbit's netcat from the dawn of time which Denys nailed to the side of busybox and painted the project's color in 2007, although maybe it's had stuff added to it since, I haven't checked.

(Sorry, old argument from my busybox days: making cartriges for an Atari 2600 and coin-op machines in a video arcade are different skillsets, and gluing a full sized arcade cabinet to the side of an atari 2600 is NOT the same as adding a cartrige to its available library. As maintainer I strongly preferred fresh implementations to ports because license issues aside, if it already existed and we couldn't do BETTER why bother? Hobbit's netcat is actually pretty clean and slim as external programs you could incorporate go, but Vodz used to swallow some whales.)

Anyway, that's not the part that kept me from merging the netcat patch from the pull request into toybox the first day I saw it. Nor is the fact I have the start of nommu -t support using login_tty() in my tree (another thing I need a nommu test environment for) and have to back it out to apply this.

No, the head scratcher is that the name on the email address of the patch I wget by adding ".patch" to the github URL is "कारतोफ्फेलस्क्रिप्ट™" which Google Translate says is Marathi for "Kartoffelscript" with a trademark symbol. Marathi is the 4th most widely spoken language in India (about 90 million speakers), and Kartoffel is german for Potato.

I mean, it's not some sort of ethnic slur or exploit or something (which is why I checked the Thing I Could Not Read), so... yay? I guess I could apply that as is, I'm just... confused.

And I'm also looking at the OTHER available options in the bigger netcat's --help output and going "hex dump would be lovely". I don't need a "delay interval" because the sender of data can ration it easily enough, and each call to netcat does a single dialout so the caller can detect success/fail and delay in a loop if they're manually scanning a port range for some reason. (Look, nmap exists.) I'm reluctant to add -b "allow broadcasts" because... what's the use case here? I can do that one if somebody explicitly asks for it, which means they bring a use case.


February 3, 2024

Moving is exhausting, and so far I've barely packed up one bookcase.

Follow-up to yesterday's email, my correspondent is still looking into the IP status of older architectures, sending me a quote from a reuters article:

> "In 2017, under financial pressure itself, Imagination Technologies sold the
> MIPS processor business to a California-based investment company, Tallwood
> Venture Capital.[47] Tallwood in turn sold the business to Wave Computing in
> 2018,[48] both of these companies reportedly having their origins with, or
>l ownership links to, a co-founder of Chips and Technologies and S3 Graphics.[49]
> Despite the regulatory obstacles that had forced Imagination to divest itself of
> the MIPS business prior to its own acquisition by Canyon Bridge, bankruptcy
> proceedings for Wave Computing indicated that the company had in 2018 and 2019
> transferred full licensing rights for the MIPS architecture for China, Hong Kong
> and Macau to CIP United, a Shanghai-based company.[50]"

As far as I can tell mips imploded because of the PR backlash from screwing over Lexra.

Mips used to be all over the place: Linksys routers were mips, Playstation 2 was mips, the SGI Irix workstations were mips... Then they turned evil and everybody backed away and switched to powerpc and arm and such.

China didn't back away from mips, maybe due to a stronger caveat emptor culture and maybe due to not caring about lawsuits that couldn't affect them. The Lexra chips that got sued out of existence here were still widely manufactured over there (where US IP law couldn't reach at the time; that's how I got involved, somebody was importing a chinese router and trying to update its kernel to a current version, and it needed an old toolchain that didn't generate the 4 patented instructions). China's Loongson architecture recently added to the linux kernel is a Mips fork dating back to around 2001.

Yes, "homegrown clone". Don't ask, I don't know. See also this and this for the arm equivalent of what china did to mips. Any technology sent to china gets copied and then they claim to have invented it.


February 2, 2024

I get emails. I reply to emails. And then I cut and paste some long replies here:

> Is there an expiration on ARM patents such as the ARM7TDMI and ARM9? With the
> SH-2 being developed in 1992, and expiring in 2015, I am curious if the ARM7
> would be synthesizable.

In theory?

Ten years ago there was a big push to do open hardware arm, and Arm Inc. put its foot down and said they didn't mind clones of anything _before_ the ARMv3 architecture (which was the first modern 32 bit processor and the oldest one Linux ran on) but if you tried to clone ARMv3 or newer they would sue.

That said, the point of patents is to expire. Science does not advance when patents are granted, it advances when they expire. Lots of product introductions simultaneously from multiple vendors, such as iphone and arm launching within 18 months of each other, can be traced back to things like important touchscreen patents expiring.

The problem is, the big boys tend to have clouds of adjacent patents and patent-extension tricks, such as "submarine" patents where they file a patent application and then regularly amend it so it isn't granted promptly but instead remains an application for years, thus preventing its expiration clock from starting since it expires X years after being _granted_, not applied for. (But prior art is from before the _application_ for the patent.) Or the way drug companies patented a bunch of chemicals that were racemic mixtures, and then went back and patented just the active isomer of that chemical, and then sued anybody selling the old raecemic mixtures because it _contains_ the isomer. (Which CAN'T be legal but they can make you spend 7 years in court paying millions annually to _prove_ it. The point of most Fortune 500 litigation isn't to prove you're right, it's to tie the other side up in court for years until you bankrupt them with legal fees, or enough elections go by for regulatory capture to Citizens United up some pet legislators who will replace the people enforcing the law against you.)

Big companies often refuse to say exactly what all their relevant patents ARE. You can search yourself to see what patents they've been granted, but did they have a shell company, or did they acquire another company, so they control a patent their name isn't on? And this is poker: they regularly threaten to sue even when they have nothing to sue with. Bluffing is rampant, and just because they're bluffing doesn't mean they won't file suit if they think you can't afford a protracted defense. (Even if they know they can't win, they can delay your product coming to market for three years and maybe scare away your customers with "legal uncertainty".)

You can use existing hardware that was for sale on known dates, and publications that would invalidate patents that hadn't yet been filed (there was some attempt to bring submarine patents under control over the past couple decades, but it's reformers fighting against unguillotined billionaires with infinitely deep pockets and they have entire think tanks and lawfirms on retainer constantly searching for new loopholes and exploits).

My understanding (after the fact and not hugely informed) was that a big contributor to J-core happening was going to Renesas with old hardware and documentation to confirm "anything implementing this instruction set has to have expired because this came out on this date and either the patent had already been granted or this is prior art invalidating patents granted later", and when Renesas still insisted on license agreements demanding per-chip royalties, refusing to sign and telling them to sue. Which they did not, either because they were bluffing or the cost/benefit analysis said it wasn't worth it. But standing up to threats and being willing to defend against a lawsuit for years if necessary was an important part of the process, because the fat cats never STOP trying to intimidate potential competitors.

The J-core guys could have chosen any processor from that era to do the same thing with: m68k, Alpha, etc. And in fact they initially started trying to use an existing Sparc clone but it didn't do what they needed. The sparc was memory inefficient and power hungry, which led to the research into instruction set density, which led to superh as the sweet spot. In fact superh development started when Motorola's lawyers screwed over Hitachi on m68k licensing, so their engineers designed a replacement. x86 is even more instruction dense due to the variable length instructions, but requires a HUGE amount of circuitry to decode that mess at all efficiently. Starting with the Pentium it has a hardware frontend that converts the x86 instructions into internal RISC instructions and then actually executes those. (That's why RISC didn't unseat x86 like everybody expected it would: they converted their plumbing to RISC internally with a translation layer in front of it for backwards compatibility. The explosion of sparc, alpha, mips, powerpc, and so on all jockeying to replace x86... didn't. They only survived at the far ends of the performance bell curve, the mainstream stayed within the network effect feedback loop of wintel's dominant market share. Until phones.)

Arm Thumb, and thus Cortex-m, was a derivative of superh. To the point it got way cheaper when the superh patents expired and arm didn't have to pay royalties to renesas anymore, which is why that suddenly became cheap and ubiquitous. But from a hardware cloning perspective, keep in mind "thumb" was not present in the original arm processors. Also, things like "arm 7" and "arm 9" are chips, not different instruction set architectures. (Pentium III and Pentium M were both "i686".) The instruction set generations have a 'v" in them: armv1, armv2, armv3, up through armv8.

It goes like this:

Acorn Risc Machines started life as a UK company that won a contract with the BBC to produce the "BBC Micro" back in 1981 alongside an educational television program teaching kids how to compute. Their first machine was based on the MOS 6502 processor, same one in the Commodore 64 and Apple II and Atari 2600: that had 8-bit registers and 16 bit memory addressing, for 64k RAM total. (The story of MOSTEK is its own saga, the 6502 was to CPU design a bit like what Unix was to OS design, it showed people that 90% of what they'd been doing was unnecessary, and everybody went "oh".)

ARMv1 came from acorn's successor machine the Archimedes (released in 1987, circa the Amiga) which used a home-grown CPU that had 32 bit registers (but only 26 bit addressing, 64 megs max memory). ARMv2 added a hardware multipler and a faster interrupt mode (which only saved half the registers), but still 26 bit addressing. Think of ARMv1 and ARMv2 as a bit like the 286 processor in intel-land: a transitional attempt that wound up as a learning experience, and fixing what was wrong with them means backwards compatibility doesn't go back that far.

The oldest one Linux runs on is ARMv3, which did a proper flat 32 bit address space, and is generally considered the first modern ARM architecture. ARMv4 introduced a bunch of speedups, and also a way of announcing instruction set extensions (like different FPUs and such) so you could probe at runtime what was available. These extensions were indicated by adding a letter to the architecture. The most important extension was the "thumb" instruction set, ARMv4T. (But there was also some horrible java accelerator, and so on.) ARMv5 had various optimizations and integrated thumb so it wasn't an extension anymore but always guaranteed to be there: recompiling for ARMv5 speeds code up about 25% vs running ARMv4 code on the same processor, I don't remember why. ARMv6 added SMP support which is mostly irrelevant outside the kernel so you generally don't see compilers targeting it because why would they? And then ARMv7 was the last modern 32 bit one, another big speedup to target it with a compiler, but otherwise backwards compatible ala i486/i586/i686. All this stuff could still run ARMv4T code if you tried, it was just slower (meaning less power efficient when running from battery, doing the "race to quiescence" thing).

Along the way Linux switched its ARM Application Binary Interface to incorporate Thumb 1 instructions in function call and system call plumbing, the old one retroactively became known as "OABI" and the new (extended) one is "EABI", for a definition of "new" that was a couple decades ago now and is basically ubiquitious. Support for OABI bit-rotted over the years similarly to a.out vs ELF binaries, so these days ARMv4T is pretty much the oldest version Linux can run without serious effort. (For example, musl-libc doesn't support OABI, just EABI.) In THEORY a properly configured Linux kernel and userspace could still run on ARMv3 or ARMv4 without the T, but when's the last time anybody regression tested it? But if ARMv3 was your clone target, digging that stuff up might make sense. Easier to skip ahead to ARMv4T, but A) lots more circuitry (a whole second instruction set to implemment), B) probably more legal resistence from whoever owns ARM Inc. this week.

And then ARMv8 added 64 bit support, and kept pretending it's unrelated to historical arm (stuttering out aarrcchh6644 as a name with NO ARM IN IT), although it still had 32 bit mode and apparently even a couple new improvements in said 32 bit mode so you can compile a 32 bit program for "ARMv8" if you try and it won't run on ARMv7. Dunno why you WOULD though, it's a little like x32 on intel: doesn't come up much, people mostly just build 64 bit programs for a processor that can't NOT support them. Mostly this is a gotcha that when you tell gcc you want armv8-unknown-linux instead of aarrcchh6644-talklikeapirateday-linux you get a useless 32 bit toolchain instead of what you expected. Sadly linux accepts "arm64" but somehow the "gnu gnu gnu all hail stallman c compiler that pretends that one of the c's retroactively stands for collection even though pcc was the portable c compiler and icc was the intel c compiler and tcc was the tiny c compiler" does not. You have to say aarrcchh6644 in the autoconf tuple or it doesn't understand.

So what's Thumb: it's a whole second instruction set, with a mode bit in the processor's control register saying which kind it's executing at the moment. Conventional ARM instructions are 32 bits long, but thumb instructions are 16 bits (just like superh). This means you can fit twice as many instructions in the same amount of memory, and thus twice as many instructions in each L1 cache line, so instructions go across the memory bus twice as fast... The processor has a mode bit to switch between executing thumb or conventional ARM instructions, a bit like Intel processors jumping between 8086 vs 80386 mode, or 32 vs 64 bit in the newer ones.

Note that both Thumb and ARM instruction modes use 32 bit registers and 32 bit addresses, this just how many bits long is each _instruction_. The three sizes are unrelated: modern Java Virtual Machines have 8 bit instructions, 32 bit registers, and 64 bit memory addresses. Although you need an object lookup table to implement a memory size bigger than the register size, taking advantage of the fact a reference doesn't HAVE to be a pointer, it can be an index into an array of pointers and thus "4 billion objects living in 16 exabytes of address space". In hardware this is less popular: the last CPU that tried to do hardware-level object orientation was the Intel i432 (which was killed by the 286 outperforming it, and was basically the FIRST time Intel pulled an Itanium development cycle). And gluing two registers together to access memory went out with Intel's segment-offset addressing in the 8086 and 286, although accessing memory with HI/LO register pairs was also the trick the 6502 used years earlier (8 bit instructions, 8 bit registers, 16 bit addresses). These days everybody just uses a "flat" memory model for everything (SO much easier to program) which means memory size is capped by register size. But 64 bit registers can address 18 exabytes, and since an exabyte is a triangular rubber coin million terabytes and the S-curve of Moore's Law has been bending down for several years now ("exponential growth" is ALWAYS an S-curve, you run out of customers or atoms eventually), this is unlikely to become a limiting factor any time soon.

The first thumb instruction set (Thumb 1) was userspace-only, and didn't let you do a bunch of kernel stuff, so you couldn't write an OS _only_ in Thumb instructions, you still needed conventional ARM instructions to do setup and various administrative tasks. Thumb 2 finally let you compile a Linux kernel entirely in Thumb instructions. Thumb2 is what let processors like the Cortex-M discard backwards compatibility with the original 32-bit ARM instruction set. It's a tiny cheap processor that consumes very little power, and the trick is it's STUCK in thumb mode and can't understand the old 32 bit instruction set, so doesn't need that circuitry. Along the way, they also cut out the MMU, and I dunno how much of that was "this instruction set doesn't have TLB manipulation instructions and memory mapping it felt icky" or "as long as we were cutting out lots of circuitry to make a tiny low-power chip, this was the next biggest thing we could yank to get the transistor count down". Didn't really ask.

Thumb 2 was introduced in 2003. I don't know what actual patentable advances were in there given arm existed and they were licensing superh to add this to it, but I assume they came up with some kind of fig leaf. (People keep trying to patent breathing, it's a question what the overworked clerks in the patent office approve, and then what the insane and evil magic court that ONLY hears IP law cases on behalf of rich bastards gets overruled on as they perpetually overreach.) But it still came out 20 years ago: patents are going to start expiring soon.

The ARM chip design company the original Acorn RISC guys spun out decades ago was proudly british for many years... until the Tories took over and started selling the government, and then they did Brexit to avoid the EU's new financial reporting requirements (which were going to force billionaires doing money laundering through the City of London and the Isle of Man to list what all their bank accounts and how much money was in each, Switzerland having already caved some years earlier so "swiss bank account" no longer meant you could launder stolen nazi gold for generations)... and the result was Worzel Gummidge Alexander "Boris" de Pfeffel Johnson (Really! That's his name! Look it up!) sold ARM to Softbank, a Japanese company run by a billionaire who seemed absolutely BRILLIANT until he decided Cryptocoins were the future and funded WeWork. Oh, and apparently he also took $60 billion from Mister Bone Saw, or something?

So how much money ARM has to sue people these days, or who's gonna own the IP in five years, I dunno.


February 1, 2024

Happy birthday to me...

Closing tabs, I have a bunch open from my earlier trudge down nommu-in-qemu lane, which started by assuming or1k would be a nommu target, then trying to get bamboo to work, then coldfire...

A tab I had open was the miniconfig for the coldfire kernel that ran in qemu, and that's like half the work of adding it to mkroot... except that was built by the buildroot uclibc toolchain. So I'm trying to reproduce the buildroot coldfire toolchain with musl instead of uclibc, but there IS no tuple that provides the combination of things it wants in the order it wants them, and patching it is being stroppy. Alas gcc is as far from generic as it gets. This config plumbing is a collection of special cases with zero generic anything, and it's explicitly checking for "uclinux" in places and "-linux-musl" in others, and that leading dash means "-uclinux-musl" doesn't match, but "-linux-musl-uclinux" doesn't put data in the right variables (because some bits of the config thinks there are 4 slots with dedicated roles) plus some things have * on the start or the end and other things don't, so sometimes you can agglutinate multiple things into a single field and other times you can't, and it is NOT SYSTEMATIC.

This isn't even fdpic yet! This is just trying to get the config to do what the other thing was doing with musl instead of uclibc. I can probably whack-a-mole my way down it, but if the patch is never going upstream... (Sigh. I should poke coreutils about cut -DF again.)

Now that Fade's graduated, we've decided to pull the trigger on selling the house. Fade's already done paperwork for me to move into the other room at her apartment for the next 6 months, and they start charging us rent on the extra room on the 15th I think? But if I fly back up there with an actual place to live, I don't really want to fly back here, and this place is EXPENSIVE. (I bought it thinking "room to raise kids", but that never happened.) So packing it out and getting it on the market... I should do that.

Fuzzy took the news better than I expected, although her father's been sick for a while now and moving back in to take care of him makes sense. She's keeping the 20 year old cat.

I bought 4 boxes at the U-haul place across I-35 and filled them with books. It didn't even empty one bookshelf. Um. Moving from the condo at 24th and Leon to here was moving into a BIGGER place, so we didn't have to cull stuff. And that was 11 years ago. Before that Fade and I moved a U-haul full of stuff up to Pittsburgh circa 2006... and then moved it all back again a year and change later. The third bedroom is basically box storage, we emptied our storage space out into that to stop paying for storage, and still haven't unpacked most of it. Reluctant to drag it up to Minneapolis (and from there on to wherever Fade gets a job with health insurance, it's the exchange until then). But I don't have the energy to sort through it either. I have many books I haven't read in years. (Yes I am aware of E-books. I'm also aware you don't really _own_ those, just rent them at a billionaire's whim.)

I'm reminded that packing out the efficiency apartment I had for a year in Milwaukee took multiple days (and that was on a deadline), and I'd gone out of my way to accumulate stuff while I was there because it was always temporary. And lugging it all to Fade's I pulled a muscle carrying the "sleeping bag repurposed as a carry sack" I'd shoved all the extra stuff that wouldn't fit into the suitcases into, while switching from a bus to minneapolis's Light Tactical Rail. This time Fade wants to do the "storage pod, which can be somewhat automatically moved for you" thing.


January 31, 2024

Parallelizing the make.sh header file generation is a bit awkward: it's trivial to launch most of the header generation in parallel (even all the library probes can happen in parallel, order doesn't matter and >> is O_APPEND meaning atomic writes won't interleave) and just stick in a "wait" at the two places that care about synchronization (creating build.sh wants to consume the output of optlibs.dat, and creating flags.h wants to consume config.h and newtoys.h).

The awkward part is A) reliable error detection if any of the background tasks fail ("wait" doesn't collect error return codes, creating a "generated/failed" file could fail due to inode exhaustion, DELETING a generated/success file could have a subprocess fail to launch due to PID exhaustion or get whacked by the OOM killer... I guess annotate the end of each file with a // SUCCESS line and grep | wc maybe?), B) ratelimiting so trying to run it in on a wind-up-toy pi-alike board or a tiny VM doesn't launch too many parallel processes. I have a ratelimit bash function but explicitly calling it between each background & process is kinda awkward? (And it doesn't exit, it returns error, so each call would need to perform error checking.) It would be nice if there was a proper shell syntax for this, but "function that calls its command line" is a quoting nightmare when pipelines are involved. (There's a reason "time" is a builtin.) I suppose I could encapsulate each background header generation in its own shell function? But just having them inline with & at the end is otherwise a lot more readable. (I'm actually trying to REDUCE shell functions in this pass, and do the work inline so it reads as a simple/normal shell script instead of a choose-your-own-adventure book.)

While I'm going through it, the compflags() function in make.sh is its own brand of awkward. That function spits out nine lines of shell script at the start of build.sh, and although running generated/build.sh directly is pretty rare (it's more or less a comment, "if you don't like my build script, this is how you compile it in the current configuration"), it's also used for dependency checking to see if the toolchain or config file changed since last build. When we rerun make.sh, it checks lines that 5-8 of a fresh compflags() match the existing build.sh file, and if not deletes the whole "generated" directory to force a rebuild because you did something like change what CROSS_COMPILE points to. That way I don't have to remember to "make clean" between musl, bionic, and glibc builds, or when switcing between building standalone vs multiplexer commands (which have different common plumbing not detected by $TOYFILES collection). The KCONFIG_CONFIG value changes on line 8 when you do that: it's a comment, but not a CONSTANT comment.

The awkward part is needing to compare lines 5-8 of 9, which involves sed. That magic line range is just ugly. Lines 1 is #!/bin/bash and lines 2 and 9 are blank, so comparing them too isn't actually a problem, but lines 3 and 4 are variable assignments that CAN change, without requiring a rebuild. Line 3 is VERSION= which contains the git hash when you're building between releases, if we don't exclude that doing a pull or checkin would trigger a full rebuild. And line 4 is LIBRARIES= which is probed from the toolchain AFTER this dependency check, and thus A) should only change when the toolchain does, B) used to always be blank when we were checking if it had changed, thus triggering spurious rebuilds. (I switched it to write the list to a file, generated/optlibs.dat, and then fetch it from that file here, so we CAN let it through now. The comparison's meaningless, but not harmful: does the old data match the old data.)

Unfortunately, I can't reorganize to put those two at the end, because the BUILD= line includes "$VERSION" and LINK= includes "$LIBRARIES", so when written out as a shell script (or evaluated with 'eval') the assignments have to happen in that order.

Sigh, I guess I could just "grep -v ^VERSION=" both when comparing it? The OTHER problem is that later in the build it appends a "\$BUILD lib/*.c $TOYFILES \$LINK -o $OUTNAME" line to the end, which isn't going to match between runs either. Hmmm... I suppose if TOYFILES= and OUTNAME= were also variable assignments, then that last line could become another constant and we could have egrep -v filter out "^(VERSION|LIBRARIES|TOYFLIES|OUTNAME)=" which is uncomfortably complicated but at least not MAGIC the way the line range was...

(The reason main.c lives in TOYFILES instead of being explicit on the last line is to avoid repetition. The for loop would also have to list main.c, and single point of truth... No, I'm not happy with it. Very minor rough edge, but it's not exactly elegant either...)


January 30, 2024

What does make.sh do... First some setup:

  • declares some functions
  • does a (safe) rm -rf generated/ if compiler options changed
  • check if build.sh options changed
    • function compflags, just check lines 5-8: $BUILD $LINK $PATH $KCONFIG_CONFIG
    • delete the whole "generated" dir if they don't match, forcing full rebuild
  • set $TOYFILES (grep toys/*/*.c for {OLD|NEW}TOY()s enabled in .config)
  • warns if "pending" is in there (in red)

And then header generation:

  • write optlibs.dat (shared library probe)
  • write build.sh (standalone build script, to reproduce this binary on targets that have a compiler but not much else, like make or proper sed)
  • Call genconfig.sh which writes Config.probed, Config.in, and .singlemake (that last one at the top level instead of in generated, because "make clean" can't delete it or you wouldn't be able to "make clean; make sed".
  • Check if we should really run "make oldconfig" and warn if so.
  • write newtoys.h (sed toys/*/*.c)
  • write config.h (sed .config)
  • write flags.h (compile mkflags.c, sed config.h and run newtoys.h through gcc -E, pipe both into mkflags)
  • write globals.h (sed toys/*/*.c)
  • write tags.h (sed toys/*/*.c)
  • write help.h (compile config2help.c, reads .config and Config.in which includes dependencies ala generated/Config.)
  • write zhelp.h (compile install.c and run its --help through gzip | od | sed)

And that's the end of header generation, and it's on to compiling stuff (which is already parallelized).

It's awkward how scripts/genconfig.sh is a separate file, but "make menuconfig" needs those files because they're imported by Config.in at the top level, so that has to be able to build those files before running configure. Possibly I should split _all_ the header generation out into mkheaders.sh (replacing genconfig.sh), and just have it not do the .config stuff if .config doesn't exist? (And then make.sh could check for the file early on and go "run defconfig" and exit if it's not there...)

Having .singlemake at the top level is uncomfortably magic (running "make defconfig" changes the available make targets!) but getting the makefile wrapper to provide the semantics I want is AWKWARD, and if it's in generated/ then "make clean" forgets how to do "make sed".

The reason the above warning about calling "make oldconfig" doesn't just call it itself is that would be a layering violation: scripts/*.c CANNOT call out to kconfig because of licensing. The .config file output by kconfig is read-only consumed by the rest of the build, meaning the kconfig subdirectory does not actually need to _exist_ when running "make toybox". Kconfig is there as a convenience: not only is no code from there included in our build, but no code from there is RUN after the configuration stage (and then only to produce the one text file). You COULD create a .config file by hand (and android basically does). Blame the SFLC for making "the GPL" toxic lawsuit fodder that needs to be handled at a distance with tongs. (I _asked_ them to stop in 2008. Eben stopped, Bradley refused to.)

Of the three scripts/*.c files built and run by the build, the only one I'm _comfortable_ with is install.c I.E. instlist, which spits out the list of commands and I recently extended to spit out the --help text so I could make a compressed version of it. It's basically a stub version of main.c that only performs those two toybox multiplexer tasks, so I don't have to build a native toybox binary and run it (which gets into the problem of different library includes or available system calls between host and target libc when cross compiling, plus rebuilding *.c twice for no good reason). This is a ~60 line C file that #includes generated/help.h and generated/newtoys.h to populate toy_list[] and help_data[], and then writes the results to stdout.

The whole mkflags.c mess is still uncomfortably magic, I should take a stab at rewriting it, especially if I can use (CONFIG_BLAH|FORCED_FLAG)<<shift to zero them out so the flags don't vary by config. I still need something to generate the #define OPTSTR_command strings, because my original approach of having USE() macros drop out made the flag values change, and I switched to annotating the entries so they get skipped but still count for the flag value numbering. Maybe some sort of macro that inserts \001 and \002 around string segments, and change lib/args.c to increment/decrement a skip counter? I don't really want to have a whole parallel ecology of HLP_sed("a:b:c") or similar in config.h, but can't think of a better way at the moment. (Yes it makes the strings slightly bigger, but maybe not enough to care? Hmmm... Actually, I could probably do something pretty close to the _current_ processing with sed...)

The config2help.c thing is a nightmare I've mentioned here before, and has an outstanding bug report about it occasionally going "boing", and I'd very much like to just rip that all out and replace it with sed, but there's design work leading to cleanup before I can do real design work here. (Dealing with the rest of the user-visible configurable command sub-options, for one thing. And regularizing the -Z support and similar so it's all happening with the same mechanism, and working out what properly splicing together the help text should look like...)


January 29, 2024

It's kind of amusing when spammers have their heads SO far up their asses that their pitch email is full of spammer jargon. The email subject "Get High DA/DR and TRAFFIC in 25-30 Days (New Year Discount!" made it through gmail's insane spam filter (despite half of linux-kernel traffic apparently NOT making it through and needing to be fished out), but the target audience seems to be other SEO firms. (No, it didn't have an ending parentheses.)

Wrestling with grep -w '' and friends, namely:

$ for i in '' '^' '$' '^$'; do echo pat="$i"; \
  echo -e '\na\n \na \n a\na a\na  a' | grep -nw "$i"; done
pat=
1:
3: 
4:a 
5: a
7:a  a
pat=^
1:
3: 
5: a
pat=$
1:
3: 
4:a 
pat=^$
1:

The initial bug report was that --color didn't work right, which was easy enough to diagnose, but FIXING it uncovered that I was never handling -w properly, and needed more tests. (Which the above rolls up into one big test.)

As usual, getting the test right was the hard part. Rewriting the code to pass the tests was merely annoying.


January 28, 2024

Managed to flush half a dozen pending tabs into actual commits I could push to the repo. Mostly a low-hanging-fruit purge of open terminal tabs, I have SO MANY MORE half-finished things I need to close down.

Heard back from Greg Ungerer confirming that m68k fdpic support went into the kernel but NOT into any toolchain. I'm somewhat unclear on what that MEANS, did they select which register each segment should associate with, or not? (Did that selection already have to be made for binflt and it just maps over? I'm unclear what the elf2flt strap-on package actually DOES to the toolchain, so I don't know where the register definitions would live. I was thinking I could read Rich's sh2 patches out of musl-cross-make but they vary WIDELY by version, and some of this seems to have gone upstream already? For a definition of "already" that was initially implemented 7 or 8 years ago now. It LOOKED like this was one patch to gcc and one to binutils in recent versions, but those mostly seem to be changing config plumbing, and grepping the ".orig" directory for gcc is finding what CLAIMS to be fdpic support for superh in the base version before the patches are applied? So... when did this go upstream, and at what granularity, and what would be LEFT to add support for a new architecture?)

People are trying to convince me that arm fdpic support was a heavy lift with lots of patches, but looking back on the superh fdpic support it doesn't seem THAT big a deal? Possibly the difference was "already supported binflt", except the hugely awkward bag on the end postprocessor (called elf2flt, it takes an ELF file and makes a FLT file from it) argues against that? But that doesn't mean they didn't hack up the toolchain extensively (pushing patches upstream even!) and THEN "hit the output with sed" as it were. You can have the worst of both worlds, it's the gnu/way.

I got a binflt toolchain working in aboriginal way back when. Maybe I should go back and look at what elf2flt actually DID, and how building the toolchain that used it was configured. (I honestly don't remember, it's been most of a decade and there was "I swore I'd never follow another startup down into bankruptcy but here we are" followed by the Rump administration followed by a pandemic. I remember THAT I did it, but the details are all a bit of a blur...)

But now is not the best time to open a new can of worms. (I mean there's seldom a GOOD time, but... lemme close more tabs.)


January 27, 2024

Sigh. I'm frustrated at the continuing deterioration of the linux-kernel development community. As they collapse they've been jettisoning stuff they no longer have the bandwidth or expertise to maintain, and 5 years back they purged a bunch of architectures.

Meanwhile, I'm trying to get a nommu fdpic test environment set up under qemu, and checking gcc 11.2.0 (the latest version musl-cross-make supports) for fdpic support, grep -irl fdpic gcc/config has hits in bfin, sh, arm, and frv. I'm familiar with sh, and bits of arm were missing last I checked (although maybe I can hack my way past it?) But the other two targets, blackfin and frv, were purged by linux-kernel.

I.E. the increasingly insular and geriatric kernel development community discarded half the architectures with actual gcc support for fdpic. Most of the architectures you CAN still select fdpic for don't seem to have (or to have ever had) a toolchain capable of producing it. That CAN'T be right...

Cloned git://gcc.gnu.org/git/gcc.git to see if any more fdpic targets spawned upstream: nope. Still only four targets supporting fdpic, two of which linux-kernel threw overboard to lighten the load as the hindenberg descends gently into Greg's receivership. As the man who fell off a tall building said on his way down, "doing fine so far"...

Yes I still think driving hobbyists away from the platform was a bad move, but as with most corporate shenanigans where you can zero out the R&D budget and not notice for YEARS that your new product pipeline has nothing in it... the delay between cause and effect is long enough for plausible deniability. It "just happened", not as a result of anything anyone DID.

And which is worse: Carly Fiorina turning HP into one of those geriatric rock bands that keeps touring playing nothing but 40 year old "greatest hits" without a single new song (but ALL THE MONEY IN THE WORLD for lawyers to sue everybody as "dying business models explode into a cloud of IP litigation" once again)... or Red Hat spreading systemd? Zero new ideas, or TERRIBLE ideas force-fed to the industry by firms too big to fail?

Caught up on some blog editing, but haven't uploaded it yet. (Japanese has a tendency to omit saying "I", which is has been a tendency in my own writing forever. "I" am not an interesting part of the sentence. That said, it technically counts as a bad habit in english, I think?) I made a mess of december trying to retcon some entries (I'd skipped days and then had too many topics for the day I did them and wanted to backfill _after_ I'd uploaded, which probably isn't kind to the rss feed), and I only recently untangled that and uploaded it, and I'm giving it a few days before replacing it with the first couple weeks of January.

My RSS feed generator parses the input html file (capping the output at something like the last 30 entries, so the rss file isn't ridiculously huge in the second half of the year), but that makes switching years awkward unless I cut and paste the last few entries from december after the first few entries of January. Which I've done for previous years, and then at least once forgotten to remove (which I noticed back when Google still worked by searching for a blog entry I knew I'd made and it found it in the wrong year's fine). Trying to avoid that this year, but that means giving the end of december a few days to soak.


January 26, 2024

Hmmm... can I assume toybox (I.E. the multiplexer) is available in the $PATH of the test suite? Darn it, no I can't, not for single command tests. Makes it fiddly to fix up the water closet command's test suite...

So Elliott sent me a mega-patch of help text updates, mostly updating usage: lines that missed options that were in the command's long one-per-line list, tweaking option lists that weren't sorted right, and a couple minor cleanups like some missing FLAG() macro conversions that were still doing the explicit if (toys.optflags & FLAG_walrus) format without a good excuse. And since my tree is HUGELY DIRTY, it conflicted with well over a dozen files so applying it was darn awkward... and today he gave me a "ping" because I'd sat on it way too long (I think I said a week in the faq?) at which point my documented procedure is I back my changes out, apply his patch, and port my changes on top of it because I've already had PLENTY OF TIME to deal with it already.

And of course trying to put my changes back on top of his was fail-to-apply city (the reason I couldn't just easily apply it in the first place), so I went through and reapplied my changes by hand, some of which are JUST conflicting documentation changes (like patch.c) and others are fairly low hanging fruit I should just finish up.

Which gets us to wc, the water closet word count command, where I was adding wc -L because somebody asked for it and it apparently showed up in Debian sometime when I wasn't looking. (It's even in the ancient version I still haven't upgraded my laptop off of.) It shows maximum line length, which... fine. Ok. Easy enough to add. And then which order do the fields show up in (defaults haven't changed and the new fifth column went in at the end, which was the sane way to do it), so I add tests, and...

The problem is TEST_HOST make test_wc doesn't pass anymore, which is not related to THIS change. The first failure is a whitespace variation, which already had a comment about in the source and I can just hit it with NOSPACE=1 before that test (not fixing it to match, one tab between each works fine for me, I do not care here; poke me if posix ever notices and actually specifies any of this).

But the NEXT problem is that the test suite sets LC_ALL=c for consistent behavior (preventing case insensitive "sort" output and so on), and we're testing utf-8 support (wc -m) which works FINE in the toybox version regardless of environment variables, but the gnu/dammit version refuses to understand UTF-8 unless environment variables point to a UTF-8 language locale. (Which makes as much sense as being able to set an environment vbariable to get the gnu stuff to output ebcdic, THIS SHIP HAS SAILED. And yet, they have random gratuitous dependencies without which they refuse to work.)

On my Debian Stale host, the environment variables are set to "en_us.UTF-8", so the test works if run there, but doesn't work in the test suite where it's consistently overridden to LC_ALL=c. (In a test suite it's more important to be CONSISTENT than to be RIGHT.)

I could of course set it to something else in a specific test, but nothing guarantees that this is running on a system with the "en_us" locale installed. And fixing this is HORRIFIC: in toybox's main.c we call setlocale(LC_CTYPE, "") which reads the environment variables and loads whatever locale they point to (oddly enough this is not the default libc behavior, you have to explicitly REQUEST it), and then we check that locale to see if it has utf8 support by calling nlcodeinfo(CODESET) which is laughable namespace pollution but FINE, and if that doesn't return the string "UTF-8" (case sensitive with a dash because locale nonsense), then we try loading C.UTF-8 and if that doesn't work en_us.UTF-8 because MacOS only has that last one. (So if you start out with a french utf8 locale we keep it, if not we try "generic but with UTF-8", which doesn't work on mac because they're just RECENTLY added mknodat() from posix-2008. As in it was added in MacOS 13 which came out October 2022. FOURTEEN YEARS later. Yes really. Steve Jobs is still dead.)

So ANYWAY, I have painfully hard-fought code in main.c that SHOULD deal with this nonsense, but what do I set it to in a shell script? There is a "locale" command which is incomprehensible:

$ locale --help | head -n 3
Usage: locale [OPTION...] NAME
  or:  locale [OPTION...] [-a|-m]
Get locale-specific information.
$ locale -a
C
C.UTF-8
en_US.utf8
POSIX
$ locale C.UTF-8
locale: unknown name "C.UTF-8"
$ locale en_US.utf8
locale: unknown name "en_US.utf8"

Bravo. (What does that NAME argument _mean_ exactly?) So querying "do you have this locale installed" and "what does this locale do" is... less obvious than I'd like.

I was thinking maybe "toybox --locale" could spit out what UTF-8 aware locale it's actually using, but A) can't depend on it being there, B) ew, C) if it performed surgery on the current locale to ADD UTF-8 support with LC_CTYPE_MASK there's no "set the environment variable to this" output for that anyway.

Sigh. I could try to come up with a shell function that barfs if it can't get utf8 awareness, but... how do I test for utf8 awareness? Dig, dig, dig...

Dig dig dig...

Sigh, what a truly terrible man page and USELESS command --help output. Dig dig dig...

Ah: "locale charmap". for i in $(locale -a); do LC_ALL=$i locale charmap; done

What was the question again?


January 25, 2024

Running toybox file on the bamboo board's filesystem produced a false positive. It _said_ it had ELF FDPIC binaries, but the kernel config didn't have the fdpic loader enabled. And the dependencies for BINFMT_ELF_FDPIC in the kernel are depends on ARM || ((M68K || RISCV || SUPERH || XTENSA) && !MMU) so I only have 5 targets to try to get an fdpic nommu qemu system working on. (And need to read through the elf FDPIC loader to figure out how THAT is identifying an fdpic binary, it seems architecture dependent...)

I haven't poked at arm because musl-cross-make can't build a particularly new toolchain and hasn't been updated in years, but maybe the toolchain support went in before the kernel support did? I should come back to that one...

SuperH I'm already doing but only on real hardware (the j-core turtle board), and qemu-system-sh4 having "4" in the name is a hint WHY sh2 support hasn't gone in there yet. (Since qemu-sh4 application emulation can run it might be possible to build a kernel with the fdpic loader if I hack the above dependency to put superh next to ARM and outside of the !MMU list? Dunno what's involved but presumably arm did _some_ of that work already.)

M68K is coldfire, I ran buildroot's qemu_m68k_mcf5208_defconfig to get one of those which booted, but all the binaries are binflt. I grepped the patched gcc that mcm built to see how its configure enables fdpic support, but the patches vary greatly by version. Hmmm...


January 24, 2024

Sigh, I really need to add a "--shoehorn=0xa0000000,128m" option to qemu to tell it to just forcibly add DRAM to empty parts of a board's physical address range, and a kernel command line option for linux to use them...

My first attempt at fixing grep -w '' didn't work because it's not just "empty line goes through, non-empty line does not"... Turns out "a  a" with two spaces goes through also. Which means A) the '$' and '^' patterns, by themselves in combination with -w, suddenly become more interesting, B) my plumbing to handle this is in the wrong place, C) 'a*' in the regex codepath has to trigger on the same inputs as empty string because asterisk is ZERO or more so this extension to the -w detection logic still needs to be called from both the fixed and regex paths without too much code duplication, but how do I pass in all the necessary info to a shared function...

Marvin the Martain's "Devise, devise" is a good mantra for design work.


January 23, 2024

I want a qemu nommu target so I can regression test toybox on nommu without pulling out hardare and sneakernetting files onto it, and or1k's kernel config didn't have the FDPIC loader in it so I'm pretty sure that had an mmu.

Greg Ungerer said he tests ELF-fdpic on arm, and regression tests elf PIE nommu on arm, m68k, riscv, and xtensa. Which isn't really that helpful: I still don't care about riscv, arm requires a musl-cross-make update to get a new enough compiler for fdpic support, and xtensa is a longstanding musl-libc fork that's based off a very old version. (I could try forward porting it, but let's get back to that one...)

The three prominent nommu targets I recall from forever ago (other than j-core, which never got a qemu board) are m68k (I.E. coldfire), powerpc (where bamboo and e500 were two nommu forks from different vendors, each of which picked a slightly different subset of the instruction set), and of course arm (cortex-m, see toolchain upgrade needed above).

Buildroot's configs/ directory has "qemu_ppc_bamboo_defconfig" and board/qemu/ppc-bamboo/readme.txt says "qemu-system-ppc -nographic -M bamboo -kernel output/images/vmlinux -net nic,model=virtio-net-pci -net user" is how you launch it. Last time I tried it the build broke, but let's try again with a fresh pull...

Hey, and it built! And it boots under qemu! And hasn't got "file" or "readelf" so it's not immediately obvious it's fdpic (I mean, it's bamboo, I think it _has_ to be, but I'd like to confirm it's not binflt). And qemu doesn't exit (halt does the "it its now safe to turn off" thing, but eh, kill it from another window). And from the host I can "toybox file toybox file output/target/bin/busybox" which says it's fdpic.

Ok, the kernel build (with .config) is in output/build/linux-6.1.44 and... once again modern kernel configs are full of probed gcc values so if I run my miniconfig.sh without specifying CROSS_COMPILE (in addition to ARCH=powerpc) the blank line removal heuristic fails and it has to dig through thousands of lines of extra nonsense, let's see... it's in output/host/bin/powerpc-buildroot-linux-gnu- (and of COURSE it built a uclibc-necromancy toolchain, not musl) so... 245 lines after the script did its thing, and egrep -v "^CONFIG_($(grep -o 'BINFMT_ELF,[^ ]*' ~/toybox/mkroot/mkroot.sh | sed 's/,/|/g'))=y" mini.config says 229 lines aren't in the mkroot base config, with the usual noise (LOCALVERSION_AUTO and SYSVIPC and POSIX_MQUEUE and so on)... static initramfs again, does bamboo's kernel loader know how to specify an external initramfs or is static a requirement like on or1k?

Yet another "melting down this iceberg" session like with or1k (which I'd HOPED would get me a nommu test system), but the other big question here is does musl support bamboo? It supports powerpc, and the TOOLCHAIN supports bamboo, but is there glue missing somewhere? (Long ago I mailed Rich a check to add m68k support, but he had some downtime just then and gave me a "friend rate" on an architecture nobody else was going to pay to add support for probably ever, and I was working a well-paying contract at the time so had spare cash. If nothing else, there's been some inflation since then...)


January 22, 2024

So, unfinished design work: I want more parallelism and less dependency detection in make.sh setup work (mostly header generation).

It's not just generating FILES in parallel, I want to run the compile time probes from scripts/genconfig.sh in parallel, and probe the library link list (generated/optlib.dat) in parallel, and both of those have the problem of collecting the output from each command and stitching it together into a single block of data. Which bash really doesn't want to do: even a=b | c=d | e=f discards the assignments because each pipe segment is an implicit subshell to which assignments are local, yes even the last one. I can sort of do a single x=$(one& two& three&) to have the subshell do the parallelizing and collect the output, but A) each output has to be a single atomic write, B) they occur in completion order, which is essentially randomized.

The problem with A=$(one) B=$(two) C=$(three) automatically running in parallel is that variable assignments are sequenced left to right, so A=abc B=$A can depend on A already having been set. Which means my toysh command line resolver logic would need to grow DEPENDENCIES.

In theory I could do this, the obvious way (to me) is another variable type flag that says "assignment in progress" so the resolver could call a blocking fetch data function. Also, I'd only background simple standalone assignments, because something like A=$(one)xyz where the resolution was just _part_ of the variable would need to both store more data and resume processing partway through... Darn it, it's worse than that because variable resolution can assign ${abc:=def} and modify ala $((x++)) so trying to do them out of sequence isn't a SIMPLE dependency tree, you'd have to lookahead to see what else was impacted with a whole second "collect but don't DO" parser, and that is just not practical.

I can special case "multiple assignments on the same line that ONLY do simple assignment of a single subshell's output" run in parallel, but... toysh doing that and bash NOT doing that is silly. Grrr.Alright, can I extend the "env" command to do this? It's already running a child process with a modified environment, so env -p a="command" -p b="command" -p c="command" echo -e '"$a\n$b\n$c" could... resolve $a $b and $c in the host shell before running env, and if I put single quotes around them echo DOESN'T know how... Nope, this hasn't got the plumbing and once again my command would be diverging uncomfortably far from upstream and the gnu/dammit guys still haven't merged cut -DF.

The shell parallelism I have so far is a for loop near the end of scripts/make.sh that writes each thing's output to a file, and then does a collation pass from the file data after the loop. Which I suppose is genericizeable, and I could make a shell function to do this. (I try to quote stuff properly so even if somebody did add a file called "; rm -rf ~;.c" to toys/pending it wouldn't try to do that, and maintaining that while passing arbitrary commands through to a parallelizer function would be a bit of thing. But it's also not an attack vector I'm hugely worried about, either.)


January 21, 2024

Bash frustration du jour: why does the "wait" builtin always return 0? I want to fire off multiple background processes and then wait for them all to complete, and react if any of them failed. The return value of wait should be nonzero if any of the child processes that exited returned nonzero. But it doesn't do that, and there isn't a flag to MAKE it do that.

I'm trying to rewrite scripts/make.sh to parallelize the header file generation, so builds go faster on SMP systems. (And also to just remove the "is this newer than that" checks and just ALWAYS rebuild them: the worst of the lot is a call to sed over a hundred or so smallish text files, it shouldn't take a significant amount of time even on the dinky little orange pi that's somehow slower than my 10 year old laptop. And the OBVIOUS way to do it is to make a bunch of shell functions and then: "func1& func2& func3& func4& func5& wait || barf" except wait doesn't let me know if anything failed.

Dowanna poke chet. Couldn't use a new bash extension if I did not just because of 7 year time horizon, but because there's still people relying on the 10 year support horizon of Red IBM Hat to run builds under ancient bash versions that predate -n. And of course the last GPLv2 version of bash that MacOS stayed on doesn't have that either, and "homebrew" on the mac I've got access to also gives you bash 3.2.57 from 2007 which hasn't got -n. So a hacky "fire off 5 background processes and call wait -n 5 times" doesn't fix it either. (And is wrong because "information needs to live in 2 places": manually updated background process count. And "jobs" shows "active" jobs so using it to determine how many times I'd need to call wait -n to make sure everything succeeded doesn't work either.)

Meanwhile, wait -n returns 127 if there's no next background process, which is the same thing you get if you run "/does/not/exist" as a background job. So "failure to launch" and "no more processes" are indistinguishable if I just loop until I get that, meaning I'd miss a category of failure.

I made some shell function plumbing in scripts/make.sh to handle running the gcc invocations in the background (which, as I've recently complained is just a workaround for "make -j" being added instead of "cc -j" where it BELONGS. (HONESTLY! How is cc -j $(nproc) one.c two.c three.c... -o potato not the OBVIOUS SYNTAX?) Maybe I can genericize that plumbing into a background() function that can also handle the header generation...

That said, I think at least one of the headers depends on previous headers being generated, so there's dependencies. Sigh, in GENERAL I want a shell parallelism syntax where I can group "(a& b&) && c" because SMP is a thing now. I can already create functions with parentheses instead of curly brackets which subshell themselves (turns out a function body needs to be a block, but it turns out "potato() if true; echo hello; fi" works just fine because THAT'S A BLOCK. I want some sort of function which doesn't return until all the subshells it forked exit, and then returns the highest exit code of the lot. It would be easy enough for me to add that to toysh as an extension, but defining my own thing that nobody else uses is not HELPFUL.

Meanwhile, cut -DF still aren't upstream in gnuutils. Despite repeated lip service. Sigh, I should poke them again. And post my 6.7 patches to linux-kernel...


January 20, 2024

Oh dear:

unlike Android proper, which is no longer investigating bazel, the [android] kernel build fully switched to bazel, and doesn't use the upstream build at all. (but there's a whole team working on the kernel...

I had to step away from the keyboard for a bit, due to old scars.

On the one hand, "yay, multiple independent interoperable implementations just like the IETF has always demanded to call something a standard". That's GREAT. This means you're theoretically in a position to document what the linux-kernel build actually needs to DO now, having successfully reimplemented it.

On the other hand... oh no. Both "build system preserved in amber" and "straddling the xkcd standards cycle" are consulting bingo squares, like "magic build machine" or "yocto".

AOSP is actually pretty tame as fortune 500 examples of the Mongolian Hordes technique go: everything is published and ACTUALLY peer reviewed with at least some feedback incorporated upstream. Their build has to be downloadable and runnable on freshly installed new machines with a vanilla mainline Linux distro and retail-available hardware, and at least in theory can complete without network access, all of which gets regression tested regularly by third parties. And they have some long-term editors at the top who know where all the bodies are buried and shovel the mess into piles. (There's a reason DC comics didn't reboot its history with "Crisis on Infinite Earths" until Julius Scwartz retired. Then they rebooted again for Zero Hour, Infinite Crisis, 52, Flashpoint, the New 52, DC Rebirth, Infinite Frontier, Dawn of DC... I mean at this point it could be a heat problem, a driver issue, bad RAM, something with the power supply...)

This means AOSP does NOT have a magic build machine, let alone a distributed heterogeneous cluster of them. They don't have Jenkins launching Docker triggered by a git commit hook ported from perforce. Their build does not fail when run on a case sensitive filesystem, nor does it require access to a specific network filesystem tunneled through the firewall from another site that's it both writes into and is full of files with 25 year old dates. Their build does not check generated files into an oracle database and back out again halfway through. They're not using Yocto.

(YES THOSE ARE ALL REAL EXAMPLES. Consulting is what happens when a company gives up trying to solve a problem internally and throws money at it. Politics and a time crunch are table stakes. It got that bad for a REASON, and the job unpicking the gordian knot is usually as much social skills, research, and documentation as programming, and often includes elements of scapegoat and laxative.)


January 19, 2024

Onna plane, back to Austin.

Did some git pulls in the airport to make sure I had updated stuff to play with: the most recent commit to musl-cross-make is dated April 15, 2022, updating to musl-1.2.3. (There was a 1.2.4 release since then, which musl-cross-make does not know about.) And musl itself was last updated November 16, 2023 (2 months ago). He's available on IRC, and says both projects do what they were intended to so updates aren't as high a priority. But the appearances worry me.

I am reminded of when I ran the website for Penguicon 1, and had a "heartbeat blog" I made sure to update multiple times per week, even if each update was something completely trivial about one of our guests or finding a good deal on con suite supplies or something, just to to provide proof of life. "We're still here, we're still working, progress towards the event is occurring and if you need to contact us somebody will notice prompt-ish-ly and be able to reply".

Meanwhile, if a project hasn't had an update in 3 months, and I send in an email, will it take 3 more months for somebody to notice it in a dead inbox nobody's checking? If it's been 2 years, will anybody ever see it?

That kind of messaging is important. But I can't complain about volunteers that much when I'm not the one doing it, so... If it breaks, I get to keep the pieces.


January 18, 2024

If I _do_ start rebuilding all the toybox headers every time in scripts/make.sh (parallelism is faster than dependency checking here, I'm writing a post for the list), do they really need to be separate files? Would a generated/toys.h make more sense? Except then how would I take advantage of SMP to generate them in parallel? (I suppose I could extend toysh so A=$(blah1) B=$(blah2) C=$(blah3) launched them in parallel background tasks, since they already wait for the pipe to close. Then bash would be slow but toysh would parallelize...

I originally had just toys.h at the top level and lib/lib.h in the lib/ directory, and it would make sense to have generated/generated.h or similar as the one big header there. But over the years, lib grew a bunch of different things because scripts/install.c shouldn't need to instatiate toybuf to produce bin vs sbin prefixes, and lib/portability.h needed ostracism, and so on. Reality has complexity. I try to collate it, but there's such a thing as over-cleaning. Hmmm...


January 16, 2024

Sat down to knock out execdir and... it's already there? I have one? And it's ALWAYS been there, or at least it was added in the same commit that added -exec ten years ago.

And the bug report is saying Alpine uses toybox find, which is news to me. (When they were launching Alpine, toybox wasn't ready yet. They needed some busybox, so they used all of busybox, which makes sense in a "using all the parts of the buffalo" sort of way.)

Sigh, I feel guilty about toybox development because a PROPER project takes three years and change. Linux took 3 years to get its 1.0 release out. Minix took 3 years from AT&T suing readers of the Lyons book to Andrew Tanenbaum publishing his textbook with the new OS on a floppy in the back cover. The Mark Williams Company took 3 years to ship Coherent. Tinycc took three years to do tccboot building the linux kernel. There's a pretty consistent "this is how long it takes to become real".

Toybox... ain't that. I started working on it in 2006, I'm coming up on the TWENTIETH ANNIVERSARY of doing this thing. Admittedly I wasn't really taking it seriously at first and mothballed it for a bit (pushing things like my patch implementation, nbd-client, and even the general "all information about a new command is in a single file the build picks up by scanning for it" design (which I explained to Denys Vlasenko when we met in person at ELC 2010). I didn't _restart_ toybox development until 2012 (well, November 2011) when Tim Bird poked me. But even so, my 2013 ELC "why is toybox" talk was a decade ago now.

I'm sort of at the "light at the end of the tunnel" stage, especially with the recent Google sponsorship... but also losing faith. The kernel is festering under me, and I just CAN'T tackle that right now. The toolchain stuff... I can't do qcc AND anything else, and nobody else has tried. (Both gcc and llvm are both A) written in C++, B) eldrich tangles of interlocking package dependencies with magic build invocations, C) kind of structurally insane (getting cortex-m fdpic support into gcc took _how_ many years, and llvm still hasn't got superh output and asking how to do it is _not_ a weekend job).

And musl-libc is somewhere between "sane" and "abandoned". Rich disappears for weeks at a time, musl-cross-make hasn't been updated since 2022. Rich seems to vary between "it doesn't need more work because it's done" and "it doesn't get more work because I'm not being paid", depending on mood. It's the best package for my needs, and I... SORT of trust it to stay load bearing? And then there's the kernel growing new build requirements as fast as I can patch them out (rust is coming as a hard requirement, I can smell it). I would like to reach a good 1.0 "does what it says on the tin" checkpoint on toybox and mkroot before any more floorboards rot out from under me.

Sigh, once I got a real development environment based on busybox actually working, projects like Alpine Linux sprang up with no connection to me. I'd LIKE to get "Android building under android" to a similar point where it's just normal, and everybody forgets about the years of work I put in making it happen because it's not something anybody DID just the way the world IS. I want phones to be real computers, not locked down read-only data consumption devices that receive blessings from the "special people who aren't you" who have the restricted ability to author new stuff.

And I would really, really, really like to not be the only person working toward this goal. I don't mind going straight from "toiling in obscurity" to "unnecessary and discarded/forgotten", but I DO mind being insufficiently load-bearing. Things not happening until I get them done is ANNOYING. Howard Aiken was right.


January 15, 2024

I saw somebody wanting execdir and I went "ooh, that seems simple enough", although git diff on the find.c in my main working tree has debris from xargs --show-limits changing lib/env.c to a new API, which is blocked on me tracing through the kernel to see what it's actually counting for the size limits. (Since the argv[] and envp[] arrays aren't contiguous with the strings like I thought they were, do they count against the limit? If not, can you blow the stack with exec command "" "" "" "" ""... taking a single byte of null terminator each time but adding 8 bytes of pointer to argv[] for each one, so I have to read through the kernel code and/or launch tests to see where it goes "boing"?

Elliott's going "last time you look at this you decided it changed too often to try to match", which was true... in 2017. When it had just changed. But as far as I can tell it hasn't changed again SINCE, and it's coming up on 7 years since then. (My documented time horizon for "forever ago".) So it seems worth a revisit. (And then if they break me again, I can complain. Which if Linus is still around might work, and if Greg "in triplicate" KH has kicked him out, there's probably a 7 year time horizon for replacing Linux with another project. (On mastodon people are looking at various BSD forks and even taking Illumos seriously, which I just can't for licensing reasons.)


January 14, 2024

Bash does not register <(command) >(line) $(subshells) with job control, and thus "echo hello | tee >(read i && echo 1&) | { read i; wait; echo $?; }" outputs a zero. This unfortunately makes certain kinds of handoffs kind of annoying, and I've had to artifically stick fifos in to get stuff like my shell "expect" implementation to work.

On an adjacent note, a shell primitive I've wanted forever is "loop" to connect the output of a pipeline to the input back at the start of the pipeline. Years and YEARS of wanting this. You can't quite implement it as a standalone command for the same reason "time cmd | cmd | cmd" needs to be a builtin in order to time an entire pipeline. (Well, you can have your command run a child shell, ala loop bash -c "thingy", a bit like "env", but it still has to be a command. You can't quite do it with redirection because you need to create a new pipe(2) pair to have corresponding write to and read from filehandles: writing to the same fd you read from doesn't work. Which is where the FIFO comes in...)


January 13, 2024

Ubuntu and Red Hat are competing to see who can drop support for older hardware fastest, meaning my laptop with the core i5-3340M processor won't be able to run their crap anymore.

I guess I'm ok with that, as long as Debian doesn't pull the same stupidity. (I bought four of these suckers, and have broken one so far, in a way that MOST of it is still good for spare parts. I am BUSY WITH OTHER THINGS, don't force me to do unnecessary tool maintenance.)


January 11, 2024

A long thread I got cc'd on turned into a "Call for LTP NOMMU maintainer", which... I want Linux to properly support nommu, but don't really care about the Linux Test Project (which is an overcomplicated mess).

Linux should treat nommu/mmu the way it treats 32/64 bit, or UP vs SMP, as mostly NOT A BIG DEAL. Instead they forked the ELF loader and the FDPIC loader the way ext2 and ext3 got forked (two separate implementations, sharing no code), and although ext4 unified it again (allowing them to delete the ext2 and ext3 drivers because ext4 could mount them all), they never cleaned up the FDPIC loader to just be a couple of if statements in the ELF loader.

It's just ELF with a separate base register for each of the 4 main segments, text, data, rodata, and bss. Instead of having them be contiguous following from one base register. Dynamic vs static linking is WAY more intrusive. PIC vis non-PIC is more intrusive. They handle all THAT in one go, but fdpic? Exile that and make it so you CANNOT BUILD the fdpic loader on x86, and can't build the elf loader on nommu targets, because kconfig and the #ifdefs won't let you.

And instead of that, when I try to explain to people "uclinux is to nommu what knoppix was to Linux Live CDs: the distro that pioneered a technique dying does NOT mean Linux stopped being able to do that thing, nor does it mean nobody wanted to do it anymore, it just means you no longer need a specific magic distro to do it"... Instead of support, I get grizzled old greybeards showing up to go "Nuh-uuuh, uclinux was never a distro, nobody ever thought uclinux was a DISTRO, the distro was uclinux-dist and there was never any confusion about that on anyone's part". With the obvious implication that "the fact uclinux.org became a cobweb site and eventually went down must be because nommu in Linux IS obsolete and unsupported and it bit-rotted into oblivion because nobody cared anymore. Duh."

Not helping. Really not helping.


January 10, 2024

Got the gzipped help text checked in.

My method of doing merges on divergent branches involves checking it in to a clean-ish branch, extracting it again with "git format-patch -1", and then a lot of "git am 000*.patch" and "rm -rf .git/rebase-apply/" in my main branch repeatedly trying to hammer it into my tree, with "git diff filename >> todo2.patch; git checkout filename" in between, and then once I've evicted the dirty files editing the *.patch file with vi to fix up the old context and removed lines that got changed by other patches preventing this from applying, and then when it finally DOES apply and I pull it into a clean tree and testing throws warnings because I didn't marshall over all the (void *) to squelch the "const" on the changed data type, a few "git show | patch -p1 -R && git reset HEAD^1" (in both trees) and yet MORE editing the patch with vi and re-applying. And then once it's all happy, don't forget "patch -p1 todo2.patch" to re-dirty those bits of the tree consistently with whatever other half-finished nonsense I've wandered away from midstream.

Meanwhile, the linux-kernel geezers have auto-posters bouncing patches because "this looks like it would apply to older trees but you didn't say which ones". (And "I've been posting variants of this patch since 2017, you could have applied any of those and CHOSE not to, how is this now my problem" is not allowed because Greg KH's previous claim to fame was managing the legacy trees, and personal fame is his reason for existing. Then again it does motivate him to do a lot of work, so I can only complain so much. Beats it not happening. But there are significant negative externalities, which Linus isn't mitigating nearly as much as he used to.)


January 9, 2024

I've been up at Fade's and not blogging much, but I should put together a "how to do a new mkroot architecture" explainer.

You need a toolchain (the limiting factor of which is generally musl-libc support), you need a linux kernel config (using arch/$ARCH/defconfig has a file), and you need a qemu-system-$ARCH that can load the kernel and give serial output and eventually run at least a statically linked "hello world" program out of userspace. (Which gets you into elf/binflt/fdpic territory sometimes.)

The quick way to do this is use an existing system builder that can target qemu, get something that works, and reverse engineer those settings. Once upon a time QEMU had a "free operating system zoo" (at http://free.oszoo.org/download.html which is long dead but maybe fishable out of archive.org?) which I examined a few images from, and debian's qemu-debootstrap is another interesting source (sadly glibc, not musl), but these days buildroot's configs/qemu_* files have a bunch (generally uclibc instead of musl though, and the qemu invocations are hidden under "boards" at paths that have no relation to the corresponding defconfig name; I usually find them by grepping for "qemu-system-thingy" to see what they've got for that target).

Once you've got something booted under qemu, you can try to shoehorn in a mkroot.cpio.gz image as its filesystem here to make sure it'll work, or worry about that later. If you don't specify LINUX= then mkroot doesn't need to know anything special about the target, it just needs the relevant cross compiler to produce binaries. (The target-specific information is all kernel config and qemu invocation, not filesystem generation.)

Adding another toolchain to mcm-buildall.sh is its own step, of course. Sometimes it's just "target::" but some of them need suffixes and arguments. Usually "gcc -v" will give you the ./configure line used to create it, and you can compare with the musl-cross-make one and pick it apart from there.

The tricksy bit of adding LINUX= target support is making a microconfig. I should probably copy my old miniconfig.sh out of aboriginal linux into toybox's mkroot directory. That makes a miniconfig, which laboriously discovers the minimal list of symbols you'd need to switch on to turn "allnoconfig" into the desired config. (Meaning every symbol in the list is relevant and meaningful, unlike normal kernel config where 95% of them are set by defaults or dependencies.)

Due to the way the script works you give it a starting config in a name OTHER than .config (which it repeatedly overwrites by running tests to see if removing each line changes the output: the result is the list of lines that were actually needed). You also need to specify ARCH= the same way you do when running make menuconfig.

The other obnoxious thing is that current kernels do a zillion toolchain probes and save the results in the .config file, and it runs the probes again each time providing different results (so WHY DOES IT WRITE THEM INTO THE CONFIG FILE?) meaning if you don't specify CROSS_COMPILE lots of spurious changes happen between your .config file and the tests it's doing. (Sadly, as its development community ages into senescence, the linux kernel gets more complicated and brittle every release, and people like me who try to clean up the accumulating mess get a chorus of "harumph!" from the comfortable geezers wallowing in it...)

Then the third thing you do once you've got the mini.config digested is remove the symbols that are already set by the mkroot base config, which I do with a funky grep -v invocation, so altogether that's something like:

$ mv .config walrus
$ CROSS_COMPILE=/path/to/or1k-linux-musl- ARCH=openrisc ~/aboriginal/aboriginal/more/miniconfig.sh walrus
$ egrep -v "^CONFIG_$(grep -o 'BINFMT_ELF,[^ ]*' ~/toybox/mkroot/mkroot.sh | sed 's/,/|/g')=y" mini.config | less

And THEN you pick through the resulting list of CONFIG_NAME= symbols to figure out which ones you need, often using menuconfig's forward slash search function to find the symbol and then navigating to it to read its help text. Almost always, you'll be throwing most of them away even from the digested miniconfig.

And THEN you turn the trimmed miniconfig into a microconfig by peeling off the CONFIG_ prefix and the =y from each line (but keep ="string" or =64 or similar), and combining the result on one line as a comma separated value list. And that's a microconfig.

And THEN you need to check that the kernel has the appropriate support: enough memory, virtual network, virtual block device, battery backed up clock, and it can halt/reboot so qemu exits.


January 6, 2024

The amount of effort the toys/pending dhcpd server is putting in is ridiculous for what it accomplishes. Easier to write a new one than trim this down to something sane.

Easier != easy, of course.


January 5, 2024

I had indeed left the 256 gig sd card at Fade's apartment, which is what I wanted to use in the "real server". (I had a half-dozen 32 gig cards lying around, but once the OS is installed that's not enough space to build both the 32 bit and 64 bit hosted versions of all the cross compilers, let alone everything else. I want to build qemu, both sets of toolchains for all targets, mkroot with kernel for all targets, and set up some variant of regression test cron build. So big sd card.)

The orange pi OS setup remains stroppy: once I got the serial adapter hooked up to the right pins, there's a u-boot running on built-in flash somewhere, as in boot messages go by without the sd card inserted. Not hugely surprising since the hardware needs a rom equivalent: it's gotta run something first to talk to the SD card. (And this one's got all the magic config to do DRAM init and so on, which it chats about to serial while doing it. At 1.5 megabit it doesn't slow things down much.) Which means I'm strongly inclined to NOT build another u-boot from source and just use that u-boot to boot a kernel from the sd card. (If it's going to do something to trojan the board, it already did. But that seems a bit low level for non-targeted spyware? My level of paranoia for that is down closer to not REALLY trusting Dell's firmware, dreamhost's servers, or devuan's preprepared images. A keylogger doing identity theft seems unlikely to live THERE...)

Besides, trying to replace it smells way too bricky.

I _should_ be able to build the kernel from vanilla source, but "I have a device tree for this board" does not tell me what config symbols need to be enabled to build the DRIVERS used by that device tree. Kind of a large missing data conversion tool that, which is not Orange Pi's fault...

So anyway, I've copied the same old chinese debian image I do not trust (which has systemd) to the board, and I want to build qemu and the cross compilers and mkroot with Linux for all the targets on the nice BIG partition, and record this setup in checklist format. (In theory I could also set up a virtual arm64 debian image again and run it under qemu to produce the arm toolchains, but I have physical hardware sitting RIGHT THERE...)

I _think_ the sudo apt-get install list for the qemu build prerequisites is python3-venv ninja-build pkg-config libglib2.0-dev libpixman-1-dev libslirp-dev but it's the kind of thing I want to confirm by trying it, and the dhcp server in pending is being stroppy. I got it to work before...

Sigh. It's HARDWIRED to hand out a specific address range if you don't configure it. It doesn't look at what the interface is set for, so it's happy to try to hand out address that WILL NOT ROUTE. That's just sad.


January 2, 2024

I fly back to Minneapolis for more medical stuff on wednesday (doing what I can while still on the good insurance), which means I REALLY need to shut my laptop down for the memory swap and reinstall before flying out.

So of course I'm weaning mkroot off oneit, since writing (most of) a FAQ entry about why toybox hasn't got busybox's "cttyhack" command convinced me it could probably be done in the shell, something like trap "" CHLD; setsid /bin/sh <>/dev/$(sed '$s@.*/@@' /sys/class/tty/console/active) >&0 2>&1; reboot -f; sleep 5 presumably covers most of it.

But while testing mkroot to make sure reparent-to-init doesn't accumulate zombies and such. That's what the trap doing SIG_IGNORE on SIGCHLD is for, a zombie sticks around while its signal delivery is pending; presumably so the parent can attach to it and query more info, but if the parent doesn't know it's exited until the signal is delivered, and it goes away as soon as the signal IS delivered, I don't know how one would take advantage of that?

Anyway, I noticed that "ps" is not showing any processes, which is a thing I hit back on the turtle board, and it's because /proc/self/stat has 0 in the ttynr field, even though stdin got redirected. But stdout and stderr still point to /dev/console? Which means the kernel thinks we're not attached to a controlling tty, so of course it won't show processes attached to the current tty.

I vaguely remember looking at lash years ago (printed it out in a big binder and read it through on the bus before starting bbsh) and it was doing some magic fcntl or something to set controlling tty, but I'm in a systematic/deterministic bug squishing mood rather than "try that and see", so let's trace through the kernel code to work backwards to were this value comes from.

We start by looking at MY code to confirm I'm looking at the right thing. (It's worked fine on the host all along, but you never know if we just got lucky somehow.) So looking at my ps.c line 247, it says SLOT_ttynr is at array position 4 (it's the 5th entry in the enum but the numbering starts from zero), and function get_ps() is reading /proc/$PID/stat on line 749, skipping the first three oddball fields (the first one is the $PID we needed to put in the path to get here, the second is the (filename) and the third is a single character type field, everything after that is a space-separated decimal numeric field), and then line 764 is the loop that reads the rest into the array starting from SLOT_ppid which is entry 1 back in the enum on line 245. This means we started reading the 4th entry (if we started counting at 1) into array position 1 (which started counting at 0), so array position 4-1=3, and 4+3 is entry 7 out of the stat field table in the kernel documentation. (In theory we COULD overwrite this later in get_ps(), but it only recycles unused fields and this is one we care about.)

The kernel documentation has bit-rotted since I last checked it. They converted proc.txt to rust (to make the git log/annotate history harder to parse), and in the process the index up top still says "1.8 Miscellaneous kernel statistics in /proc/stat" but if you search for "1[.]8" you get "1.8 Ext4 file system parameters". Which should not be IN the proc docs anyway, that should be in some sort of ext4 file? (Proc is displaying it, but ext4 is providing it.)

I _think_ what I want is "Table 1-2: Contents of the status fields (as of 4.19)" (currently line 236), but right before that it shows /proc/self/status which _looks_ like a longer version of the same info one per line with human readable field descriptions added... except it's not. That list skips Ngid, and if you look at the current kernel output it's inserted "Umask" in second place. So "which label goes with which entry offset" is lost, they gratuitously made more work for everyone by being incompatible. That's modern linux-kernel for you, an elegant solution to making the kernel somewhat self-documenting is right there, and instead they step in gratuitous complexity because "oops, all bureaucrats" drove away every hobbyist who might point that out. Anyway, table 1-2 is once again NOT the right one (it hasn't even GOT a tty entry!), table 1-4 on line 328 is ("as of 2.6.30-rc7", which came out May 23, 2009 so that note is 15 years old, lovely), and the 7th entry in that is indeed tty_nr! So that's nice. (Seriously, when Greg finally pushes Linus out this project is just going to CRUMBLE TO DUST.)

Now to find where the "stat" entry is generated under fs/proc in the kernel source. Unfortunately, there's not just /proc/self/stat, there's /proc/stat and /proc/self/net/stat so grep '"stat"' fs/proc/*.c produces 5 hits (yes single quotes around the double quotes, I'm looking for the string constant), but it looks like the one we want is in base.c connecting to proc_tid_stat (as opposed to the one above it connecting to proc_tgid_stat which is probably /proc/$PID/task/$PID/stat). Of course neither of those functions are in fs/proc/base.c, they're in fs/proc/array.c right next to each other where each calls do_task_stat() with the last argument being a 0 for the tid version and a 1 for the tgid version. The do_task_stat() function is in that same file, and THAT starts constructing the output line into its buffer on line 581. seq_put_decimal_ll(m, " ", tty_nr); is the NINTH output, not the seventh, but seq_puts(m, " ("); and seq_puts(m, ") "); just wrap the truncated executable name field, and subtracting those two makes tty_nr entry 7. So yes, we're looking at the right thing.

So where does tty_nr come from? It's a local set earlier in the function via tty_nr = new_encode_dev(tty_devnum(sig->tty)); (inside an if (sig->tty) right after struct signal_struct *sig = task->signal;) which is _probably_ two uninteresting wrapper functions: new_encode_dev() is an inline from include/linux/kdev_t.h that shuffles bits around because major:minor are no longer 8 bits each but when they expanded both minor wound up straddling major to avoid changing existing values that fit within the old ranges). And tty_devnum() is in drivers/tty/tty_io.c doing return MKDEV(tty->driver->major, tty->driver->minor_start) + tty->index; for whatever that's worth. But really, I think we care that it's been set, meaning the pointer isn't NULL.

So: where does task->signal->tty get set? I did grep 'signal->tty = ' * -r because the * skips the hidden directories, so it doesn't waste a bunch of time grinding through gigabytes of .git/objects. There's no guarantee that's what the assignment looks like, but it's a reasonable first guess, and finds 4 hits: 1 in kernel/fork.c and three in drivers/tty/tty_jobctrl.c. The fork() one is just copying the parent process's status. The assignment in proc_clear_tty() sets it to NULL, which is getting warmer. A function called __proc_set_tty() looks promising, and the other assignment is tty_signal_session_leader() again setting it to NULL. (Some kind of error handling path?)

So __proc_set_tty() is the unlocked function, called from two places (both in this same file): tty_open_proc_set_tty() and by proc_set_tty() (a wrapper that just puts locking around it). The second is called from tiocsctty(), which is a static function called from tty_jobctrl_ioctl() in case TIOCSCTTY which means this (can be) set by an ioctl.

Grepping my code for TIOCSCTTY it looks like that ioctl is getting called in openvt.c, getty.c, and init.c. The latter two of which are in pending.

The main reason I haven't cleaned up and promoted getty is I've never been entirely sure when/where I would need it. (My embedded systems have mostly gotten along fine without it.) And it's STILL doing too much: the codepath that calls the ioctl is also unavoidably opening a new fd to the tty, but I already opened the new console and dup()'d it to stdout and stderr in the shell script snippet. The openvt.c plumbing is just doing setsid(); ioctl(0, TIOCSCTTY, 0); which is a lot closer to what I need, except I already called setsid myself too. Ooh, the man page for that says there's a setsid -c option! Which didn't come up here because it's tcsetpgrp(), which in musl is a wrapper around ioctl(fd, TIOCSPGRP, &pgrp_int); Which in the kernel is back in drivers/tty/tty_jobctrl.c and tty_jobctrl_ioctl() dispatches it to tiocspgrp() which does if (!current->signal->tty) retval = -ENOTTY; so that would fail here. And it setting a second field, which seems to depend on this field.

TWO fields. Um. Ok, a non-raw controlling tty does signal delivery, when you hit ctrl-C or ctrl-Z. Presumably, this is the process (group?) the signal gets delivered TO?

Ah, man 4 tty_ioctl. Settling in for more reading. (I studied this EXTENSIVELY right when I was starting writing my own shell... in 2006. And I didn't really get to the end of it, just... deep therein.)

My real question here is "what tool(s) should be doing what?" Is it appropriate for toysh to do this for login shells? Fix up setsid -c to do both ioctl() types? Do I need to promote getty as "the right way" to do this?

I don't like getty, it SMELLS obsolete: half of what it does is set serial port parameters, which there are separate tools for (stty, and why stty can't select a controlling tty for this process I dunno). Way back when you had to manually do IRQ assignments depending on how you'd set the jumpers on your ISA card, and there was a separate "setserial" command for that nonsense because putting it in getty or stty. There's tune2fs, and hdparm, and various tools to mess with pieces of hardware below the usual "char or block device" abstractions.

But getty wants to know about baud rate and 8N1 and software flow control for historical reasons, and I'm going... This could be netconsole or frame buffer, and even if it ISN'T the bootloader set it up already (or it's virtual, or a USB device that ACTS like a serial port but isn't really, hardware like "uartlite" that's hardwired to a specific speed, so those knobs spin without doing anything) and you should LEAVE IT ALONE.


Back to 2023