Rob's Blog rss feed old livejournal twitter

2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002


December 31, 2019

Still receiving replies from yesterday's kernel thread. Still deleting them unread. As the saying goes, "Not my circus, not my monkeys". (I'm making good progress on toysh, and I fly to Japan on the 7th so there's a deadline to down tools on that for a while. Trying hard to stay focused on that.)

Speaking of drama, sadly the linuxconf.au "Why we're doing J-core" talk _isn't_ happening. Here's the emails that went back and forth about that, but really it's my bad for reading "miniconfs have no budget for tickets" on the CFP page as "you won't be able to attend the rest of the conference" instead of "you need to pay to speak, this is not optional". I should have clarified that _way_ earlier, but "negative honorarium" didn't seem plausible at the time, and we didn't go back and reread it until way late, then some people reacted badly to the clarification.

My first email:

We're scheduling flights and wondering how much of the rest of the conference we should attend. (We're probably going to leave thursday to spend friday in singapore whatever happens, but it could be thursday end of day, attending 2 days of conference proper.)

My question is tickets: do we need to buy tickets to attend _just_ the miniconf and not the conference proper? If so is it the same price ticket? (The CFP page said miniconfs don't have the budget to comp tickets, and I've never gone to a conference I have to pay to speak at. I don't know the budget procedure. It's not the same as for travel. I'd just as soon avoid the paperwork, and if flying out wednesday morning is one less form to fill out...)

The second email I sent:

> That is to say, if you're attending or presenting at the miniconf you either
> need a registration for the main conference (hobbyist is fine) or a miniconf
> pass which is AUD99/day

So, attend monday and fly out tuesday morning then.

> LCA itself is a pretty thoroughly technical event and I'd venture even with your
> obviously deep technical chops you'd not only enjoy it but get a good amount out
> of it and see plenty of familiar faces.

I spoke there in 2017, it was nice. (Sadly, I was so amazingly jetlagged I'm not happy with the talk I gave, hence my plan to fly to the Japan office first several days ahead of time and get the jetlag out of the way there.)

But since I spoke at the main event in 2017, ticket prices never came up, and unfortunately after poking the accounting department about conference tickets they seem to think paying _to_ speak means this conference is a scam (something about Yog's Law) and they canceled the _travel_ funding. (Presumably because we've never asked them to pay for speaker tickets at an engineering conference before. Neither Jeff nor I have ever previously paid _to_ speak.)

I'm trying to convince them to reinstate the travel funding, but Jeff and I would definitely be paying for tickets out of our own pockets. (The accountants and lawyers partially answer to investor-appointed board members and it all turns into corporate politics fast, not my area...)

> I've heard many an attendee rank it up
> with ELC, Linux Plumbers, FOSDEM etc.

Yes, because Linus likes to snorkel so it was the only conference he kept coming back to regularly, so everybody _else_ convened there. (That's also why the Linux Foundation decided to hold Plumber's in his home town, to make it easier for Linus to show up.) And ever since the W administration had some idiot arrest a guy on stage for talking about PDF passwords, half the overseas guests understandably won't set foot in the US, and the last Ottawa Linux Symposium was 2008 (I still have the shirt). That's why ELC has a European fork, to get those guys.

> I'm sorry we don't have budget available to comp the passes or full conference
> registration.

Understood, you didn't set the policy.

That said, if it seriously costs the convention $100/day for us to set foot in the building, such that they'd need money set aside to pay for the air we breathe or some such, it sounds like we can save you money by not coming.

(But then at our end $5k in travel and lodging expenses went past without a hitch, until 1/5 as much in tickets got added and it all went "boing".)

> We have however been able to arrange the earlybird discounted
> rate if you do want to attend more of the conf.

I don't think that's on the table at this point. I'm willing to personally cough up for a day pass, and Jeff might be too, but if so we'd fly out again the next day. You've priced yourself entirely out of the "hobbyist" market.

I note I was overworked enough in 2017 I burned out and stopped submitting talk proposals for a couple years, which meant I didn't attend any either. I ended that hiatus in August and even proposed a talk for OLS in the airport on the way back.

But the O'Reilley conference and Texas LinuxFest and such pass through my home town of Austin all the time, and I've been coincidentally in town for things like Plumbers more than once, and not gone. Even when travel and lodging are literally free I've never gone if I'm not speaking because the tickets aren't worth it. (Speaking of which, ELC is in Austin this summer, I should really submit a talk proposal to that. It would be nice to attend, but I'm not paying 3 figures for a ticket.)

The two conventions I founded (Penguicon and Linucon) each charged $45 for the three day weekend pass because they were hobbyist events bartering a room block for function space (the way science fiction conventions have been doing for a century now). There's been enough inflation since it might be more like $65 now, I haven't had time to do that stuff this decade so my numbers are a bit out of date. But this is presumably why actual hobbyist events like ohiolinux.org (which the Linux Foundation hasn't turned into profit centers) take place in a hotel.

So when I see $500 ticket prices from an organization that has four tiers of corporate sponsorships taking place in a convention center... I sigh and shake my head.

> Look forward to seeing you week after next,

I hope we can make it work too, but it sounds like it'll just be one day if it happens. Let us know if that should be monday or tuesday...

And then today I sent:
> The time has come for you both to create separate logins and register
> separately. We can easily accommodate supporting speakers in our
> system.
>
> I've cc'd the miniconf organisers just so everyone is on the same page.
>
> If you have any dramas, let me know.

Alas, there was drama. It doesn't look like we're coming.

We can record a talk and send you a video if you'd like, and/or skype in. But the travel funding didn't come back after "pay to talk" convinced the accountants/investors the conference is a scam, and Jeff doesn't want to spend political capital to override it, or something? (I think he got the funds repurposed to expand an upcoming ASIC shuttle tapeout instead.)

Sorry about that. I told them what it said on the CFP page at the time, but I don't think they'd noticed until it came time to cut a check...

And finally:
> I reached out to the conference director, Joel yesterday to see if something
> can be worked out with respect to the registration, yet to hear back.

That's not the problem at this point. My understanding (I'm just an engineer) is _asking_ for the registration money made the transportation money go away, and then it became political and Jeff decided not to spend political capital on explaining to investors that the people they appointed and who report to them are wrong, but instead negotiated for the money to go towards an engineering project rather than back into the travel budget.

Sorry, this is our screw-up, not yours. You stated the terms clearly on your CFP, that info didn't get propagated widely enough here because nobody else expected that. (None of us have ever paid to speak at a technical conference before.*)

Rob

* Ok, I bought a pass to Penguicon and Linucon every year I attended, but I founded both conventions, it was $45 for the weekend, and the last time I was there was 11 years ago. Given how much I spent on adding new things to it each year I attended, that was kind of a rounding error, but I never had to get budget approval since it was all my money.


December 30, 2019

Blah, what's accumulating on the todo list while I focus on toysh: "echo hello > a; echo also > b; mkdir sub; install -m 644 -Dv a b sub/" should NOT be calling strip on a and b. I have an selinux test case now and it looks like ls -Z is broken.

Is "exec" one of the MAYFORK-likes that take environment variable preassignment? Yes it does. I wonder what happens when you pipe the output of exec? It looks like the exec is ignored and it just runs the command? (Maybe it takes out an implicit layer of subshell, but not in a way I can find a behavior difference for?)

Fiddling with <( ) and that turns into /dev/fd/63 and such, which implies that /dev has an fd -> /proc/self/fd symlink, since the commands it runs aren't in the shell and thus the shell can't interpret that path specially for it. This says that mkroot needs a little more devtmpfs setup plumbing. (I'd submit a patch upstream to devtmpfs in the kernel, but why bother? (Yes, I still have a patch to apply the devtmpfs automount config option to initramfs. Yes it's still out of kernel because three strikes.)

Ugh, and half of mkroot's current /dev is useless /dev/tty17 entries which are for VGA TEXT CONSOLE (which does not apply here, and is not in the kconfig I'm feeding it), so why... because it's hardwired on in menuconfig. And there is no obvious reason it is hardwired on. Great. (I'm pretty sure lkml peaked somewhere around 2005. Like java, it went through a period of "you're adding a mix of good stuff and some outright crap" that made it hard to say whether new or old versions were better, and now it seems to be in full flaming crap mode. There is NO EXCUSE for kconfig to be turing complete. That's not what it's FOR.)

Gee, why does no one in the embedded community participate in linux-kernel and explain how their needs differ from non-embedded users? It's a mystery for they ages.


December 29, 2019

Bash is inconsistent about single command environment varable definitions applied to builtins: "ABC=DEF command ls -l" works, but "ABC=DEV time ls -l" doesn't. There's no obvious pattern to it? (It's... _sort_ of a subshell? Do I need a TOYBOX_MAYSUBSHELL? Ah, no this _is_ MAYFORK. Modulo the "command" builtin shouldn't be installed as a symlink, but all the things that _can_ run as their own PID (or in the current PID) should be able to accept environment variable pre-assignment. (Does that include "exec"? Yes it does. I wonder what happens when you pipe the output of exec? It looks like the exec is ignored and it just runs it.) And when you assign variables to them, they get their own PID. Although I could do it in the current PID too since I can save/swap the *environ pointer. And also, time acts differently as a builtin:

$ ./time true | sleep 1
real 0.001025
user 0.000000
sys 0.000000
$ time true | sleep 1
real	0m1.004s
user	0m0.000s
sys	0m0.000s

WOW do blockquote/pre blocks look terrible on my phone. It chooses a tiny illegible font way out of step with everything else around it. Looks fine on chrome on linux, this is some sort of android-specific insanity? Perhaps the "large font" I chose in the accessibility settings doesn't seem to apply to pre blocks? No reason for it _not_ to. I keep things under 80 characters in pre blocks and the font it's shrinking down <pre> blocks to looks like it could do maybe 200 chars across before I'd have to scroll the screen. That's INSANE. Oh well.

I need to make "ls -l /proc/$$/fd" work pretty soon in order to check for filehandle leaks (basic environment variable resolution, which is a can of worms with unquoting and such mixed in there, and if you don't want to realloc() every character it requires design and plumbing), and I need to actually implement "abc=def env" because the plumbing I have is enumerating the assignments... and then not performing them if it's got a command. Which I hadn't implemented yet because it's more per-process context that either has to be done after the vfork() or has to be done and then undone again in the parent process.

The second seems the obvious way to do it, because the heap's shared between parent and child after vfork(), so it would have to be undone in the parent anyway. I think what I need to do is memdup() a temp copy of environ[] (just the pointer array, not the string contents), search each variable and replace in situ if we find it, else append it to the array if we don't. This does NOT free the old variable, and doesn't copy the new one's memory (it's the TAKE_MEM logic: key=value is already evaluated, just add the pointer to the array).

Ok: temp environ, if (getenv()) replace in situ else append. Don't free old, TAKE_MEM new, then free temp environ and replace with original after exec.


December 28, 2019

The world continues to have good news, and it looks like GOP's end of life rally (or possibly extinction burst) is finally coming to a head and people are noticing. We'll see. Unfortunately terminal senility is what Boomers have always really respected. Their ideal candidate is a doddering old white man who literally can't remember ever having been contradicted (or his wife's name) repeating the same old stories, over and over, about how much better everything was for him back in the day. Sad but predictable.

Ok, I can vfork() most of the toysh plumbing, but without proper fork() subshells and job control become a nightmare, and without implicit subshells around pipelines a lot of CTRL-Z suspending ain't happening. (Although I doubt most people have noticed that if you "read i" from an interactive command line, you can't CTRL-Z it. Same for "sleep 30 & wait": CTRL-C yes, CTRL-Z no.)

I have a theory about how to make subshells work with nommu, but it involves marshalling data through a pipe and re-parsing it in the new process in a way that's hideously expensive and thus I dowanna do for the common case of _pipelines_. Pipelines have an implicit subshell around them so "X=42 | true; echo $X" doesn't get the 42, and _that's_ why "zcat blah.gz | while read i; do thingy; done" is suspendable with ctrl-z: you're suspending (and potentially backgrounding) the subshell process. (Within which the read isn't suspendable because it ain't got a PID, but to the parent shell the whole subshell is suspendable because it does.)

I think what I need to do is initially implement subshells with fork(), simply not support them on nommu yet, and handle pipelines on nommu without the subshell. (Meaning "{ sleep 10; echo abc;} | read i | echo hello" wouldn't print hello immediately because it wouldn't launch the second echo until the read i returns. But given I'm not even sure how to make an automated test for that, is it likely to cause a problem?) And then I can do the expensive pipe marshalling solution later.


December 27, 2019

So the NEXT chunk of toysh code I have to frog ("rip it, rip it". There's also tinking) is for "zcat file | while read i; do echo $i; done" because that pipeline takes a type 0 block and pipes it into a type 0 block.

Backing up: my shell parsing annotates pipeline segments with types: type 0 is a normal execuable statement, type 1 is the start of a multiline flow control statement (the "while" statement in the above block), type 2 is a "gearshift" between the test and the body (the "do" above), and type 3 is the end of block (the "done" above). If you change the if (0) dump_state() in sh_main() to an if (1), it'll print the parsing state:

$ zcat file | while read i; do echo $i; done
arg[0][0]=zcat
arg[0][1]=file
type=0 term[0]=|
arg[1][0]=while
type=1 term[1]=(null)
arg[2][0]=read
arg[2][1]=i
type=0 term[2]=(null)
arg[3][0]=do
type=2 term[3]=(null)
arg[4][0]=echo
arg[4][1]=$i
type=0 term[4]=(null)
arg[5][0]=done
type=3 term[5]=(null)

I.E: the above line parses into 6 "segments": a type 0 segment with "zcat" and "file", a type 1 "while", a type 0 "read" "i", a type 2 "do", a type 0 "echo" "$i", and a type 3 "done". The first segment ends with "|" and all the rest have no special ending (either a semicolon or a newline).

When traversing the linked list of these segments, each type 1 has a corresponding type 3 ending it, and usually a type 2 in the middle, so it's easy to search forward and backward to match them up. The function block_end(pl) finds the end of a pipeline segment's current block (skipping nested blocks). The run_function() plumbing traverses the blocks and handles the flow control statements. The run_command() function takes an individual type 0 segment (more or less a parsed argc/argv pair) and calls expand_redir() to parse out the environment variable assignments and redirections to create a _new_ argv/argc with the cleaned command line arguments after environment variable substitution. Sometimes there's nothing else to do, such as when you just have an environment varible assignment, or just a redirection on its own line creating an empty file, or you run the line "$ABC" and that environment variable's empty. But when it _does_ return a nonzero argv, it then runs it either as a builtin or via vfork/exec(), and returns a struct sh_process * with the ->pid field set to the PID it launched (or 0 if it was a builtin that already ran, in which case ->exit is already be set to its rc).

So, backing up. The problem is when I wrote this, I was thinking in terms of "traverse the type 0 segments, which may be piped together", because you can't pipe across a transition (both pipes in "if true | then echo hello | fi" are illegally placed and should throw syntax errors) which _also_ means you can't drop && and || tests there either. But you CAN pipe into and out of blocks: type 0 to 1 is legal, 3 to 0 is legal, and 3 to 1 is legal. The command line at the start up there is 0 to 1, and 3 to 0 looks like "if true; then echo hello; fi | cat", and 3 to 1 would be "for i in one two three; do echo $i; done | while read j; do echo $j; done". And of course all this can nest arbitrarily deep. And anywhere a pipe can go, you can have && and || and |& and probably & too. (I _still_ don't understand why "break &" is allowed, but then I don't understand why echo | break" is allowed either. Did you know that in the Defective Annoying SHell you can just type "break" at a normal command prompt and it isn't an error? Seems to be a NOP. Return code 0.)

So all the pipe logic has to go in the type 0 block _and_ the type 3 block, and feed into a type 0 or type 1 following block. (Except I guess it feeds into whatever the following block is without caring, because the parsing to assemble the list caught the syntax errors for "statements in wrong order or not matching up" already? But the problem is the _undo_ information for blocks attaches to the type 3 segment at the end of the block (which is already where trailing block redirects live, ala "( echo hello ) > file": not that "(" is a type 1 block and ")" is a type 3 block). But the undo information for type 0 is local and the undo happens at the start of the next command. (It inherits a modified stdin and has to undo it right after the exec.) Aha, wait, I already fixed that. The "struct blockstack" instance is where this lives, and that spans the whole (currently active) block: type 1 setup needs to record the current pipe-in state in there, and then block_end() to get the type 3 data and handle the redirects and pipe-out attached to that.

Hmmm... pipes out of type 3 are the same as type 3 redirects: seek ahead from type 1, parse them immediately and apply them to the whole block until you exit the block. (Said exit could be via "break" or "return", which need cleanup code. There's probably also "trap" contexts and such that need to shift but again, it seems like it should apply to the block.) And yes both need to happen at the start of the block, because they apply even to the test part before the type 2 gearshift:

$ if echo hello; then true; fi | tr l x
hexxo

It's intersting the little idioscynrasies of programming style you develop doing this long enough. Back using turbo C under dos I always used to write "tempomat.c" as my temp file name (didn't have cut and paste with mouse so you'd write chunks to a file and read them back in elsewhere). These days I use "hnork" as a temp file name I'm 90% sure isn't there (and when it is it can always be deleted because that's what it _means_ to me), and in the past week or two I've started sticking "murgle" into code when I'm leaving off in the middle of a thought and about to jump elsewhere in the same file to look for something, and then I can search for that to get back where I was. I suppose on a larger scale this is how language evolves...


December 26, 2019

Longish call with Jeff this morning on work strategy for the new year. So much to do, and not enough people to do it, and not _nearly_ enough money to hire more people yet. (Just turtle testing: making a proper automated manufacturing test harness so they can test them at the factory requires a $10k part to get _started_, and testing them ourselves by hand is probably 20 hours to get through 500. We can start out with manual testing for the first batch and spring for the big test harness if we sell through that batch quickly, which would eat up pretty much all the profits from the first batch.)

Toysh is now to the point where I'm developing by adding tests to the test suite and fixing what breaks, except that small tests can result is very large development ratholes. In theory "abc=def env" can test variable assignment without needing to use the variable resolution plumbing I haven't done yet (I should just implement $ABC without opening the ${a:3:7} can of worms, but then I _should_ just add all the tests it passes and go back for the others in a second pass. I'm not good at avoiding tangents.)

The problem with that "simple' test is env dumps all the environment variables (an unconstrained set) so I piped the output to grep '^abc=' and of course I haven't implemented the pipe stuff yet... so I started. And it's its own can of worms.

Each pipeline segment except the last is backgrounded, you only wait for the _last_ pipeline segment. The last segment is kind of implicitly subshelled, as in "echo | read i" can be suspended with ctrl-z (which directly run builtins can't). Back in my first pass at shell stuff ages ago I dug into those corner cases while trying to understand job control. (I think I wound up reading the kernel source code to figure out what setsid() and tcsetpgrp() and friends actually do.)

One of the fiddly things about testing stuff with strace is that "strace true |& wc" is 25 lines. That's because it's dynamically linked against glibc, static toybox true under musl is 5 lines, only 2 of which are actually unnecessary. (The first is the exec line, the last 2 are the exit syscall and an "exited with 0" report from strace. The 2 unnecessary ones musl does are thread setup syscalls for a program with no threads. Rich does have his idiosyncrasies.)

Meanwhile my recent "how do I detach from a tty so bash stops finding it" led to me redoing bits of the setsid plumbing which turned into a _weird_ corner case: "setsid -cw echo hello" hangs. If I stick printfs into it, the last output is right before the tcsetpgrp(0, getpid()); It's not a BIG hang, ctrl-C gets my command prompt back, but if I strace it, I get my command prompt back immediately. So the hang ISN'T happening under strace!

Without strace, the hung process is there (stopped in T state) and sudo cat /proc/$PID/stack says it went through entry_SYSCALL_64_after_swapgs, do_syscall_64, exit_to_usermode_loop, vfs_write do_vfs_ioctl, do_signal get_signal, and ended up in do_signal_stop. And that's where it's hung: it tried to do the syscall and then hit some sort of signal check? (I sent it a SIGCONT to no effect.)

Alas I'm testing it on devuan and _not_ under a VM (if setsid as a normal user panics the kernel we have bigger issues), so my logical next step of "stick printk()s into the kernel" requires some retrenching and hoping the problem reproduces in the new environment. Which would be yet _another_ tangent from a tangent from a tangent... Aha! Nope, figured it out. What I'm trying to do boils down to:

if (!vfork()) {
  setpgid(0,0);
  tcsetpgrp(0, getpid()); <- this never returns
}

And this could be vfork() weirdness: maybe it's waiting for the hung parent process to release the tty. In which case I can move the setpgid() before the vfork() and... Boom, it works. (The switch to vfork() caused a regression I didn't catch at the time.)


December 25, 2019

Christmas! So much food. All the food. (Fuzzy spatchcocked a capon, and baked a pecan pie, and HEB clearanced the bakery section yesterday. The rosemary bread makes _excellent_ capon sandwiches.)

Github seems like one of those services without an obvious archive to read through, so when somebody asks for an infodump maybe 3 people will ever read it unless it's pointed out. (Maybe I'm wrong about that? Mailing list posts can also get buried, but at least a web archive keeps them all in one place if you _do_ want to read through.)

I finally broke down and tested whether FD_CLOEXEC does what I want it to, and it does! (It doesn't survive dup2(). I can't dup3() because you have to #define GNU_GNU_GNU_ALL_HAIL_STALLMAN in order for the function prototype to show up in the headers, which I refuse to do on principle because Linux is not and never was part of the GNU project, nor is toybox. Linux forked off of the Minix development community, which was created to replace the Lions book after AT&T closed the source in 1983. Both were independent reactions to Apple vs Franklin extending copyright law to cover binaries, and the long legal path to Free BSD was a third independent reaction. And the first complete clone of Unix AT&T verified it had no copyright claim on was Coherent in 1980, 3 years before the "gnu manifesto", so he wasn't even proposing anything novel. What the FSF was good at was marketing itself and claiming credit for other people's work: case in point dup3 is a Linux system call that glibc is literally requiring you to "#define GNU" in order to access.)

Anyway, so dup3() isn't worth it but I can just fcntl() the dup2 result and it works, which lets me rip out half the unwind logic. (The other half's still necessary, using the hfd to put the old filehandle _back_. The thing I just established is that when I do that, the CLOEXEC annotation on the hfd doesn't persist on the replaced low filehandle.)

I have no idea what "while true; do echo hello | break; done" is supposed to do, or why in bash you usually have to ctrl-C out of it multiple times. (I'm implementing pipeline support, and you can have flow control statements in a pipeline? Except they don't work?)


December 24, 2019

Ooh, sodium battery news! (Yes, new since last time.)

A battery is basically two electrodes with a solvent and insulator in between. The solvent lets the ions cross the gap but the electrons have to go the long way around through the wire. A big problem rechargeable batteries have to solve is if an electrode is _just_ your ionizing material, it'll dissolve completely away when the battery fully discharges and when you try to recharge it tends to precipitate out of solution all over the place, instead of going back where you want it to. That's why completely discharging lead/acid batteries two or three times will kill them. Another fun problem is if you _do_ get pure metal to recrystalize where you think it should, lithium or zinc like to form "dendries", I.E. long stabby crystal whiskers that bridge the gap between anode and cathode and short out your battery (sometimes physically stabbing through an insulating membrane). That's why non-rechargeable batteries tend to explode when recharged.)

So rechargeable batteries need insoluble cathode and anode substrates to hold the ions. The chemistry of conventional lithium batteries has lithium ions crossing the gap between cathode and anode, with a lithium/cobalt/oxygen alloy at one end being the insoluble part (it can lose some but not all lithium ions), and uses graphite or graphene to hold the lithium at the other end. Lithium and cobalt are rare/problematic enough we want to use something cheaper and more scalable, and sodium is the next element down in the same column as lithium in the periodic table, can use nickel instead of cobalt, and _mostly_ works the same way... except sodium atoms are physically slightly larger, and don't fit into the holes in graphite.

Sodium batteries have been using something called hard carbon in place of graphite, which has a random structure more like soot and thus (some) bigger holes, but it's an inherently uneven material and progress with it has been very slow.

But now there's a new anode substrate that can hold sodium, which you can literally make from air (it's entirely carbon hydrogen oxygen and nitrogen, basically seven rings stuck together into a pointy triangle that has three niches positive ions cling to because of the nitrogens there, and the manufacturing process they're using starts with corn or rice). Unfortunately when they charged and discharged the prototype battery a few times the molecules still tended to move around too much... so they polymerized it to hold them in place and got a sodium battery without significant degradation after 50k charge cycles. That's kind of awesome.

There's been lots of other activity around sodium. Lithium's scarce and increasingly expensive, but sodium's half of sea salt: 2/3 of the earth is covered a mile deep with commercially viable "ore" for the stuff, we literally can't run out. (And the USA already produces 11 million tons of chlorine annually, so the other half of the salt will presumably find uses.) It's not quite as energy dense as lithium (fewer watts/kilogram) but for municipal battery banks and home battery walls and such that's irrelevant (size/weight aren't limiting, cost is), and an app-summonable self-driving-car with a 150 mile range instead of 200 miles would make little difference (that's still almost twice around the entire houston beltway, which is like 3 hours of highway driving which is _many_ hours of stop-and-go, and the vehicle goes to swap out the batteries between passengers so no human is incovenienced).

P.S. Diesel locomotives have always been electric (the diesel generator powers electric motors) because no combustion engine has the torque to get a stopped fully-loaded train moving. The number of cars a locomotive can pull varies (with incline being a major factor; uphill is harder), but a rough estimate looks like about 25 cars per engine is reasonable. In that case, having even 4 or 5 battery cars after each engine (the way steam locomotives used to have coal cars) isn't a big deal as long as they're cheap enough, and an entire shipping container filled with sodium batteries would hold a _lot_ of power (and be quick to swap out with shipping container cranes, probably without even unhooking the cars).


December 23, 2019

Your regular reminder that land does not vote, people do. (And that's why the electoral college _existing_ is just another form of gerrymandering.)

Jeff's and my talk at linuxconf.au was accepted (as usual I submitted 4, they chose "Why we're doing J-core" in the Open ISA track). We're thinking we can convene in Japan on Jan 8th and then fly to Australia for our talk on the 13th.

Ok, toysh testing: my previous txpect design won't work because it was specifically matching stdout and using a blank string to mean "anything on stderr". I think what I need is more: txpect name command 'E$' 'Iecho hello\n' 'Ohello\n' 'E$'

Oh, _this_ is fun. I've hit a hang in bash where "wait" is uninterrupible (ctrl-C and ctrl-Z both ignored), and of course there's no timeout on wait the way "read -t 2" does? (Sigh, some sort of bug in bash again. I'm not here to debug bash, I'm working to replace it.)


December 22, 2019

Oh good grief, if I run "timeout 10 bash -c 'X=0; while true;do sleep 1;echo -n $X;X=$(($X+1));done | tee'" with debian's tee I get "012345678" but with toybox tee I'm getting "001122334455667788" and I don't understand WHAT'S going on here because I shouldn't be getting anything? (Context: that shell snippet prints out a number, waits a second, prints the next higher number, rinse repeat, all on one line with no newlines. It's my simple tee smoketest: does it print data as it gets it, or wait until it reads "enough" data? Yes 0-8 is only 9 entries, not 10, but the shell loop logic takes nonzero time to run so an accurate timeout should kill it JUST before it would print the tenth digit.)

I was trying to test whether the setlinebuf(stdout) Elliott added to main.c broke tee so it didn't show output promptly, but it turns out it's already broken? Ah, it's because I didn't give tee any arguments, so the lib plumbing is supplying "-" as the default argument, except it's ALSO appending stdout to the list of things it's writing to. I need to record that it's already writing to stdout and not add it again. (Why has nobody found this bug before me?) And while I'm at it, there's no tests/tee.test and I should have at least this in there, on the "fix a bug add a test" principle...

Meanwhile I got the next lump of sh.c changes more or less ready to check in, but again it's all thrashing with no progress indication if I haven't got TESTS (so much has changed there's likely to be regressions), so I finally wrote a "txpect" test function... and it works fine with busybox ash. But bash is DEEPLY BROKEN when it comes to interactive shells.

The problem is, bash goes above and beyond to try to get a tty: if stdin, stdout, and stderr are redirected to a non-tty device, AND you setsid() so it's in its own session, it will open /dev/tty and read from there. It does this EVEN WITH "-s" WHICH EXPLICITLY TELLS IT NOT TO. (Bug!) So I implemented a "detach" wrapper command that does "setsid(); ioctl(open("/dev/tty"), TIOCNOTTY); xexec(toys.optargs);" to make even /dev/tty not be a tty for this process, and what bash does THEN is write the $ prompt to stderr instead of stdout! (After spitting out two lines of complaint to stderr also, so I can't even do half-assed tests glueing stdout/stderr together and treating them as the same for these tests.)

That seems like a second outright bug: I finally get a big enough stick to keep bash away from the tty, and it gets stdin/stdout confused...

Aha! --noediting tells bash not to use gnu/dammit readline. Which I _think_ means I don't need to use detach anymore, but it's still outputting the prompts to stderr when there's no tty?

$ echo -e 'echo hello\nexit' | PS1=\$ bash --noediting --norc -i | cat
$$hello
exit

(And for some reason the shell is printing "exit" to stderr when it exits. Whether from the "exit" command or EOF on stdin?)

Wait... bash is _always_ printing the prompt to stderr, not stdout? Good grief, posix actually says to do this (section 2.5.3 shell variables, in the description of PS1, it gets written to standard _error_). And busybox ash, to this day, still isn't doing that. (But bash and dash both are, just "bash -i 2>/dev/null" and there's no prompts but ls -l runs and exit gives you the old prompt back.)

Well, that ate a day.


December 21, 2019

I'm told twitter continues to suck.

It's always weird when I stumble across myself repeatedly quoted in something, but in this case it turns out to be something broken I can't read. (A PDF of somebody's thesis, where the second half of each page seems to be blank? Some kind of scanner mishap maybe? Is chrome doing OCR of PDFs now?) Still: if you want to be remembered years later, sociology beats technology every time. People remember the people stuff.

Hey, the kernel broke kernel threads! They now have kthreadd (pid 2, which is apparently magic now) as their parent, so they're not recognized _as_ kernel threads anymore by toybox ps. (No square brackets.) Sigh, throw it on the todo heap...

Speaking of the todo heap: toybox's ulimit is always reporting bytes, but bash's has variable units, for example ulimit -p is 512-byte blocks. Mine is consistent, but not compatible? Hmmm... I just tried the Defective Annoying Shell and the ulimit _there_ said "63732" which is... just wrong? (bash gives 8*512, toybox gives 65536, dash gives 1804 bytes less than 65536? What?)

Anyway, that explains why I did _that_. (This is why I blog! I won't remember all the details when I need to! Trail of breadcrumbs...)


December 20, 2019

Elliott posted another android sitrep and I'm going "guilt guilt guilt!" because I've been focusing just about exclusively on toysh (don't have much to show for it but that's how this command goes; backup and redesign, rinse repeat).

I checked in the patch.c fuzz factor support. It looks done, doesn't seem to have introduced obvious regressions, and I haven't got patch tests. Added basic patch tests, but it's like 1/4 of what's needed. (And demonstrates that you CAN do multiline hunks easily with the current test plumbing; I thought it should work, just hadn't bothered?) There are tons of old corner cases and observations I need to make tests from...

An interesting side effect of having my blog caught up is judging the posts I write more harshly, because now there's a nonzero chance of somebody other than me reading them: "this isn't interesting enough to force anybody else to read..." Except 1) it never was, 2) I'm not forcing anybody. My solution to impostor syndrome has always been "most other people suck", and _trying_ to get people to steal my ideas and do them so I don't have to (which they seldom do, and I am sad). A corrolary of the whole BS Jobs observation is that the baseline competence/relevance level of the US workforce is presumably subterranean. My solution to problems may be glued together from popsicle sticks, but it's a solution. (Duct tape: woo-oo.)

Alright, toysh redirect logic. The PROBLEM is that redirects nest (if true; then echo hello | cat < walrus; fi >&2) and the context needs to be inherited by individual commands, with some of it persisting and some of it getting cleaned up each time. And I'm doing vfork() so I can't finish this parsing _after_ the fork, it's gotta be before (partly because it's just creepy to malloc() after vfork(), partly because {var}<file assigns a value to $var, and partly because the shell distinguishing "parsing had an error" from "the command we ran had an error"... it all gets funneled through an rc code ranging from 0 to 255 with the top half meaning killed by a signal (unless I want to do something unconscionable with pipes: I have a black belt in disgusting solutions and know when Not To Go There).

So I separated the "parse redirect" and "perform redirect" stages, with a linked list of arrays recording the todo list in between, caught most of the errors in the parse stage, and had the new vfork() child traverse the list to just do a lot of dup2() and close(): the minimum that _must_ apply to the new process context. (Which has its own file descriptor table, see "man 2 clone" section CLONE_FILES.) Which had two problems, the first of which is it can still error (<&37 when fd 37 isn't open, but if you 37</dev/null first it's _not_ an error, it's the filehandle state when we perform the operation that needs to be checked, and that's dynamic).

The OTHER problem is that not all commands vfork(), some are builtins that _must_ run in the original shell's process context (otherwise chdir() and environment variable assignments and set -o pipefail and such get discarded), and THOSE can have redirections ("read i" reads from stdin to set an environment variable, that stdin can be redirected). And those redirections must be done in the parent process and _aren't_ cleaned up by the child process exiting, so need to unwind them. So I added an unwind array that would dup2() each filehandle I was about to redirect to a high (>10) filehandle so it could restore them later... and went "well if I must have that codepath in the parent anyway, it might as well handle all the cases".

So, rewrite: the current plumbing is pp = expand_redir(arg, assign_env()); then cleanup_process(pp); which calls cleanup_redir(rd); And we get INTO here (from block context) because run_function() calls run_pipeline() calls run_command() calls expand_redir(). Which are passing down a struct sh_redirects pointer-to-pointer (because fiddling with a doubly linked list can reallocate the HEAD node when the list empties or was empty). So DON'T pass that down, instead have expand_redir() just _do_ the redirects (closing its own temporary fds), assembling a simple int * list of the from/to pairs the cleanup function needs to dup2() and close().

Ripping out a LOT of code here, but that's generally a good sign. (The irritating bit is it's code I never got to work, and if this approach doesn't pan out I'd need to put it all _back_...)

P.S. In theory I could eliminate the child callback that closes all the high file descriptors by using FD_CLOEXEC, but if I'm dup()ing them _back_ to stdin/stdout/stderr I don't want _those_ to cloexec next time, and I'm unsure where exactly that attaches to or how it's inherited. (Experimenting with that is a todo item.)


December 19, 2019

Fade arrived shortly after midnight. Family together for the holidays.

Do I really need a third status bit on the redirects? It seems like what I need to do is not _record_ the redirect, but instead perform it immediately (in the parent). There is a case this wouldn't work for, of course: "123<&2 {abc}<&123" but I'm not entirely sure if that's guaranteed to work anyway? (Are redirects guaranteed to be evaluated left to right? Hmmm. Making that work requires liveness analysis.)

Ok, "echo {abc}<<potato walrus; echo hi; cat <&$abc" should print walrus, hi, and potato in that order. The behavior difference with {} is that the high filehandle allocated for << is assigned into {abc} and left open in _both_ the parent and the child.

Sigh. Ever since I realized I need the "undo" logic, I've been trying to avoid rewriting this whole mess to do everything in the parent, vfork(), and then undo it again afterwords. (Basically treating every command the same way as builtins.) I was worried that it's more brittle and less efficient, but A) if signal handling is "note it and check later" then the recovery isn't significantly less brittle, B) the system calls have to be done either way, it's just a question of whether they're done in the parent or in the child. And the parent _waiting_ for the child is the common case, so moving them doesn't speed up the common case. And I _do_ need the undo logic for the builtins, so it's a codepath that has to exist, so I might as well have that codepath do everything.

But it's a large rewrite of code I haven't actually had a chance to test yet. Oh well, rip it out and do it over YET AGAIN. It's probably smaller, and it does mean I'd be able to to tell immediately if <&25 is an open filehandle or not, for non-deferred error reporting. Hmmm, and the add_redirect() logic I've done might mean I can just replace the function body and not the plumbing calling it? Except no, now it can report errors... And I _do_ still need the xpopen() callback, except now it's for closing the high filehandles I saved displaced filehandles into. (The child and parent both still have to do cleanup, it's just _different_ cleanup.)

And an advantage of this is I don't have to pass a list of redirects through multiple functions: each can do (and then undo) its own redirects using the simple int array representation I started out with way back when.


December 18, 2019

So {abc}<&- reads $abc and closes the filehandle number in that, but {abc}<&2- duplicates 2 into a high filehandle (10 or greater), writes the fd number into $abc, and then closes 2. And the odd thing about these is they persist after the command that ran them, even without "exec".

The logic I've written _sort_ of covers this case (although I had to squint at it a bit to see how it fell through the first case to do that in the second case), but the "persist when it shouldn't persist" is something I haven't got signalling for how do you write TESTS for this?

(Elliott pointed me at some test infrastructure written in perl, which inspired him to write a version in C++. Sigh. Ok, what he seems to like about it is the file format. Back before ESR went crazy he and I argued a book with a chapter on text-based file formats, and I'm all for having a different file format, and even a tool written in C to parse the file format and run the tests. I'm just not convinced this one covers the use cases yet?)

Ok, the funky part about {var}<file redirects is that they persist past whatever created them, while "3<&1" doesn't even though it's a filehandle that wasn't previously open. (Filehandle saved in variable: caller's responsibility to close it, thus caller has control over lifetime.) Which means I need to explicitly signal this state so the cleanup machinery can deal with it. And the _funky_ part is 37<&2- doesn't leave 37 open even though it's in the "high" filehandle territory, so it's not a >9 test it's a third bit. (Can't chord 2 and 1 together to indicate this either because 37<&2 and 37<&2- need to be distinguished, which is currently bit 1. I mean I COULD punt this and just leave anything >9 open and I doubt anyone would ever notice, but _I_ would notice. And some script somewhere is bound to go weird. So, third signal bit it is. Still leaves 28 bits to store the filehandle (268 million), I'm pretty sure we're ok.)

(And yes, editing this in vi I have to type <& each time as &lt;&amp; (and had to type _that_ as &amp;lt;&amp;amp; and had to type THAT as...) which is the kind of thing I used to let build up and have to fix in the "editing pass" when my blog entries were written months before being posted. I still need to edit and post the second half of 2016...)


December 17, 2019

Sigh, I got an email from gmail saying I won't be able to use it without oauth after February 15, which means I won't be able to use pop3 instead of imap to download my email, which means I can't use gmail anymore. Given how feral its spam filter's gone, this isn't a big loss, but I hate having to do admin stuff. (Alright, let's see what's involved in getting dreamhost email servers to work with my domain...)

I'm frustrated that even after the Russians successfully flipped another UK election, despite everything Nancy Pelosi is still waiting for the upcoming US election to fix everything. With everything gerrymandered, the FEC's budget zeroed out and run by people intentionally trying to sabotage it, bulk deregistration of democrats in georgia again this week, the federal courts packed with cretins, and digital voting still ubiquitous. (Meanwhile, the under 25 crowd seems to be planning to march on washington on tumblr.)

I'm worried that if Xi's nationalist ambitions manage to isolate china enough it might restrict solar panel distribution enough to push back the collapse of the liquid part of "solar is killing fossil fuels in solid/liquid/gas order". (There's a $1.4 trillion global industry trying everything it can to survive, and they've been starting wars over oil for a century, I didn't _used_ to think china was dumb enough to get played, but no "dictator for life" can avoid being surrounded by vacuous sycophants for long.) I also took some solace in "a cult of personality is not transferable", but if the GOP manages to deepfake enough speeches to pretend the Resident's not in a wheelchair after January long enough for VP Pence to get a term, at which point by-then-vegetable #45 can have a sudden "stroke" so they don't have to admit they've been propping up a dementia patient for years, "oh no this is sudden and new, couldn't have forseen it, not our fault"... if that happens I'd give even odds China would use the opportunity to invade Taiwan, and I can't predict ANYTHING after that.

When oil ends Russia's economy collapses. They're pushing gas as hard as they can, but as far as I can tell they still make far more off oil than off gas, although I'm not sure how to read those numbers. (It could be 5x as much? oil+diesel+petroleum products vs natural gas? Do those categories overlap? Are "liquid fuels" LNG?) Either way 60% of their export income is "energy" (including still coal and electricity), if they lose half of that to solar/wind/batteries they can't feed themselves. Note that whenever the energy share of their budget does go down it's because the absolute value of their income went way down because of declining oil prices. (It's never that the rest of their economy produced _more_ money. They've spent a century trying to improve that, with zero result.)

There is still good news. And the USA does have domestic solar panel manufacturers. And technology continues to advance. (This smells like silicon-on-insulator applied to solar cells, and there's an annual sodium battery conference.) The Dorito is dying, Pelosi's 79. McConnell is 77. Biden is 77, they CAN'T last that much longer. The power of the Boomers is waning, this entire episode is probably a prolonged "end of life rally". (Which is not a new thing, shakespeare wrote one for Desdemona in Othello.) The whole hairball of fossil fuels, Russia, The GOP, the Boomers, confederate nazis, billionaire capitalism, all of it can go bye-bye together. We can keep working towards that, outlast them, and get the "basic income with replicators" future Star Trek promised 50 years ago. We've already driven the publishing cost of information to basically zero. Solar power, self driving cars, admit that 90% of the modern economy is BS jobs nobody needs to do, guillotine the billionaires cornering the real estate market... When the Boomers die, we can unwind every sad mysoginist racist consumerist assumption they unquestioningly obeyed until death, and shovel out of the mess they left. (We'll kind of have to.)

But the Boomers aren't dying fast enough to preserve a habitable climate. Part of the reason I moved to Austin in 1996 is we're 200 feet above sea level, but back then we didn't have a monsoon season with 120 degree summers, nor did we have fire ant supercolonies _downtown_ in Austin.


December 16, 2019

Oh this looks like fun to implement:

$ echo hello | { read i; echo $i;} | { read i; echo $i;} | cat
hello

That's a 4 part pipeline, the first and last parts of which are independent processes generating and consuming information at their own rate. In between are two locally executing blocks (NOT subshells) each of which runs a blocking builtin operation in it.

The way I've _currently_ implemented this stuff, the builtin will run (and block) before it advances to the next pipeline segment and runs it, meaning the "cat" won't get _launched_ until both reads return. And I'm pretty sure that's wrong. Hmmm. The curly brackets are there because "read" just sets an environment variable and you can't see it do anything without then using the environment variable. Although it's the usual context switch fun of "can't evaluate the environment variables in the next statement after the ; until the previous statement has finished executing", which is basically what's conflicting with the | logic here. You have to iterate over all the | and launch their segments in parallel to create the pipeline, but don't advance past a ; until the previous command completes (not just launching the process, but redirect files don't get opened/created, etc).

But sequencing aside, every pipeline segment is a subshell from the parent shell's point of view: "echo hello | read i" does _not_ persistently set $i in the host shell. Which means I need to implement the subshell plumbing (a FLAMING PAIN in nommu context; gotta marshall local variable and function definitions into the new process which had to do an exec() after vfork() to become an independently running PID with the parent PID also running, so all that data has to go through a pipe or the environment.)

I previously wrote half an implementation that marshalled data through a magic environment variable (with both the parent and child PIDs in the name so you really can't do it by accident, and it would be a pain to try to exploit via "this web page CGI lets you set environment variables" exploits). But I'm also thinking "pipe" might work since bash is already using fd 255 as magic (I think that's where it stashes a copy of the controlling tty?) then I can feed read-side pipe fed into the child process, _and_ I can have the child fstat(255, &st) and S_ISFIFO(st.st_mode) it before trying to read context from it, and again send the parent and child PIDs across the connection before trusting any of the other data.

But the problem with pipes is the child can hang and that could hang the parent (pipe buffer full). Then again, environment variable space can get exhausted with the 1/4 stack size limit, a hang's not so bad because that can happen any time you "read i" without input waiting.

Aha, _that's_ why the implicit exec logic exists, it's for pipeline segments: "echo abc | read a" doesn't have an sh instance subshell around read, but "{ read a; echo $a;}" in the same place would.

Hmmm, I have layering confusion here: run_function() calls run_pipeline() which calls run_command(), but pipeline segments aren't commands, they're segments which are basically self-contained functions:

echo hello | while read i; do echo -=$i=- | sed s/=/@/g ; done | cat

So run_pipeline() has to call back into run_function() which is going to redundantly parse the block, which implies that actually run_pipeline() should be _integrated_ into run_function rather than a seperate thing it calls... except that the subshell logic means you _do_ want to call back into run_function() because it's going to be doing it from a different PID. So NOW the question is how does the outer call to run_function() know to advance past that block so it won't get run twice? This is related to && and || parsing. I need to handle | at the same level, and I guess that means it does merge into run_function()? (Note that the middle | above is not part of this pipeline, because it's not at the same block level. It's handled by the subshell. This would be SO much easier if I could rely on fork(), but no, this is supporting nommu.)

This also wouldn't take _nearly_ as long if I didn't have to keep changing the design every time I figure out the next chunk of logic and redoing code I've already written to get that far.


December 15, 2019

Can we just put Finland in charge of everything? Seriously, they're doing it right, and have been for a while.


December 14, 2019

There is, of course, more shell continuation logic I haven't implemented yet. Specifically, "echo hello &&" on its own line needs to prompt for the next line before saying hello. And "echo one | cat ;& hello &&" needs to syntax error on ;& before running anything, _or_ before prompting for the next line of continuation. (And if you put a space between the ; and the & it changes to "unexpected token &", which is a different error: ;& is a special token used in case statements, it's part of the ;; family. And if you go ; > & it still complains about unexpected token & but now for a different reason, which I have logic for but at a different layer.)

Ok, the "nothing after a redirect operator" check needs to move to parse_line(), and the block parsing logic (matching up if/else and asking for continuations) also has to pass judgement on control operators, because all the ones starting with ; except ; itself are only valid in a case statement, and there's magic around single ) there too.


December 13, 2019

Not Biden. Not Buttigieg. Basically anybody the Boomers like is a bad idea. I used to think we're all just marking time until AOC can run and more Boomers die, but then she endorsed a 78 year old to fix all our problems and now I'm waiting for oil demand collapse to drive Russia into poverty (that's something like 6 years, the LD50 on the boomers is still 14 years away).

I've definitely reached the point where to advance toysh I need to turn my pile of tests into a test script, and I can't just write a normal tests/sh.test and run "test sh" for several reasons. In addition to yesterday's "expect" woes, "make sh" currently only makes sh, but none of the builtins, including "exit". The problem is it disables the multiplexer, so toysh's attempt to call the builtins (thorugh the multiplexer infrastructure) doesn't work. (And then there's the MAYFORK commands elsewhere in the tree, which should be built into the binary too.)

Maybe the not-expect semantics I want are more like like:

expect 0 "$" "if true\n" ">" "then echo hello\n" ">" "fi\n" "hello\n"

So it knows the shell should exit with rc 0, waits for "$", sends "if true\n", waits for ">", sends "then echo hello \n", waits for ">", sends "fi\n", waits for "hello\n", and then expects it to exit (with the rc 0 from before). And a 5 second timeout at each wait considered a failure if it didn't get the input (or exit). And a "wait for this output" of "" should wait for anything on stderr. (There should _be_ error output here, but don't try to match it because error output varies.)

The NEXT shell testing problem is some things want a tty. Making a pty master/slave is a whole can of worms I haven't opened yet. (I was going to do a "screen" and use that as my learning excuse, but there was bikeshedding so no.)

I've started writing a test tool command in the toys/examples directory. Dunno if I'll keep it yet, or how to fit it into something that handles _all_ these issues. (Or how many different test tools I need to write?)

Sigh. The problem is the shell isn't exactly a command, it's a language. My command tests aren't enough here. And testing shell _in_ shell (feeding weird shell input in from a _different_ shell) is painful, it trivially gets escaped/evaluated at the wrong layer while being marshalled along by the plumbing.


December 12, 2019

I'm told Buttgeig's support is mostly among Boomers. They're peeling off Biden and supporting him instead. On the plus side he's young (37) and gay, but those appear to be literally his only redeeming qualities. He's still a white male whose friends are all rich, a republican in democratic clothing, and somehow disingenous on _top_ of that. He would emphatically _not_ declare billionaires a game species, which is pretty much where we needed to be about 10 years ago now. (Sigh, I should get an rss feed of just the "programming" tag set up. I used to vent this on twitter until twitter went feral about phone numbers for advertising tracking.)

Checked in the next round of toysh stuff. It's the actual redirect performing logic, including for here documents. All I debugged is that the shell can run "ls -l" and "exit", I haven't tried to do any actual redirects with the new plumbing yet, but as long as it's not causing obvious _regressions_ it's worth checking in so people can see it. I think the next big bottleneck is I need to tackle the test infrastructure. Testing shell corner cases _in_ shell is hard, your test breaks on fancy quoting just trying to marshall the data into the other shell instance. (And is it always "sh -c" or "echo | sh" or "sh file" or what?)

Hmmm, I kind of want shtest "command" "stdin" "stdout" "stderr" "rc" run through sh -ic so I can do:

shtest "echo hello" "" "$ hello\n" "" 0

Ok, what tests do I want to run? I've got 8 gazillion of them but categorizing them would be helpful. They're all "easy to run manually, hard to automate". The line continuation stuff cares about being prompted for > instead of $ which means sending a line and waiting for a result, which is "expect" territory. I want to test rc each time (explicit 0 when I expect that). I want to be able to say 'VAR=abc echo ${VAR}a' without having to escape the $ (which "sh -c 'VAR=abc echo \${VAR}a'" requires, and otherwise ' inside another ' is awkward, but " inside ' means it might get parsed at the wrong level of nested shell invocation? I don't so much care _what_ I get on stderr as that I get something on stderr at an expected point. Possibly containing text I fed into it, which is regex territory.

And if I'm going to test something like:

$ if true
> then echo hello
> fi

I need something more like expect, except I've always found tcl/expect a pain to use. And faking it up in pure shell is a pain. The problem isn't output, which waits for us if I just redirect it to a persistent high filehandle. The problem is _input_, as in where do I redirect it _from_? Years ago I did this through FIFOs (why do I always wind up playing with fifos when I'm sick?) but the result was brittle and fiddly. Hmmm.

I might have to make something in C. That would suck, but I could do it. I refuse to add a tcl dependency even to the test suite, no not even when it's built into a standalone expect binary. Any tool that gratuitously includes an entire programming language is a can of worms I'm reluctant to open, the complexity is unbounded by definition and people approaching the project will either have to come up to speed one a whole new domain or will try to introduce complexity from that new domain for seemingly minor changes. People give me enough grief about my extensive use of sed, which is posix and the basic regex syntax is in libc and shared with grep and vi even stuff like perl so there's a reasonable chance people already know it, and learning doesn't help them with just _this_ project.

Maybe what I need is a general expect tool that _doesn't_ involve tcl, added to the toybox command list, except I dunno enough about the existing syntax for it (seems like it _requires_ a script file when I want a command line set of challenge/response things with at most a "sed" level of command complexity; you _can_ have sed scripts but I've never personally bothered and just do a lot of -e on the command line with backslashes), and I'm reluctant to reinvent the wheel when there's an existing de-facto standard wheel (even if it's one I personally am unwilling to use because of the attached can of worms).

Extending sed to do expect would be a terrifyingly bad. I should not do this thing. (For my shell case I'm matching $ and > and such prompts that don't have newlines after them, that's not how sed thinks...)


December 11, 2019

I still have enough of a cold I just named a variable "noforg" to track whether toysh HERE documents should undergo variable expansion. (It's a nested muppet movie reference: "we take your friend, the little F-O-R-G" and "don't you frogs expand?" I try to not to rely on people reviewing my sed source code having read Roger Zelazny's Amber series and so on, but when I'm tired it's how my brain works.)

I think Devuan has a cron job deleting the Wicd wireless demon's remembered networks. I keep having to tell it my phone's wireless access point password (and click the automatically connect thing) something like twice a month? It survives a reboot but _times_out_. (And this time it remembered the "automatically connect" checkbox, but had forgotten the password, and then when I refreshed it had forgotten both.)

Linux on the desktop! Smell the usability!


December 10, 2019

Ooh, I just put together another one of those 3 part stories: part one is this excellent blog post from Brad Hicks about how FDR's New Deal really worked and what it actually did, part two is this 5 and 1/2 minute interview with David Graeber about the interaction between creativity and basic income, and part three is this 6 minute Michael Lewis interview about what the "inessential government workers" furloughed in the last government shutdown actually do. There is a DARN clear through line, isn't there?

(And yes, the recent links about Pete Buttgeig's means testing work in here too. And I wrote a longish analysis on this topic before. And if you really want to go down a rathole, I have a youtube playlist of Mark Blyth's economics talks, David Graber's BS Jobs stuff has spun out from the original article into a whole constellation of stuff, CGP Grey's humans need not apply is still a good case that we're once again automating away buckets of jobs we used to need people to do and now don't...

And of course this is also why self-selected open source developers are more motivated and persistent, which looks like more productive in the right light. The 3 waves stuff isn't exactly new. I still worry about the resource curse being triggered by enough automation, which once again highlights the need to guillotine the billionaires.


December 9, 2019

Sick. In bed more than half the day. Sounded like a frog until about 8pm. (It's not _quite_ laryngitis, or pneumonia, but it's trying. Alas I've already slept to the point of headache and need to be upright for a bit.)

Still waiting out the boomers.

I really _really_ want to get the next chunk of toysh checked in (finishing the redirection logic), but it has to work first. It's changed design midstream a half-dozen times now and I need to do a pass over it all to make sure it's coherent. But _first_ I need to work through the design contradictions until I've got something that handles all the weird corner cases.

Alright, here's a fun one:

$ { echo; } {abc}<<< walrus
$ echo $abc
10
$ cat <&10
walrus

Obviously it wrote "walrus" into a pipe buffer and closed the sending side of the pipe, but the receiving side was still open for "cat" to read data out of. First problem: what if there's too much data to write into the pipe buffer? Will the shell hang trying to write? Should it launch a persistent background task to write the data? (I am not introducing threads for this, but it's darn awkward to do that with vfork.)

(Note: you need to "exec $abc<&-" to close the fd or it just stays open.)

But what if it's too big to write entirely into the pipe buffer, but the command is a builtin so we don't fork, what exactly is supposed to happen? Hmmm... Ah! ls -l /proc/$$/fd/10 says it's a link to "/tmp/sh-thd.FemSmx (deleted)" which means bash is making a /tmp file, opening it to get a filehandle, and then deleting it. Which... ok, I guess that's easier? This was also explaining why HERE documents weren't working on android msh where /tmp wasn't writeable to the terminal app. (In 10, back in M it worked fine. They keep "securing" the system so my old apps don't work, but compatibility means nothing when they can excuse any and all breakage as "security". Sigh.)

I guess bash does document TMPDIR as the variable that says where to create temp files? I can do that.

Note on the comment "Match unquoted EOF": I'm skipping escapes in the comparison function instead of creating an unescaped copy because object lifetime tracking is horrible and I don't want to make sure we don't leak the allocation in error paths. The run-side stuff has a "delete" linked list it can add string allocations to so the cleanup code can call it from anywhere, parsing doesn't. Yes it's slightly slower, but HERE document parsing really isn't a fast path and the common case is comparing against a smallish EOF string that's going to fail to match in the first character or two anyway.


December 8, 2019

And I have changed the title of this page to say "2019", with 3 weeks left in the year.

Crunchyroll still doesn't work on my phone (15 seconds of video, 30 seconds of loading, when it doesn't hang entirely), and watching subtitled stuff on the TV isn't comfortable, so I found a playlist of "ascendance of a bookworm" on youtube and I'm watching that. (I'm paying for it anyway, we have the actual "take my money" _subscription_ to crunchyroll. I've TRIED to watch it legitimately. But the unofficial means remain easier and more reliable.)


December 7, 2019

Still struggling with redirection logic in the toybox shell. Builtin commands run in the shell's process, so their redirections will persist if I don't clean up after them. (The "exec" command is special cased _not_ to clean up after them, but all the others need to.)

I think the problem I'm having is I'm recording "<&-" (explicitly close an existing filehandle) the same way I recorded other redirects that open a high filehandle and then dup2() it down later. The problem is, those explicit closes need to be backed out again for builtins. Using the same close mechanism for "< /dev/null" doing "fd = open("/dev/null); dup2(fd, get_highfd()); close(fd);" and then later "dup2(highfd, fd); close(highfd);" loses information: specifically that the FD we're closing in that second pass isn't one we created but one that was already there.

I went down a cul-de-sac of special casing stdin/stdout/stderr (saving/restoring just them) but I don't think that's sufficient. I need to annotate the explicit closes and back them out too. Which is related to the problem that <&37- can't error in the parent process because until I've _done_ the redirects (which are deferred until after the vfork) I dunno if that specific filehandle is open at that point. (This would be so much easier if I wasn't trying to support nommu, but I am.)

The other thing is I have a list of redirects from enclosing contexts ala "{ read i; /bin/echo $i } < /dev/null", and if the builtin "read" closes the saved /dev/null it's not available to set up /bin/echo. But if it _doesn't_, "{ read i; } <&-" ignores the explicit close redirect. (This isn't a problem for child processes, which have their own fd context and can do the full redirection with closes each time.)

Yeah, I need to annotate explicit closes so they can be handled differently in builtins.

SO much work which nobody else is going to notice because it doesn't result in code. The resulting code should be fairly small and do one thing, but I need to try 30 things that don't _work_ to get there...


December 6, 2019

Remember how the finance industry is curtailing the expansion of natural gas? Turns out they're forcing an end to coal as well. Interesting watching the forces at play here. (Diplomacy has always been "let's you and him fight".) Also, Not Buttgeig (seriously), and not Biden. We can do better. And remember when the AMA limited the supply of doctors, drastically increasing US healthcare costs? The American Nurses Association is doing it too. Oh, and when the Boomers die and the GOP goes the way of the Federalists and Whigs before it, we need to not only abolish ICE, we'll need a new round of neuremberg trials punishing the guilty. (But there are so many Boomer assumptions that need to be thrown out as soon as they die...)

(Don't mind me, I used to vent this sort of thing on twitter all the time.)


December 5, 2019

Took the cats to the vet. $700 and 3 hours to maintain the status quo.

Means testing is terrible and Pete Buttgeig needs to exit the race. (Or declare as a republican.)

Sigh, I've been putting <span> tags into my blog for years, but have yet to modify the rss parser (a small python script) to actually do anything with them. In theory I could have an rss feed that's just the entries (and sections of entries) with the "programming" tag.

For interactive shells, bash keeps fd 255 open to the tty (ls -l /proc/$$/fd), but for non-interactive shells it doesn't. Also, trying to _test_ this I found that bash -c 'command' has an implicit "exec" when it's a single command, but not for a compound 'echo;command'. Anyway, I suspect what it does is clone stdin/stdout/stderr back from the saved tty fd and that's all the cleanup it does for builtins? Except that suspicion:

$ bash -c 'echo -n; true 3</dev/null; ls -l /proc/$$/fd'
total 0
lrwx------ 1 landley landley 64 Dec  5 13:07 0 -> /dev/pts/60
lrwx------ 1 landley landley 64 Dec  5 13:07 1 -> /dev/pts/60
lrwx------ 1 landley landley 64 Dec  5 13:07 2 -> /dev/pts/60

Doesn't match the evidence? Hmmm... Well I don't (yet) see why I shouldn't just do that. (Modulo what do you do about filehandle exhastion?)

Sigh. Back in 2006 I found out that "read i" can't be suspened with CTRL-Z, but "(read i)" can (I don't have vforkable subshell support yet, that means marshalling local and function data into the new context). But in the normal case it runs in the current shell's context, blocking the shell until it's done. You can interrupt it, but not suspend it.

So given that, how the HELL does "read i << EOF" work? They just about have to be special casing HERE documents with builtins because "make a pipe and have the parent/child read/write into it" ain't a thing here.


December 4, 2019

Hmmm... How much of the leaking in the US oil pipelines is due to theft of the contents?

Yay, email from Elliott on the toybox list. I was wondering if I should send an email just to make sure the list is still working. Ever since I gave up on twitter and started spending more attention on my blog again, I've been posting stuff here that could have gone to the list instead.

Speaking of, I backed out the start of the nofork redirect support. How do you recover from filehandle exhaustion? (If you run out of filehandles, it's quite possible I _can't_ put it back because I don't have a filehandle move primitive, just dup2() and close.) And if once I have unredirect logic why not use it for everything (instead of opening files in the parent process and dup2()-ing them down after the vfork; conventional shell would fork and do all the redirect logic in the new child, but I want to minimize the work done in vforked() but not yet execed() context, but it's not going to be _less_ filehandles to save the displaced ones than to have the whole new set of high filehandles I dup() down to the proper numbers later. What vfork() would do is _close_ the extra ones.

But I kinda want to get the code that's there to work before rewriting it. Especially since I may fly to japan tomorrow (or may not, I still haven't got _tickets_ and there's now a 10:30 am vet appointment to figure out why George is barfing so much). I have enough backed out large rewrites already because I stopped halfway through and couldn't reestablish confidence that I understood in full detail what I meant to do when I get back.

I added a new xpopen_setup() layer (with xopen_both() being a wrapper around _that_), which calls a callback() function from the vfork() process before exec, and was mulling whether switching to unredirect() meant I should rip out that lib/xwrap.c change again, but I still need the callback to do the environment variable changes (test: a=b env | grep a=).

I also have a bunch of lib/env.c changes I did for the environment variable resolution plumbing patch I backed out: I reverted the sh.c changes but not the lib parts, because they work well enough to be ignored for now. (I got it compiling again and not having regressions.) The plumbing changes let you specify which char ** to work on (with a wrapper to provide "environ" so existing callers don't have to all change to provide the extra argument; you may notice a pattern here).

Hmmm, so I've got a struct redirect_list() and I broke out free_redirects() into its own function, which closes the filehandles and frees the struct, but I wrote the function chain so the first caller allocates the structure and frees it on return, and the error handling path wants to close the filehandles and free them also. Object lifetime rules: always the fiddly part.

Ok, Jeff found me a flight, but has to wait for money to transfer into the right country to buy the tickets. So MAYBE I fly out Friday morning? Still don't have a flight number... nope, the money transfer from singapore to Bank of America was "unsuccessful" (as in the money went to Bank of America, which stole it). And we may have to sue them to get it back. Great. Too late to book the friday ticket, maybe tuesday...

Darn it:

$ X=42 </dev/null
$ echo $X
42

So if there's no command the variable assignments persist in the local environment, even if there's redirects on the line. But:

$ <&2-
$ ls boing
ls: cannot access 'boing': No such file or directory
$ ls boing <&2-
$

The "close stderr" command gets undone after the command, but applies TO a command locally. But if X=Y doesn't have a command to apply to we never fork so there's no new process context to contain and dispose of the redirect, and switching the order of redirect/assignment doesn't help if one persists and the other doesn't.

I'm to the point where I have to implement something that sort of works to have a base to test and reason against, and then add tests to the giant heap 'o tests so they can fail later and I can change it to do THAT part to. This is more of the nofork redirect stuff where the output of "set" gets redirected and then the redirect is undone afterwards. (Do they _just_ reestablish stdin/stdout/stderr specially?) The problem is if I go down these ratholes as I find them, I bite off way more than I can chew and don't have a series of known working checkins I can fall back to when I get pulled away for a month and then come back and don't remember what I was doing.

Ok, I broke out expand_redir() into its own function because redirects need to be parsed from two contexts: they can attach to a command or they can attach to a block (ala "if true; then echo hello; fi > filename"). Block context does _not_ parse environment variable assignments or allow commands. And bash is buggy so {X}

I _think_ I should put the environment variable assignment parsing back into run_command() because A) the above lifetime weirdness, B) it has no other callers I'm aware of. (This would be so much faster if the design didn't keep shuffling around as I try to write it.)

Meanwhile I have two objects with related but not identical lifetimes: the redir list is an int * and there's a struct sh_process it could go in, but blocks aren't a process and the redirection list is a doubly linked list of "all the nested contexts we're currently in and may need to clean up later". expand_redir() is returning a struct sh_process *, but it's _also_ allocating a redir array and adding it to rdlist. The problem is, it only allocates and appends a new rd when there are redirects, and the caller has to know when to pop it off the rdlist. Add in error paths (we may return NULL instead of an sh_process pointer) and the lifetime is just...

Ok, struct sh_process needs an rdlist **, not a pointer to the instance it added to the end of the list. (Popping the instance off the list will potentially change the pointer to the head of the list.) That's what I'm passing around from function to function anyway...

No, darn it: lifetime rules again. The problem is if I have a pipeline of commands inside one or more blocks:

{ { echo hello > one; cat } | tee two } > three < four

In that case, I need a struct redirect with stdout > open("three") and stdin < open("four"), and I need that as the base entry to which the stdout > open("one") gets later applied for the _first_ process (the echo) but not for the cat or the tee (the first's output goes to the pipe, the second's goes to three).

The problem is, I can't have one struct rdlist instance be in all those lists at the same time. It can be sequentially (I push/pop things onto the end of the list so it has the right contents when I launch each process and perform that process's redirects), but not when I'm cleaning up again for the nofork stuff afterwards.

This is why the builtin commands special case restoring stdin/stdout/stderr isn't it? Because saving the full context is complicated...


December 3, 2019

So if you have a nofork command but redirect filehandles, how should it clean up after that? It's in the same process, so it's gotta redirect them _back_ again afterwards. So I need to save the old handles and add recovery code (corrolary: signals have set a global flag and return and then we check for the flag later, which was pretty much the case for memory allocation already). And if I've got recovery code, should the main path just use it rather than stuffing everything in high filehandles and moving them down after vfork()?

First things first: what does the recovery code look like? Each move is squashing a destination filehandle, so check if that filehandle's already open and dup2() it up to an unused high filehandle to save it, fine. But what happens when you do "1<&2-"? That closes stderr (having moved it to stdin) and we only saved stdin. Is this the only case that does that? Hmmm... (|& does a 2>&1 but it's a statement separator, not a redirect operator, and it doesn't _close_ the other filehandle. And it's actually stderr getting dup2'd over there, with stdin being the source that isn't closed.)


December 2, 2019

Let's see, toysh. Trying to break out of this "sick and still somehow jetlagged" rut I've been in for weeks. I took a quarter modafinil Fade left behind, and I'm having a can of assam milk tea from HEB's international aisle. (Hopefully my blood pressure will survive, I walked a couple miles round trip to Target last night. Well the black friday sale was ending.)

I backed out the changes to sh.c I'd made over the past month. Declaring patch bankruptcy: I can't remember what I was doing before I went to japan, and reverse engineering it has _already_ taken more time than just doing a new chunk from scratch. (Although most of that was re-reading the checked in code and going "oh right, that's where I left off in the redirect logic, and that's a bug in the cleanup code...")

I think a good scope would be "resolve simple environment variables". Except there are local and global environment variables, and the set logic isn't distinguishing between them either. (What I _haven't_ backed out yet is the lib/env.c changes to let it work on an argument-supplied envrionment list rather than the global "char **environ;" (and then the trivial wrapper to take fewer arguments and feed it environ).

Fighting with rdlist[] in run_command(), I didn't properly document the data structure in it: it's an int array that acts as a doubly linked list... wait, this can't work. Type confusion between int and int *. Blah, no wonder I was having trouble understanding what I did last time, I did it wrong. (It _ran_, I remember that. Smoketest passed, but it was never loadbearing.)

Ok, break down and make a struct. Because I need two different types (int * and int, and I don't want to stick a 32 bit value into a 64 bit pointer. To be honest I want to use "short" for the type because it's never bigger than that (ulimit -n defaults to 1024, is the _shell_ ever going to care about an fd > 32767?), but I'm being good and sticking to int because that's the type here. Grrr. But yeah, struct because being clever with an array confused _me_ a couple months later.

Aha! The "help unset" test and the "man bash" description of unset (line 5789 in devuan ascii) say completely different stuff, because GNU. And the man page lists the variables that lose their special meanings when unset: COMP_WORDBREAKS, RANDOM, SECONDS, LINENO, HISTCMD, FUNCNAME, GROUPS, and DIRSTACK. (Why does "set" show GROUPS= but not RANDOM or SECONDS? And is there any way to get set to NOT show functions?)


December 1, 2019

I'm... caught up on my blog. It's live on the web and everything. (I mentioned having more time sans twitter.) I wonder if I should try to go back and edit the second half of 2016...

Huh, twitter has started a "if you don't use your account constantly we'll delete your history" thing. Are they having enough attrition they need a use-it-or-lose-it policy to intimidate people into an obligation to post? (What do they do with people who lurk? Use accounts to read a collated list, but don't post?)

I refuse to ever pay for youtube because their "we'll annoy you into paying for this" scheme (which they announced in more or less those words last year) is danegeld. But the interesting part is how many of the ads are for google stuff like chromebook, or just plain youtube red. (7 seconds of literal "pay us money to stop seeing this commercial".) Maybe half the ad space is being sold, and the rest is being consumed internally by google. Is this because of a soft market for youtube's advertising? Google/Alphabits did have a bad quarter recently. *shrug* I just mute the volume until it lets me skip or the content comes back. It's basically a malfunction where youtube shows me a different video than what I asked for semi-randomly, youtube is increasingly unreliable. WHY they're doing it is irrelevant.)

I _think_ I'm flying back to Japan on the 5th (and returning on the 18th), but I'm not sure yet. No actual flight information so far. (Apparently the tickets are cheapest to get either way before, or at the very last minute. But at the last minute, you don't know what your departure date is until it happens.)

Walked to Target and bought a tiny air fryer on sale ($30) and impulse bought a tiny waffle iron for $10. Fuzzy squeed at the tiny waffle iron (it is tiny) and made waffles with it immediately. We haven't unboxed the air fryer yet. (Fade's air fryer, which is actually her roommate TH's air fryer, is apparently $350 on sale and looks like a large microwave crossed with a toaster. Let's see what the tiny one that looks like a coffee maker can do.)

Meanwhile, we're all still waiting for the Boomers to die.


November 30, 2019

Poking at the shell again. Not quite sure what the parsing is doing here:

$ X=42
$ echo $"X"
X
$ echo $"X"Y
XY
$ echo $XY

$ echo $"
> ^C
$ echo ${"X"}
bash: ${"X"}: bad substitution

November 29, 2019

Fade slacked that she was out of surgery and ok around 1am. She's recuperating with friends.

Fuzzy, meanwhile, has the cold I brought back from Japan and spent today interacting with chicken soup and the couch. (We did not go to Stu's.)

I figured out how Line is social media: it has a "timeline" tab that's apparently like faceboot's timeline. (Which I've never used, but given how Twitter's tried to turn itself into that...) Alas, I don't know anybody _on_ Line (except co-workers). Still pulling up individual twitter users' timelines the way I pull up individual tumblr users' timelines. (Because when you aggregate tumblrs via subscription it throws in WAY too many ads, so even though I made a tumblr once (in 2013, apparently) I stopped logging in years ago.)


November 28, 2019

Thanksgiving!

Fade had intense abdominal pain this morning and went to the hospital. Yay good health insurance and teaching hospitals attached to her university so the HMO du jour doesn't screw her over with "your surgeon's in-network but the anesthesiologist wasn't, here's a $25k bill". Still a 3 hour wait for a cat scan, but they gave her morphine in the mean time. (I'd be a lot more nervous about this if she wasn't already in really good hands.)

Fuzzy and I tried to take public transportation to Stu's place for thanksgiving (carrying the matcha buttermilk pie she made with some of the cooking matcha I brought back from Japan, on top of a pile of green bean casserole ingredients), and we found out while sitting at the bus stop that the light tactical rail isn't running today. (Because who would want to go anywhere on thanksgiving? Yeah, we have to take a bus to light rail because the Nimbys near us voted down the bond to put a rail stop across I-35 from the 24 hour HEB, and the closest stop is a 40 minute walk away.) So we called Stu and decided to visit him tomorrow (rather than spend $60 round trip on a Lyft, he's _way_ out in the boonies) and went home.

We had a container of bone marrow (not only is HEB selling it now, but there was a coupon) Fuzzy was thinking of making a fallback dinner out of, but we decided to check if HEB was still open (24 hour doesn't mean they won't close early on thanksgiving, although we knew they weren't closed _all_ day because they didn't clearance the bakery section yesterday), and they're open until 2. We purchased an actual turkey breast (clearance, 25% off) and can of cranberry sauce (and they had sweet potatoes practically clearanced, 7 pounds for a dollar).

Another of Fuzzy's friends is having some sort of Horrible Personal Thing and is coming over tonight for thanksgiving. Sadly, we found out right _after_ HEB closed. and they're vegan. Still, the sweet potato wedges can be made vegan without rendering them inedible (olive oil instead of butter), and the green bean casserole I wasn't going to eat anyway can be similarly modified. And the jars of "mince" we've had forever turn out to be vegan (it's not "mincemeat", it's some sort of apple-nut relish) so Fuzzy could make a mince pie with shortening crust.

Fade got diagnosed with an ovarian cyst causing tortion. My understanding is this is the female equivalent of a man being kneed in the groin every 30 seconds all day. (Hurts just as much, doesn't come up as often because internal instead of external.) They're going to operate, but it's laproscopic so still not _that_ alarming.

Fade had mentioned her initial trip to the ER on twitter this morning, so her mother called me to see if she should fly there anyway. I told her to call Fade (since she has her phone with her and a big USB battery), she said she tried that and it went to voicemail, I reminded her that texting is a thing. (I'm learning all this from the household Slack. I called Fade briefly for moral support, but actual communication is via text. She's apparently coordinating with her local Minneapolis friends through a different slack channel.)

My brother and sister (both currently living in minneapolis) also wanted to show up and help. I dissuaded them. (Doctors! Hospital! She is _in_ a hospital bed surrounded by professionals. They have a diagnosis and a plan of action. Best we can do is stay out of their way. If that changes, _I'M_ on a plane.)

My headphones broke. I superglued them. Wild and crazy around here.


November 27, 2019

Uninstalled the crunchyroll app (clearing the cache and deleting the data first), Fade disassociated it from the account via the web thingy, I reinstalled it, and... it's still hanging every 30 seconds. When it doesn't lose its marbles and go into the "video is frozen and the audio is repeating the last 15 seconds in a loop" mode.

Installed the PS4 app. That works, but I don't sit in front of the television and watch stuff much. I like to watch things while out for long walks late at night on deserted sidewalks. Maybe I could get an exercise bike. Except I like recumbent exercise bikes, and they seem to have gone out of style...

Honestly, it's so much more effort to NOT pirate this stuff... This should not be the case, and yet.


November 26, 2019

Huh. Well good for him. (That's open source for you, if it's of interest to somebody they can always take the ball and run with it.)


November 25, 2019

This cold is really lingering.

Ran across a link that seemed like it would be of interest to David Graeber, but when you google "David Graber email" you get... his twitter account. Which links to a personal website that's all about his books, and forwards all contacts to his literary agent.

Checked twitter just to see if there was an easy way to do this, and it still demands a phone number the instant I log in and won't let me do anything else, and literally won't take "no" for an answer. (Unsupported phone number.) Yes, the lip service about letting you remove your phone number is just lip service: Only @jack gets to do that, normal paeons must remain hackable. Meanwhile, I bumped into an account named guillotine lover that seems to be doing just fine. Nope, not missing twitter.

(I eventually figured out that the london school of economics people page DOES actually have content, it just breaks (loads a 404 error page) if you haven't waited for the stylesheet to load before you click. So I was able to find an email address. Doubt it made it through the spam filiter, but eh. I did the thing, moved on.)


November 24, 2019

Trying to poke at toysh. Haven't quite got the brain for it, but doing what I can.

Environment variable logic is weird. There's a bunch of magic builtin variables, some of which are just predefined locals (ala "$BASH") and some of which run code each time to evaluate them (ala $RANDOM) like /proc files do in the kernel. But you can unset the builtin variables so that they stop being special. Well, SOME of them: unset RANDOM and then you can RANDOM=walrus and $RANDOM will show walrus. (But if you assign it _before_ unsetting it, the assignment is ignored.) And if you "unset _" it seems to be ignored: it doesn't error, but $_ will still show you the last last argument of the last command run. And "unset \!" says that ! is not a valid identifier, so $! is always magic (PID of most recent background job).

Also, although "X=2; {X}<&-" is apprently supposed to close the filehandle in X, it doesn't do so for stdin/stdout/stderr. Is this in the man page? It's been long enough I need to reread the man page...

None of this was designed. It's all piles of historical accretion you have to reverse engineer and match for compatibility.


November 23, 2019

I'm still jet-lagged from the 9 hour time difference, had a sore throat on the plane that turned into a cough and runny nose, went off caffeine when I left Japan, and I'm recovering from a three-week engineering sprint.

I hope to be coherent again sometime next week.

Meanwhile, "You can't be a billionaire in a country where people don't have clean water... and claim to be a good person" is an excellent quote. And so is "If there's a minimum wage, why isn't there a maximum wage?" (I continue to advocate for guillotines. Yes, "capital offense" is a pun in this case, but it's also true.)

(And Elon Musk is especially disgusting. He bought the right to be called the founder of Tesla as part of his acquisition of Tesla. There's some kind of hole in that dude's self-image he's trying to fill, but I suppose that's true for all billionaires. With the possible exception of Warren Buffet who seems to be gardening with money (in otaku/hikkomori whose savant fascination is compound interest), but even he's gotten a bit savage in his old age.)


November 22, 2019

Fade pointed out I could use her phone number to revive my twitter account, but I'm not particularly motivated to try to reclaim it? If twitter is going to call billionaires a "protected class" you can't say anything bad about, screw 'em. I read tumblr by going to individual accounts without a login, doing that for twitter isn't a huge ask. (And in a way, giving up twiter is like giving up broadcast television long ago: I have so much more time now. I'm mostly using it watching youtube, but still. I _want_ to be using it watching crunchyroll, but the darn android app is too crappy for words.)

Meanwhile, twitter claims to be removing the requirement to have a phone number because their CEO Jack's account got hacked by the general insecurity of SMS. (Gee, thanks for noticing. Please try to keep up with the group.) But I haven't even checked if the login is still demanding a phone number. Livejournal had its time, slashdot had its time, bulletin boards had their time, and I've _never_ had a Faceboot account (because they were too transparently scummy from day 1). Why on earth would I add content to twitter? That relationship became abusive. Screw 'em.


November 20, 2019

Sitting in the Haneda airport awaiting my flight, fielding a bug report from Elliott that I broke "ls -l missing" files from the command line, and the regression tests didn't catch it because the "missing" test he added was searching for specific output text which libc du jour didn't produce. This is why I never test for specific output text in error messages: it's not standardized and you can't rely on it. (If the "C" locale gave the macro names as the error message, ala "ENOENT", then you could rely on it. But alas, no.)

Hmmm. You know, technically I could portability.c something up for that. A perror_msg() variant that would give the darn macro name. (And if I make it CONFIG_DEBUG then Android would always do it. :)

Anyway, his fix doesn't make sense, but given how early I got up to pack out my hotel room before the 10am checkout, and how the previous thing I checked in here was subtly broken, I don't quite trust my analysis. (I fit everything into my suitcase and backpack, for a definition of "everything" that involved throwing out several things I'd brought with me from Texas, and my suitcase is 0.2 kilograms under the point where they'd charge me more money.)

Along the way, I found out my make/single.sh PREFIX= fix from monday broke building "make ls" when PREFIX isn't set. Sigh. (WHY is ls defaulting to showing nanoseconds? How does this help?)


November 19, 2019

Jeff is back in Canada, first full day to myself. Walked to Akihabra and back. Jeff's right that it's gotten hugely touristy, although there's still electronics parts stores down the side alleys. I took a picture of the old office building. Bought yet another bag of japanese groceries to take back to Fade and Fuzzy. I need to get a box to use as a second piece of luggage.

I got Aboriginal Linux building mostly the old busybox version against new musl: I had to patch it and disable a half-dozen commands, and I still need cp and truncate and oneit from toybox because old busybox doesn't provide the right stuff to run the native build. But I worked through it, ran the build, and... it died the same way. The m4 build break is #error "Please port gnulib freadahead.c to your platform!" and it's because the old version of gnu/dammit/lib doesn't recognize musl, and doesn't have fallback "sane" behavior for unrecognized environments. (Because gnu: it's never sane.)

So the regression test failed for musl reasons, not toybox reasons. (Why does m4 have a bundled copy of of gnulib?) Right, I can -DSLOW_BUT_NO_HACKS in the m4 build (which makes it abort() in the staircase, but in theory it should never call freadahead() when you define that...) and of course:

m4: internal error detected; please report this bug to <bug-m4@gnu.org>: Aborted

Sigh. I want to build the old LFS version because I want to reestablish the old baseline. But old LFS never built under current musl libc (because of crap like this: the gnu packages are brittle and badly designed), and current toybox doesn't build under old uClibc (because uClibc is dead).

Ok, last commit to Aboriginal Linux was switching a bunch of architectures to musl, but that version of Linux From Scratch never built under musl. So switch i686 back to uClibc, build a busybox version with it (with the toybox stuff I was doing last time because of busybox not building under current musl), undo the SLOW_HACK thing to the lfs build, and regression test LFS 6.8 build under that...

And toybox doesn't build under uClibc anymore because MOUNT is missing eight different MS_RELATIVE and friends macros. Of course. And it's not just me failing to add uClibc compatibility, Elliott's been ripping it out to make MacOS work.


November 17, 2019

Jeff flies back to Canada tomorrow, but I'm here through the 21st. I get a few days in Tokyo by myself. I probably shouldn't sit in my hotel room programming the whole time.

Still haven't rebooted my laptop. Cleaned out 6 of the 8 desktops, but USB tethering to my phone still works to get me net so I've opened a bunch of windows again...


November 16, 2019

Dating these entries is weird, my laptop's still on Austin time, with daylight savings time recently having expired in the states (something Japan's never bothered with, but which Linux auto-adjusts for when set to the US central timezone), which means 9am my time is 6pm Austin time. So my laptop switches over to showing the current date at 3pm tokyo time.

[And then there's the fun detail that I wrote this long before posting it, although it's only November 30 as I type this. Catching up!]

Poking at expr.c, because people on the list keep sending me patches and going "I plan to do a thing here" is getting old.


November 15, 2019

Sigh, people keep emailing me wanting to do "guest posts" on my website. It has this blog (which I'm 2 months behind on uploading the stuff I've written in a text file of sentence fragments with no HTML tags), my open source projects, a mirror of computer history research I did years ago, archives of presentations I gave and things I wrote way back when (such as the 3 years of Motley Fool columns I did from 1998-2000, I.E. 20 years ago), and random stuff like my resume.

There's nothing to put guest posts _in_. The only way they can have found that and not realized that is by randomly googling (presumably for things with a high SEO rank) and then blanket emailed people without looking at context. But they they follow _up_ with emails later asking if I saw their previous one, so it's sort of like there's a human behind it?

Unsure whether to respond. I don't know _how_ to respond in a non-mocking fashion since pointing out the above boils down to "what exactly are you asking for?"

(I usually assume this is some kind of russian virus spreading thing where the link initially goes to a valid article, but then it gets changed to exploit code later. Either that or it's trying to hack google rank by getting my site to link to theirs? It's a scam either way, I just don't know which one this week's is...)


November 14, 2019

Devuan's wicd wireless GUI thing hung yesterday trying to associate with an access point (the access point connected but the gui froze), and the gui was still hung that evening so I killed it, and on restart it couldn't scan for anything, and since it's at least 3 different interacting daemons (written in python for some reason) I don't know how to properly reset it without a reboot (or at least logging out of the desktop, which is basically the same thing: I lose all my open windows).

So I'm going through windows and closing them so I can shut down my laptop. I'm now on day 2 of this process.

Along the way, I bumped into the Aboriginal Linux stuff I was doing on desktop 6, and I poked at it some more and got it built: toybox wouldn't build against 2015's musl-libc, but a current musl, a couple new toybox checkins (been a while since I regression tested against a 12 year old gcc version), and a new busybox patch (the 2015 busybox doesn't like the new libc) fixed it. And I ran the ancient Linux From Scratch build control image under it and... the m4 build died.

So now I'm trying to get the _defconfig_ old version of busybox to build against current musl (so I can regression test that it's toybox and not musl that broke it), and... wow it doesn't like this new library. I've had to switch 6 commands off (sed against the config produced by defconfig)...

Ok, it built! And old busybox doesn't support truncate -s which dev-environment.sh is using to create the ext2 scratch disk for /dev/hdb.img. And the oneit symlink in build/host points to "{:+/usr/bin/}toybox" which can't be right. (Toybox didn't build at all, at least oneit should have built...)

Ok, been a while since BUSYBOX=1 got regression tested, looks like. (I mean when I left off work on this, it had been a while.) Let's add a stanza to the toybox build that in the BUSYBOX=1 case build a few standalone commands and copies them to the target... and that's failing... because "STRIP= make cp" isn't producing a cp binary at the top level? Ok, look at generated/unstripped and... it contains "root-filesystem-i686cp". Sigh. The prefixed name changes two months back seem to have introduced a regression when cross compiling single commands.

But hey, regression testing is the point of this exercise. There are quite a few regressions, aren't there?


November 13, 2019

You can sing Basic Income to the Halelujiah Chorus. Also, I wrote about this long ago, then explained it again over four days, then gave a talk about it in 2016 (which sadly was recorded by Flourish, but never posted).

I understand why Tona Seba has given his talk so many times. Howard Aiken was right.

Jeff and I are going over the j-core vhdl CPU source files, triaging them so we can do a big CPU flow diagram, document and simplify everything. It's kind of brain melting, but hopefully the result can be simplified enough it won't be for other people. So far the commits resulting from this exercise have been adding comments and changing whitespace.


November 12, 2019

Really not missing twitter. A reminder: "guillotine the billionaires" is considered by twitter to be making threats against a protected class of people.

I'm still interested in things people share on social media like twitter ( and livejournal before it, slashdot before that), mostly about about the need to guillotine the billionaires. But honestly if I don't get enough from youtube (which is about all I can access from Tokyo on my phone at the moment, all the streaming services we _pay_ for have have region locked me out) I could go to fark.com or read blogs or what's left of tumblr (where I've been looking at individual accounts without logging in all along). I'm treating twitter like tumblr, looking at individual people's feeds without logging in.

As for finding a new place to post stuff (other than here), there's an element of the same "does the world really need to hear more from a white male?" that led me to stop submitting talk proposals to linux conferences for a year or two there. (I await "ok boomer" turning on Generation X with a certain bored inevitablity. Yeah, it's a fair cop. I intend to vote for Warren, but after that I don't want to vote for anybody who isn't younger than me. My generation probably shouldn't be driving either.)


November 11, 2019

I wrote "Place and route" as my note-to-self what today's entry was about. (That's it. The whole note. I wonder what I meant?)

There's an open source VHDL toolchain producing actual bitstreams now, although very alpha test quality and still under active development. Jeff explained to me what the chunks are back when I was in tokyo, and I suppose I should write up something for the website. What do I remember and/or can bother Jeff to clarify via Line?

The first stage is "analysis", which reads the VHDL source, syntax checks it, and stores the resulting abstract syntax tree in a database. (In theory anyway, GHDL with the mcode backend re-parses the source each time because it's basically instantaneous and there's no advantage to storing an intermediate format. I'm told the LLVM backend version stores LLVM's AST into the database, which is a human readable text file by the way.

The second stage is "elaboration", which Jeff described as "super constant folding". The result of elaboration is what the simulator uses to test-run your circuit.

The next stage is "synthesis", which produces a schematic called a "netlist" from the source code: use these wires to connect these components. The first netlist is a high level abstraction: all the wires are zero length and the components it connects are generic things like "an adder" or "an or gate".

Next is "mapping" which converts the abstract netlist into components your target actually uses, producing a "mapped netlist". The mapping tool has a library of components it knows about and basically does a bunch of regex searches against the schematic to replace the generic components with the ones it knows about. (For an FPGA it's a set of cells and libraries, for ASIC it's a "Standard Cell Library" and "IP Macro Blocks". But each brand and model of FPGA can have its own incompatible set of components (even within Xilinx, Spartan and Kintex aren't compatible), and each foundry generally has its own too that you have to get from them, and they'll change every die size shrink.)

Finally there's "place and route", which takes the mapped netlist (using the right _types_ of components) and tries to figure out where to put them in 2D space. In an ASIC it has a little more freedom, but it's not easy. The immovable bits are mostly the I/O pins, and it can plonk down other components anywhere it needs to by drawing the right patterns on the right layers. The ASIC data has the size and spacing of all the cells, and there are physics rules for avoiding radio interference in your wires depending on what signals will be going down them. (Plus "transistor leakage current" and cooling and the entire can of worms that is integrated test circuitry.)

An FPGA it has a fixed set of each resource (this many cells, this many of each library), and information about where each instance lives on the chip. There's also a "routing fabric" that can be programmed to connect stuff together, although it has a finite number of wires and sometimes you have to route into a and back out, passing the input along unmodified, just to glue two wires together. When you have big busses or a lot of clocks, routing can easily become the bottleneck.

But as long as there's enough of each resource, the important part is minimizing the length of the connections. The longer the trace a signal has to go down, the longer it takes and the slower you have to clock the chip. This is where timing constraints come in: if you have to make a known signal (HDMI, Ethernet, etc) with a minimum timing, then this connection can be at most _this_ long, so you've gotta shuffle stuff around to make it fit. If place and route can't figure it out (or takes too long to do so each time), you can add placement constraints: the cells implementing this circut have to be within this part of the chip. (Usually near libraries or I/O pins that aren't movable.)

Then that last netlist gets either packed into a bitstream (the set of bits you load into the FPGA to tell each cell and route connection how to behave), or gets passed on to your fab's tools to mark it up with test pads and so on.


November 10, 2019

Cleaning up staircase and datapath.vhd.

Yay, new company registered. Alas it says it's a software sales company that performs no engineering, because racists in the bureaucracy insist white people can't do engineering. (Here they say the guy from busicom really invented the 4004 microprocessor back in 1969, because a white guy in the states couldn't possibly have. I think Intel officially disagrees.)


November 9, 2019

Sunday, day off. Wheee.


November 8, 2019

More staircase cleanup: fix default values and avoid redundantly setting them. Again, the optimizer's probably already finding most of this but we're paring down the code to see what's _left_.

Next up is finding patterns! Ala read-and-jump, which is done by RTE [002b], TRAPA #imm [3cii], and the system plane instructions "General Illegal", "Slot Illegal", "Reset", "Interrupt", and "Error":

1)
  -- W = MEM[Z] long
  ex_stall.ma_issue <= '1';
  ex.ma_wr <= '0';
  ex_stall.mem_addr_sel <= SEL_ZBUS;
  ex.mem_size <= LONG;
2) stall

3)
  ex_stall.zbus_sel <= SEL_WBUS;
  -- PC = Z
  ex_stall.wrpc_z <= '1';
4)
  id.ifadsel <= '1';
  id.if_issue <= '1';

All that stuff, done across 4 clock cycles, is actually 1 bit of information. A microcode step says "start doing this" and then for the next 4 clock cycles it does those steps in that order, because we read a value from memory and jump to it. (Which takes 4 clocks because reading from memory and modifying the program counter each have a stall cycle, for the value to come back and to read the next instruction into the IF stage, respectively.)

(The only instance of that pattern which varies slightly that RTE does SEL_XBUS instead of SEL_ZBUS in the first cycle. I.E. the address to read from comes form a different source. Maybe we can modify it to be the same? Add zero to it and it'll come out of the ALU on ZBUS...)

A second pattern is "incpc, dispatch, if_issue". A third partern (done by RTE, RTS, JMP, JSR, BRAF, BSRF, BF /S, BT /S, BRA, BSR) is "ifadsel, dipatch, if_issue, delay_jump". And there's LOTS of common "end this instruction, advance to next one" stanzas...

Jeff suggested that what we could do is have a bit shifter, where we set the bit for one of these action pipelines in our first/triggering clock cycle, and it gets shifted right each clock cycle triggering the next part each clock. And an instruction isn't done while any of those bits are set.

Hmmm: CAS is nuts: maybe it ends on clock 3 and is followed by a system plane instruction? (I don't have the right vocabulary for this.)


November 7, 2019

Seasonal affective disorder. Really. That was quick.

It's because the timezone here is insane: I think they want to be in the same time zone as eastern china, but we're a full time zone east of that, so the sun goes down around 4:30 pm here at this time of year. The hotel kicks me out at 11am (so they can clean), and I'm in a city full of light canyons, so I'm only getting about 4 hours of sunlight before the sun goes behind the tall buildings and might as well be below the horizon.

It took about a week for me for my brain chemistry to start running dry from seasonal affective disorder. I wanna hibernate.

The solution is to get up at 6am, when the sun rises, and open the curtains. (And to be honest, midnight is at midnight and noon is at noon here, the time matches the actual astronomical whatsis. But in the states we've been doing daylight savings time long enough that the numbers on the clock are lying to me consistently and I'm misreading them. I'm treating 1am as the middle of the night and 9pm as "early evening", and here it's not. 6pm and 6am are equidistant from noon and midnight.)


November 6, 2019

I'm going to guess all of it?


November 5, 2019

The lady who runs APA hotels here in Japan took over from her husband, who had to step down for being too obviously racist. I'm told she's pretty famously racist too, but better at plausible deniability. Still, the equivalent of gideon bibles in the rooms here is "Theoretical Modern History: The Real History of Japan" which has articles (in english!) with titles like "World War II was a plot against Japan by the White Nations" (seriously, Volume IV page 22). It's the usual confused mess of "white people are horrible but cozy up to the US to defend us from china" you get from racists, and these particular racists really really really want nuclear weapons and to spend all the country's money maintaining an army on top of that. (In the USA we have Boomers, here they have their own Crisis Of Infinite Geezers. Racist grandpa is in full-throated whine.)

But the racism _here_ is aimed at white people (and these days china, and lots has historically gone towards korea too), and being on the receiving end of racism is... refreshing in a way? There's a certain karmic balance to it. At home I benefit from ambient racism even without consciously participating in it, our history has been completely rewritten by racists. Here it's a headwind. Ok then. I kinda feel I've earned that.


November 4, 2019

I showed up to early voting before flying out, but wasn't informed about any ot the issues and they weren't allowed to tell me anything at the polls, so I went to look it up... and forgot before flying out again. At least I poked Fuzzy to vote today. Meanwhile, Nancy Pelosi's insistance that the election will fix stuff is flat wrong.


November 3, 2019

Huh. Interesting possible reason _why_ "dying business models explode into a cloud of IP litigation". If you give all your remaining money to lawyers, when your shareholders sue, all the money they're going after went to lawyers. That means the officers of the company conspicuously haven't got any money to go after anymore, and the logical people to sue to reclaim money is... a bunch of lawyers, which is picking a fight with a black belt. And if you have a remotely plausible reason to have given the money to the lawyers, they can't even say you breached your fiduciary responsibility.

It's an exit strategy defending against investor lawsuits. Want to recoup your worthless shares? Talk to the lawyers. No use suing us, and if you try we can show in court that you're not following the money.


November 2, 2019

So hang on: The Resident asked Russia to interfere in the 2016 election on live television. He asked China to interfere in the 2020 election on live television. The impeachment hearings are about whether he asked Ukraine to interfere in our elections... and the GOP are still arguing that despite the evidence, it's not a thing he would have done?

Honestly, the Boomers need to stop. Just stop. Grandpa is too old to drive now.


November 1, 2019

The current policial avalanche started when the oil industry paniced in 2016 due to the start of China's current five year plan, which ended expansion of China's Strategic Petroleum Reserve.

The Energy Industry, 1/6 of the world's economy, is undergoing a profound transformation. The previous such transformation was the communications industry's "dot-com" transition, which took us from analog color television to smartphones in about 15 years, and we're still digesting the fallout. The energy transition is just as big, but energy tends to pull the transportation industry (another 1/6 of the world economy) along for the ride so this one's ridiculous.

When entrenched interests face change, they tend to try regulatory capture, ever since the buggy whip manufacturers association famously lobbied to introduce speed limits on Model T era cars 100 years ago (becoming synonymous with "behind the times" in the process). In the recent dot-com transition, when the audio and video entertainment industries faced digital streaming, they paniced and lobbied for the Digital Millennium Copyright Act making "copyright circumvention devices" illegal. (And yes, back then I was covering it and writing about how the new business model was obvious.) But in communications the various entrenched interests weren't synchronized, television and movies and music and communications were their own fiefdoms that hadn't yet consolidated into today's monopolies where Facebook exists, Disney is more than 50% of all box office receipts, and Time Warner's the only cable company in town for entire cities.

But energy consolidated a century ago (the Standard Oil breakup didn't stop the pieces from colluding, and Edison's General Electric drove real inventors like Nikola Tesla out of business). When 1/6 of the world economy does regulatory capture, they hijack governments and make the CEO of Exxon secretary of state in the USA, and create Brexit to derail the EU's transitioning off of russian gas. (After first inserting a "nearly" into the EU's net zero directive, then making them miss even that.)

Historically in these technological uphevals, the energy segment transformations tend to spill over into transportation. Heating homes with coal in the early 1800's led to steamboats and trains (and the coal-powered industrial revolution, and railroad "Robber Barons" owning the Gilded Age). Then indoor lighting and cooking switching from whale oil lamps and wood burning stoves to kerosene, electricity and methane gas ("natural" gas was a 1920's marketing term for methane, Bob Hope's "now you're cooking with gas" was a shout-out to his sponsors) in the late 1800's led to horses being replaced by cars and airplanes at the start of the 1900's. Transportation can switch _without_ energy, as the shipping container revolution (starting in 1956) most recently demonstrated, but energy spills over into transportation. (Other similarly major segments with their own revolutions and schedules include agriculture and finance, and manufacturing (the original european "industrial revolution", the "american system" of interchangeable parts, Henry Ford's assembly lines, robotic arms, etc.)

But the timing of _this_ energy transition is being driven by china. China's government issues 5 year plans, which start in years ending in 1 or 6. China started filling a Strategic Petroeum Reserve under their "10th plan" starting in 2001 and expanded it under the 11th plan 5 years later, which drove the cost of gasoline in the USA from under $1/gallon up to about $3/gallon and led to a huge oil industry boom that overcame the 2005 oil production plateau with gobs of money thrown into deep-sea drilling and fracking. I.E. the fossil fuel industry ran out of cheap oil and went after the really expensive stuff. None of the Fracking outfits have ever made a profit, but with china buying more oil than they need they can get unlimited credit and turn the money they borrow into oil, and fracking releases huge amounts of natural gas as an unavoidable side effect, which they might as well sell.

But when China hosted the 2008 olympics, China's air quality became a major national embarassment so their 12th 5-year plan starting in 2011 included switching their electricity from fossil fuels to solar, wind, and nuclear, mostly at the expense of coal.

Of the 3, solar turned out to be the real winner and their 13th 5-year plan starting in 2016 moved the subsidies from nuclear over to solar. That's when it became clear to the fossil fuel industry that china wasn't going to start buying coal again (in fact it exported its domestic production to africa because nobody was buying it at home). But more importantly, China's 2016 plan declared that the strategic petroleum reserve they'd been growing was now big enough, and they stopped adding to it.

With china no longer buying way more oil than it was actually using, the price of oil crashed, the fossil fuel industry paniced, and the top three oil producers (Saudi Arabia, Russia, and the United States) started hijacking governments. Of the three, Russia was most ham-fisted about it. Saudi Arabia had a domestic coup where the current King got the throne because the claimants to the throne in front of him all conveniently died within a couple years of each other (and then he had a purge). And yet somehow THAT's the guy who covered his tracks better than Putin or Trump.

Anyway, the timing of Oil Oilgarchs' Last Stand was driven by china, not just because they're manufacturing All The Solar Panels but because they capped the strategic petroleum reserve and oil prices crashed due to the drop in demand. This isn't some grand strategy, it's a reactionary scramble from dinosaurs losing their grip and trying desperately to hold on as long as possible.


October 31, 2019

I didn't get to see any halloween this year. I didn't get to see 6th street in Austin, and here in Japan it's a half-hour train each way to Shibuya (where the real halloween festivities are) and I'm too jetlagged to be up for it. There are decorations up in shops and halloween meals in restaurants, and I approve.

There are also many small children here in Japan now. The local economy looks like it REALLY picked up maybe 2 years ago. There were small children in strollers at the coffee shop in the airport where I met Jeff yesterday. There were mothers with babies in slings in the coffee shop we went to this morning. There were small children at the tables in the mall at lunch. To misquote M Knight Chamois: I See Obviously Pregnant People.

This was not the case when I was here at the end of 2017. It's deeply cool.


October 30, 2019

Crossed the international dateline. My plane took off around noon on the 29th in the USA, and landed in Japan around 2pm on the 30th something like 14 hours later.

Add in jetlag and the redeye flight, not a lot to blog about today.


October 29, 2019

On my way to Japan. 6am flight with soup and/or shuttle picking me up at 2:18 am. I'm a bit jetlagged already.

Three hour layover in San Francisco, with an outlet. I have _finally_ caught up on email from the backlog that accumulated in Canada.

Watched Hidden Figures (which was excellent but stressful to watch in a number of places) and the first Ant Man on the plane. Now I can watch Ant Man And The Lady Who The First Movie Totally Should Have Been About Honestly There's a TVTropes Page On This, which has been on Netflix forever.

Meanwhile, I have an outlet (woo!) that's loose enough the plug falls out of it every ten minutes but on balance my laptop's still got 80% battery so close enough. I've reproduced Jeff's sh2-elf toolchain build (for the boot ROM and such) and I'm trying to get it debugged so "the build dies but it's ok, just run install afterwards and that'll die with an error to but it will have installed enough of the toolchain to use" setup Jeff gave me is not what I send to the list.

I was hoping to do a lot of toysh work on this trip, but... redeye flight.

Also watched the new Jumanji (better than I expected, I'm not usually a fan of Jack Black but he was born to play that role), and am now watching Lost in Translation (one of the co-stars is Scarlet Johansen when she was, apparently, 12).


October 28, 2019

Plane to Japan in the morning. May be doing some avoidance productivity.

And 0BSD is recognized by github! I poked them to rescan toybox as suggested in that comment, and it only took about 45 minutes to fix the toybox repository page.

Ok, what's wrong with xargs -0... I just checked in Elliott's workaround for now, I.E. expanding the fudge factor from 2048 to 4096 bytes. In theory libc should never return less than 128k anyway (even if 1/4 the stack is smaller than that), so I don't _think_ that'll go negative? And if it does, it should just veto any attempt to run anything rather than segfaulting.

Reading about what oil price makes fracking economically unviable, which led me to looking up the current price of oil ($56 for WTI crude, $62 for brent crude), which led me to look up what the difference is which is a fascinating article. I blog about this stuff periodically, and used to do twitter threads about it. (I should dredge through the last twitter archive I downloaded before the format change and copy the stuff that's worth keeping to my website.)

Oh hey, fracking was invented by Haliburton, who'da thunk? The Boomers are killing billions of innocent civilians for the benefit of about 90 people.


October 27, 2019

There's no obvious way to get vi to show whitespace (google suggests I type two lines of tricksy punctuation, which I'm not even going to try to cut and paste), but this works fine:

echo -e '\e[42m'; cat file.txt

Especially since my $PS1 has a color change in it so it resets itself.

The endless xargs saga continues, with 5 commits to the command already this month. The remaining issue is that the xargs size accounting is randomly failing for reasons I still don't understand, and which Elliott fixed by adding a full page fudge factor at the end so we just don't hit it. But sometimes it DOES work without that fudge factor?

Trying to get xargs to work continues to suck

October 26, 2019

I fly out the morning of the 29th, missing halloween. On the bright side, I presumably get halloween in Tokyo. :)

While trying to add a test for the new ln -r, I accidentally did ln -s circular circular instead of ln -s . circular, and of course following that goes "boing" and gives an error message, but in my code it fell through and gave _two_ error messages, the first from relative_path() saying too many levels of indirection, and the second was ln -s saying it couldn't make a symlink from (null) to the destination. And of course I then had to fix that because you can't trust glibc to print (null) when passed a null pointer instead of segfaulting, because glibc loves breaking stuff that used to work in the name of purity. (The C++ developers who took over gcc and tried to spray down C with "undefined" behavior have been infecting glibc.)

So anyway, I fixed it, but I generally add a test to the test suite when I commit a fix, and this specifically testing an error path, and I'm not quite sure how to go about it. Check that it did indeed exit with an error? The theoretical segfault would have exited with an error. I try not to check _specific_ error messages, because internationalization and version skew between implementations and so on. If nothing else it makes TEST_HOST extremely brittle.

I have a whole todo can of worms for fleshing out the test suite, and "testing error paths" is one of the big unscoped things with pending design work: I don't actually know how to do that yet. There are many things that are easy to test by hand but hard to test mechanically, and that's an entire category of them. Checking for _specific_ text, a _specific_ error code... that's more constrained than I want the tests to be. I guess "it segfaulted" is always an error, and that means error code 139:

$ echo "int main(int argc, char *argv[]) { *(int *)0 = 1; }" > test.c
$ gcc test.c
$ ./a.out
Segmentation fault
$ echo $?
139

Because SIGSEGV is 11, 128+11 = 139: killed by signal 11.


October 25, 2019

Darn it, most of the stuff I added to the crunchyroll to-watch list seems to be dubs rather than subs. Ever since I tried to watch the dub of Irresponsible Captain Tylor years ago, I just haven't been a fan of even good dubs. (Plus it doesn't mean I'm being exposed to more japanese, which I'd like to eventually learn and foreign languages are a thing I DEEPLY SUCK at.)

Still drowning in toybox todo items. Trying to make ln -r work because when I scrolled back in my email inbox to the last point where I'm reasonably sure I dealt with everything, the github notification email about a push to that pull request is the first thing that came up as still todo. And investigating _that_ found a cp -s bug, so I'm doing a new relpath() function for lib/xwrap.c...

Cans o' worms all the way down. (My flight to Japan leaves tuesday morning at 6am, so I'm trying to get things to a good stopping point by then.)


October 24, 2019

Fade and I broke down and got a crunchyroll subscription, which has less anime on it than I expected. (It's got a bunch, but I scrolled to the end of the list in about an hour. Recognized maybe 2 dozen things and added them to a list. Lots of stuff I was hoping to find _isn't_ on there. Oh well, that's the modern streaming service landscape for you.)

The real problem with crunchyroll is it's unusable over phone data. It shows maybe a minute of show and then hangs for a full minute loading more. (Long enough to pop up a "this show is having problems, keep waiting or quit" dialog each time, and then wait almost as long again after you keep waiting.) You can't download shows to watch from local memory, you can't pause and let it fill its buffer... it's doing the IP pearl-clutching thing where data MUST be transient and it's INCONCEIVABLE that your connection can't keep up. (Which is extra-weird because I've downloaded files at 3 megabytes/second on this phone, what resolution did they encode these suckers in? I take it crunchyroll hasn't been paying the phone companies their danegeld to not get rate-limited the way youtube/netflix/prime have? Or do those just have 4 different encodings of the video at different data rates they switch between depending on quality? (And 2 of the 3 have a download to view locally later option.))

Part of the reason I stayed offline yesterday is I have yet another xargs patch waiting in my inbox, and I'm trying to remember what failure it was trying to fix. When I have the test, coming up with a fix for it is generally easy. When I have a fix without a test I have to try to figure out what _might_ be going wrong which takes forever and is error prone.

I think the issue here was "make sure you always try to run one argument by itself no matter how long it is"? Except the point of -s is to _not_ do that and I'm confused about what Elliott _wants_.

I remember somewhere in the middle of a thread he did send me the test and I reproduced it, but gmail dropping duplicates (which it's done for a decade and there's no way to tell it not to and it's the classic "that's not a bug, that's a _feature_ and it will remain a feature no matter how many times you tell us it's a bug) means my threads are broken and split between inbox and toybox folders, and you keep either sending me new patches without restating the test we're fixing, or going "but I told you once, check your back email". (And unfortunately the mailing list archive doesn't put DATES on anything unless you open individual messages and hunt.) And there's also the possibility that he mentioned it on github, in which case it's just lost.

I'm currently checking the web archive. At the start of the thread he said:

> It turns out that findutils xargs almost always uses an artificially low
> limit of 128KiB. You can observe this with --show-limits (which I
> refrained from adding to toybox since I believe it's only useful if
> you're debugging a broken xargs).
> 
> I think an alternative fix for all this would be to go back to counting
> the cost of the (char *)s, but separate the -s limit from the "system"

A fix for _what_? What is "all this"? He was describing findutils' behavior, not toybox's behavior. It sounds like toybox is allowing stuff to run that findutils would veto? How is it causing a problem?

When I asked for clarification, he said:

> I'm not sure what you're talking about. the point is that the
> findutils "don't count the (char *)s" algorithm is an _underestimate_
> of how much space we really need, so comparing that to the actual
> kernel limit is wrong. but findutils' other bug is that it does

We're not comparing anything to the actual kernel limit. The actual kernel limit's been at least 10 megabytes since something like 2007.

There was a patch to add xargs -P, then a patch to avoid a cast, then a -E fix, and I _think_ all of that is in and sorted now? But this issue is pending and I don't remember what the failure mode was...

Ok, there's a test in the patch. Which is a terrible test (the contents of /usr aren't stable, thus it's a dependency. If you want a long filename you can "head -c $SIZE /dev/zero | tr '\0' x" to get an arbitrarily long string of the same character.)

Alright, the problem is that we're not testing _any_ limit, and when we hit the kernel's 10 meg modern limit it goes "boing" and fails. Right. And since he took the pointer size accounting out in a previous patch, if you feed it a string of single character names then you'll hit the limit at 1 million of them on a 64 bit system. (1 byte + 1 null terminator byte + 8 pointer bytes = 10 bytes per entry.)

Gotta put the accounting back, but -s doesn't use the accounting, which means -s is broken and will easily trigger "argument too long", but I can check FLAG(s) to see if you specified it and have TT.s be the probed value otherwise and do the math right.


October 23, 2019

Jeff's in Japan. He did not get arrested on his way through customs. Not a huge likelihood but one of the people at the previous company stole the company seal out of Jeff's desk, used it to file official paperwork with regulators to drastically increase his and his friends' salary, and then when the company ran out of money (not having _accounted_ for the extra money, since nobody else was told) sued for the "unpaid salary". This is why the old company went bankrupt and they had to start a new one.

Sadly, they hired that guy in the first place because he's the son of politically connected people who they thought might help with their VC funding, and apparently in Japan what legally counts is posession of the seal. (To the point that wives can steal their husband's seal and get a divorce that way. The Elementary episode about the long lost Imperial Seal of China makes more sense now.)

Anyway, Jeff's meeting with lawyers and getting the new company going, and it looks like I'll be flying out to Japan maybe sunday-ish.

I stayed offline today. Bussed to the DMV to get a copy of the car title for the car Fuzzy destroyed by driving after winding up in a ditch and putting a hole in the oil pan; we gave the corpse to the maintenance guy who removed our poison ivy, and needs the title to sell it to a junkyard for parts). The republicans' voter suppression tactics remain in force: the only DMV location that could handle this was 10 miles north, an hour riding two different busses (plus a 15 minute walk) each way. Altogether it ate about 4 hours.


October 22, 2019

Caught up with the tweets tweetcaster had downloaded, then deleted the app off my phone. Fade tweeted that I'm not currently planning to come back.

The reason I refuse to attach my phone to my online accounts because it doesn't increase security, it DECREASES it. If password reset requests go to things like email I can at least spread that over multiple accounts accessed via devices I control, and you can't pickpocket a sim card to get those account credentials: the phone may have a screen lock but the ability to receive texts is in the sim card, and that's not counting somebody stealing your number via social engineering at the carrier level, which is called phone slamming (the T-Mobile service to put a PIN on your account to prevent that is apprently called Lookout MobileSecurity Premium... even though it's free?).

2FA is nice (modulo losing _either_ locks you out of your account so either there's a recovery mechanism that's usually the weak link, or else denial of service and bricking is the weak link), but SMS ain't a security protocol. Heck, somebody could request a reset on the website and merely listen to the SMS broadcasts on the same cell tower my phone's on (the encryption's best described as "sad"). My _exact_ location may (or may not) be fuzzy but figuring it out within a few blocks isn't hard (and that's assuming you're _not_ tracking my phone). And of course this is assuming the phone itself is secure, which I generally don't. (I don't do anything like banking through my phone, and don't have any email accounts anything _else_ is tied to readable from my phone. I don't even use gmail's web UI through the phone browser. The only reason I had twitter on there is I didn't hugely mind if I lost it.)

Still annoying I didn't get to say goodbye to my followers, though. Oh well.


October 21, 2019

Sigh. Twitter just effectively deleted my account.

I made a "Guillotine the billionaires" tweet du jour (it was just that text with a link to another tweet), which went mildly viral (the webcomic artist behind Something Positive followed my account back when he was a Penguicon guest, and he retweeted it), and a couple dozen retweets later my account was suspended for making threats against a protected class of people: billionaires.

In theory the suspension is "12 hours" but that countdown apparently can't start until I give twitter my phone number, which I've never done before and refuse to do now. (Why would I make my phone a single point of failure where people can do a password reset and hijack my account by intercepting a text message anyone anywhere can send at will from the website? No I can't get a burner phone, then whoever inherited the number could take over the account. And you can't make new twitter accounts without supplying a phone number anymore either, because data collection for advertisers.)

I've tweeted "guillotine the billionaires" a hundred times or more in the past couple years, this is just the first one that got noticed I guess. Personally, I consider "hoarding a billion dollars should be a capital offense" to be a valid political position, and yes I would literally like to change the law so that controlling a billion dollars for more time than it takes to give it away (maybe 30 days) is federally punishable by guillotine. There were 607 billionaires in the USA in 2018, and the top 1% of the population owning more wealth than the bottom 90% of the population is why things like health care and climate change are not being addressed.

It's literally impossible to EARN a billion dollars in a human lifetime: somebody paid $1000/hour working 16 hours per day every day including weekends for 100 years would have $584 million. If you worked since the fall of Rome in 476AD earning $100/hour for 8 hours a day working monday through friday, without a vacation: $321 million.

Think what that means about spending it: a thousand dollars a week is $52k/year (tax free if you already have the money), so a million dollars would last 18 years even without earning a nickel of interest, and it would take 90 years to spend $5 million. If you already have the money it's tax free (you'd have to earn tens of thousands more to be able to _spend_ $52k/year), and the average rate of inflation from 1999-2019 was 2.2% even _with_ the trillions of dollars of "quantitative easing" pumped into the system to bail out the banks. (Turns out the 1970's "stagflation" was capitalists trying to sabotage FDR's New Deal so they could become billionaires.)

It just might be possible to earn a hundred million dollars through your own efforts via the work of a lifetime, and this is where most tech founders retire: Paul Allen of Microsoft, Steve Wozniak of Apple, Jim Manzi of lotus: all retired when they had $100 million, which at 5% interest means you get 96 thousand dollars more money each _week_.

A billion dollars is ten times that. Billionaires rely on financial tricks like leverage and compound interest and securitization to accumulate vast hoards of wealth that are literally impossible to earn. You can only ever get a billion dollars by receiving almost all of your money from other people's work.

And being rich means nothing without poor people: Warren Buffett does not hire Bill Gates to wash his car. Money is a social construct, it can only be spent on people (who then agree to make goods and perform services like not to contesting your ownership of land that's been around since before humans evolved). So a lot of rich people do their best to _create_ poverty, slumlords and payday lenders are just more obvious about it. That's why republicans keep attacking health care, "you can't get good help these days" but if refusing to work for a large corporation is a literal death sentence... Only the truly desperate would put up with the modern workforce, so we're all as desperate as billionaires can make us.

Tens of thousands of deaths annually are directly attributable to inability to afford healthcare. In a world where not only are 1/3 of gofundmes for medical bills but hospitals and insurance companies are telling patients to set up gofundmes to cover their costs, yes the existence of billionaires is literally killing people. LOTS of people.

This "protected class" of people I'm supposedly threatening (billionaires) stays so voluntarily: you can give money away at any time, as people who aren't monsters generally do. And they race to accumulate more money because they think they're bidding on the lifeboats of the titanic and must be able to outspend the other billionaires to be safe when the apocalypse comes. (That's what you get for raising all those Boomers with "duck and cover", they're SURE the world's going to end in their lifetimes and there's not much of it left now. Half of all Boomers were born before 1955 and the actuarial tables say average lifespan in the USA is 79 so on average they've got about 15 years left.)

And when the tweet I linked to is about the Sacklers, who created the opioid crisis, getting off with a $260 million fine? Yes they're killing people. They've always been killing people.

We've had 50 years of lobbying saying that taxing billionaires is unrealistic. Let's take them at their word: guillotining 650 people could totally happen. Lobby and vote for the realistic goal. Move the overton window so that "billionaires continuing to exist" is not a thing. If some people want to head off guillotining them with the tepid solution of returning taxes to the 91% rate they were at before LBJ lowered them to 70% in 1964 (which gave the capitalists the money to sabotage the new deal and lobby to put Reagan into a position to lower the top tax rate to 28% in the 1980's thus giving us an ever-growing National Debt which is exactly the same amount of money that _didn't_ get taxed away from billionaires, as in the charts literally mirror each other)... I suppose that's one way of doing it. If there aren't billionaires, then laws requiring they be guillotined don't apply to any current living person, just like kings before them.

But guillotining is realistic. Taxing is not. All the billionaires say so, why not believe them? If they insist the only way they'll stop buying our political process (via citizens united) is if we pry their money from their cold dead fingers, why not take them up on it? And if they want to flee to an island somewhere and be afraid to ever set foot in this country again, that works too. They're parasites on our economy who insist we need them, using predatory philanthropy to launder their public image and cement control.

Oh, twitter also wants me to performatively delete the offending tweet, despite them having hidden it, as a second act of contrition after they have more of my personal data. No thanks: I wasn't wrong.

Twitter (or as its users call it, "this hellsite") is known today primarily for the Resident's twitter account (where he can call of the death of journalists without repercussions) and its inability to get rid of nazis (unless you set your location to Germany, in which case they magically vanish from your feed because it's required by law there and thus twitter _does_ have an is_nazi flag in the database). And now they've come out in defense of "people of wealth" (who do not like to be called billionaires, and no I did not make that up).

Another thing I regulary tweeted is "twitter is bad at being twitter", something even its founder admits.

I've never had a facebook account, and I've left a bunch of sites like slashdot and livejournal behind over the years. I've _nearly_ quit twitter at least 4 times (left for a month or so and came back) when they did various frog boiling "twitter is bad at being twitter" site changes I couldn't stomach. I was only staying after the most recent UI change because tweetdeck sucked less than their most recent site design, and because tweetcaster on the phone was usable-ish. If that's what twitter is now? They can keep it. No, I am not giving them my phone number in some sort of act of peanance.

So anyway, that's what happened to my twitter account. *shrug* I downloaded my archive a week or two back (they changed the format to be MUCH less useful to search by hand for things you once tweeted, it used to be CSV and now it's hugely verbose json), maybe I'll try to put it it somewhere useful someday...


October 20, 2019

I found the posix-2008 URL and I'm seriously tempted to change all the command URLs to point there, and modify the roadmap ala:

Although previous versions of Posix have their own stable URLs (where you can still find SUSv3 and SUSv2), the 2008 release of SUSv4 was replaced by a 2013 release also claiming to be SUSv4, then again by a 2018 release, all at the same URL. Similarly, the other version numbers claim not to have changed but adopted some sort of "Windows 95" naming scheme ("The Open Group Base Specifications Issue 7, 2018 edition"). Since a moving target isn't a standard, we're sticking with the 2008 version until they stop this upgrade-behind-your back nonsense. Luckily you can still find the original content here.

But I'm still tired and irritable, and don't want to make decisions when it could just be low blood sugar talking.

I suspect I have may have a very very very mild case of this darn flu that knocks people out for weeks. I've had a mild sore throat and congestion that's refused to turn _into_ anything for a week now.

This sometimes happens with me because my immune system basically learned ninjitsu on Kwajalein, the tropical island I grew up on. I used to think a strep throat meant it was sore and a staff cut meant it was pink, and when a little line runs up your leg or arm from the cut more than an inch you'd go to the doctor to get the nasty pink chalk liquid (amoxicillin). It was _normal_ as a kid on Kwaj that you'd get an ear infection bad enough to sound like you were hearing through a cardboard tube a couple times a year. When my mother got breast cancer back before Y2K (the bomb they dropped on Bikini Atoll took off from Kwaj; geiger counters were not allowed on the island) the x-ray tech checking her lungs asked when she'd had tuberculosis. Which is how she found out she'd contracted tuberculosis on kwaj and her immune system fought it off.

It's been a looooong time since then, and I had measles when I was 23 (vaccinated as a kid but no boosters) which tends to cause immunological amnesia, but I'll still sometimes shrug off things that take other people down for a while. (Or it could just have a really long incubation period. No idea.)


October 19, 2019

I've had opinions about venture capital being a net negative ever since I worked for WebOffice in 2002 and the startup got crushed by its VC in service of one of its other investments. (We got pulled off of our product and tasked with reverse engineering the realvideo server's encryption layer so some sports streaming service wouldn't pay so much for it. Youtube was founded about 3 years later.)

The fundamental problem with venture capital is it starts you at the bottom of a lake with a scuba tank, and instead of swimming to the surface where you can breathe your incentive is to go down or sideways to get ANOTHER scuba tank. (And whatever happens, it's a very heavy tank.) I prefer to fund from operations and leave crazy investors' notes out of it.

Anyway, this is a good article from the trenches on that.


October 18, 2019

I got a toybox release out by basically ignoring the mailing list and just retreating into a little ball until I got "make tests" passing and the release notes written up and all the binaries built and uploaded and so on. (I've got a checklist.)

I may just wander away from the computer from the rest of the week. I'm tired, and people are asking me "how can I help" on the mailing list and it's one of those things where anything I can think of would take me less time to do than to review and correct their patches. (It's a short term vs long term thing: Elliott is hugely helpful and I _want_ these developers to come up to speed and mature... except there's theoretically a finite amount of work to do on the project? My roadmap has an _end_. Yeah, bugfixes and undoing bit rot presumably goes on as long as it's in use, but in terms of adding features to it?)

Sigh. Tired.


October 17, 2019

Ok, finally got this github markdown thing figured out. To locally create the HTML more or less like it looks on github, you need the cmark-gfm package (which wants cmake but once that's installed just type "make"), then build/src/cmark-gfm < input.md > output.html and THEN you need the github-markdown-css package: copy the github-markdown.css file into the same directory as output.html and then copy the wrapper blob out of the readme there, namely:

.markdown-body { box-sizing: border-box; min-width: 200px; max-width: 980px;
margin: 0 auto; padding: 45px; }
@media (max-width: 767px) { .markdown-body { padding: 15px; } }
</style>
<article class="markdown-body">
[body of output.html]
</article>

I should make a wrapper script to do all that.


October 15, 2019

Trying to get ps.test to work and diff -b is being stroppy. It ignores changes in the _amount_ of whitespace, but "one thing is indented" and "the other thing is not indented" still counts as different. (Because zero is not an amount, apparently. There must still BE a space. Which is annoying for leading and trailing whitespace.)

Ah, I see. I want -w instead of -b. becuase having FIVE command line options dealing with whitespace is very gnu/fsf.


October 14, 2019

Saw the homeless lady from last month. She was spending the night at the picnic table in front of HEB again, and I stopped to catch up. She lost the burner phone the same way she lost the frosty keychain I gave her 6 months ago: when you don't have a secure place to store your stuff while you sleep, it's often not there when you wake up.

She spent the money I gave her on meth (instead of a week's lodging at that weekly boarding house we visited, but ok), and is mostly sleeping in an abandoned car now when not staying "with her friends" under the bridge.

She says she's back on the waiting list for a rehab slot up in dallas (multi-month inpatient program, she claims to be more interested in the housing than the rehab), but has now missed it twice because the busses from hancock center up to the greyhound station next to the ACC Highland Campus stop running early on sunday night. Yes, she's missed it twice for the same reason, and been surprised both times. (When I took that japanese class at the Highlander campus (same place: there can be only one) I walked back home more often than not, it's less than an hour each way. But ok.)

I'm starting to see where this 'Ok Boomer' thing comes from. She's a decade and change older than me and needs an adult. I want to help, but am not sure how.


October 13, 2019

According to github I should still be camping this page to see when 0BSD finally gets recognized. (The decision _to_ recognize it was made 8 months ago, but actuallly doing it has an awful lot of bureaucracy and staging.


October 12, 2019

Elliott just poked me that a commit I did screwed up tests for commands _other_ than the one I was just testing, so now I'm trying to fix "make tests" so all the entries pass before cutting a release. (Once it all works, I should be able to use it as a regression test.)

Alas, there's host variance (hence wanting mkroot as a stable testing environment). When I switched from ubuntu to devuan, a number of tests broke.

The first "cmp" test isn't that problem, it's that I changed cmp to behave more like the host cmp (and less like posix): it used to error out with only one argument, and now it uses stdin as the second argument if you only give it one file. Which is fine, except I want to test both success and failure (for a proper test) and the message put out by both busybox and toybox (which is also the one posix requires) is "test - differ: char 5, line 2", but the one from diffutils says "byte" instead of "char". (I assume this gratuitous posix violation has something to do with utf-8, but it's kinda hard to TEST for. Grrr.)

What I really want is a regex (or at least wildcard) match, but I haven't written the infrastructure for that: it's just diffing the output. I can | grep in the test itself, but this seems awkward to repeat a bunch of times? I suppose if the match output starts with ~ it can grep and tests for "does it contain this" maybe? Except the problem is then what do I print for VERBOSE=1? When it doesn't match, _how_ doesn't it match? Probably I should just pipe the output through sed to file off the pointy bits in each test instead of trying to extend the plumbing...


October 11, 2019

In Minneapolis, back at the McDonald's with the outlet. Not really getting much done, but it's good to have some recovery time.

Yay, my regex update made it a man-pages release (version 5.03).

Elliott has pointed me at the file where selinux attributes are applied to files during filesystem packaging in the AOSP build. Unfortunately, I have no idea what tool _reads_ that file and actually Does The Thing. (It's done during filesystem packaging; neither the host nor the target system applies these xattrs, it's like creating /dev nodes with the genext2fs -D option. The archiver annotates them while adding them to the new filesystem image.

There are about a hundred files in that directory. I wonder what a .te file is? Ah, there's a README in the parent directory, maybe that'll help. (Why is the web gui syntax highlighting and, for, as, is, by, into, when, final, this, and out? Why is the first word of each sentence purple?)


October 10, 2019

Redeye flight to minneapolis, got to Fade's, slept for 4 hours (under the dog, who is trying to do the nose-lunge thing as I type; if I don't pet him he'll lick me, which he knows I try to avoid).

The Android NDK seems to have been superficially broken in a profound way, which is quite a trick. I should just do a compile time probe for this.

I want to reply to this with "yes, that's why it's in pending", but they're trying to be productive. I should throw some cycles at cleaning up the pending directory, but it's full of cans of worms that each turn into a large timesink and wind up making _more_ work if I do the cleanup halfway and stop.

Meanwhile in Linux Kernel land, I'm having trouble getting psyched up about BPF. They've reinvented the JVM and stuck it in the kernel. Wheee? Somebody was trying to stick a lithp intrepreter (and somebody else a forth interpreter) in there over 10 years ago and they got laughed down, but once one was in there they genericized it to do everything, because of course. (Maybe I'd care more if it wasn't Faceboot doing most of the BPF work? Them and Red Hat: sources of bad ideas.)

I'm trying to psych myself up to go to the protest. (The Resident is in Minneapolis for his usual nazi rally slash fundraiser. The giant baby balloon was shipped here yesterday, apparently.) Alas, I just don't have the energy...


October 9, 2019

Cleaning out the airbnb house. It was a good engineering sprint, but I'm tired and ready for it to be over.


October 8, 2019

Tired. Two more days here in Canada.

Jeff's got it into his head that it's very important turtle produce analog video RCA output through the audio port (with a special adapter), which is apparently ten times as complicated as HDMI was and he's on day 3 of trying to work out how to do, and has special equipment for. His evidence that this is important is Raspberry Pi 4 still does it.

The Raspberry Pi foundation started in 2009, the same year analog TV transmission ended, and back then it made sense to support surplus devices people had lying around that had _just_ gone offline and hadn't been disposed of yet. And once they did it's "this circuit we already have" as far as they're concerned, so why change it? (It's not like it's eating FPGA space that could be put to another use.)

But even 10 years ago it analog TV was already an obsolete technology: Sony stopped production of CRTs in 2005 and Samsung in 2008, federal law required all new TVs sold to be digital starting in 2007, and analog broadcast licenses above "wall outlet" power ended on June 12, 2009 so the frequencies could be recycled. By 2013 even the market for digital to analog converter boxes for old TVs had "largely dried up". The last known CRT _refurbisher_ repairing old ones closed shop in 2015. These days old CRTs are hard to legally get rid of (the EPA classifies them as hazardous waste due to the lead, but the recycling market only ever turned them into new CRTs, and is thus long dead, so they're stockpiled like nuclear waste).

So yeah, Jeff is on day 3 of getting analog video to work.


October 7, 2019

Ooh, remember how the people denying the dangers of leaded gas, tobacco, and global warming are all the same people? There's a documentary about this.


October 6, 2019

Jeff's not back until afternoon, so I have another half day off. (Then again it _is_ a weekend. We sort of work until we drop, but it shouldn't feel weird to have time off on sunday.)

I took another stab at using CROSS_COMPILE with a current NDK that doesn't have the "export" script and once again can't figure out what to link llvm-cc to. I tried ./toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-cc -> ../../../../llvm/prebuilt/linux-x86_64/bin/clang but when I do CROSS_COMPILE=~/android/android-ndk-r20/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi- make distclean defconfig toybox it can't find any system headers...

Ah, I was just reminded that https://developer.android.com/ndk/guides/other_build_systems uses toybox as a build example. Let's see how that's changed since I last looked at it...

$ export TOOLCHAIN=$NDK/toolchains/llvm/prebuilt/$HOST_TAG
$ export AR=$TOOLCHAIN/bin/aarch64-linux-android-ar

The problem is the only ndk/toolchains/llvm/prebuilt directory is linux-x86_64. The other toolchains seem to be under toolchains/$ARCH ... Ah, it scrunched them all up into a single directory. Got it. So...

$ CC=clang CROSS_COMPILE=~/android/android-ndk-r20/toolchains/llvm/prebuilt/linux-x86_64/bin/i686-linux-android16- LDFLAGS=--static make distclean defconfig toybox

Fails because bionic doesn't have readlinkat (a _syscall_ wrapper) until API 21. Despite it having been added in the same commit (5590ff0d5528b) as openat() back in 2006 (shortly before posix-2008 started requiring the *at() functions). Ok, try building with -android21- and... now getgrgid_r() isn't there: bionic's sysroot/usr/include/grp.h is #ifdeffing it to api 24, the man page says it was introduced in glibc 2.19. The getgroups() syscall was already there in linux 1.0 in 1994.


October 5, 2019

An evening off in rural Canada while Jeff's running errands up near his apartment (about an hour's drive away from our airbnb house, so he's spending the night with his wife and dog).

I've plugged toybox into the old aboriginal linux build and gotten most of the way through the gcc build, at which point the tar "no ../ files outside this directory" check is false positiving on "./" for some reason, gotta track that down. Gotta constant regression test all the corner cases, and old builds are a good way to do that. They test stuff I'd never think of. I have a full Linux From Scratch baseline build, I should reestablish that and forward port to the new stuff, like I used to do before mkroot. (Simpler base, but losing years of continuity and what I built under it sucks. Gotta bridge...)


October 3, 2019

So trying to cut a toybox release I'm doing lots of testing, and I'm trying to get the android NDK to work as an "llvm-cross" with scripts/cross.sh and it's being weird. For static linking with the old 2018-era NDK I have lying around (r18-beta2, the export-toolchain.sh version), it's working ok:

$ LDFLAGS=--static scripts/cross.sh llvm make distclean defconfig toybox
...
$ file toybox-llvm
toybox-llvm: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped

But for dynamic linking it's... creating a shared object?

$ scripts/cross.sh llvm make distclean defconfig toybox
...
$ file toybox-llvmtoybox-llvm: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /system/bin/linker64, not stripped
$ hd toybox-llvm | head -n 2
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 03 00 3e 00 01 00 00 00 40 e5 00 00 00 00 00 00 |..>.....@.......|

That is a 3 in position 16... So I do:

$ wget https://dl.google.com/android/repository/android-ndk-r20-linux-x86_64.zip
$ unzip android-ndk-r20*.zip
$ ln -s clang android-ndk-r20/toolchains/llvm/prebuilt/linux-x86_64/llvm-cc
$ mkdir ccc
$ ln -s ~/android/android-ndk-r20/toolchains/llvm/prebuilt/linux-x86_64 ccc/llvm-cross
$ LDFLAGS=--static scripts/cross.sh llvm make distclean defconfig toybox

And the output is a dozen of:

generated/obj/id.o: In function `do_id':
id.c:(.text.do_id+0x1c2): warning: Using 'getgrouplist' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

... it's linking against the host libc? The ccc/llvm-cross/bin/llvm-cc path is preventing it from finding itself, or something?

I only had 4 options in the r20 NDK for:

$ find . -name clang
./toolchains/llvm/prebuilt/linux-x86_64/bin/clang
./toolchains/llvm/prebuilt/linux-x86_64/lib64/clang
./toolchains/llvm/prebuilt/linux-x86_64/lib64/cmake/clang
./toolchains/llvm/prebuilt/linux-x86_64/share/clang

I don't _think_ I picked the wrong one? V=1 confirms it's calling llvm-cc, but when I take off the --static I get:

$ file toybox-llvm
toybox-llvm: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, stripped

Hmmm...

Ok, look around for _other_ toolchains, and I find toolchains/ dir has architecture directories! Ok... So... what do I use as the cc symlink? Um... well, let's try the same clang as last time (since it _did_ compile stuff, it just linked wrong), so... Let's try: for i in toolchains/*-*/prebuilt/*/bin/*-nm; do ln -s ../../../../llvm/prebuilt/linux-x86_64/bin/clang ${i%nm}cc; done and then LDFLAGS=--static CROSS_COMPILE=~/android/android-ndk-r20/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin/aarch64-linux-android- make distclean defconfig toybox and it goes:

In file included from ./toys.h:9:
./lib/portability.h:41:10: fatal error: 'sys/types.h' file not found
#include 

A lot. Hmmm... That seems to be the _only_ header it's not finding though? (I'm doing the distclean defconfig, it _should_ be rerunning all compiler probes in the new toolchain context! Modulo it should do that whenever the toolchain changes anyway, I need to test that more.)

Ah, that's the first header it tries to include. So it's saying "you have no system headers".

I need to write a FAQ entry about how to do this so I can refer to it in future. But first I need to make it _work_ with the current NDK version. I remember hitting and being confused about this before, but the lesson I took away at the time was "the prebuilt toolchain in AOSP doesn't work like the NDK does". Apparently that was the wrong takeaway because the NDK changed to work more like AOSP, and now I need to see if I left myself any notes...


October 2, 2019

Wrote a document for the investors talking about the 4 projects we're working on in this "engineering sprint" (I.E. time in Canada at an AirBNB). 1) a contract for a communications protocol, 2) turtle boards sellable on crowdsupply, 3) ice40 j-core systems sellable on crowdsupply (the first version of which is a clone of old HP calculators), 4) the synchrophasor stuff we were doing years ago (our noncompete with 3M has now expired!)

In the evening Jeff and I went to a surprisingly live mall to find the amazon locker our "10 atari games in one" joystick had been delivered to, because we need an RCA video output test load to make sure the fancy many-knobs studio monitor survived shipping (the case didn't but the electronics inside seem to have?) before we try to test analog video output from turtle. (We got HDMI working in the bitstream already, but raspberry pi's audio jack turns out to also do analog video if you have the right adapter, and we're supporting that. Hence the need for the analog video test box with All The Knobs to tell us _how_ our signal is wrong and let us debug it.)

Hanging out at the starbucks in the mall, Jeff's working on a design for a page faulter so the ice40 system can map the 8 megs of spi flash as memory and we have an sram cache with plumbing that loads spi flash into the sram replacing the old contents as necessary. (Basically like a read-only CPU cache, except it works for a specific physical address range and loads the contents from spi flash.)

I'm working on toybox release notes for the long delayed 0.8.2.


October 1, 2019

The 50 new turtle boards are held up in customs leaving china for a week, meaning they won't get here until I fly to Minneapolis and Jeff flies to Japan. Of course. Well we still have 2 of the original turtle production run in a salvageable shape (as in when the power connectors broke off they could be soldered back on, and this time we expoxied them in place too; the new ones have a metal clip holding the thing on so the electrical solder connection isn't load bearing anymore).

We dug up the HDMI test pattern generator we ran on the turtle years ago (doing DVI encoding, simple and out of patent), and bolted it to the side of the j-core processor (rather than a seperate standalone bitstream like last time), which took hours longer than we expected because getting vhdl and verilog to share a clock turns out to be fiddly in the toolchain. But it works now, and we sent Martin a bitstream that tests Linux running and HDMI output, so when the boards arrive he can smoke test them.

Now we're designing the actual bitmap: 640x480x16 bit (64k color) means the frame buffer is 614440 bytes. At 60 frames a second (progressive scan) that's a continuous ~37 megabytes per second read from memory to display the screen. We have about 200 megabytes memory bandwidth in our one bank of LPDDR2 DRAM, so it's just under 1/5 of our memory bandwidth eaten by the display. If we do interlaced instead, that's half as much bandwidth, or under 1/10th.

That's just display, writing to the screen is done by the processor. But we can cheat and have an offset register saying where the top starts, and have 480 lines wrap around from there, so scrolling up/down is free and the CPU just has to redraw the new line in the framebuffer.

We can avoid worrying about memory contention/prioritization between the processor and the display by having a one scanline buffer (640x2=1280 bytes), which is exactly 40 of the 32 bit cache lines the processor uses. Treat the scanline buffer as a ring buffer, and each time we're done with a 32 bit chunk (because it went out the display) we queue up a read for the next scanline's replacement data which has 39 more cache lines to get back to us before we'd glitch the display. That's plenty of time, no need for fancy prioritization...


September 30, 2019

I have removed screen from my todo list because the arguing on the list convinced me it's not within scope for toybox. A screen that MERELY lets you share multiple terminals across a single ssh session and to detach and re-attach them so they persist when it's disconnected is clearly utterly useless if it doesn't reimplement tcl/expect. Oh well.

I also replied to a github pull request about adding ln -t (which is not the same as -T) with:

What's your use case here?

I ask because you're the first person to want this flag. It's not in posix (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ln.html (or LSB https://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/cmdbehav.html) and busybox is 20 years old and still doesn't have this flag, presumably because nobody's ever asked them for it.

I'm happy to add features people need, but filling them in just because they're not there is "infrastructure in search of a user". If a user for such a thing ever _does_ show up, there's no guarantee the feature meets their needs, and it's gone untested in the tree for long enough it may have bit-rotted anyway as the context of the project (and kernel, and libc, and compiler) shifted under it.

Since it's just as easy to add the thing later when there's a real user to provide immediate feedback, I try to ask about the use case so I can at least add a real-world-inspired test to the test suite.

And they replied with a real-world use case, so I applied the patch. (I push back about "do we need this" a lot, but real world use is a pretty strong argument.)


September 29, 2019

Wow expedia's really crap. My trip itinerary expired, so the link can't show me my flight numbers and times anymore, even though I don't fly _back_ until October 10th? That's deeply stupid. (Then again, it was founded by Microsoft in 1996. I don't use it to book travel, but Jeff does.)


September 26, 2019

Sigh, bikeshedding on the mailing list about screen being insufficient and how we NEED every weird thing tmux does including scripting(?) which just makes me want to remove screen from the todo list. The perfect being the enemy of the good, I think tmux is definitely out of scope for toybox and people seem PASSIONATE about this... Not touching that mess with a cattle prod.


September 24, 2019

Added ln -T to toybox. Apparently it's just -n with an error if the destination isn't a directory, not sure why that option exists? But somebody asked and scripts are using it...


September 19, 2019

Flying to Canada to hang out with Jeff until October 10th, in a distant suburb of Toronto. (Apparently renting an entire airbnb house for 3 weeks is cheaper than trying to get office space, let alone hotels on top of that. *shrug* It still makes more sense than "wework". Pretty sure this is some variant of gentrification driving up real estate prices though.)

I did not get the toybox release out before my flight.

The old saying, "A journey of a thousand miles has a stopover in atlanta" applies to this flight.

In Toronto. Jeff's text to me in Atlanta was that because I took a redeye flight he's looking forward to a "low impact evening". Upon arrival he couldn't pick me up because he just had a car accident. I asked him not to phrase things that way in future. (He's a CEO: Forward Looking Statements are a thing they have to watch out for, especially ironic ones.) So I'm at the airport waiting for Jen to pick me up.

There's a "poutinery" near the waiting area (indoor food truck?), and I was staring at their menu trying to find the poutine on it until they explained that everything is poutine. Including the cheesesteak. I retreated in confusion.


September 18, 2019

Stallman's out at the FSF, due to his me too issues which I've heard about from a half-dozen people over the years. (It's not that he'd hit on basically every woman I met in the Linux community who'd ever met him, it's that he didn't take no for an answer.) Linux Weekly News has coverage.

I should probably do a longer reaction but... I'm glad he's gone? I've made no secret I've considered the man a net negative to open source for 20 years now, and to have more or less gone crazy sometime since the 1980's. (My friend Stu Green, who founded the original Austin Linux User Group and worked for both of the big Boston Lisp companies in the 1980s, said he knew RMS back then and was on suicide watch with him over rejection by women. More or less 30 years ago.)

Sometimes old problems are reviewed in a new context, and can finally be addressed.


September 16, 2019

Feeling a bit better. My sleep schedule is still deeply weird.

Jeff booked me a flight to Canada on thursday, we can get together up there and triage the past couple years' todo items.

Elliott posted an updated android sitrep about how macos and linux are both using the same toybox commands now for the AOSP build (yay), so naturally I want to look up what those are. I _could_ dig through my old twitter feed to find two questions about aosp top level, but ever since twitter "improved" its javascript last month, using CTRL-F search on twitter.com/i/notifcations finds no hits. (They're in some sort of nested tables or something, it doesn't search the actual _tweets_.)

So I thought "I can just look at the copy of AOSP I have checked out"... yeah, about that: bootstrap.bash doesn't say what it's for or why you'd run it. (It does say it's obsolete and that you should do an alternate invocation with new tools instead, but does not say _why_ you would do this. Does this rebuild the prebuilts?) And Makefile is just an #include of build'/make/core/main.mk (bootstrap is a symlink, this is an include...?)

Trying to figure out what any of this stuff _does_ by reading it is very, very non-obvious. And quite a slog.


September 15, 2019

I'm trying to finish up pending stuff for a toybox release, which is mostly (but not entirely) doing a "git diff" on my tree and seeing what dirty files I can finish off today, and the commit message I _almost_ just checked in is:

Allow --mktemp to have an optional argument, while -p has a mandatory arg.

If somebody supplies both -p and --tmpdir, -p wins. Trailing square bracket optarg syntax acts on short options, I didn't give --tmpdir a _new_ short option so I can't have -p and --tmpdir switch each other off, so I can't track what order they happen unless I want to grovel through argc in the command and I really, really don't. I've already got manual code collating the results _and_ duplicating "/tmp" and such because the logic ISN'T THE SAME because the original command is nuts.

BUT... if you specify -p "" then it acts like --tmpdir with no arguments. (The mandatory argument being an empty string triggers the fallback behavior.) So I can have --tmpdir with no argument (in the absence of -p) trigger the same fallback and... now half of make test_tmpdir is failing.I think I'll shelve this until until the cold is over.


September 13, 2019

My talk is up! But not on Youtube, and the link the Linux Foundation put it at asks for $5.

However, after I complained on twitter they clarified that they're not _intentionally_ charging $5 to view it, but that A) they're only doing that accidentally, B) there's a watch it for free link if you look closer than I did.

They also confirm that deleting all the 2015 videos was another accident, but say it's not recoverable.


September 12, 2019

The J-core website is back up! (The DNS registration had expired.) I have ssh access to update it again! And a cold that makes thinking kinda loopy so I don't trust myself to actually commit anything.


September 10, 2019

I have a terrible head cold. Got it from fuzzy, who had it for a week. This probably means I'll have it through the weekend. Great.

Hard to think through. Plus the air conditioner died, which turned out to be the drain line clogging until the condensation trough thing filled up. (They installed a pump, which should prevent it from happening again. Yay having been paid recently. I am not particularly motivated by money, but a _lack_ of money is highly distracting. Daniel Pink had a lovely talk about that, and I put it on the kernel doc page I maintained back in 2007. It's a pity they took away my access to update it or I'd _still_ be doing the occasional kernel documentation poking for free as a hobbyist, but the Linux Foundation doesn't believe in the concept of hobbyists anymore.)


September 9, 2019

Editing and posting old blog entries is a great way to find lost todo items, which is why I sent a patch to Michael Kerrisk to document REG_STARTEND, and since it's man page syntax the patch looked like:

+.SS BSD regex matching
+.TP
+.B REG_STARTEND
+Use pmatch[0] on the input string, starting at byte pmatch[0].rm_so and ending
+before byte pmatch[0].rm_eo. This allows matching embedded NUL bytes
+and avoids a strlen() on large strings. It does not use nmatch on input,
+and does not change
+.B REG_NOTBOL
+or
+.B REG_NEWLINE
+processing.

September 8, 2019

I'm muttering out loud to myself about shell design, which probably means I should blather into the blog.

Finally cycled back around to debugging for loops, and was thinking I should do environment variable resolution next because "for i in 1 2 3; echo $i; done" isn't useful when it's literally echoing "$i", not the contents of i. But where I left off was halfway through flow control and I really should finish that, except my flow control todo list had:

  1. if / then / elif / else / fi - done

  2. while / until / do / done - similar logic to if above, done

  3. for / select / do / done - technically 1/3 done, because for has 2 forms and I can't do (( )) until I implement a math parser, and I haven't done select prompting yet because I've honestly never used it, although it seems simple?

  4. case / esac - can of worms, and again I haven't personally used it. I'm probably missing the "case)" parsing, that would be a type 2 flow control blockstack entry I think?

  5. { / } - sort of already done? It's a NOP flow control statement that allows group redirects. (I.E. you can pipe the whole block into a command.) In theory it's the same as "if true;then contents;fi", I don't think there's anyting _for_ me to do here (except inplement group redirects, which are still a TODO).

  6. [[ / ]] - this is the "test" command plumbing with different command line parsing. I think I need to call test_main() directly, bypassing lib/args.c (which is fine because the NEWTOY(test) line has a 0 optargs so it already bypasses it and works on the raw argv[], the tricky bit is toysh needs to SELECT test in the config so test_main() is available. And I need to do the expansion (bash man page says "tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal are performed" but NOT word splitting or pathname expansion. That's the same NO_PATH|NO_SPLIT as local variable assignments before commands, common codepath).

  7. ( / ) - same as {/} except I run the function in a child process, which is fun to do with vfork(). (I have the start of context marshalling code. I'll probably break down and have two codepaths eventually, but I need to implement and test the nommu one FIRST or it'll never be as good as the other one.)

  8. (( / )) - this is the same for ((;;)) plumbing. Right now it parses as a single word and I hand it off to a function that does it. INSANELY "((i=42)) > walrus" creates the file walrus. I have no idea how this would apply to anything, but it needs the end-of-block redirects, which means it needs to be a self-terminating block (type 1 _and_ type 3). Probably make it a type "(" and special case it or something. Anyway, stays on the todo list until I do a math parser. (Last one of those I wrote was in Java in 1998. I was quite proud of it, but that code got buried in a sea trench when I left Quest Multimedia, which is when I started really caring about my stuff getting open sourced so _I_ could access it again in future. The other hiccup here is it's 64 bit integer math and I keep wanting to use floating point in $((blah)), and does that require two codepaths or...? I _was_ going to have this also handle bc, but somebody wrote one instead.)

  9. function(){ / } - function definition. The code I have now parses a function into a data structure (linked list of "struct pipeline" containing the pipeline segments and flow control statements at the right granularity), and then run_function() runs the data structure. What a function() statement does is it copies the pipeline segments in its { } block into its own list, attached to a name in a list of function names. And the hard part here is lifetime rules: can I just point to the _existing_ pipeline segment list (I know where it ends, the type 1 and type 3 pipeline segments nest like parentheses), but if the block I'm calling it _from_ goes away...? (Because they typed it on the command line and we're done with it now...) Sigh, I should always copy it, but that means (gratuitously) copying the strings it points to in each argv[] which just seems _sad_. (Lifetime rules! *shakes fist*)

Of course the "not wanting to copy strings" problem is why toysh was on the todo list for so long, because if you mmap() a shell script in THEORY you can have all the strings just be pointers into the mmap() memory with null terminators stuck in the right places... except it doesn't work because (echo) is "(" "echo" ")" and there's no _room_ for null terminators. Which is a relief because working out the lifetimes otherwise was a NIGHTMARE. (You pretty much have to reference count each string block, at which point what are you really saving? Yes, I could do pascal-style start/length pairs but that's now how C works, and C won for _reasons_.)

I should not fall back in that hole. I should do it the obvious but inefficient way where defining functions inside loops is just gonna thrash memory a lot, and be done with it.

Ha! I just tried "abc() { echo hi; def() { abc;echo lo;};def;}; abc" and it ran until the tab closed because the shell had crashed. (Stack exhaustion I'm guessing. Yup, add a bash -c in front of that and it runs until Segmentation fault. (Printing "hi" but never "lo", of course.) I wonder if I took the lo out would it recognize tail recursion and not segfault? Right, tangent...

Ok, PROBABLY what I should do is finish the "select" querying so that bullet point's as done as I can get it without the math parser, then try to get [[ ]] working and calling test_main and add < > comparisons to test (which you can quote)... no, I need to disable redirects within [[ ]] and the way I did that for (( )) was treat it as a type of quoting so it parsed as one big word, I kinda _don't_ want that here. Except <= and >= and ~= are tokens. Hmmm...

$ [[a<b]] && echo yes
bash: b]]: No such file or directory
$ [[a<b ]] && echo yes
bash: b: No such file or directory
$ [[ a<b ]] && echo yes
yes

Ok, I should probably do basic environment variable expansion first. That's gonna have more of a "this is immediately useful" impact. And I should hook up the && || | pipeline plumbing so it's not just parsed but actually used after that. Then probably do redirects. THEN cycle back around to [[ ]]...

Except if it has parsing impact I want to do it now while it's fresh in my memory. Which implies I should handle case/esac now too. Grrr...


September 5, 2019

Email from Denys Nykula asking for "head -n -123" (all but last NUM lines of file), "sed q123" (exit code extension), "patch -f" (which is how patch is already behaving so it just needs to ignore it), and "find -newerXY". Ok then.

Yay, Jeff got his deposit into the new company account! The bitcoin sold (took 2 days because the lawyer inserted herself into the process demanding a limit order at a specific price it wasn't getting up to), and a deposit is pending into the account. And of course held up until monday because it's a US Dollar deposit into a foreign account, of course...

But still, looks like I'm working for SEI's successor company (Core Semiconductor) through the end of the year. (For very little money, but it's enough to pay the mortgage.) The question is whether I'll be flying to japan for this or what...


September 4, 2019

Busy week, what did I did...

Fuzzy was visiting Fade up in minneapolis for a week, dropping off the dog and seeing the state fair and such. She had a good time, and is back now.

I let a homeless old lady sleep on the couch for 3 days while I was otherwise home alone, bought her a burner phone from wal-mart over the weekend, and spent most of monday helping her find weekly lodgings in what's more or less a youth hostel. (We took a bus to see it, because I still don't have a car and although Fade has a lyft account I never installed the lyft app on my phone. Multiple beds per room but she has her own bed, locker for posessions, and access to a bathroom and kitchen. It's $125/week, I gave her cash for the first week.)

And then when I sent her off so I could clean up before fuzzy got back from the airport, I'm told she didn't show up at the weekly lodging. Oh well, I tried.

Programming-wise I got zero done on the shell this week, because there were all the sidequests. Something like 5 more half-finished things in my tree not yet checked in, of course...


September 1, 2019

Sigh. The Android guys convinced me to neuter xprintf() and xflush() to the point where xprintf() is usless and xflush() doesn't do what the name implies. I mention this because there's an fflush(stdout) in netcat right after a printf and I'm going "surely that's just xprintf()" and then... no, it isn't. Because xprintf() doesn't work anymore.

Sigh. I suppose I can replace it with dprintf(1, ...) at least.


August 31, 2019

Jeff got enough money for his new company to pay me 1/4 of what I was making at JCI, at least through the end of the year (by which time we need more money, preferably by having gotten something to market). *shrug* It keeps the lights on, and if we succeed it's changing the world for the better. He sent a wire transfer for the first 2 months. Ok then, I'm back onboard.

Still working on toybox while he gets everything else spun back up, but I need to get to a good stopping point. (Jeff is paying me. Android is not paying me. I do what I can, but it's $DAYJOB time again.)


August 29, 2019

Here's an updated version of the June 7 digression (in response to the new android status update).

I'm trying to break circular dependencies at the root of system builds. The minimal native system that can rebuild itself under itself, where you don't just have circular dependencies but _ALL_ the dependencies form one big circle, is conceptually 4 packages: kernel, command line, compiler toolchain, and libc. Anything needed to build any of those, and package the result into a bootable system, has to be in toybox. (There's more "convenience" stuff like top and vi, but those are non-negotiable.)

Long term what I really want to do is get linux, toybox, a vastly upgraded tinycc, and bionic to build standalone. (This means I need to write a makefile or similar for bionic; I have yet to get it to build outside of AOSP as a standalone host thing.)

Then as part of "bootstrapping up to more complexity" I want to use the LLVM C backend to produce a C version of LLVM that tinycc can build but which produces optimized output, then rebuild it with itself so it runs fast. I realize that new cfront won't run in the tiny system, but in theory the source it produces is architecture target agnostic, so I should be able to convert it once each release and import it into the tiny system. (Modulo tinycc not being able to build a lot of the code I tried to throw at it, and the people who've been "maintaining" it since making zero advances I'm aware of. Plus it supports maybe 3 architectures... There's work to do.)

One of the reasons to do this is countering trusting trust, but also because if you're can't reproduce something from first principles (in isolation from external uncontrolled variables) what you're doing isn't science, it's an art like alchemy. (Computer security is definitely still defense against the dark arts.)


August 28, 2019

Ugh, it's one of those days where everything I touch explodes into tangents. Elliott's feedback on the DIRTREE_STATLESS change breaking find. A pull request of the "something vague broke, work out a test case by reverse engineering my patch" form. (I love bug reports. Patches that don't clearly say how to reproduce the problem they purport to fix, less so.) And I tried to run a test under the new "make root run_root" infrastructure and init SEGFAULTED. Which I think is a glibc problem (or glibc/qemu mismatch) but... seriously, segfault? What the...

Now I just got a patch to patch to drop support for filenames with tabs in them because someone, somewhere, is still using mercurial.

I just wanted to poke at toysh some more while I still have the chance. (Jeff says funding has closed and now they're waiting for it to deposit! I told two different recruiters that I'm busy for the next 3 months. Fuzzy's scheduled to get back tuesday, I may be flying out to Tokyo then. Dunno yet.)


August 27, 2019

Fuzzy flew out with Adverb, up to Minnesota to see Fade's new apartment (which is across the hall from her old one, but now she only has one roommate instead of 4).

Adverb threw up on my bed at 6am (probably sympathetic nerves from Fuzzy, who was completely melting down over the trip).


August 26, 2019

Took a recovery day after flying, and now I'm banging on... the ls command, oddly enough. The top level android directory is sprayed down with selinux rules to prevent normal users from listing its contents (in a way that doesn't actually work, but ok). And the result is you can readdir() the contents but calling stat() on any of it fails, which drives ls bananas.

Ubuntu's ls doesn't handle this well either, but I finally managed to reproduce it with a chmod 444 directory.


August 24, 2019

Wrote a thing from the airport (which translates a bit weird from email to a blog entry when it comes to URLs, but eh):

From: Rob Landley
To: speakers@ohiolinuxfest.org
Subject: Can I submit a talk for _next_ year yet?

I'm in the airport to fly back from ELC and I really want to do a talk on "Why the Linux Foundation is bad at its' job." I had fairly extensive hallway conversations with people about it yesterday, and am having a twitter thread about it now, and really this should be a talk.

It would include bits of this and this and this but mostly use the framework of this to explain why this happened.

Rob

P.S. This year ELC was colocated with FIVE other events. There were 19 panel tracks running simultaneously. My talk was 35 minutes long and scheduled against 18 other talks. Microsoft and Facebook were everywhere. I should probably chart the Linux Foundation's destruction of a rather nice conference I was keynote speaker at in 2006: they took it over in 2010 and have attended several times since, commenting on its status each time, for example...

P.P.S. I have kind of a weird insider view of the Linux Foundation because I applied for a Documentation contract with OSDL right before they merged with FSG to became the Linux Foundation, and I wound up reporting to the Linux Foundation's CEO for a bit (having checkin phone calls with him while he changed planes in airports).

It was a surreal experience and I watched them figure out who they were and what they wanted to do in realtime, and I was kind of unhappy with the result but not in a position to do anything about it. Oh well.

OLF's CFP closed on the 17th, and when I saw that I sent 'em email. They're the only venue I can think of that's NOT run by the Linux Foundation (yet).

I bumped into one of the Linux Foundation guys in the Starbucks at ELC (he recognized me, I didn't recognize him) and he said they still use the "accretion disk" phrase I used to describe them once, as an in-joke around the office. This led to a long talk with the guy I'd been talking to (Joe from IBM, I should have asked him what his last name was) about the three waves stuff and how the linux foundation's effectively driven all the hobbyists out of Linux and why that's a bad thing. Kinda burned out my voice from talking too much.

Sigh, I should just record a podcast about this, but I'm terrible at doing things without an externally imposed deadline. I have too much to do that people are _already_ waiting for...


August 23, 2019

I gave my talk! The presenter after me was VERY nice and let me go 20 minutes over my time. (He didn't have anything scheduled after him, so he could run late, but it was still hugely nice of him.)

It went ok. Coulda been better, I'm closing tabs now and going "ah, I should have mentioned that" on all sorts of stuff, but... it was a decent talk.

I meant to upload my talk outline to my patreon but didn't get it ready soon enough, and then forgot to mention it at all during my talk. Oh well.


August 22, 2019

I saw Tim Bird yesterday (he passed in the hall, said hi and apologized he couldn't stop and talk, total interaction about 10 seconds), and I saw a second person I recognize in the line at starbucks this morning. So far, that's it from the "familiar faces".

Up bright and early poking at my talk outline. The problem with writing documentation is the same problem as teaching, when you explain it to somebody else you realize how you SHOULD have done it all along and stop and rip apart the code to fix stuff.

Some other things I go try to clean up and bump into a comment I left myself about WHY it is that way. (.singlemake isn't in generated because "make clean" would delete it otherwise, and "make clean test_sed" should work. Hmmm, maybe I can add a build dependency to recreate it? Hmmm...)

Ah, avoidance productivity...


August 21, 2019

Left for the airport at 6:30 am, flew to San Diego on Frontier, which is now out of the "south building" of Austin's airport, which is _adorably_ tiny. It's 3 gates. You walk out to the plane on the runway and climb up stairs, like we used to do on Kwajalein.

Got to San Diego, the temperature was nice and google maps said the hotel was only about two and a half miles from the airport, so I walked along the riverfront. There was NO SHADE (solid concrete, no trees) and I got sunburned.

Arrived at the hotel and... I don't recognize anyone here? The hobbyists have abandoned ELC. I am sad. (My badge says "hobbyist" as my affiliation, and it's the only one. Every single other badge is a corporate employer.)

Not that I can really blame them. The Linux Foundation has "colocated" ELC with at least one other event for something like 5 years now, and this time it's colocated with something like five other events. (SO many emails trying to get me to buy tickets to more than one event, for more $$$ of course.) There are 19 talk rooms, and they're giving us 35 minute talk slots.

So my talk tomorrow is against 18 other talks, and a half hour long. I flew to San Diego for this? Oh well...


August 20, 2019

Working on my conference talk. I have about 2 hours of material and 35 minutes of scheduled talk time (so realistically 30 minutes).

Honestly, this slot is for a lightning talk. The Linux Foundation has "colocated" ELC with so many other events this year that I am literally scheduled against EIGHTEEN other talks in the same time slot. That's in ADDITION to cutting the time in half. This is probably the last year I speak at ELC, because seriously, this is insane.


August 19, 2019

Trying to clean up for a toybox release.

Oh hey, you can spam dmesg as a regular user:

[367266.808983] EXT4-fs warning (device sda1): ext4_ioctl:483: Setting inode version is not supported with metadata_csum enabled.

That's probably not a good idea, but it's a kernel issue.


August 18, 2019

An email reply I wrote that might be of more general interest:

On 8/17/19 11:36 PM, XXXXX wrote:
> Hey, someone linked me to:
>
> http://lists.busybox.net/pipermail/busybox/2010-December/074114.html
>
> which has a historical origin story for the /usr split.

That was written off the top of my head one day without checking references. It goes mildly viral every couple years, and one of the times it did a magazine asked if they could publish it and I sent them an updated version with a couple corrections and more sources.

> However, this is not the history I learned for it back in the 80s. I
> went looking for what I remembered, and found:
>
> https://www.bell-labs.com/usr/dmr/www/notes.html
>
> which matches what I recall.

Yes, a thing I wrote extemporaneously to a mailing list 9 years ago got a detail wrong. Well spotted. Did the corrected version from 7 years ago fix it?

> Specifically:
>
>> In particular, in our own version of the system, there is a
>> directory "/usr" which contains all user's directories, and which is
>> stored on a relatively large, but slow moving head disk, while the
>> othe files are on the fast but small fixed-head disk.

Originally /usr stored the user's directories, then later the OS leaked into /usr (duplicating /bin and /lib and such), and when they got a third disk they mounted it on /home and moved the user directories there so the OS could eat the whole of the second disk (still on /usr).

Is this not what I said?

> I haven't been able to find any references to the "two identical disks"
> explanation earlier than your post, and I'm wondering if you happen to
> remember where you ran into it.

See the corrected version above. Their initial 2 disks were a small fast one (0.5 mb) mounted on / and a big slow one (2.5 mb RK05 external disk, sort of like a USB hard drive) mounted on /usr. I had the total size right, but off the top of my head didn't remember how they were divided. They later got a second 2.5 meg RK05 disk pack (and thus _had_ two identical disks, but that wasn't the initial setup), mounted the new one on /home, and moved the user directories there.

As for more references, replace "notes" in your URL above with "hist" and there's another page of reminiscence from Ritchie. There's also an excellent book called "A Quarter Century of Unix" by Peter Salus, written by the head of Usenix for Unix's 25th anniversary in 1994. It might be out of print, but there's a PDF online. (He wrote a sequel called the Daemon the Gnu and the Penguin which I have a copy of somewhere but never finished reading: kindle on my phone is always with me and physical books mostly aren't. It started out serialized on groklaw, I should just pull up that copy...) I probably also have relevant stuff in my history mirror but haven't looked at it in a while.

> I note also, when I worked on one of the Berkeleys, I don't think
> we had to worry about /lib matching /usr/bin, because shared libraries
> didn't go in /lib;

Given that dynamic linking in unix wasn't a thing for its first decade, I'd be surprised if you did. This says Sun's first shared library test case was xlib, which can't have been before 1987.

Sun hired away the first BSD maintainer in 1982 and he stayed in charge of their OS dev until AT&T paid them to rebase from BSD to System V (hence sunos->solaris (see the long strange trip to java and also the book "Under the Radar" by Robert Young, and one of the things in that mirror page above is Larry McVoy's sourceware operating system proposal), so if BSD was doing it Sun would already have been doing it. Here's a paper on a shared library prototype from 1991, by which point Unix was over 20 years old. (Here's some good stuff on early Berkeley history circa 1979. And this was nice too, written by the guy who took over from Bill Joy.)

> root was statically linked so it could be used
> for recovery purposes, and shared libraries went in /usr/lib.

Everything was statically linked on the Bell Labs version of Unix. The System V guys were a different division within AT&T. Bell Labs continued putting out releases through version 10 but version 7 was the last publicly released one (because AT&T wanted to commercialize Unix, and the Apple vs Franklin decision in 1983 allowed them to do so, so they suppressed the internal competitor), and then the Bell Labs guys started over with Plan 9, and Dennis Ritchie let himself be promoted to management...

> As you note, a lot of this no longer makes any *sense*, but I'm
> confused by the divergent accounts of what things were like back in the
> 70s and 80s.

There's plenty of primary references is you dig.

Bell Labs withdrew from the Multics project in 1969 (which was doomed: it had 36,000 pages of specification document written before the hardware arrived, and had O(N^2) algorithms all over the place so it ran ok with one user logged in, could sort of stumble along with 2, and froze as soon as the third simultaneous user logged on). The bell labs guys knew it was doomed and wound up spending all their time playing an astronomical flight simulator called "sky" (a "test load" for the system; basically an early text version of lunar lander or kerbal space program). Management blamed them for killing multics not because it was doomed but because clearly they'd played sky too much, so decided to punish them.

The multics contract was still funded through the end of the fiscal year and management refused to give them a new contract until that funding ended, AND transferred their offices to a storage attic (as punishment). In the move the ex-multics team lost access to the GE 645 mainframe they'd been running code on (they had no longer had a terminal connected to it, but could still send a paper tape via inter-office mail and get a printout or results tape mailed back), so they scrounged up a PDP-7 out of the storage closets (which had been set up to do graphics research with a vector display attached, but then the guy who'd done that had transferred to the Holmdel site and left it behind so it was theirs to play with, and was the ONLY computer they had exclusive 24/7 access to). Ken Thompson wrote his own assembler that could run on it, and used that to port "sky" to the PDP-7 now with _graphics_, but he and Dennis Ritchie also had filesystem ideas left over from multics they hadn't been able to do, so they wrote up a ramdisk based version on their PDP-7, which evolved into an OS they called "unix" which was either one of whatever multics was many of, or a castrated version of multics, depending on who was asking. All this was largely out of boredom waiting out their punishment. (There was a lovely presentation about this at Atlanta Linux Showcase in 1999 from one of the guys who was there. Apparently the roof of the attic they were exiled to leaked.)

When the Multics contract finally expired and they could bid on new contracts again, which came with funding for new hardware, they proposed buying a PDP-11 to create a typesetting system for the Patent and Licensing department (because those guys had money). The proposal was in 1970 and they got most of it working in 1971. (The timeline is in Quarter century of Unix.)

On the original PDP-11 system the Bell Labs guys ordered to service the patent and licensing contract for a typesetting system, their internal disk was 0.5 megs, Their RK05 disk pack was 2.5 megs but much slower, and was mounted on /usr. The OS leaked into /usr when they ran out of space on / hence /usr/bin. They got a second RK05 later, mounted it on /home, and moved the user account home directories to the new disk and let the OS consume the rest of the old one.

All this happened before the 1973 ACM paper (and 1974 presentation) that announced Unix to the world and got people requesting source licenses (which AT&T had to grant because of the 1957 antitrust consent decree, something the patent and licensing department knew very well), and it was before Ken Thompson took a year sabbatical from Bell Labs in 1975 to teach at his alma mater (the University of California at Berkeley; I _think_ it was fall 75 and spring 76 but would have to check the Quarter Century of Unix book to confirm), which resulted in the Berkeley Software Distribution which was awarded a Darpa contract in 1979 to connect VAX systems running Unix to the internet to replace the original (then ten year old) Honeywell IMP routers (see the book "Where Wizards Stay Up Late" by Katie Hafner for the BB&N stuff, and the Andrew Lenard link above for the 1979 replacement), and thus every internet connection DARPA sponsored for the next few years came with a VAX running BSD unix, which is why the Stanford University Network administrators decided to commercialize their m68k unix workstations under the SUN brand in 1982.

(Note: I haven't really checked references for this either, but I linked you to a few above if you'd like to.)


August 17, 2019

Grumble grumble bugs marshalling state between contexts. Writing the new shell I want to do as much as I can with in local variables in each C function, but the function either has to call a "give me more data" function (which makes it deeply awkward to feed multiple different types of input into it), or it has to return to its caller with a return code requesting more input, and save parsing state into a structure that's maintained between calls and gets fed back in for the continue.

I've chosen the latter approach, and I use a structure passed in as an argument (well, a pointer to it anyway) rather than globals because I'm not sure we won't be calling this from multiple overlapping contexts. (Functions, trap handlers, here documents, files sourced with the "." command, (subshells) and backgrounding& and $(commands) run in various <(contexts) that are parsed as late as possible...)

Parsing was working nicely until I tried "if true then date fi" with a newline between each word and that _should_ work, but the "if" set done=1 thus signalling that the next iteration of the loop should break the pipeline segment, then it checks it for HERE documents and sets pl = 0 which will allocate a new pipeline segment when we get another word of input (without which we don't _need_ another segment; we're done and I try to avoid empty segments so flow control doesn't have to deal with them)... but in this case parse_word() returns "there is no next word on this line" so we return to request more input, and when we get called again with the next word on a new line, the local variable state from last time is gone so the setup no longer knows we need a new segment. (It would have done it next iteration of the loop, but we returned rather than looping.)

Except the only time we're CONTINUING an existing segment is when we had unfinished quoting? Otherwise newline means the same as semicolon: end the current pipeline segment. (A pipeline segment is a command or a flow control statment like "if", "do", or "(". There's probably a better name for this but as with the OSI model the layers they've named and what we're actually _doing_ when you implement it don't match up.)

So yeah, I think the fix is that when we're re-parsing the initial context (coming back in with a line continuation), set pl = 0 unless we were continuing a quote or dealing with a HERE document.


August 16, 2019

Bash has too many error messages for minor variants of the same thing.

$ break 1 37
bash: break: only meaningful in a `for', `while', or `until' loop
$ for i in 1; do break walrus; done
bash: break: walrus: numeric argument required
$ for i in 1; do break 37 walrus; done
bash: break: too many arguments
$ for i in 1; do break 0; done
bash: break: 0: loop count out of range

Also, why is & supported after break and continue?

$ while true; do break & done
[1911]   Done                    break
[1912]   Done                    break
[1913]   Done                    break

That's an endless loop, and in theory with continue it's a tiny forkbomb. Is this what posix says to do? What? I can see why "test && break || walrus" is supported...

Sigh. To make _any_ of that work, I need to pass the block stack as an arugment to run_command(). (Or I need to stick it in GLOBALS but I'm trying not to do that because function calls and a single global context seem like they would combine poorly even before (subshell) and background& interacts with it.)

Hmmm, should I try to be _interoperable_ with bash? Specifically:

$ potato() { ls -l; }
$ export -f potato
$ env | grep potato
BASH_FUNC_potato%%=() {  ls --color=auto -l

That's not the export format I was using, but it _could_ be. (They did allias expansion in export? Why...)


August 15, 2019

Since toysh supports nommu and toybox doesn't do unnecessary multiple paths, I'm trying to make backgrounding and subshells work with vfork() instead of fork (if I implement them with fork() first they'll never get sufficient testing), and it's a SERIOUS PAIN.

The thing about vfork() is it doesn't duplicate the forked process, instead it suspends the parent process until the child exits or calls exec, and during that time the child is basically a thread not a process, and all its memory is shared with the parent. In fact, traditionally the child didn't even get a new _stack_ so changes to local variables and such show up in the parent too. Basically from the parent's point of view vfork() works like setjmp(), the actuall fork() is deferred until the exec() happens and until then it's the parent doing everything, at exec() time the parent longjmp()s back to the vfork() point and resumes what it was doing.

Except the child has its own PID and its own file handle table (so stuff you open/close after vfork() persists in the child after the exec(), and the parent's filehandles are unchanged when it resumes). So it's not a perfect analogy, but pretty close. And one thing it DOES help you keep straight is DON'T RETURN FROM THE FUNCTION THAT CALLED VFORK() because when the longjmp() happens your stack will be trashed and your next return will segfault.

ANYWAY, point of all this is vfork() has to be followed by an exec() or an _exit() and the parent's suspended until then. You only get ANOTHER process running at the same time once you exec. So implementing & or ( ) with vfork means toysh has to exec itself.

Now being ABLE to exec yourself is another fun little detail, because /proc/self/exe is only there when /proc is mounted (chroot and initramfs don't start that way), and otherwise you can sometimes sort of figure it out from argv[0] and $PATH (assuming you do so before either is modified and that they were set correctly: execve has "file to exec" and "argument list including argv[0]" as two separate arguments and it's only by convention that they overlap), and there's no guarantee that your binary is still available in the current filesystem anyway (switch_root deletes all the initramfs files but it still exists as long as it's open, pivot_root and umount -l are similar)... A process keeps the file open but it's not in the filehandle _table_: I argued with the kernel guys that execve(NULL, argv, envp) should re-exec the current process, but they kept boggling at why I'd want to do that and it never went anywhere.

Hmmm... you know, the container suspend/resume plumbing has to have solved that one, and they merged a lot of patches to allow exporting data that wasn't otherwise available to recreate a process's full state. I should look up how they do it... but they probably use and require /proc to be avaliable. :P

Anyway, backup to the topic: the child needs to know what the local variables and functions defined in the parent are, plus variable annotations like "integer" and "read only". AND it may need to be told what the child process is running.

Except the bash "shellshock" nonsense was "set an environment variable and bash runs it on launch". That's what I _need_ here, but I don't want anything _except_ this to be doing it. Hmmm... I can do stuff like use the empty name variable name that other stuff usually can't set, but you can set anything with env so that's out. I can include the PID and the PPID in the name, since those are known by the child process before vfork() and are trivial to query with getpid() and getppid() even in a no /proc environment, but are those two numbers (which usually have a range of 2-32768 unless you've written to /proc/sys/kernel/pid_max) enough security to prevent brute force attacks ala "this web page launches a shell script with an environment variable I control, imma hammer on it til I get it to run arbitrary code"?

Hmmm... ok, the child inherits filehandles from the parent, so if I open a specific filehandle number (derived from both pid and ppid) with specific flags, and then do the flag check ioctl on it and get the right flags back (which also means it's open), that's a second check that's cheapish to do and hardish to set from exploit contexts.

Hang on, I shouldn't do that. I should just set -c "thingtorun" on the child's command line. Yeah it's externally visible in ps but that's way less silly than allowing anything in environment variable space to execute. Modulo functions still need to get marshalled between contexts, but they don't auto-run. For what that's worth:

$ function echo () { blat; }
$ echo
bash: blat: command not found

Functions even override bash builtins...


August 14, 2019

Ok: changes to parse_word(): $( (anywhere) and (( (at the start of a word) start a quoting context ended by ), within which we count parentheses and the quoting context only ENDS when the parentheses count drops to zero. If (( has an internal parentheses count of zero but gets a single ) it retroactively becomes two individual parentheses.

Dear bash: if I end a line with a #comment but there's a continuation, and I then cursor up in the history? Don't glue the later lines after the #comment so they get eaten.

$ burble() #comment
> { hello;};burble
hello
$ burble() #command { hello;};burble

The two are not equivalent. (You can do multiline history! Really! Busybox ash does!)


August 13, 2019

One more week until my plane flight to San Diego. The shell is not ready to demonstrate yet. Still finding and adding parsing corner cases:

$ ((1<3))
sh: 3: No such file or directory

That's wrong, bash doesn't do that. Same for $((1<3)), it's a variant of quoting. We go into a different context (arithmetic context) within which < and & and ; aren't redirects or pipe segment terminators.

Revisiting quoting (again), bash treats [ and $[ differently for line continuation purposes:

$ [ one two
bash: [: missing `]'
$ $[ one two
>

Since [ is an alias for "test", it's not even treated as a flow control statement, while $[ ] is an environment variable expansion handled by the quote logic in the tokenizer. (Quotes must have matching end quotes or we need more input.)

I'm trying to work out how to treat $(( vs (( which both behave the same way: at the start of a line by themselves, if you then hit enter, it prompts for a line continuation. I _think_ one is being treated as a flow control statement and the other as a quote variant: if I go "true ((" it complains about an unexpected (. BUT both of those disable redirection within them. The variable/quote would do so automatically: it would parse as a single word starting with $(( and thus not be recognized as a redirection by the thing looking for redirections. (Environment variable expansion happens _after_ redirections are identified.) I have to handle (( before doing redirections, which means the "run a command out of this pipeline segment" logic needs to recognize (( before doing the normal command line processing. We still need to CALL the pipeline processing logic because:

$ ((3<2)) || echo hello
hello

But it looks like bash is doing a similar workaround, because:

$ X=3 ((3<2)) || echo hello
bash: syntax error near unexpected token `('

The local variable assignments don't get called. Even though just like with $((x+2)) the math logic will expand x into a variable (recursively!) until it gets a number or blank space... hang on, what happens if:

$ X=X; echo $((X+2))
bash: X: expression recursion level exceeded (error token is "X")

Ha! Ok, added to the test pile. Where was I. Quoting logic. Hmmm...

$ cat <(echo "$(((1<2)) && echo "hello")")
hello

The $(( there is actually $( (( which we can tell because $(( (1<2)) " puts us in negative parenthetical territory without it being a )) together. So the token parsing here also needs similar break-and-downgrade logic. Can I make them the same?

However it goes, I have to redo the parse_word() logic. And looking at it, I'm not sure it ever handled echo "$("${'THINGY'}")" properly...

$ echo "$("${'THINGY'}")"
bash: ${'THINGY'}: bad substitution

Of course neither does bash. But the parsing doesn't syntax error on it, and if you remove the single quotes:

$ THINGY=whoami; echo "$("${THINGY}")"
landley

Hmmm... For $(( I think treating it as $( followed by a ( is right from a parsing perspective, the variable expansion logic has to re-parse the block again anyway and bash already tracks parentheses nesting depth specially here:

$ echo $({hello)
bash: {hello: command not found
$ echo $((hello)
>

August 12, 2019

Dear bash: why does

for((
;
;
)); do echo hello; done

produce no output, but:

for(( ; ; )); do echo hello; done

does?


August 11, 2019

Ok, (( is a token when there is a matching )) rather than a matching ). Keeping in mind you can have ( ) inside (( )), but the balance can't go negative or the (( degrades to two ( (. I _think_ that's the rule, anyway. And of course in ((2>(1+3))) those last three characters don't parse as )) ) they parse as ) )). So the _tokenizer_ has to be tracking the parentheses count.

Posix says that shell environment variables MUST allow names with letters, digits (not starting with a digit), and underscore, and it _can_ accept more names. And that environment variable names are anything except =.

Bash is a lot pickier, it won't allow any punctuation except underscore, and rejects unicode characters:

$ せ=42
bash: せ=42: command not found

I'm pretty sure I want to accept unicode characters in variable names. Hmmm.

Bash accepts "function abc=def () { echo hello; }" but abc=def is a variable assignment, so I'm not sure how you'd call it... Ah:

$ "abc=def"
hello

Strangely, it does NOT allow function "abc=def", says it's an illegal function name. (Function names are not dequoted?)


August 10, 2019

All this shell stuff is forcing me to learn corners of shell programming I didn't know. (Some I bumped into briefly years ago and totally forgot.) For example, in bash "for((i=1;i<5;i++));do echo $i;done" prints 1 through 4. No, you can't have spaces between the parenteses, (( and )) are tokens, which call for arithmetic evaluation of their contents. Busybox ash does not implement this syntax, it throws a syntax error parsing it.

Meanwhile in bash, "for true;do echo hello;done" parses, but prints nothing. But "for echo hello;do echo hello;done" is a syntax error, unexpected token 'hello'.

Speaking of which, in bash:

$ for ((i=3
> i<5
> i++))
> do echo $i; done
bash: syntax error: arithmetic expression required
bash: syntax error: `((i=3
i<5
i++))'

This is the third thing (other than # and HERE documents) that cares about newline vs ; and it's annoying. Grrr.

Meanwhile, trying all the corner cases:

$ X=ii;for $X in a b c; do echo $ii; done
bash: `$X': not a valid identifier

In theory I should add negative stuff like that to the test list, in practice it's a pain to test negatives. Hmmm. I should probably do it anyway. I don't want to match exact error messages, but I can see that no stdout output was produced and it gave an error return code...

Darn it, I can't have (( be a regular token because:

$ ((echo a) | sed s/a/b/)
b

So what IS the rule for when (( is special and when it isn't? Hmmm...

$ ((1+
> 1<2))
$ echo $?
1

August 9, 2019

Finally caught up and applied the toybox patches Elliott sent during The Dark Time. (Except for the 2 I want to do a different way.)

Poking at my ELC talk, I have way way _way_ too much material I'd want to cover for the timeslot. When I first spoke at ELC in 2006, I had an hour. Last time I was there, a slot was 45 minutes. Now it's 35 minutes, scheduled against _18_ other talks, and ELC is "colocated" with TWENTY other events! (At least the Linux Foundation spam asking me to pay to register so I could attend some of those colocated events said so, and they were proud they'd just added more!)

Guys: a normal convention already has more stuff in any timeslot than I can go to. Fade goes to 4th street, a writer's conference that has ONE panel track that everybody's at. At Penguicon and Linucon we had 5 tracks going in parallel, and that was _already_ more than anybody could go to. (We were a combination Linux expo and science fiction convention: 2 SF tracks, 2 Linux tracks, and a "crossover" track.) It gave people choice, but there were also inevitable conflicts, which is why we posted panel recordings for the ones you missed. (And at Linucon, we had a half hour between panels for recovery time and mingling, and scheduled food events in the con suite.)

This pathological "colocating" the linux foundation is doing is not an advantage, it's a schedule conflict. It means they're BAD at this. But they don't think so because their goal is to get money. These are for-profit conventions done as fundraising events for a trade association which is the same kind of legal entity as the Tobacco Institute and Microsoft's old "Don't Copy That Floppy" sock puppet. Microsoft has now JOINED the Linux Foundation.

Sigh. Anyway, back to my talk. Writing up all the different things I could talk about, and then hopefully there's a clear subset that I can cover coherently in the allotted time.


August 8, 2019

Laptop back together with new hardware! Woo! Much reinstall.

While buying an ssd, since I was at best buy anyway, I got a USB bluetooth adapater so I can use my wireless headphones with my laptop as well as my phone. Plugged it in and... nothing. Ok, google a bit, devuan didn't install bluetooth support by default so there are some packages to install, and now I've got a gui thing that's... really clunky. But ok, switch on headphones, click scan, it sees it, and it paired but couldn't associate? What? (These are seperate steps?) It wants me to choose between "handsfree" and "audio sink" which should NOT be my problem. (At a guess one's with microphone and the other isn't, but that's JUST a guess an terribly phrased if so...)

Connection Failed: blueman.bluez.errors.DBusFailedError: Protocol not available.

Googled, found a page which didn't fix it. Maybe I need to reboot, but instead I yanked the usb thing out and threw it back in the box and dug out my wired headphones. Linux on the desktop! (My android phone Just Worked with these headphones. Of course android ripped out the "standard" Linux bluetooth daemon and wrote a new bluetooth demon from scratch a few years back, which is presumably _why_ it just works.)

Doing more or less the April 16 setup, except I copied the home directory verbatim (new enough install I wasn't worried about version skew in the file formats of the .hidden rc directories) so half the setup is already done (because the xfce config was retained, so it already knows it has 8 desktops and so on). But I still have to reinstall a bunch of packages and uninstall xscreensaver and undo the "vimrc.tiny" sabotage...

Why on earth does the user account debian/devuan creates NOT automatically belong to sudoers? (Linux on the desktop!) And adding the group to the user account requires a reboot for it to take effect (because all the loaded command line tabs haven't got it, and new xterms fork from a logged in user process that hasn't got it), but adding the USER to the sudoers file (which you're not supposed to do) takes effect immediately... *shrug* Big hammer, move on. (Except while I'm at it, yank mail_badpass out of there because it's not 1992 anymore. Honestly.)


August 7, 2019

Remember that dodgy laptop hard drive? Yeah, it died. Not exactly a surprise, but a disappointment. I had backups, didn't lose anything worth mentioning. Still a productivity hit from needing to reinstall and set the distro up again.

Went to Best Buy (because they were open 2 hours later than Fry's and not on the other side of town), held a replacement drive for like ten minutes while staring at the SSDs, and broke down and bought an SSD instead. (3 year manufacturer's warranty? Eh, ok, give it a try. I survived the last drive failing, so my discomfort with "this storage thing will wear out from use" isn't gonna be _worse_ than the previous disk giving out after a month or two. My big worry is I tend to drive my machines into swap thrashing and on an SSD I may not _notice_ and burn out the disk fast, but... 3 year warantee.)

Back up early, back up often.


August 6, 2019

Jeff sent me a link to the github project that was extending llvm to support sh2 (until I think the guy doing it graduated and got hired by Apple).

Kinda up to my neck in todo items right now, but I added it to the list.


August 5, 2019

Going back and editing old blog entries so I can post them, and I'm at the end of February where I was leaving JCI due to burnout.

The thing is, I was out of actual productivity, but I could still debug stuff. I could probably debug stuff while on fire. That's not creative, merely clever. There IS an answer, bisect your way to it with an axe, figure out how to stick a flashlight into the right cracks to illuminate the problem, making new cracks with said axe as necessary. That's powered by anger. Frustration only makes my debugging STRONGER.

But JCI's new enviornment was Yocto: all their code was freshly written or supported by Wind River, so they had relevant people to debug it and didn't really need to call me in to reverse engineer 20 year old code nobody available was already familiar with. And in _order_ to debug it I'd have had to become a lot more familiar with yocto, and all my "there's your problem" instincts were swamped with "well there's your problem: it's running yocto with systemd"...

Sigh. It's easy to look back on it from a fairly recovered state and go "I could have done X, Y, and Z", but at the time I had trouble getting out of bed. I wist that I should have paid down the mortgage, but life also has to be worth living.


August 4, 2019

Environment variable names can have any character but NULL and =, so "env '()=123' env | grep 123" shows ()=123 as a variable that was indeed set. But () isn't a valid _shell_ variable name, so if you launch a shell with that in the environment it's filted out from what "export" shows.

I have yet to find the variable name rules in the bash man page to implement this filtering with. (I can also read the posix spec, but I'm implementing the bash man page as my first pass. Plus 8 gazillion tests like this where I think of a thing, try it, see what the behavior is, and then try to track it down in the documentation.)

I _think_ the rule is that variable assignments don't allow quotes or escaping in the name part, so any control punctuation like " ' ` ; or & can't be part of the name... except the existence of _any_ punctuation before the = seems to throw it off. (You can't have an @ or % in there, which are fine unescaped in command names because they don't mean anything special to the shell.)

And if you can't assign it, you can't export it with the "export" command (which _does_ allow escaping because it's an argument to a command, albeit a builtin).

Oh, the _other_ fun "not a variable" is zero length name. (I have an "env =3 env | grep '^='" test in env.tests because yes, it works. No, the shell doesn't allow it. But you can use env to feed any of this to programs...)

Hmmm, how to test it though? Devian still has the Defective Annoying Shell hooked up to /bin/sh by default, so:

$ env -i =abc a%b=def ' '=xxx sh -c export
export PWD='/home/landley/toybox/toybox/lib'

So that's all filtered out and the only exported variable is PWD (which is created by the shell and updated every time you cd). But when I explicitly say bash:

$ env -i =abc a%b=def ' '=xxx bash --noprofile --norc -c export
declare -x OLDPWD
declare -x PWD="/home/landley/toybox/toybox/lib"
declare -x SHLVL="1"

Slightly more complicated. And uses " instead of ' because gratuitous inconsistency. Hmmm...

Ha. I dug up an old aboriginal linux image to try this out on bash 2.05b (well over 10 years old) and busybox "env -i =abc env" does _not_ set the variable with no name. (Hasn't been through the desert, I suppose.) Ah, that image hasn't got bash on it, it has busybox hush (which doesn't filter out the a%b and " " names I _can_ set with busybox env). But that doesn't mean anything here... darn it, I moved bash to the toolchain build, didn't I? Ok, ./dev-environment.sh instead of ./run-emulator.sh and yes, now bash is in the $PATH, and...

$ env -i a%b=abc " "=def bash --noprofile --norc -c export
declare -x  ="def"
declare -x OLDPWD
declare -x PWD="/home"
declare -x SHLVL="1"
declare -x a%b="abc"

Well that's nice. So way back when it didn't filter this out, BUT it leaked the same crap into the environment as default exported stuff. (And why is OLDPWD defined with no actual contents, by default? Expanding $UNDEFINED produces the same empty string? This smells like a bash bug.)


August 3, 2019

Heh. There's a subtlety in the redirection stack I'm not sure how to document.

My doubly linked list logic has three interesting functions: dlist_add(), dlist_pop(), and dlist_lpop(). (This is glossing over the difference between dlist_add() and dlist_add_nomalloc() which is about allocation granularity, I need to clean that up at some point.)

The dlist_add(list, entry) function adds an entry to the end of the doubly linked list, and when it returns the new entry is "list->prev", I.E. it's at the end of the list. The function dlist_pop(list) removes the first entry from the list and returns it. The function dlist_lpop(list) removes and returns the _last_ entry from the list. So pop() works like a queue and lpop() works like a stack.

Except the "list" passed to all three functions isn't a struct dlist *, it's a struct dlist **. Pointer to pointer. The reason is if you add to an empty list, dlist_add() receives a NULL argument and changes it to point to the newly added entry. If you dlist_lpop() the last entry, the list pointer becomes NULL. And each time you dlist_pop() list becomes list->next; So all three functions have circumstances under which they need to change the value of the list pointer passed in to them.

The _next_ problem is that the types of these pointers are struct double_list, but most of the users aren't. A doubly linked list starts with two entries: a next pointer and a prev pointer, in order. The rest of the list's contents are irrelevant to the list plumbing, so you can define any structure with those first two entries and typecast it. Except remembering that it's a pointer to a pointer instead of just a pointer is then the caller's responsibility, with a segfault if you get it wrong. And you have to typecast both the list pointer and the entry pointer (either the second argument to dlist_add() or the return value), which is just awkward.

This ties into the dlist_add_nomalloc() stuff: I have ideas on how to clean it up: right now struct dlist has 3 entries, the third of which is a char * because the first thing I implemented with this plumbing was the patch command, and that's what that needed. dlist_add() mallocs a new struct dlist and sets its data pointer to point to the second argument you passed in, which is easy but guarantees your allocation is in two parts. But dlist_add_nomalloc() just adds the structure you gave it ot the list without wrapping it. This determines whether you can just dlist_traverse(&list, free); or need to while (list) {dl = dlist_pop(&list); free(dl->data); free(dl);}

Anyway, a _better_ way to do this is probably to have struct dlist {struct dlist *next, *prev;} with no third member, and then create structs that _start_ with a struct dlist member. Because you can always typecast a structure pointer to its first member, C99 guarantees there's no leading padding and the alignment requirements are correct. Gotta touch a lot of files to do that, though. It's on the todo list.

But that's not the current subtlety. The current subtlety is that my redirection logic is _extending_ linked lists, but never actually modifying the existing members that are passed in. And they clean the new members they've added back off before they return. So I don't need to pass around pointers to pointers in _these_ functions, I can just pass pointers and take the address of my local copy when messing with the lists. Extending and truncating the lists messes with the ->prev and ->next pointers of the list members, but it puts 'em back how they were at the end, and even if we had a signal handler longjmp() to a recovery function it's a valid linked list at all times anyway that can be handled normally from the "root" pointer. (If the list was empty, passing in 0 means the local copy of the list initializes properly anyway, although _that_ wouldn't clean up from a longjmp() context that couldn't see new members if it had passed in an empty list. It would leak memory then.)

The longjmp() stuff is for implementing the "trap" command, of course. Not quite there yet, but gotta keep it in mind...


August 2, 2019

I ran out of my blood pressure pills (refill request has already gone from 24 to 48 to 72 hours to fill; yeah I need to lose 80 pounds, the curse of the desk job), and for some reason being off the medication is making me tired and irritable. I thought that's what being ON it was supposed to do? I meant to post a patreon update yesterday, but spent a surprising amount of the day sleeping instead.

I'm working on my ELC talk, and already have three times as much material as space. (This talk is only 35 minutes. And it's against something like eighteen other talks.) I should do podcasts.

I got to a good stopping point on the filehandle redirection code, got the result to build, tried to run it, and after I got through segfault city I realized the if/then/else/fi code I _was_ working on before diverging into filehandles wasn't quite finished, let alone tested. And the dangling bit is that the pipeline segment list is still grouped before flow control parsing. (The pipeline list and the expect stack are different lists built at different times, with unfortunatley different granularity: the pipeline list is what the shell script text parses into, and stays constant no matter how many times you call it. The expect stack is what's happening THIS run through it.) So I need to break the expect stack up differently.

And _that_ got me shuffling code around where I tripped over the lifetime rules on the different kind of here documents (<< vs <<<) being different, which is a problem because they otherwise get dumped into the same storage.

I need to get this to the point where I can feed lots of data through it and regression test against my 8 gazillion test cases. But that's a LONG way from now. A lot of shell has to work before you can run real shell scripts, let alone like the test suite. (Although you can #!/run the test suite in one shell and test a second shell.)


August 1, 2019

Fade flew back to Minneapolis yesterday. We've still got the dog for another couple weeks (until she gets back from a friend's wedding in Ireland, which she doesn't leave for until after her placement exams). So now I have ANOTHER reason I can't work at home; clingy dog. Clingy energy vampire dog that's making me irritable even now that I'm out elsewhere with my laptop.

The main reason I can't work at home is still cats climbing me and standing on my keyboard whenever I sit down to program. I grew up with cats, I love cats, and I don't want any more cats. I am tired of cats, and I am waiting for our current cats to die of old age so I can be cat-free. (Peejee and George are both 15, Zabina's much younger but we can probably find somebody to take her after that). I was thinking someday I'd graduate from cats to kids, but it didn't happen. (I had measles at 22, yes I got vaccinated as a kid, but didn't get the booster and my brother got it, and gave it to me.) And pets aren't a substitute for children.

The Kobayashi Dragon Maid music came on in my programming music rotation, it's the bounciest thing ever, but it's kinda melancholly right now because the animation studio that did it had _just_ started work on season 2... and then was firebombed in the worst domestic terrorist attack tokyo's seen in decades, killing multiple dozens of people including the series' director and putting more in the hospital for a long time. Most of the victims are young women in their 20's or early 30's, because it was a progressive studio that actually hired women.

As I said, kinda melancholly. And I have the long versions of both the intro and outro themes (the anime used only about half of each composition) in my programming playlist. Bit of a mood whiplash.


July 31, 2019

Googling for something else I found a patent using a file from my old aboriginal linux toolchain as an example. I boggle.


July 28, 2019

Trying to redo the loopfiles_lines() plumbing so the "flush" call at the end of a file has the filename in *pline and instead signals it's done with len = -1. (A zero length line is valid if you're stripping the terminators.) That way we can report errors on empty files (the way sha1sum -c wants). Unfortunately, the filename isn't passed in to do_lines, and if loopfiles_lines() makes the function() call itself there's a duplicate (0, 0) call before it? Hmmm, needs design work to shuffle stuff around so it collapses down cleanly.

Meanwhile, toysh is now just over 1000 lines and I've implemented maybe 1/3 of it. I suppose 3000 lines isn't outrageous for a shell but it would be far and away the largest toybox command. (Not counting bc, which I need to find a month to clean up sometime. No real reason for that to be much larger than sed, that I know of...)


July 27, 2019

I've got the majority of the redirect plumbing implemented, and I'm rereading the bash man page to figure out what all the corner cases I need to implement are.

Making the & background operator work on nommu (with vfork) is kind of brain melting. How does backgrounding a HERE document work?

$ while true;do sleep 1;read a;echo $a;[ -z "$a" ]&&break;done << EOF &
> one
> two
> three
> EOF

I just tried in bash and it works fine, but... the child process is reading from a normal filehandle, so the shell process is writing the data to it. I guess it starts waiting for it to exit when the HERE document hits EOF? If the parent shell isn't going to block if it fills up the pipeline buffer bash has to fork _two_ processes, and the writer process does not exec(), so on nommu I gotta do the /proc/self/exe dance and marshall data through to the child. Possibly with -c. (Marshalling the local variables and function definitions is likely to be fun. I'm guessing "magic environment variable", and the obvious candidate is the one with the zero length name, see env.tests and why does "env =abc" work... then the body would be some sort of encoding of the data. That's gonna be a pain to secure...)

Sigh, maybe it's trusting the shell pipeline to hold the data and otherwise blocking the parent? That would be easier. How big a HERE document do I need to saturate a modern kernel's pipe buffer to _test_ that?

And of course just plain (subshell) and backgrounded | pipelines& is the same problem on nommu: vfork() without exec, so you exec _yourself_ and pass the child process any data it needs through a pipe or something. Sigh, I should revisit my old exec(NULL) request, without which nommu requires /proc/self/exe to do this sort of thing, which means it requires /proc and you could be in a chroot...


July 26, 2019

Fade found a new coffee shop (cherry wood, around the corner from Fiesta), and I swung by to see how things were. It was quite nice. About 4x the walk as the Wendys or HEB food court though. Still, exercise...


July 25, 2019

Twitter migrated my web ui to New Coke yesterday, so I've stopped using it (except occasionally through my phone). Getting a bit more done. But it does mean I can't tweet boggles like:

$ {walrus}<&-
bash: walrus: ambiguous redirect
$ walrus=potato
$ {walrus}<&-
$ echo $walrus
potato

Which seems to be that atoi("potato") returns 0 and attempts to close 0-2 (stdin, stdout, stderr) are silently ignored. No message, no return code.

According to the bash man page, {var}<&- or {var}>&- indicates a filehandle to close by the contents of $var, but it doesn't say what the error handling should be if var is blank or has a non-integer value. I tried "strace bash -c 'var=blah; {var}<&-'" and it's... spawning a child process? I threw a lot of -F at it to follow forks and it's just doing a bunch of rtsigaction and friends? All the open() and close() calls are either /dev nodes or endless locale nonsense.

I think it's a NOP? There's just no error message? Does that mean atoi() said 0 and it won't close stdin? Yes, it looks like "0<&-" is ignored. So are 1 and 2. And:

$ potato=-1; {potato}<&-
bash: potato: ambiguous redirect

Is the same message as unset.

Sigh. I implemented <&- and then read in the man page that <&$BLAH can expand _to_ "-" and it counts... Lots of circular dependences in this stuff, hard to find good stopping points.


July 24, 2019

I'm grinding away on this shell stuff as fast as I can, but I'm not sure what I'll have ready by my ELC talk in a month. There's a lot of groping around. There are a lot of dark corners of shell programming I didn't previously know (because I'd never had to before).

I'm making a sh tests file I cut and paste each test I run against bash into. If I had to ask this question to work out what the correct behavior was, I want my shell to get it right. This is another manifestation of my todo list getting longer as I work.

I still hit weird corner cases I have to think through in basic stuff: is -1<<2 still -4? Because 0xffffffff becomes 0xfffffffc which is decrementing it by 3 which in ones complement means adding 3 so yes, it works the same way! Ones complement is clever. Which means int*4 can become int<<2 whether it's positive or negative.

(Yes, I _could_ trust the optimizer to do that for me, but the optimizer is written by C++ people these days and you only get sane behavior by -fstoppit-ing the optimizer to swiss cheese with command line arguments. Otherwise everything is "undefined behavior". Makes me reluctant to trust the optimizer to do anything, really.)


July 23, 2019

I'm confused by bash doing:

$ echo one two three four five' ' | while read -d ' ' a
> do echo 1$a; cat <(read -d ' ' a; echo 2$a); done
1one
2two
1three
2four
1five
2

(Without that trailing quoted space on at first echo I only got the first 4, which is a seperate boggle. The boggle _here_ is the interleaving.)

The reason is, stdio reads use a buffer, and <() is a sub-process that in _theory_ has its own FILE * for stdin and thus its own buffer, and I don't see how the two are sharing the input cleanly? I did a get_line() function that WOULD get this right... by using inefficient single-byte reads to never overshoot the input data. Elliott is in the process of ripping that back out because it's slow to do single byte reads, but how else do you share lines of input between processes?

Anyway, the hard bit I'm banging on right now is redirects, because making:

$ echo hello | while read a; do echo $a; done <<< boing
boing

Work without leaking filehandles _and_ reasonably supporting nommu is fun. I want to have one well-tested codepath, which means I want my shell to use vfork() internally all the time, which means I want to do as little work in the pre-exec child process as possible. Working out "as possible" is nonobvious.

Do I want to set up the filehandle redirects in the parent process, maybe dup2() the old stdin/stdout/stderr up to a high filehandle and then unwind it after the vfork()? Or set them up as high filehandles (>10 is reserved for shell internal use, at least according to the bash man page), and then have the child dup() them back down and close the old ones? (There doesn't seem to be a "move filehandle" option, just dup2()/close(). Oh well.)

I _can_ have the child call open(), it's a system call and a vfork() child has its own file descriptor table. But it should error out and not run the child process if the open didn't work, which means I prefer it to happen before the fork because error handling becomes much more awkward afterwards. And if you add in supporting the weird tcp/ip stuff (< /dev/tcp/host/port) I _really_ want that to happen before vfork()... But then you combine signals (the "trap" shell builtin) and unwinding interrupted setup without leaking anything...


July 22, 2019

This sort of thing is part of why I maintained a tinycc fork for 3 years. I really want a small simple C compiler with known C99 behavior that we can port to various platforms and NOT do crazy optimizations generating endless "undefined" behavior.

These days we'd need a new cfront converting C++ to C in order to use such a compiler to bootstrap gcc or llvm on a new architecture (and thus have an optimizing compiler, not merely a correct one). _BUT_ if we teach LLVM to produce C output as one of its targets, suddenly we have a modern cfront.

If I had all the money I'd ever need, I'd hire somebody to work on this... Alas, while I keep encouraging people to steal my ideas they never do.

I also note that the entire article on detecting and defeating insane compiler "optimizations" boils down to "this is why volatile was invented", so of course the C++ loons have proposed its removal. (As long as they don't break _C_ they can do whatever the like to C++. But never let C++ developers take over development of a C compiler. They will break it. It's what they _do_.)

Ooh, Jeff sent me a link to exactly that, an llvm backend that produces C. Have LLVM run itself through this, teach tinycc to compile the result, and you don't need cfront anymore! You can bootstrap up to a modern optimizing compiler from a tiny auditable system!

Ok, I need to dig up qcc once I get toybox to 1.0.


July 21, 2019

Here's a fun one:

$ chicken() { echo hello; chicken() { echo also ;}; chicken;}
$ chicken
hello
also
$ chicken
also

The lifetime rules here are awkward. The original definition of chicken() is still running when it's overwritten, so the memory can't be freed until the function exits. And given that it could be a _recursive_ function... I think I need a reference count? (Or a function to clone a pipeline?)

Do I need this for variables too? I don't think so... They're used atomically, you don't run _in_ a variable you just resolve it. Even ${abc:${abc:${abc}}} is resolving the innermost one first and working its way back out. You can modify a variable in $((x=y)) (although you can only set it to a number: "fred=rutabega; rutabega=123; echo $((x=fred))" outputs 123) or assign default values to unset variables with ${name:=value}, but these happen in the context of one variable resolution and then the result is used for the next: ${abc:$X:$Y} doesn't care what the value of abc is until it has the values of $X and $Y, and the whole thing becomes a string which is then parsed by ${}.


July 19, 2019

Had one of Fade's vitamin pills on an empty stomach this morning. Last time my stomach got _that_ unhappy from a vitamin pill on an empty stomach was ...20 years ago now, driving to work at Trilogy.


July 18, 2019

Sorry if this shell blogging has been terse bordering on unintelligible. I've been doing a lot of design work and when I try to blog my thought process it gets REALLY LONG. Here, lemme do today's just to demonstrate.

I'm still arguing with the flow control logic. Right now each flow control terminating statement is its own pipeline segment, but the flow control _starting_ statements are bunched together, because "if if if true; then false; fi; then false; fi; then false; fi" turns out to be a valid line.

The pipeline data structure is more or less a linked list of struct sh_arg { int c; char *v[];} structures. (Plus HERE document information, but that's irrelevant here, and it's doubly linked to make it easier to create in order rather than reversed.) Which means each pipeline segment is a command and its arguments, and each argument list ends with either NULL or a "control operator" like ; or | or && indicating how it interacts with the next one.

Flow control complicates this, because "(echo hello)|cat" is actually "(" is a starting flow control statement meaning "you fork and do all this in a subshell", "echo" and "hello" are a command you fork and exec, ")" is the corresponding end to the flow control statement, and then "|" applies to everything inside the parentheses (so the hello goes through it), and then "cat" is another command you fork and exec. So you actually wind up with:

( echo hello
) |
cat

And that's how I parsed it when I implemented the line continuation and syntax checking, which does the right set of ">" prompts until it has a complete thought, including ending quotes, flow control statements, and HERE documents.

This breaks the input down into pipelines, figuring out where each stage ends due to newlines or ; | || etc, with the special case of ")" being both a control operator _and_ a flow control statement (so it instead of appending to the previous argument list as "how this ended", whenever it's not at the start of a new statement already it ends the previous statement with a NULL and starts a new one it's the first word of). All the OTHER flow control terminators already have to be at the start of a pipeline segment (which is why "if true; then false; fi" needs the semicolons before then and fi, same with "{echo;}" needing that semicolon so } isn't an argument to echo. Yes, ) ends both a word and a statement, but } is basically just a glorified command. My initial trivial run logic just tried to run each pipeline segment in sequence (the control operators were saved but not used), so in the above "(echo)|cat" example "(" would be an unknown command not in the $PATH, so would ")", and then "cat" would hang awaiting input from the console.

Now I'm trying to actually implement the flow control, and there's several parts to getting it right. Each flow control statement has a corresponding end, and any pipes or redirects done to the end apply to the whole statement, as in "if echo hello; then echo next; fi | tee blah.txt" writes both hello and next to blah.txt. So before I can run _any_ of the statements, I need to apply the flow control redirects to the entire context. Meaning I need to find them up front, so I needed logic to search forward for the matching end _without_ losing my place at the beginning. My syntax checking logic had a stack that popped itself, meaning performing the search modified the stack contents...

The stack is because flow control _nests_, so unless I want a recursive function I need a stack of active flow control statements, which is why I did the "expect" stack back in syntax checking. (It's called that because it tracks what statement am I expecting next: while/do/done, if/then/fi, etc.)

I extended my syntax checking function to run in 3 modes, which answer 3 questions. The original mode answers the questions "is the expect stack empty" (if not we need more lines to balance it out), and "was there a syntax error" like "(echo&&)|cat" which needs a statement after the &&. (Which means the flow control needs to know when to expect a non-empty statement, and it adds NULL entries to the expect stack to track those.)

The second mode is "where do I start running this line, and in what context": this returns an offset into argv at which there's something to fork/exec, which equals arg->c (I.E. eats the whole line) if the whole line is flow control statements, ala:

if
  true
then
  echo hello
fi

The pipeline stage parsing logic doesn't glue lines together, just ends them early. (The quoting logic glues them together for unfinished quotes or trailing \ but that's a previous layer.) And since you can wind up on a separate line (and thus in a seprate pipeline stage) when you actually have something to execute, the inciting statement (like "if" or "then") gets saved at the top of the expect stack, for the caller to pop.

The third mode is that "search ahead" mode, which doesn't pop the stack but instead returns whether or not the caller _should_ pop the stack (meaning we have an end of flow control statement. I'm oversimplifying a bit because if/then/elif/elif/elif/fi can go on for a while, but the flow_control() function already knows that when we expect "fi" we should also check for "else" and "elif". The point is it won't pop back past the "if", and only "fi" has the special redirect and pipe stuff that applies to the whole block, any pipes or redirects on "then" or "else" apply to the command that goes there.

Unfortunately, actually trying to use this sucks, because pipeline segments contain BOTH flow control AND command to execute. And not just ONE flow control statement either, you can have "if if if true" and we're back to needing a recursive function or another stack to deal with that.

I tried adding a second piece of data (segment and offset _into_ segment), but tracking this turned into a mess and I was snapshotting the expect stack to do the search forward for the end meaning I had a stack of copies of the stack...

So back up, redesign. If you sometimes wind up with flow control and command in a separate pipeline stage, and have to handle it, then make that ALWAYS the case and thus the only case to handle. What I need to do is break up the pipeline further, so each _starting_ flow control statement is also its own pipeline segment (line ending ones are now), so "if while true; do break; done; then echo hello; fi" becomes a list of 9 pipeline segments:

if NULL
while NULL
true ;
do NULL
break ;
done ;
then NULL
echo hello ;
fi NULL

Then the code to track where we are at runtime just needs a pointer to the pipeline segment to deal with next, and then I can implement a flow_control_end() function that takes a starting pipeline location and returns a pointer to corresponding end segment (just start a new expect stack, and when it's empty again we're done)...

Hang on, better: each pipeline stage isn't _really_ just a struct sh_arg, that's just one of the members. I can add another member to annotate each statement with "start of flow control", "flow control change", and "end of flow control" on the first pass. (The "change" ones are needed because "if one; two; three; then" can have multiple statements belonging to the "test" part before you get to the "perform conditionally" part, let alone else and elif and such.) Which means I don't have to re-call the flow control function, the information goes into the pipeline list. And I can ask a simpler question in the flow_control() function: is this WORD a flow control statement or not, yes/no, and then have the pipeline assembly use the result by noticing the stack got deeper (or have the function return the 3 categories in the return code).

Except I can't. And the reason it doesn't work is functions: "function walrus() {" is either 4 or 5 words depending on whether you say "function", and it can even have a newline between the ")" and the "{", but _can't_ have a newline before either of the parentheses. I have special case logic in the flow control to handle this now, but the function name, (, and ) get parsed into 3 seperate words before this (kinda necessary: any or all of them could have spaces before or after them, or neither, they count as "control characters" and thus become their own words). To be a function, they MUST come in sequence on the same line (except the { can optionally be a later line, but no _other_ words can come between them. Violating this is a syntax error, and yes "thing();{" is allowed but "thing()&{" is a syntax error, remind me to add a test for that. The difference between "newline" and "semicolon" is pretty darn fuzzy at times.)

I was fudging the contents of the expect stack to always have the special "function" keyword on it (even when you didn't use the function keyword in the input), but I'm _not_ currently modifying the pipeline input. (I'm splitting it into chunks, but not changing what any of it says.) I suppose I could assemble an artificial "function NAME ( ) {" pipeline segment after syntax checking, but the problem is in order for it _to_ pass syntax checking, I need to either pass in more than one word at a time, or abuse the expect stack a lot. Hmmm... I need at least one word of readahead in order to recognize functions in the "posix" case: without the function keyword, the second word of a statement being "(" means it's a function. (Or a syntax error if the third word isn't ")" and the next word after that isn't "{".)

And _this_ means I can't do the flow_control() stuff until I've assembled a pipe segment terminated by ; or newline or such, at least for recognizing functions. Which makes splitting the pipeline segments a LOT more awkward. because I'm undoing work I already did instead of looking at words before they get grouped. (Possibly I need to move recognizing shell functions outside of flow_control() and have the caller do it?)

The existing flow_control() function appends to and pops the expect stack on its own (in the original "syntax checking" mode). The new mode 3 tells the caller when _they_ should pop the stack because it's found a block end. What I probably want is to go back to a single mode, have it report when it found a terminator via the return code...

Darn it, the SAD part is that you can have a command and a function declaration in the same segment: "blah(){echo hello; }" parses to _two_ segments... hang on, no it doesn't. Because I added the special case handling for ) meaning it's a terminator! So the segments are 1) "blah" "(", 2) ")" "{", 3) "echo" "hello" ";" (well, "echo" "hello" and then ";" instead of NULL at the argv[argc] position), and 4) "}". Even given the possibility for segment 2 to have a newline and split the ")" and "{" into its own segments, ")" is always at the start of a segment!

There's still sort of the 3 magic things on the same line for "function NAME (" with function being optional (parsing that had a goto at one point, now it has an if (thingy || (!strcmp(argv[i], "function") && i++)) which is less awkward than a goto but not by much... Ok, the _rest_ of it should be doable, this particular knot I need to stare at more.

For one thing, if you type "name (" into bash and hit enter, it's a syntax error. It needs the other ) to be on the same line. And the parsing is losing that distinction. For bash, ) only starts a new line when it does NOT immediately follow a ( as the previous token. So that's wrong, although... does _accepting_ input that the other one rejects count as an error? (Valid input runs, error checking is less rigorous to the point where more things run? Hmmm... Ideally I'd like the behavior to be the same, the question is how much work I'm willing to put into it.)

Let's see, a function starts with either the word "function", or an otherwise unrecognized word followed by (. Then you must have (on the same line): word ( ). And then the next word must be {. So if I lift function recognition _out_ of flow_control(), it handles individual words, manages the expect stack itself, and returns "start of block", "gearshift", "end of block", "command", or "syntax error"...

Anyway, dinnertime. Didn't get any new code written. All design rumination, today in a "musing out loud" fashion rather than my usual "long walks staring vaguely into the distance".


July 17, 2019

Today's toysh question is about changing lifetime rules for the expanded command arguments. (Translation: moving the expand_args() call, which mallocs one or more new args[] for each arg (wildcards like * can expand into a hundred arguments) into run_command() along with the corresponding responsibility to free it. Which also means I don't need storage space in sh_process to marshall the data into the other function, so it's a cleanup?)

Object lifetimes are always one of the big design thingies. Get the object lifetimes, data represenation, and data flow right, and the rest of the code tends to be the easy part.


July 16, 2019

I really liked the Good Omens miniseries Neil Gaiman did at Terry Pratchett's dying request, and Neil tweeted a link to the Good Omens BBC radio play from a few years back, so I gave the first episode a listen and...

I want to pat them on the head for trying. Having seen this done EXACTLY RIGHT, listening to this audio play is just painful. So many things the TV series did so smoothly in passing I didn't even notice them, get set up here with pliers and a block and tackle. It's nice that Neil and Terry had a cameo in the first episode, but they served no plot purpose. And Azieraphale and Crowley... I'm sorry, having seen the TV versions, I just can't take the radio versions seriously. They're... not right.


July 15, 2019

I promised Panagiotis (the magazine guy I've been talking to) a column and a head shot today. I need to do that, but I'm hip deep in shell coding, and _that_ needs to be demonstrable by August 22 so I don't have to yet again say that toysh is unusable in my "Toybox vs Busybox" talk. That's the _main_ thing preventing it from being usable in systems on its own. (The other, at least for mkroot, is a lack of a good "route" command. But I can open that can of worms after getting toysh basically working.)

I'm still trying to fit HERE documents elegantly into the design. I want to parse HERE documents as I finish each arg: struct sh_arg *arg is a single argv[argc] ended by | && ; and so on. (Or newline.) I either need add more members, brutally abuse the ones I have, or wrap it in an enclosing struct...


July 12, 2019

Got the continuation logic checked in, now working on flow control. (Walked to UT to bang on stuff at that little table. Yay exercise.)

One problem is that flow control skips commands (sort of the point), which means HERE documents are parsed out of order, so my "assemble a linked list of them" approach isn't good enough. (Or at least awkward when you've got loops inside loops.) I need to attach each HERE document to the corresponding struct sh_arg holding the argc/argv[] pair that made us read the extra lines.


July 11, 2019

The reason you can declare functions inside other functions is that declaring a function is basically an "export" into a different array, so:

$ abc() { echo hello; def() { echo ghi; }; def; }
$ def
bash: def: command not found
$ abc
hello
ghi
$ def
ghi

And THAT's how they nest. Function declaration is basically a form of variable assignment, with similar lifetime rules. Yes, I added a test.


July 10, 2019

Ok, back to flow control. Or specifically, back to running commands and trying to get redirection to work. Today's revelatory test is:

$ if cat <<< one; then cat <<< two; fi <<< three
one
two

Which means you open the outermost redirections first, and the inner ones discard them. So I have to parse to the end of the flow control block and work back in, on a stack. And you can have "if if true; then echo; fi; true; then echo; fi" nesting arbitrarily deep, so you need a stack. (Or to be able to parse backwards from the end, but I already had to parse from the front to get the line continuations right.)

If I break flow control into some kind of tree structure, it's gotta be linked lists because if/elif/elif/elif/elif/fi can go on arbitrarily long. Hmmm, but the _only_ place that redirections (or pipes) apply to the entire grouping is the group/block terminations, because there's a statement after each of the other flow control words, and the redirect applies to that statement. Only a redirect after a group terminator applies to the whole group (because there's no statement allowed there).

So I don't need a structure, I need a way to find the end of this block. Which _is_ a minor variant of the same parse-forward logic, I just need a test to know when I'm done. Hmmm...


July 6, 2019

I've been arguing with ( ) in shells for a bit, and finally worked out what culdesac I went down:

( ( ( echo ) echo ) )

does not become

( ( ( echo )
echo )
)

It becomes:

( ( ( echo
) echo
)
)

And in THAT framework, ( echo ) >> blah" is the same logic as "if true; then echo; fi >> blah". I.E. flow control terminators can have trailing redirects but not trailing arguments.


July 5, 2019

Once upon a time Intel's x86 processors were optimized for price/performance ratio. They took over the desktop, but were very energy inefficient so (at least up around until 2004) you couldn't use them without wall current and a giant heat sink with a fan on it. Arm chips were optimized for energy efficiency (power/performance), which is why phones are all arm. Over the years, x86 went from "the processor" to kind of a sideline overshadowed by raspberry pi.

Something similar is happening with battery technology: lithium is the x86 of battery tech, light and highly reactive but also scarce and hard to work with. Lithium batteries evolved in a specific niche, and people are applying lithium to other niches because it's there, not because it's a good fit.

Lithium is the lightest metal, and lithium batteries are optimized for power/weight ratio. Size and weight are important in phones and laptops and electric cars, but a building storing solar power from the rooftop overnight really doesn't care how much the battery weighs. Since about 1970 battery technology has been driven by portable electric devices, which funneled a bunch of R&D into lithium and gave it a big headstart. But just like x86, the technology that dominated the old niches is kind of a bad fit for the new ones.

Lithium is _so_ chemically reactive its pure form explodes on contact with water, which makes it tricky to work with. Lithium batteries break down (wear out) easily by reacting in ways they're not supposed to, and making them last a long time involves sticking them in a high-tech refrigerator that can eat 15% of the power they store. And even today, lithium battery designers have to be very careful to avoid fires and explosions.

While Lithium isn't as scarce as many elements, it's not exactly abundant either. The places it's easy to mine tend to have political and logistical problems, and those concentrated deposits are finite. And batteries are made from _groups_ of chemicals: you need a cathode, anode, membrane, and electrolyte. So far the battery chemistries that best work around Lithium's chemical issues involve cobalt, which has even worse, political and logistical problems. In most places cobalt is so dilute it's only produced as a side effect of mining something else (copper in Congo, Nickel in Australia). You get tons of the cheap metal, and a tiny fraction of cobalt as basically a side effect. (Similarly, while there's lithium in seawater, you have to process ten million liters of seawater, I.E. ten thousand metric tons, to get one liter of lithium. You'd only really do that as a side effect of water desalination efforts going after the water itself, and there are still cheaper ways to get water in most places.)

Meanwhile, if you don't care about weight, you can make batteries from other things, like sodium, zinc, and nickel. They haven't had a trillion dollars of R&D pumped into them over the past 50 years, but they're all _much_ easier to mine, and generally easier to work with. Sodium's half of table salt (seawater has 100,000 times as much sodium as lithium), and zinc is so cheap we switched to making pennies out of it when copper became too expensive. Annual production of Lithium is 35,000 tons per year (I.E. 0.035 million tons). Nickel is 2.25 million tons. Zinc is 11.9 million tons. Sodium's 225 million tons. That's about 65 times as much nickel, 340 times as much zinc, and 6500 times as much sodium produced each year than lithium.

The old nonrechargeable batteries from the 1980's ("D cells" in flashlights and such) were Zinc/Carbon chemistry. Basically made from pennies and charcoal. The Nickel-Iron battery chemistry is over 100 years old, and it NEVER WEARS OUT. Many of the nickel iron batteries manufactured 100 years ago by Thomas Edison's company are still in use (hence the nickname "Edison Battery", although as with everything with Edison's name on it the actual inventor was somebody else and he just commercialized it). The classic nickel/iron chemistry uses water as the electrolyte, and if you overcharge the battery it electrolyses the water to produce hydrogen and oxygen (which both consumes electricity and produces exposive byproducts, one of which also eats the ozone layer). So in practical terms you need to keep these batteries somewhere ventilated and top them off with water every few weeks. People mix and match chemistries all the time, here's a zinc anode and nickel cathode, and a colorado company called iron edison is making lithium/iron batteries.

And there's other chemistries like aluminum air using aluminum as the anode and oxygen as the cathode: the reason aluminum recycling pays isn't because aluminum ore is rare: Aluminum is the third most common element in earth's crust (47% oxygen, 28% silicon, 8% aluminum. It's light so when the planet was molten it floated to the surface.) But aluminum metal isn't found in nature, it oxidzes too easily. (The metal forms a thin layer of oxide that prevents the rest from oxidizing, but over time in nature it all turns into oxide.) Aluminum metal is basically stored electricity: it's created from ore by electroysis, and oxidizing it gives off electricity. It's easy to get the electricity back out by dipping it in something that dissolves the oxide (like sodium hydroxide) so the air can get at it.

Aluminum's well-known for being light, and air batteries are great because they have twice the power/weight ratio of things that need to carry their cathode with them. This is also why burning gasoline has so much power for the weight, because half the reaction mass isn't carried with you but taken from the environment. What's not so easy is to create a _rechargable_ aluminum air battery, because if you dissolve the oxide off and all the aluminum turns into oxide you've melted the aluminum into solution. You have to condense it back out when you recharge it, which is tricky. (This is also why lead/acid batteries try not to discharge too much: if you melt the lead into solution it condenses out in the wrong places when you try to get it back.) And if you just condense it out of solution onto one of the electrodes, you have a smooth surface with comparatively little surface area and thus low voltage. You generally want complex topology to increase surface area and thus rate of reaction, which you lose when your material dissolves and re-condenses. For things like self-driving fleet vehicles "slot in a new aluminum cartridge" isn't necessarily a big ask, but battery walls care about lifetime, generally as measured in charge cycles before capacity drops to 80%, and so far that's sidelined aluminum.

But they're working on it, and a thousand other possible battery chemistries. Heck, the guy from the "shipping container sized batteries" ted talk years ago is still at it, for some reason using molten calcium anodes and antimony cathodes now. (Antimony is about as common as thallium, not anybody else's first choice when it comes to "scaling up".)


July 4, 2019

Yup, the bad hard drive sector always happens when I "right click->insepect element->delete node" in chrome. At which point chrome crashes and I get the error in dmesg. The number of the sector in dmesg changes, but the sequence to trigger it doesn't.

The SCSI layer's remapping insanity continues to get weirder, but I think it's a single flaw that probably happened when my laptop got dropped? I should still get a new hard drive and keep it backed up, but it doesn't appear to be getting worse.

A guy named @tomjchicago on twitter is making a really good case that the Resident has frontotemporal dementia. Reagan had altzheimer's in office (to the point Nancy took all his meetings his last year), now this. What the GOP really _likes_ turns out to be "senility".


July 3, 2019

So, continuing with shell sequencing:

$ if cat << EOF ; then
> one two three
> EOF
> echo hello
> fi
one two three
hello

Which confirms I need to peel off and satisfy HERE documents _before_ parsing other flow control. First the line gets broken down into ; & && | || statements, then peel off redirects (but don't interpret the non-here ones yet), then flow control. Except yesterday's test says the << EOF still has to BE there when flow control is parsed in order to start statements. (In case somebody has an executable named "else".) So they must be parsed but not removed, a later pass performs the actual I/O redirection..

Another parsing problem with the code I have: things like "else" are only contextually special, so if you try to run a command "elif blah" outside any other flow control, my code so far will happily try. I might need to add gratuitous error checking just because people expect it.

So, the sequencing should be to check for flow control, but peel off redirects right after that, and act on the HERE document's request for more lines before the flow control's request for more lines.

Luckily flow control works on statement granularity (an argv[] ending with ; and friends), and so do redirects, and I've GOT a good data structure representing those and have already parsed the tokens into that. So even if it modifies... except the redirect parsing shouldn't remove them from the argv[], it's the expansion stage that should skip over those. Hmmm...

My "sh.tests" file is only 240 lines long. It should probably be longer.

Hmmm, { } queues up statements to execute later, because | applies to the lot of them. This is basically what functions are doing as well, and ( ) is similar but runs it in a subshell. And ) ends a line even when not the first statement, ala "(echo)|cat". Whereas { echo;}|cat requires the space and the semicolon.

Sigh. I made a design decision during token parsing about how input lines are represented that's really inconvenient for HERE documents. When it needs more data to finish a line, it returns the pointer to the character to resume at, and the line reader chops that bit off and uses it as the start of the next line. But what I want for HERE documents is new lines as they are, and I don't currently have a way to represent that. So now I need to go reread the token parser and see what it actually needs and if _it_ can glue the lines together, or if the line handling function should do it, or...

Ok, the problem is I haven't got a place to store "fragment of previous line". I'm not sticking much in TT.globals during parsing because that gets brittle fast: with "source filename" and such I may need to call it from multiple places, semi-recursively, and "echo else hi > hello.sh; if true; then . hello.sh; fi" says the else is an error, meaning sourcing more files is NOT transparent to flow control. So yes, this code needs to be reentrant.

The argument that _is_ passed in to parse_line() is a doubly linked list of argc/argv[] pairs I'm pretty sure I could abuse (off the top of my head, append a pointer to the continuation point as the last argv[] and make argc negative to indicate continuation in progress, although really an extra argument isn't too bad and I should just do that). BUT the next question is object lifetime rules. I'm strndup()ing each string into the argv so the argument can go away as soon as the function returns, and the current return value is saying what _portion_ of the line can go away. In theory I need to retain these lines for input history (cursor up/down to see previous lines, which I haven't implemented yet), but that's only for interactive mode. Still, if the caller saves the entire _set_ of lines until parsing's done and passes it in as another doubly linked list (easy way to make a stack, additions are already in order and top is just head->prev)... I still need the "where I left off" argument because it's position _within_ the line.

Other issue: sh -c arguments can have a HERE document in it! IN which case they stop at embedded newlines: bash -c "$(echo -e "cat << EOF\nhello\nEOF")". So whatever I do already has to cope with that, meaning the definition of "lines" is _not_ what getline() is returning. The tokenizer current treats \n as just whitespace (except for \ at end of line, which cannot end token parsing).

So: parse_line() returns fragment of current line left to parse, caller currently glues the next line to it and passes it back, which loses data HERE documents need. What does a HERE document need? There isn't an unfinished part of THIS line when we're doing HERE documents because we don't get to that point until we're done with lines, that's only tokenizing. Flow control (and redirects) deal with parsed pipeline info.

What I need is some place to store the HERE document status so all these functions can act on it.


July 2, 2019

Learning is contextual for me, I have to attach new information to what I know or it doesn't make sense, and I'm terrible at retaining things that don't make sense. That's why instead of reading the posix spec over and over, I'm implementing stuff and running tests to see how bash handles varying corner cases. Then when I'm done, I read the spec to see what I got wrong.

At the moment, I'm wrestling with HERE documents. The following is not exactly legal:

$ << EOF if true; then cat; fi
> hello
> EOF
bash: syntax error near unexpected token `then'
But it's revealing because the here document DOES get parsed. IN fact:
$ << EOF
> boing
> EOF
$

A here document is parsed! And then the data discarded. See also:

$ <<EOF if true
> EOF
bash: if: command not found

Flow control can't go after redirects. Which means the flow control shouldn't parse words after redirects, they fall through as unknown symbols and start a command. (Except, for some reason, "(", which is "unexpected token", but not reported until after the HERE document concludes. *jazzhands*)


July 1, 2019

New month, I should say hi to patreon.

Interesting. Bash and dash do the same here:

$ cat << one << two
> echo hello
> two
> one
> and
> two
and

And mksh on my phone says "/system/bin/sh: can't create temporary file /data/local/shxijx9k.tmp: Permission denied", which is too broken to contribute usefully to this analysis.

Android's traded away a lot of usability for "security" in scare quotes. My new phone gives me a full-screen exception with java stack dump popup (which goes away after 3 seconds so it's hard to write down, RealInterceptorChain.proceed() calling StreamAllocation.findHealthyConnection() and so on) when the podcast app that worked fine under M tries to check half the feeds. After a week or so I figured out any rss feed that's still http instead of https (about half of them) is an "unknown service", which is silly if it's an rss feed of mp3 files. I expect they want me to bug the RSS feed providers, but instead I gave up on being able to listen to those feeds from my phone. I still have the old phone with a USB battery duct taped to the back of it which I can load podcasts on via wifi. Or plug my headphones into my laptop.

Yesterday the self serve soda machine at Wendy's was out of service all day "downloading updates" (the little clock said it had been doing so for 12 hours). Why? What's the point? Why give an avenue to hack soda machines (giving sugared drinks to diabetics and aspartame to phenylkeraneutics)?

I'm not really a fan of "the internet of things"...


June 30, 2019

Darn it. Because of HERE documents, we have to retain flow control state while parsing. I was flushing the state stack and re-parsing it when we'd added another line (to minimize the state passed between functions), but I can't because when you're in a HERE document you grab the next line verbatim without tokenizing (whitespace is significant), and flow control happens after tokenizing.

Gotta frog a bunch of functions and redo them. Again.

(No, I haven't bought a new hard drive yet. I _think_ the failure mode here is "process crashes", not "garbled data winds up pushed to github". That's why it _has_ the checksums. But I shouldn't leave it too long. And I'm to the point where "rsync to usb disk" isn't necessarily an ideal backup option? If it goes "I can't read this file", does it leave the backup copy alone?" (I _think_ so, but should confirm?))


June 29, 2019

Biked downtown to Cuvee (a coffee shop on 6th) to meet Grant, the guy behind the upduino. I picked up two of the 2.0 boards from him, and we spoke for over an hour. Cool guy. He says he's got about 500 more of the 2.0 (and 300 of the 1.0) in stock, so we can order as many as we like.

This is the ice40 board that Jeff did the j2 bringup on a few months back, Jon-Tobias in Germany is porting the TRON os to j-core as a project to present at a conference in january, and I got 2 so I can send him one. (Alas, I need to install a horrible lattice binary-only toolchain with license key nonsens to get it to work...)

Grant _also_ says he's got a bitstream that emulates an FPGA (which is kinda meta), and can prove it only uses technologies whose patents have expired, and he'd like to make a chip from it. I poked Jeff and he said designing an FPGA is reasonbaly easy and well-understood: each cell is a lookup table with a small RAM using the input wires as bits to look up what the output should be, which lets each one work as any combination of and/or/nand/not/etc gates. The xilinx ones use 6 inputs wires (1<<6 is 64 bits of ram to find the output bit for every possible on/off combination of the inputs), the lattice ones use 4 bit inputs (so 16 possible input values, meaning 16 bits of ram to look up whether the output is on or off for each input).

The rest of the FPGA is just busses connecting the LUTs together, and then software doing the placement and routing is where all the actual brains are. But the busses are where the hard part is: the FPGA fabric is a lot of loops, and it's up to the programming in the LUTs (I.E. the software making the bitstream) to prevent these from acting like short circuits. All the ASIC mask generation tools flag these loops and go "possible short circuit! Error!" So you have to disable the ASIC tools' sanity checks to get it to output an FPGA mask, at which point the FAB refuses to guarantee the result works and wants a large insurance policy before they proceed.

Turning verilog or vhdl into actual circuitry means running it through a compiler, and each ASIC fab has its own compiler backend they wrote themselves by hand. Remember the days where every different hardware platform had its own proprietary compiler/linker toolchain with its own bugs and you had to test each port and work around everything? Fabbing is like that, only moreso. So you make a GIANT test suite to check every weird little bit of your circuit will work in simulation after the fab's tools are done with it (they spit out a "verilog netlist" which is kind of like decompiling assembly back to C; it's horribly unreadable but the important thing is the circuit simulation tools can consume it).

So designing an FPGA circuit is easy. Getting a modern FAB to manufacture it is a political nightmare, because their toolchain is guaranteed to barf on the routing fabric and they don't wanna get sued charging you tens of thousands if not millions of dollars for masks that they can't prove will actually work.

Fun stuff. Still don't understand how the LUTs connect together (how the whole routing mesh part is controlled), but that seems to be the fiddly bit.

Biked from the coffee shop to the Other Other Bank (not my credit union, not the bank our checking account's at, but the one our mortgage was sold to a few years ago) to take a couple months expenses out of the home equity loan. Fade's home for the summer and all the things recruiters have waved at me involve packing up and going to another city for months. And I'm making (slow and frustrating but measurable) progress on toysh. Don't wanna stop now...


June 28, 2019

Well that's not good.

My laptop fell off the arm of the couch onto a hard floor a week or so back, and landed on its side, and it was frozen with completely garbled graphics afterwards. But a power cycle _seemed_ to fix it.

A couple days ago, I noticed a /dev/sda disk sector error in dmesg. (And immediately re-backed-up the disk to an external drive.) But it was read only, and always the same number (which kept coming up because Linux kept trying to read it over and over for some reason), so I was thinking maybe the write glitched when the disk fell over?

It's reporting it again today... with a different sector number. Still the same one over and over, but that's "time to buy a new hard drive", which means reinstalling the OS. Great, that's gonna cost more than a day, I expect. (Sigh, should I run memtest86 overnight too? Did something get unseated? Devuan's crotchety and evil enough it's not always easy to figure out what's a hardware bug and what's a repeatable software pattern like "I requeried for available wifi too soon after enabling my phone's hotspot, now devuan will _never_ see it until I disable and re-enable wireless"...)

I mean, chrome crashes now semi-regularly. But every crash has been when I right click->inspect element->delete node. Maybe websites have found a bug to exploit to discourage that? (Not gonna stop, for the same reason I still block every single promoted tweet in my twitter feed, even though twitter now seems to have an endless stream of them. Don't care: still blocking every single one.)


June 26, 2019

Still writing shell stuff, and needing to support:

$ if blah () { echo bang; true; }; blah; then echo hello; fi; blah
bang
hello
bang

Which is just cheating.


June 25, 2019

Did my first checkin of the redone toysh infrastructure, which is parsing input to request more data, and then executing it without any flow control or variable expansion or redirection or... Working on it.

Meanwhile, Elliott is going over the toybox test suite and getting stuff to pass on Android. Which is quite nice. He's hitting todo items I've been meaning to do forever...


June 21, 2019

Why did I say 10kw of solar panels yesterday? Because ~8 hours of sunlight at 10kw is 80kwh, more than even the larger 75kwh battery could hold, which is _plenty_ for an average household. Remember 39kwh was the highest state average, and then we rounded up the battery size another 20%, and this doesn't even count electricity used during the day coming straight from the solar panels without even needing the battery; your battery should hardly ever be anywhere near fully drained, even at 6am. Plus you get more than 8 hours of sunlight anywhere south of canada. You might be fine with a 6kw kit, and even a 4.5kw kit (currently $5k at Home Depot) produces 36 kilowatt hours on a good day, vs the american average of 28 killowatt hours. (7 extra kwh/day would take a week to fill a 50kwh battery, but as long as you're net positive you'd get there.)

Sites that say 6k is enough for half of american homes are talking about selling enough electricity back to the grid to wind up with a net $0 bill. (In which case how does the power company stay in business when half their cost is maintaining the transmission lines?) Utilities generally pay less for electricity than they charge for it, so you need to sell _more_ back to them to break even. (And _everybody_ can't do this if they still need cash from somewhere, and lots of it.) With batteries, "enough" is much lower, and 6k*8 hours is still 48kw, one day away from home could fill the battery completely from fully drained.

I'm being conservative here because a bad enough thunderstorm can cut solar output in half while it lasts, and sometimes those go on for days, so extra's nice. Clouds and rain won't entirely stop solar production (there's still light and solar panels aren't just picking up visible frequencies), but a thick enough layer of snow can, and up north they run the panels in reverse for a bit to heat them up and melt the snow. If you don't let it accumulate, it's not hard to deal with. (When it's too cold for that to melt it, it's also too cold to snow.) Still, this is why I'm looking at an extra-large solar array. (People who assume they'll be charging and driving an electric car 100km every day might even want a 15kw system, but transportation as a service makes that highly unlikely to happen outside of rural areas. Then again, sufficiently rural people still use dial-up internet because the future has passed them by. Short of another New Deal style Tenesee Valley Authority program to dig them sanitary outhouses, this will too.)


June 20, 2019

I think electric utility companies might to go the way of milkmen and paperboys. The numbers don't look good.

These days batteries are the limiting factor on solar power, and have been since solar panels became the cheapest way to generate electricity in 2016. It's cheaper to build new solar plants than keep running existing fossil fuel plants, but solar only produces power half the time (leading to the "duck curve") Utilities have been installing gas turbines (which can go from "off" to "full power" in under a minute) to handle the rapid generation ramp-up as the sun goes down just as everybody gets home from work, but utilities are increasingly reluctant to buy more gas turbines because they're already "stranded assets" that won't be in service long enough to pay off their construction costs.

Utilities know that battery prices are declining rapidly, going down 50% over a recent 3 year period. Utilities want to install batteries to store extra power generated during the day, which they've got buckets of. California stopped installing new solar for a bit because at peak output, what they've got already supplies more power than can use, leading to "curtailment" where they unplug the panels and waste the electricity that has nowhere to go. (For a while california law _required_ it to be used and they wound up paying other states to take it, but they got that fixed and now they just waste it.) They can install plenty more solar panels, it's cheap and easy, but they need batteries to make it work.

So how much battery power _does_ a household need? About as much as a low-end electric car has. The average american household uses 10.4 megawatt hours of electricity each year, but the highest average household electricity consumption is in louisiana at 14.2 megawatts/year, so let's use that. This means each louisiana household is using about 39 kilowatt hours per day, on average. The Tesla Model 3 battery packs come in 50 kilowatt hour and 75 kilowatt hour versions. This means even the small Model 3 battery pack can store more electricity than a household uses in 24 hours

That article estimates the 50kwh battery pack should cost around $7500 to make, but that tesla charged $9000 to upgrade from a 50 kwh battery pack to the 75kwh battery (battery production is still their limiting factor making vehicles so they're trying to discourage the big battery, it means they sell more cars). At $9000 per 25kwh, they're charging $18000 for 50kwh of batteries, which means even at that inflated retail price batteries are less than half the cost of a Tesla model 3.

Here's a site selling replacement model 3 battery modules, which each cost $1350, hold 5.2kwh (10 to assemble the "small" 50kwh battery costing about $14000 before volume discounts), and have a max discharge rate of 30kw (or 5kw "continuous"), which means _each_ of them can run an inefficient (old) electric stove, and the full 50kwh battery could handle ten of them. (For reference the new tesla supercharers work at 250kw, so if anything these numbers are conservative. 6000 watts from a 240 volt dryer outlet requires a 25 amp circuit breaker. Most homes have a total electrical service of 100 to 200 amps and 250 amps would be 60kw... once again the "small" model 3 battery neatly fits a house.

So a modern electric car battery is sized to power a house... if you wanted to unplug that house from the grid _entirely_ and generate your own power. Why would anyone want to do that?

For one thing, if you sell power back to the grid, when the grid power goes out _your_ power is legally required to go out. (So you don't electrocute the workers trying to fix the downed lines.) If you unplug from grid power, your power doesn't have to go out when the lines go down.

For another, the utility companies need money, and lots of it. If everybody had a net $0 bill, they'd go bankrupt. The cost of maintaining the electricity distribution network is about $750/year per customer, or $62.50/month, which is more than half the annual average electricity bill of $112. So even if electricity generation became _free_ for the utilities, the bill can't really go down that much. When _your_ bill goes down, they consider you a free rider, and start to scheme ways to get money out of you to pay for the distribution infrastructure you're only using as an insurance policy.

Today a 10kw solar kit (with not just solar panels but the inverter and such to use them) costs $11k, which can fill our 50kw battery (costing ~$14k) in 5 hours of maximum production. This means a solar system that can entirely replace the grid currently costs less than a second car, and the prices are going down fast.

If battery prices drop another 50% over the next 3 years, and 50% more 3 years after that, that $14000 would become $7500 and then $3750 by around 2025. Meanwhile solar panel prices are getting 50% cheaper every 5 years (Swanson's Law). So in 2025, you'll probably pay less than $4k for a day of batteries and as much again for solar panels, or about $8k for the entire system. And these are _retail_ prices, not wholesale. You can probably get a solar system powerful enough not to _need_ a grid tie installed for under $10k. And if you're worried about running out of power, you can spend $4k on an extra day of batteries. (72 hours of batteries is enough to get your roof replaced without losing power.) And it only gets cheaper from there, just like computers did for 50 years.

At which point... what do the utilities do? Not everybody switches, but enough people to seriously screw up their business model, which only makes switching more attractive to the rest...

This is what Tony Seba meant when (starting back in 2014) he talked about "God Parity" coming after "Grid Parity". That $62.50/month infrastructure cost for a grid tie has to come _way_ down for a grid tie to still be a thing in 10 years, and as with gasoline cars going electric you only need about a quarter of the existing customers to defect before the economics of the old way go into a death spiral.


June 19, 2019

I reached the point where I have a toysh parsing pass that compiles! And immediately segfaults. And still has large unfinished in the parsing. But hey, _progress_.

There's a lot of sequencing issues. In theory << EOF is the same sort of command as < FILE, but it switches parsing modes so you request additional lines, and those lines are stored verbatim (not parsed) so the spaces within are retained. ("cat << EOF" demonstrates that.) I worried about that for a bit, but the test:

$ cat << EOF ; echo hello; EOF
>

Shows that the plumbing I've been writing already has this sequence right. :) (Also, the EOF it checks for is a full exact line, leading or trailing space will break it, as will \ or quotes.)

Yeah, I'm sure a lot of this is covered in the posix doc, but I read the thing, implement, and then read the thing again. And while I'm implementing and hit a question, testing against bash is better than looking it up in posix not just because it's faster, but because if the two diverge I'm going with bash.


June 18, 2019

I had tickets for Weird Al's "with strings attached" concert at Bass Concert Hall (just north of the giant football stadium) and headed to UT a few hours early with my laptop to get some programming in near the venue. At concert time I went over, stood in line, and was told that my backpack was against their TSA-style "clear bags only" policy (which I'd never heard of until I showed up), and that even though the doors we were lining up in front of said "bag check line" in large black letters, no they didn't check bags. If I'd brought cash I could pay to leave my laptop out in a locker in the texas sun in the parking lot, but I hadn't, and no I couldn't see the concert I'd paid for.

So I walked home (took most of an hour, round trip to go back afterwards would have been over an hour into the concert). On the walk home I called Bass Concert Hall's ticket line to see if I could get my money back (concert hadn't started yet), and could not get a human. Their phone tree didn't recognize my phone number, and wanted an "order number" that did not appear to be any of the numbers printed on my ticket.

I was thinking of disputing the charge through my credit card, but I don't mind Weird Al getting the money. Just Bass Concert Hall. And I don't blame Weird Al for this: last two times I saw him was at an open air venue (which I was wearing the t-shirt from), and the Drafthouse. Both were great. It's Bass Conert Hall that sucks, never go there for anything.

Stopped at HEB on the way home and they'd clearanced a bunch of lamb chops. Fuzzy made those for dinner, and Fade had one when her flight finally got in (late, it was delayed an hour or so by airline trouble du jour). The dog was ecstatic to see her, of course.


June 17, 2019

I've rebuilt the sh parsing stuff 3 or 4 times as I hit corner cases the old way doesn't handle right. It's mostly "ah, the data structures I'm parsing into have to look like _this_", and now I've hit another one because the parentheses stuff is funky.

This is the first pass parsing that tries to figure out when we've got a complete thought and don't need to read another line of input ($PS1 vs $PS2 prompts). I know bash does this before executing anything because I keep getting continuation prompts for:

$ echo hello; if
> true
> then
> echo boing
> fi
hello
boing

Parsing a line into words, I need a continuation if it ends with unbalanced quotes, including infinitely nested $( and ${ and yes "$(" means you need ")" in sequence to end it. I've already got a function for that which seems right-ish, returning a pointer to the next unconsumed character of the line. It returns 0 if it needs a continuation, and returns the pointer that was passed into it when it's done and there's nothing more to do. (This is presumably a pointer to the unquoted, unescaped NULL terminator at the end of a line, but that's not what I'm testing.) I was initially calling that to get an argc[argv] list of unexpanded words (via xstrndup(start, end-start)) with all the quotes and environment variables and such still in them, and then looping through that list to handle flow control statements, which can _also_ request continuations. (See blockquote above.) However, flow control statements are only valid at the _start_ of a pipeline... wait, no they're not.

echo hello | if true; then read i; echo i=$i; fi

Ok, flow control statements have 2 parts, a test and a body, and the test... can also contain flow control statements.

$ if echo hello | if true; then read i; echo i=$i; fi; then echo hello; fi
i=hello
hello

Which obviously I knew because "if true && false; then echo hi; fi" has a pipeline in true && false. But "if true && then echo hi; fi" is a syntax error, same if you replace the && with |, which is why I was thinking it had to be after ; or newline. And that's still sort of right, it's just in between it can nest arbitrarily deep.

I keep sitting down to explain how my current understanding works and finding the loose thread where "here's the test that shows why that's wrong". Eh, keep this writeup, it shows what I was thinking. (Yes, I read the posix spec and the man page, and retained none of it because I mostly learn by doing.)

Ok, the POINT is, I made a tree structure which is a linked list of pipelines containing a linked list of argc+argv[] structures, thinking I could iterate through the list of pipelines to check for flow control statements at the start.

And where I got derailed is things like "( echo ) | cat" where the echo isn't part of this pipeline, it pops out into its own pipeline that slots back into this one as a single argv (which you can redirect output of or background as a group). I'm not trying to RUN it yet, I'm just trying to PARSE it into a discrete series of argv[] which I could exec() as their own process.

And it is "start of thought", but they nest. End of thought, then new flow control statement. Flow control statements similarly pop out, ala "while true ; do echo hello; done | tee blah" and to handle this I need a data structure that can arbitrarily nest "if (true); then echo hello; fi".

You know, except for the part about ) ending a pipeline, ( is basically a flow control statement. As far as I can tell the only reason ( is funkier than if or while is so it can match ) without ) having to be at the start of a command. Alright, what's the data structure for this. It's a tree with branch nodes and argument leaves. There's a "how did this command end" annotation recording | vs &&, but ; and implicit ; via newline are special, they're flow control _within_ the tree. Hmmm... Ok, | && || glue statements together, so do being in ( ) or { } or any flow control bracketing (if/else/fi, while/do/done, etc). Great.

So getting back to parsing: ( and ) are their own words wherever they occur, and ) starts a new line but ( does not, although it increments a counter so we do continuation until we get the next ), but so does { and that's not special to the parser it's just a flow control statement.

So I think ) always ends a line...

$ boom(){ echo hello; }; boom
hello

Yup. But the ) is not on a NEW line, because:

$ boom (
bash: syntax error near unexpected token `newline`

So it doesn't work like "}", it's special and magic. Great. Sigh, it's a little bit like | that stacks. It's not a pipeline transition character because each statment has _one_ and you can't have two of those in a row. But it's not a flow control statement either because it's not logically at the start of a line, it's sort of an attribute of the previous line? Where does this slot into parsing? I can detect it, what do I do to RECORD it? It's really the function definition case that's nasty... Hmmm.


June 16, 2019

I've been having long email conversations with a magazine publisher who wants me to start podcasting for his network. I'm interested, but don't have the time.

On the other hand, I seldom get much done without externally imposed deadlines. I wrote a regular column for 3 years, I organized 2 conventions, I should _make_ time... which is why I'm talking to him I guess? But there's a certain amount of "launching a new thing" which means it isn't real yet, and toysh -> ELC talk and then I need to find a new $DAYJOB...


June 14, 2019

Elliott hit a weird bug in find, which is frustrating because I can't reproduce his test case. I don't see _how_ the change I made could have user-visible effects, with one exception that _shouldn't_...


June 12, 2019

I was scheduled to meet with Amazon today for an all-day interview, and when I sat down to sign the NDA at 8am before heading out, I got a simultaneous headache and stomach ache, and decided to listen to myself. Contacted them to call it off. (There were like 8 other applicants at the bar on monday, they'll be fine.)

They wouldn't tell me what they're building. There were hints dropped at the monday meet-n-greet (consumer product, in people's homes, runs apps, has an app controlling it, video is involved somehow). But they explicitly said I wouldn't learn what I was actually working on until I showed up for my first day of work, and YET they wanted an NDA for the interview.

Amazon treats its warehouse workers terribly. The idea of working for a _different_ part of a company that places no obvious value on human life doesn't exactly fill me with warm fuzzies. They see people as "customers", and the only thing that excited the amazon guys wasn't technology, it was the possibility of profit. This could sell well and make money.

It was probably this tweet that set off my subconscious. Making the richest man in the world richer isn't what I want to do with my life. His wife got $36 billion in a divorce (what, 1/10 of his fortune?) and she's giving it away. That's what normal people do. Hoarding more cash than you can physically spend (by carrying suitcases full of $100 bills from the bank to a purchase place) while people performatively beg for their life on gofundme is sick.


June 11, 2019

Fade flew off to Minneapolis for a week, attending 4th street. I'm grinding away at shell script parsing.

Some things bash does I don't currently feel obligated to reproduce:

$ time echo hello | sleep 3
real	0m3.003s
...
$ echo hello | time sleep 3
bash: time: command not found

And then there's:

$ echo abc)dev bash: syntax error near unexpected token `)'

Sigh. It's because switch/case, I know. Still kinda silly.


June 10, 2019

Off to meet with amazon at a happy hour. (They're sort of offering me a job-ish. Well, an interview. I'm torn about whether I _want_ to work for them. I'm willing to work with Google and my household is paying the amazon prime daneguild and bying stuff from them. Is taking money from them a higher bar than giving money to them? Hmmm...)

Back from the happy hour. Talked 3 waves with one of the amazon guys, he explained about the company having multiple "pillars". Google had multiple pillars before it did Alphabet to turn them into properly separated business units. Sounds like it's going to get ugly.

Headed home at 9 to spend the evening with Fade, she flies out in the morning for 4th street (annual writing conference) back in Minneapolis.


June 9, 2019

Ok, if you go:

ls & sleep 5; echo hello; } | tee blah &

Then you background two jobs, but everything in the curly brackets gets redirected through tee. (The ; at the end of hello is so } isn't an argument to echo, it's command logic not quoting logic.)

I still need to work out what this means:

[[ X == && ]]
bash: unexpected argument `&&' to conditional binary operator
bash: syntax error near `&'

This is such an enormous can of worms. Oh well, dinking away at it...

Went through the bash "help" with no arguments output and got a list of builtins to implement in the first pass. Rereading the posix shell spec "chapter 2" page for comparison...

Also:

$ X=echo
$ $X hello
hello

Gotta expand the first argument before checking for commands, EXCEPT:

$ X='['
$ $X -gt 3 ]
bash: [: -gt: unary operator expected
$ X='{'
$ $X ls ; }
bash: syntax error near unexpected token `}'

Environment variables are expanded _after_ the flow control statements, and "{" is parsed like "if".

The sh.tests file is going to need its own subdirectory of shell snippets, isn't it? (Environment variables are expanded _after_ the flow control statements, and "{" is parsed like "if".)

$ BLAH="ls
> echo hello"
$ sh -c "$BLAH"
notes.txt
...
hello

The -c context parses like shell script; newline _there_ is a statement break. Ok, I can probably work with that.


June 8, 2019

Bash is terribly inconsistent:

$ (( 1 +
> 2 ))
$ [ thing
bash: [: missing `]'
$ [[ thing
bash: unexpected token `newline', conditional binary operator expected
bash: syntax error near `thing'

June 7, 2019

The promised digression.

I want toybox to provide the commands used by the base OS build (compiler, libc, kernel, and toybox itself) so a minimal build system can rebuild itself under itself from source, and then bootstrap up to arbitrary complexity natively under the result (solving the "circular dependency" problem with OS builds _and_ providing a good base for Android to build itself under itself someday).

This starts with the Posix and LSB commands that still matter today (ignoring crap like sccs or remove_initd), adding commands scripts/record-commands finds in package builds that would otherwise be a circular dependency, and commands used by simple init script like mkroot's to boot a minimal build environment to a usable state. If you can't rebuild the system under itself without a command (including being able to boot and run the init script and set up the network and wget more source to build), then toybox needs to provide that command.

I also want to provide conveniences like top and lsusb you really miss when trying to debug stuff on a build system (or initramfs), but those are negotiable. They could be external packages as long as they build natively under the base system.

"A package build needs it" doesn't mean it needs to be in toybox if the package providing it can be built under the simple toybox+compiler+libc+kernel system. Toybox _can_ provide it, but by definition it's a convenience then.

This is why "we need three types of decompressor (zcat, bzcat, and xzcat), but only one type of compressor (gzip)" makes sense to me: you can build bzip2 and xz packages natively under your minimal system if you need that output format, but a source tarball you wget may _come_ compressed in a lot of formats, and it's nice to have some compression available when producing output (the deflate algorithm is the 80/20 rule of compressors).

My 1.0 goal is building Linux From Scratch under toybox because that's easy for me to test (especially cross compiling to a new target where host binaries won't run and natively building LFS there) and is demonstrably "enough" to then build all the Beyond Linux From Scratch packages (I.E. bootstrap up to arbitrary complexity in old world Linux).

I want to do the same with AOSP next, but that's a huge can of worms which involves writing enough of a read-only git client to drive repo and clone source repositories, since aosp doesn't provide wgettable package tarballs. Plus tackling the whole "how to bootstrap ninja" issue. (Posix has make, and I can teach it to handle kbuild. Ninja is such a moving target AOSP wouldn't build under debian's ninja last I checked, it needed its own...)

(Shipping prebuilt binaries is convenient, but unless you have the ability to rebuild EVERYTHING then A) you can't counter trusting trust, B) bootstrapping a new target architecture is black art, C) there's magic in your build that will collect bugs and bit rot over the years, the way gentoo forgot how to do a "stage 1" build and we had to work it out again.)

Building AOSP is going to involve building a lot of packages under toybox, which aren't in toybox. If the getprop/setprop package has to be early in the build because later AOSP packages depend on it, that's fine. As long as the kernel, compiler, libc, and toybox builds don't, and then it _can_ build without further circular dependencies.

I should start working "what's left" into release notes, but I _just_ cracked open the toysh can of worms again, haven't quite merged mkroot into toybox as a "make root" target, and I'm also trying to stick modern toybox into my old aboriginal linux build to regression test it against the ancient Linux From Scratch build there. (Since I had that working and automated, it's still a good regression test even if the package versions are old. I can migrate the build over to mkroot and update to current in a controlled manner once I've reestablished a working baseline...)


June 5, 2019

So then there's this nonsense:

$ while true
> do echo hello; break; done
hello
$ while true; do echo hello; break; done

In bash if I enter a continued line and then cursor up to see the command in command history, it's been stitched together with a semicolon in that case. But if I echo "hello[enter]world" that still has the newline. Why?

Out of curiosity I checked busybox ash's command history (at least the version installed in debian) and it's not doing that, it gives the last line entered and doesn't collate them. (Not that I'm working on fancy command editing and history right now, I'm trying to get a shell that can run the mkroot init script at the moment.)


June 4, 2019

Shell argument parsing: "echo > blah" is obvious, but "echo ab>blah" and "echo ab> blah" also work. The spaces around > are optional which says that the "ab", ">", and "blah" need to parse as 3 arguments and then the second pass consumes those arguments rather than adding them to the new process context.

And of course >> and << and && and || are all a thing. So is ;; in case statements. And there's 2>&1 and the <<< single line here document. So it's _runs_ of these characters.

Then there's this nonsense:

$ echo $((3<7))
0
$ [ 3<7 ] && echo true
$ [[ 3<7 ]] && echo true
bash: unexpected token 284 in conditional command
bash: syntax error near `3<'

The middle [ test ] is either a true inequality or a nonzero string (which also resolves to true) so I don't understand why it's returning false, and [[ test ]] is a different parsing context I don't understand at all yet.

And how do I represent "$("ls")". Even with recursion it's unhappy because the $( needs to end with ) but the " needs to end with " so I need a quote stack, and if I need a quote stack I don't need to do recursion. (In the parsing, anyway.)

Note: PS1= and unset PS1 are not the same thing. I should add that to the test suite...


June 2, 2019

Shell design!

I need to parse a command line into an intermediate format, and save that intermediate format for loops and functions and so on. The intermediate format does _not_ have environment variables resolved, but needs to parse them to know where ls dir/$(echo hello; echo walrus)/{a,b}* ends.

This is the big design blocker I stopped on last time, because there's inherent duplication and I want one codepath to handle both, a bit like find has. But it's fiddly to work out. In this first pass, things like "$@" and *.{a,b,c} don't expand into multiple arguments, but in the second pass they do.

Hmmm, earlier I was doing a lot of linked lists (which I thought were more nommu friendly), but now I'm doing more realloc() arrays because Linux on nommu has per-process heaps so small memory allocations aren't as terribly fragmenty as they would be on bare metal. I think struct pipeline makes more sense as an array, except it would need a "number of entries" count (unless I null terminate it?)

The other design hiccup was that I wanted to avoid gratuitous copying of strings (use the environment string passed to -c directly, mmap the file and use it directly, if all else fails the memory returned from getline() can be used directly... except them I'm mixing allocation contexts where "echo a $BLAH c" had some arguments you don't free() and some you do when you're done, and tracking them was annoying.

Get it working first, _then_ optimize... (I've been researching this on and off so long there's been years of optimization ideas mixed into the design, which is premature optimization.)

Parsing a pipeline into intermediate format needs to understand continuations, which means if/while, {, and (, plus $(( and strangely ${ although I'm not sure what you could legitimately put in there that would resolve? Fiddle fiddle... Ah, ${BLAH: 1 : 3 } works. But you can't have a space (or newline) before that first colon. This is such an ad-hoc undocumented mess. Right. At a guess number parsing eats leading and trailing whitespace. (And while blah; do thingy needs the newline/semicolon there because do would be an argument to while otherwise.)

If I go BLAH=12345 and echo $BLAHa it doesn't show 12345a, which environment variables are set has no influence on parsing, it does the lookup after determining the bounds of the variable. (Hence the ${blah} syntax.) It takes letters, digits, and the _ character, but not the rest of the punctuation. There's probably a posix stanza on this buried in the mess, and might also be something in the bash man page. I gotta re-read both of those, it's been ages...

And of course the problem with line continuations is the design right now is a getline() loop that calls a function on each line, so when I have a "I need more data" continuation point, it has to _return_ that and then the outer loop needs to prompt differently or something. Of course I have global variables for the prompt state. Hmmm...

$ bash -c 'ls $('
bash: -c: line 0: unexpected EOF while looking for matching `)'
bash: -c: line 1: syntax error: unexpected end of file

Not quite the error message I expected, but yeah, that's the logicalish thing to do there. (Quotes, another thing to line continuation. And of course here documents. I'm gonna have to whack-a-mole this adding one more thing at a time...)

But the fiddly part is still that parsing has to understand if; then; fi _and_ command execution has to understand it too, and I really don't want that in two separated places that can get out of sync. Hmmm...

Why is ; on its own line an error? I can hit enter. It only ends a nonzero command? Two trailing ; are an error? (They aren't in C...) Why is bash complaining about this? Huh, the Defective Annoying SHell is doing it too, is this some posix corner case?

"ls one;cat" even mid-argument the semicolon ends the line when not escaped, got it. Same for "(" for some reason (when would that be valid mid-word?) And of course:

$ cat blah<(echo hello)thing
cat: blah/dev/fd/63thing: No such file or directory

Bash's parsing is a bit self-defeating at times. Why would you recognize that THERE? What's the POINT? When could it serve a purpose?

Another fun thing is I run a lot of tests from the command line, like:

$ echo one\
> two
onetwo
$ echo one\;two
one;two

Which are non-obvious how to turn into tests/sh.test entries. I just did:

$ sh -c "$(echo -n 'echo one\\\ntwo')"
one
two
$ bash -c "$(echo -n 'echo one\\\ntwo')"
one\ntwo

And the Defective Annoying SHell (which is what devuan points /bin/sh to) behaved differently from bash, but bash is NOT behaving like it just did from the command line! (readlink /proc/$$/exe says /bin/bash because my account's shell in /etc/passwd is /bin/bash; this is the compromise ubuntu made to avoid admitting they made a mistake redirecting /bin/sh to point to dash in 2006, change everybody's login shell so you're never actually using dash at the command line, and every #! in the world changes to explicitly say "bash" and we all avoid dash THAT way. Idiots.)

Let's try again...

$ bash -c "$(echo 'echo -e one\\\ntwo')"
one
two
$ dash -c "$(echo 'echo -e one\\\ntwo')"
-e one
two

Of course. And notice the newline IN THE SECOND ONE. Grrr...

It is really hard to implement shell unit tests _in_ shell scripts. Anyway, can of worms for another day, add comments with the command lines I ran and work out how to turn that into regression tests later...


June 1, 2019

Discovered that if you pull down TWICE from the top of the phone screen (I.E. try to open the pulldown menu when it's already open), the pulldown menu expands and THEN you have a brightness slider and the little gear that lets you edit what's in the original pulldown.

Only took me a week of owning the phone to figure out how to do really basic things with it that were obvious last version.


May 31, 2019

Posted about the ls -l / android issue to the list, waiting to hear back about what they're trying to accomplish here.

Upgrading tar to extract the old tarball (to unblock the aboriginal linux linux from scratch build control image experiment). It's not very well documented and I have one example. Maybe I should dig up more old tarballs to test against, but this has "rathole" written all over it...

May 30, 2019

Installed adb to push stuff to my phone, and "adb shell" can ls /, but it complains about trying to stat a lot of directories.

The current dirtree() infrastructure is opening directories without O_PATH, and won't recurse if stat() fails (because it can't tell it's a directory). Hmmm.

I did a fresh aboriginal linux checkout and I'm trying to plug current toybox into it, building the old versions of all the other packages (including the kernel). If I can get it building, I want to plug in the old Linux From Scratch 6.8 build control image and try to build that.

This is a regression test I haven't done in forever, which I stopped doing because I couldn't update the old toolchain (for license reasons) and then updating the kernel got progressively more painful with the old toolchain, and I rebased to mkroot but still haven't reproduced all the infrastructure the old build had to do an automated LFS build, so... let's try to plug current toybox into the old thing and see what happens.

One problem is I haven't supported uclibc in forever, might need to swap in musl too but let's see what uclibc does first...

Heh, there's an old toybox patch to work around a uclibc bug I never merged (because that nonsense doesn't belong in the toybox repo, but I need it here). The old patch doesn't remotely apply to current grep, had to rewrite it.

Toybox tar can't extract the genext2fs tarball, because it's _so_ old it doesn't say "ustar" in the header. Huh.


May 29, 2019

Finally got the 0.8.1 release out. I forgot to update the date of the release notes from when I started writing them to when I actually got the release out (mkroot wouldn't build because AOSP wants tar to call "bzip2 -d" and toybox hasn't _got_ bzip2 compression side just bzcat, so I had to make it try both and fall back).

I want to fix "ls -l /" on Android Pie, which is basically the same bug as 527045debecb. I installed terminal, ran ls -l as my first command, and got "permission denied" on "." with no other output.

Elliott fixed xabspath() but I'd like to fix dirtree(). The current dirtree() infrastructure is opening directories without O_PATH, and won't recurse if stat() fails (because it can't tell it's a directory). Hmmm, needs some design work.

The real question is what's Android trying to _prevent_ here?


May 28, 2019

Fade's back!

No blogging. Fade's back.


May 27, 2019

New phone doesn't have wifi hotspot in the pulldown list of stuff.

The pulldown has a wifi icon, and if I hold it down it goes to a wifi menu, but it's not the HOTSPOT menu, it's too far down the selection tree. The rest are bluetooth (don't care), do not disturb, flashlight, display (android M had the brightness slider in the pulldown, this is less good), and battery. If there's any way to change the list of what's in the pulldown, I haven't found it yet.

So what I have to do to enable the wifi hotspot is exit to the desktop, pull up, gear icon, network and internet, hotspot & tethering, wifi hotspot, on. Lateral progress. Android M had a brightness slider in the pulldown menu, this makes me go into a menu (so turning the brightness back _up_ if I left it down and wind up in sunlight where I can't see the display is basically impossible with this phone now, instead of blind fumbling that takes three or four tries like with the old one).


May 26, 2019

Coming up to the end of the 3 months I said I'd take off.

My ELC talk that got approved was "toybox vs busybox" and I don't want to say "busybox has a usable shell, toybox doesn't" as one of the big differences _still_, so trying to get the shell at least basically usable before I have to go disappear into the land of $DAYJOB for who knows how long because none of the android ecosystem wants to pay me to work on this stuff.


May 25, 2019

New phone. Went in to try to enable it yesterday and worked out I'd have to buy a new sim card for $30, went home to try to fish the old sim card out of my old phone and got talked out of it by Jeff D. who explained that the encryption algorithms in the sim card get quietly updated regularly so switching to a new sim card periodically is a good thing.

So today I went back and gave T-mobile its gratuitous $30 profit. (If I'd bought the phone through them they'd throw in a sim card for free, but I wanted one guaranteed to be unlocked.)

I wound up using the default AOSP image it came with rather than trying to image it because I need a working phone. As with the last 3 phones...


May 21, 2019

Taking some notes for the "toybox vs busybox" talk I volunteered to do in August. I was maintainer of busybox for a while, and had written about 1/3 of its code when I handed it off, and I can at least explain what I was trying to accomplish.

I also created and maintain toybox, I can certainly explain what I'm trying to do here and why I couldn't do it in a busybox context. And there was also a period between leaving busybox and refocusing toybox on android where I maintained it outside of busybox for my own reasons, largely "better design"...

So I don't have a shortage of material. But ELC shortened its talks from an hour to 45 minutes a few years ago, and I should probably leave 15 minutes for questions...


May 20, 2019

Oh good grief, no it is not called that and only ever WAS by Richard Fontana, who is weird about it to this day. Stop deadnaming my license!

Richard Fontana made a mistake, refused to admit the mistake, tried to get SPDX to replicate his mistake so he didn't look so weird standing out like that, defended his mistake for _years_ after losing that battle with a shifting array of reasons (his conclusion never changed but his justification for it constantly did), and when he finally got outvoted at OSI has done his best to memorialize his mistake everywhere he has control over (such as the OSI page on the license).

Long before I got that confusion cleared up, people were using and recommending 0BSD out in the wild and they NEVER used Fontana's name for the thing. This has his fingerprints all over it.

I don't modify wikipedia[citation needed] in part because I'd never have time to do anything else, and these days they block the entire IPv6 address range (so I can't use phone tethering) and every McDonalds and similar open wifi, so I can't even leave a nasty note on the talk page. But seriously, dude. This is misinformation. I started calling it zero clause BSD long before Fontana ever heard of it. I got permission from Marshall McKusick to call it a BSD license in 2013, _years_ before Fontana ever heard of it.


May 19, 2019

Did Norman Borlaug's work make China's one child policy look stupid, or was it always stupid?

Norman Borlaug is possibly the single most important person in the 20th century. He's why india and china can feed themselves. The man literally quadrupled global food production with "semidwarf" varieties of rice and wheat. He started his breeding programs to improve disease resistance, but when nitrogen fertilizer became cheap and ubiquitous (the Haber-Bosch process predates World War I, but took a while to scale up and branch out), plants were limited by how much fertilizer you could give them before stalks fell over under the weight of the grain they were growing. Borlaug's solution was to make plants shorter, both so they put less energy into growing stalk and because shorter plants were sturdier and can hold more grain before collapsing, so you could nitrogen the HELL out of them.

But the real gains came when he applied the same trick to rice. These days most of the world's population lives in the circle of rice (sung, obviously, to the lion king), and they're all growing dwarf rice. This provides enough food for ~4 billion people in and around India and China.

Meanwhile, China had a revolution kicking out its royalty a century back, and just like the French revolution they killed all their scientists and wound up with a tin pot dictator in charge who may not have been a net positive _despite_ how bad the royalty they replaced was. Napoleon got millions of his countrymen killed by declaring war on the entire world (twice), but Chariman Mao mostly just starved his subjects to death. He had aristotle's problem of never looking at the world around him and instead making stuff up divorced from reality, then enforcing that vision upon the world. During the "great leap forward" he said the country wasn't producing enough iron and demanded they "make iron" without procuring more iron _ore_. Rather than explain to him where iron comes from, his subjects melted down all their farming tools into neatly stacked iron ingots for inspection by party officials, and wound up starving. (Humoring the Great Leader is one of the classic blunders, "Potemkin village" is from 1700's Russia but Mao got plenty of such tours.) Other parts of the great leap forward included exiling all the schoolteachers and college students to rural farms (where they starved, farming is a skill they didn't have). Mao ordered everyone to kill birds because he thought they ate crops, and then when the insect population exploded without predators his solution was to spray bulk insecticides that drove pollinators extinct so large swaths of china pollinate by hand. (More modern china has tried to bury the history of all these failures, just like they've buried tianamen square. They insist that their "lack of bees" is due to the shenanigans Bayer's been pulling recently, not due to Mao's edicts three generations back. *shrug* US public schools don't exactly open history class with smallpox blankets and the trail of tears, or the way we staged a coup to get Hawaii. The War of 1812 was approximately as stupid; we lost to _canada_ and they set the white house on fire.)

The one child policy was another of Mao's ideas, which led not only to their "bare branches" problem (millions of surplus unwed men because of sex selection in the one and only child parents were allowed to have), it also means China's baby boom problem is somehow even worse than the USA's. China has two generations of only children, I.E. a generation of only children whose parents were a themselves a generation of only children, meaning each child has 4 grandparents with no other descendants. In a society where "retirement" meant having enough kids to take care of the parents in old age, this is an _issue_.

So Norman Borlaug's work increasing the food supply means Mao's one child policy was outright pointless. Add in the fact that educating women means they have more options than just being barefoot and pregnant their entire lives, and birth rates among the young need government support (maternity leave, free daycare, etc) to get _up_ to replacement rates. (Around the world: Europe, the USA, China, you name it. This is apparently a side effect of late stage capitalism viewing productivity exclusively as various forms of manufacturing while completely ignoring and refusing to fund caretaking work, but China's gone all in on capitalism lately so has acquired this problem too...)

And only the young have the _option_ to replace themselves, the overhang of old people that can't have kids anymore can't. I have _no_ idea what China plans to do about any of this, but am glad to be very far away.


May 18, 2019

Debugging the sparse tar compression side, which means I have run "diff -u <(tar cv --owner root --group root --mtime @1234567890 --sparse fweep | hd) <(./tar cv --owner root --group root --mtime @1234567890 --sparse fweep | hd) | less" with malice of forethought. (Well, actually I ran "TAR='tar cv --owner root --group root --mtime @1234567890 --sparse fweep | hd' diff -u <($TAR) <(./$TAR) | less" because I'm me.)


May 17, 2019

Finally went to the store to order a new phone and they're out. Ordered from the website instead. They estimate delivery on the 23rd.


May 16, 2019

Got the grep bug sorted out, it was a missing else and an inverted test that was hidden by the missing else. (So it _seemed_ to work but what it was actually doing was ignoring the -x.)

And now of course people are trying to use it, there's another grep bug after that...


May 14, 2019

Here's an email I wrote but didn't send in response to this, because it went to a dark place (which is nevertheless true):

> Technically yes, because the first initrd could find the second by some
> predefined means, extract it to a temporary directory and do a
> pivot_root() and then the second would do some stuff, find the real
> root and do a pivot_root() again.

You can't pivot_root off of initramfs, you have to switch_root. (You _used_ to be able to, which moved initramfs from / and allowed you to unmount it, at which point the kernel locked hard endlessly traversing the mount list. I know because I hit that bug in 2005 and they fixed it.)

No, I'm saying that if /init is in the static initrd and you _also_ specify an external initrd the kernel _also_ extracts the external one into initramfs, _after_ having extracted the built-in one. (Both archives are extracted, one after the other, into the same ramfs/tmpfs mount.)

If the semantics are O_EXCL and it skips files it can't extract properly, then the external one couldn't replace files in the static one. You just have to make sure that it extracts both before trying to exec /init (which it looks like it currently does but I haven't tested it). And such an init could do anything and end with "mv newinit /init; exec /init".

(And while we're there it's _embarassing_ that you have to enable CONFIG_BLK_DEV_RAM to get the external image unpacked, which means you have to enable CONFIG_BLK_DEV which depends on CONFIG_BLOCK meaning you have to enable the block layer when running entirely from initramfs? That's one of the things I pointed out years ago but nobody ever did anything about it, and I tend not to send many patches here these days because dealing with linux-kernel is the opposite of fun. You literally have a bureacratic process with a 26 step checklist for submitting patches now, which you're supposed to read _after_ the 884 line submitting-patches document which I guess comes after the 8 numbered process documents. And then get dinged by https://lkml.org/lkml/2017/9/14/469 and it's just... no fun. You've gone _way_ out of your way to drive the hobbyists out. Congratulatious, you succeeded, it's all middle aged cubicle dwellers arguing about how to help John Deere prevent farmers from modifying their tractors. The development-process.rst file is aimed at developers "and their managers" because the linux-kernel committee can no longer comprehend developers without managers. Nobody's doing it for fun anymore because it stopped being fun a long time ago.)

And now, I mute the _rest_ of the thread. Do what you like, I think teaching the kernel to do magic in-band signaling here is a terrible technical idea _and_ unnecessary but it's obviously not my call. I'm aware I'm about 7 years too late for that sort of concern to matter to the bureaucracy linux-kernel has become (since at least https://lwn.net/Articles/563578/), and I'm only replying because I was cc'd.

Sigh. I should do the patch to make external initramfs loading work if you've disabled the block layer. And resend the patch making DEVTMPFS_MOUNT apply to initramfs. And there's like 5 others on the todo list...


May 13, 2019

I've misplaced my phone. The downside of the "no sim card" state is if you lose track of your phone, you can't call it to have it ring. Black phone on back background.

Hey, one of my talk proposals was accepted to ELC in August. It's the "toybox vs busybox" one, which personally I think is the _least_ interesting of the topics, but eh, that's what they want to hear...


May 11, 2019

My phone is dying. It keeps saying "no sim card" randomly (requiring a reboot to see it again), and randomly switches itself off as if the battery's died, but when I charge it the battery says it's at 80% or some such.

It's been like that ever since I got caught in a thunderstorm on wednesday with the phone in my pocket. It dried out and started working again after a few hours, but not reliably...

Looking at Pixel 3a xl. Should I buy from t-mobile or from google? I'd like to do the whole AOSP install route if I can...


May 9, 2019

BSD development predated the web, or even widespread internet availability by about 5 years. (It was, in fact, responsible for much of it.) This means it had the problem of privileged communication channels dividing its community into "first class" and "second class" citizens.

BSD started off with a single development office with its devs physically located together in Berkeley, the same problem which prevented mozilla and openoffice from becoming real open source projects for many years after they _tried_ to open up. When almost all your devs are right down the hall from each other, any devs _not_ participating in those privileged channels of communication (face to face due to physical proximity) are sidelined so your development "in group" erodes the "out group" into irrelevance.

Remember that the original 1987 Usenix paper "The Cathedral and the Bazaar" wasn't about proprietary software, it was about 2 different types of open source development. The paper was written by the EMACS Lisp extension library maintainer, and was a comparison of the Free Software Foundation's members-only "cathedral" (with physical copyright assignments on paper) with the Linux "bazaar" taking patches from anybody and everybody on an open mailing list.

This is why toybox's "privileged" communication channel is a mailing list anybody with email can join, and even _then_ I deal (grudgingly) with github pull requests and bug reports and such (even though I MEANT to use that as just a distribution channel), because the younger generation of devs prefers that to email and I don't want to exclude them. (Go where the people are.)

Sigh. I gave a talk about this, but alas it was at Floush in Chicago and their recordings were screwed up both times I went there. I should do podcasts, but I suck without externally imposed deadlines and feedback. The great thing about programming is the box tells me what's wrong every time I try to compile and run anything.


May 8, 2019

Here's a portion of an email I _didn't_ send to scsijon on the toybox list. (It was off topic.)

I would have expected glibc rather than gcc to be the one to break that, it's not the compiler's BUSINESS to care about this. But ever since the C++ developers took over the C compiler they've been expanding "undefined" behavior as much as possible, presumably because C _not_ being a giant pile of irrational edge cases that make no sense so you just have to memorize them was a big advantage C had over C++, and they can't have that.

*shrug* I consider gcc kind of end-of-life at this point. LLVM doesn't act nearly as insecure about C's continued existence, and not being able to compile existing code with gcc 9 would be a bug in gcc 9 as far as I'm concerned.

Presumably there's a -fstop-being-stupid flag for this too if it did turn out to be relevant, or it would be trivial to work around gcc 9's bug if we did wind up hitting it, but this is 100% a gcc bug in cutting edge gcc. (Are they building with -werror or something? Looks like -Werror=no-format-overflow is what would turn it off, which sounds like the "may be used uninitialized, but isn't" inability to turn off the broken warnings without turning off unrelated non-broken warnings all over again...)

Rob

P.S. C++'s entire marketing strategy, going back to 1986, is "C++ contains the whole of C and is thus just as good a language, the same way a mud pie contains an entire glass of water and is thus just as good a beverage". C is a simple language with minimal abstraction between the programmer and the hardware. The _programs_ are complex but the language is not. C++ adds a lot of language complexity and abstractions that unavoidably leak implementation details so you can't debug them without knowing every detail of how they were implemented. C is a portable assembly language, as close to the hardware as it can get without a port from x86 to arm being a complete rewrite, and even then hardware details like endianness and alignment and register size peek through. I'm all for replacing C++ with go/swift/rust/etc. I object to a drowning C++ climbing on top of C and dragging it down with it.


May 7, 2019

Elliott continues to make AOSP (the Android Open Source Project, the base build for all android devices) do a "hermetic" build, which is their name for an airlock step. (They're shipping prebuilt binaries instead of building an airlock locally, but fine. Either way it means android is building under the tools android is deploying, which is halfway to native build support.)

Which also means they hit problems in the toybox tools, which I have to drop everything and fix. Which is why today I'm working on tar sparse support: they have tarballs generated by the host tar which they want to extract in the airlock, and they've got sparse files in them toybox tar can't currently understand.

And of course if I'm adding it to extract, I'm adding it to create too. The options are "not doing it" and "doing it right", the middle ground is called toys/pending.


May 6, 2019

Today Elliott pointed me at a fix to his sed performance issue, which is to use REG_STARTEND. This tells regex() to use the contents of the regmatch_t on _input_ to say where the end of the string is, which means A) no strlen() on the input each call to regex() (which is really slow when you're replacing lots of small matches on a very long string, hence the performance issue), and B) I can implement regexec0 to include null terminators without a hacky for loop over the data.

REG_STARTEND seems to have started life on freebsd over a decade ago, and is now supported by _everything_ except musl libc. It was picked up by glibc in 2004, it's on macos and freebsd and bionic, and had even made it into uClibc before that project died, but it's not in musl. And the reason is that Rich declined to support it when the issue came up, saying his users were wrong for wanting to support those use cases "hideous hacks". (There's a lot of that going around in musl; the users are wrong for wanting to do what the users want to do, musl is only for people who think like Rich.)

It's also not documented in the regex man page, so I poked Michael Kerrisk to fix the man pages, complained at Rich, and checked in the fix and a test with a 5 second timeout.

It was actually a multi-stage fix because I had to edit the string in place and avoid gratuitous realloc() because libc does _not_ short circuit same-size realloc, that's the caller's job. I'd have xrealloc() do it but that doesn't know how big the old one was...


May 5, 2019

Banging on the board I took a paid sidequest to work on (making the WF111 work on the SAMA5D3), and its BSP bit-rotted. The company that made it got acquired a few years back, and the 6 year old youtube videos on how to do stuff with this board point to websites that redirect to the new corporation's main page. Great.

I feel guilty charging them for the time it takes to learn how this stuff works, but the guy they had working on it retired.


May 3, 2019

The grep --line-buffered thing has been pending for a while, but the _input_ is also line buffered. I need to rewrite do_lines() to read large blocks of data (or even mmap it, dunno where the "it's a win" size is for that though, need to benchmark).

I'd like to avoid gratuitous copying, which means read a large buffer and pass in a pointer/len within the buffer, except for three problems: 1) where/when do I null terminate? (Inserting a NUL modifies the buffer, and if I keep the \n it has to stomp the next character _after_ the terminator, which may be off the end of the allocation.) 2) lines wrap off the end of the buffer and I have to either memmove or remalloc(), possibly both, 3) some of the users want to keep the buffer, at which point they strdup.

Basically I have to audit all callers to come up with a design, which is hard to do with a dirty tree.


May 1, 2019

Finally got around to updating my resume. I'm not looking for work yet but a recruiter wanted to know and I presumably have to do it eventually.

What I'd _really_ like to do is grow my patreon to the point I can do open source full time, but I don't expect that to happen before I run out of savings again. Or alternately get some of the big companies using toybox to buy "support contracts" so again, I can do this full time...


April 29, 2019

Broke down and saw Ant man Endgame, primarily because Fade saw and enjoyed it and would want to talk about it. (I'd happily see Carol Danvers II but I'd already been told she wouldn't even have half the screen time of Rocket Raccoon.)

I only wanted to walk out of the theatre in disgust once this time, when the same trap that fridged the lead female character of guardians of the galaxy fridged the lead female character of the original MCU avengers lineup, and put to rest the calls for a "Black Widow Movie". We got a single female-fronted MCU movie, the topic is done forevermore! (Other than that, lots of plot points were predictable by going "which actors want out of their contract, and the shakycam was so bad I lost track of the plot a bunch of times. I think I followed how they got 4 of the 6 McGuffins? No idea what happened to the one Loki stole, for example...)

And unfortunately, the movie put me in an irritable mood to review the second man.c patch. I had to back out my second round of changes (more than a day's worth of work) to apply it, and now I'm looking at the various changes that messed up code I'd carefully cleaned up the first time and it's triggering my "can I ignore this command forever" reaction, to which the answer is "no, the android guys will use the broken code out of pending and then it's even more work to clean up because I keep breaking their use cases behind the great google proprietary firewall"...

I've put a lot of skill points into programming. I'm not really that good at managing the work of others, but I can do it. But when the two overlap and other people are messing up my code I want them to GO AWAY and let me get on with it, which is not how open source is supposed to work and I know it. (I program by debugging an empty screen. Things moving behind my back while I'm debugging is BAD DEBUGGING and the way to fix it is to MAKE IT STOP. I can do pair programming just fine, but "I was working on a redo of this code and you sent me a patch for the old version..." I pretty much back out and discard all my work and start over.)

I said I was irritable.


April 28, 2019

Went to the farmer's market today. Learned it takes 4 lamb hearts to make a pound, and Fuzzy got a raspberry mead she's quite happy with. Plus duck eggs. (Woo-oo.)


April 27, 2019

I'm not hugely interested in seeing them kill of the _other_ half of the Marvel universe, so I bought a ticket for Shazam. I've already been spoiled on it, but eh. Sounds like a good movie anyway.

I got an automated email that my old [PATCH v3] Make initramfs honor CONFIG_DEVTMPFS_MOUNT stopped applying, so I'm doing The Lazy Way to see what happened:

1) patch -p1 -i blah.patch
2) git log init/main.c
3) git checkout -f $LASTHASH^1 # the ^1 means commit before that

And repeat until you find the last commit it applied to, and then the one you just ^1'd is the one that made it stop applying.

I'm tempted to automate this (git log $FILE | head -n 1 | awk {print $2}) and I could do patch --dry-run even, and I should probably do --no-merges on the log but I _have_ hit cases where a merge commit is what broke. (And those are "make puppy eyes at upstream" or "dig into the code and try to figure out what the fsck is going on".)


April 26, 2019

A few tangents I edited out of emails today:

It's a crying shame there isn't yet a chromebook shell you slide google phone du jour into and the usbc gives you keyboard, touchpad, display, battery/charger, and a better heat sink/fan for the phone... Yes I'm aware google wants everybody's data in the cloud so you can't work when the net's down. I'm weird.

Java was my primary programming language for ~3 years, about as long as C++ was. I went commodore basic to C to C++ to C to Java to C to Python to C. I'm the guy who told Sun's Mark English that the java 1.0 spec didn't have a way to truncate a file (just missed the 1.1 cutoff but they added it to 1.2).

Java _stopped_ being my primary programming language when they replaced the lightweight AWT with "Swing" and all that model/view/controller nonsense. Plus "no JDK for Linux" was the #1 bug on developer.java.sun.com for 11 months with no official response and _then_ sun screwed over blackdown on the Linux JDK stuff hugely, then they bloated the language so much they had to start doing javaEE subsets, refused to open source it forever and then turned into a patent troll when they did (I'm aware defending themselves against "Microsoft J" was a useful legal battle, but the antitrust breakup should have handled that if we didn't have an infestation of republicans.)


April 25, 2019

The nice clean keyboard of my new laptop is getting slowly grunged up by a human body hunching over it for hours. Sad but inevitable. (I've been trying to keep my fingernails trimmed to slow the rate at which the letters wear off the keys, but it's been like a week and the slow slide away from New Laptop Smell is inevitable. Which is weird becuse google says this model is from somewhere around 2012. I guess it was in a box or something? Anyway, big step up from what I _was_ using, even if I haven't tried to reflash the screen formware to get it to Stop Trying To Help With The Brightness. It only does it when I switch windows, so it's not as annoying as it could be.)


April 24, 2019

Benchmarked grep and found my version's way slower than devuan's version. (Which Elliott's been complaining about, and I confirmed he's right. Well, first I wrote a giant email I didn't send arguing about it, and then I benched and went "ok".)

Thought of a new approach where do_lines() chops text out of a buffer without copying it, which should be much faster, which brings up lifetime rules and requires changing the callers. _BUT_ it would allow me to get rid of the old get_lines(fd) API, which I've meant to do forever.

Ideally I'd want to mmap() that buffer when possible, but how long is the file? I'd need to llength() the file (except really I just want the simple length, the whole llength() mess was to get the size of noncompliant block devices that didn't properly report their size, and since the cdrom went away that's probably not a thing anymore?)

Anyway, giant files can be bigger than the available address space (certainly on 32 bit), so we'd want to map chunks of them. And if we're reading instead of mapping we definitely need a finite size because a read() is into an allocated buffer that doesn't discard clean pages it can read back in from a file. Which raises the problem of what to do when a line crosses the boundary. With mmap we can unmap(), lseek() and mmap() again, with a larger size if necessary. (And the data's probably still in the page cache afterwards.) With read() we can copy the data down, fill out the buffer to the end, and realloc() as necessary. (There's always the possibility of a pathologically large line that's bigger than any finite buffer we've allocated. Although the question of what to DO about such lines remains: we don't want "tr '\0' X < /dev/zero | grep" to trigger the OOM killer.

Anyway, I'm too tired to implement this right now. Which is odd because it doesn't seem like I've done anything today? But I wrote the giant email I didn't send, which was a lot of work, so I guess I have? We all have our process...


April 23, 2019

Hired dudes took down 3 sections of fence int he back yard to get at the poison sumac _tree_ growing between our fence and the neighbor's fence. They estimate it's been there for 15 years, but didn't try to take over the entire yard until we took out the bamboo that was crowding it out.

I gave them more money. I blogged about my weeks of misery and being afraid to touch the cats, and it's just been _looming_ ever since. (Not so much when I was in Milwaukee and Minneapolis and Tokyo and such, but if you're wondering why travel seemed like a great idea...)


April 22, 2019

Hired Dudes (as @kbspangler likes to say) are removing the poison ivy from the front and back yard (which turns out to be poison sumac, not poison ivy). Identifying it all, chopping it up, hauling it away, and painting poison on the stumps. It's so nice to finally find people willing to do that, and I've given them more money than they asked for to do it because YES, THANK YOU!


April 21, 2019

The "Wicd" wifi network chooser thingy doesn't work on the UT campus. I'm guessing there are too many networks here and it's overloading and saying no networks found. How they could be stupid enough to hardwire in a limitation like that... Eh, it's a Linux GUI tool. Of course they would.

(And every time I tell it my phone's password, it doesn't save it. I looked under "preferences" to see if maybe I needed to put it there, but I can't find anything there? How would I get it to forget a network that isn't currently present? There's no list of historical associations like in ubuntu. This thing was not well throught through.)

Anyway, I got Tar switched over to the new environment variable plumbing. I'm not sure the --to-command stuff works reliably (a short write will error_exit() out of tar entirely, even when writing to one of many short-lived child process here), but this isn't a _new_ problem and in my "tar xf linux-4.20.tar.gz --to-command sha1sum | head" tests the data for the whole file tends to go into the pipe before the consumer responds to it so each sha1sum instance complains it got a short write and then tar says it exited with error code 1, but it neither exits nor gets out of sync with the tarball. Pretty sure if I gave it a tarball with a big enough file in it the toybox one would exit, the question is what the debian one would do?

And now that mkroot isn't using busybox gzip (and thus needs gzip/bzip2/xz built in to busybox or tar doesn't know what they are and doesn't have the -z and -j command line options), I can enable gzip! Which I haven't quite finished cleaning up and promoting yet because I couldn't use it yet...


April 20, 2019

It's the evening before a holiday that HEB's 24 hour location actually closes for, so they're clearancing the bakery again. (Well, putting the 50% discount stickers on anything that expires tomorrow.) We have a chest freezer. Camped the spawn and spent $60 on many bags of discount baked goods.

Got new lib/env.c infrastructure checked in yesterday, so now I make tar use it.

The reason for going down this tangent isn't just that the shell needs it soon, it's that tar should use vfork(), but you can't independently modify your environment variables after vfork() because it's a common heap. But if I set and reset them in the _host_ before the vfork() and do the normal "leak the variable contents" thing setting the environment before the vfork(), and it sets a dozen-ish variables each file, for an unlimited number of files (how long's the tarball?) it can do bad things to memory. (I could putenv() and track them manually but if I need infrastructure to do that and the shell needs it eventually... So tangent.)

Switched my email to the new laptop today. Thunderbird's file selector can't select hidden directories (you can't type a path in, and the chooser doesn't show hidden directories) so I had to sed the config files by hand to change the path where the new copy of the "Local Folders" live, but luckily they're text files. (Yeah yeah, Linux: Smell the Usability. I think we've all given up on "linux on the desktop" ever happening at this point. I'd just like to avoid Android being as stupid as Firefox.)

(For some reason there was a 2 gigabyte sqlite file lying around with a last updated date of 2017. I'm guessing version skew? Yay freeing up disk space I suppose. The new laptop has a 2 terabyte disk in it, but I'm sure that'll be insufficient at some point.)


April 19, 2019

Spent a chunk of today arguing with Dell's firmware. Might know how to fix it, but haven't convinced myself a display annoyance is worth possible bricking yet. (How do you get the specific right firmware update for an aftermarket laptop? Apparently this thing was the height of technology at the end of 2012, Moore's Law is stone dead at this point.)


April 18, 2019

The behavior of debian's "env" command is... the same naieve one I just noticed and was about to fix in toybox:

$ env =blah | grep blah
=blah
$ env =blah env | grep blah
=blah
$ env =blah /bin/sh
  $ env | grep blah

Bash sanitizes out an environment variable with a blank name, but env doesn't.

Sigh, I should modify env to test the new lib/env.c infrastructure. It doesn't _need_ it (it's not persistent, it's fire and forget, nobody cares if it leaks a little memory before printing output or calling exec(), it's limited by the command line and setenv(argc[i]) directly is less memory than strdup anyway. BUT I want the env infrastructure to get a workout.


April 17, 2019

Brought new laptop out to the nice UT courtyard, tried to build mkroot, and... the kernel build failed because it hasn't got flex. Ok, try to phone tether... no networks found in the network gui thing. (It's not using networkmangler, which is great, but it also means I'm less familiar with this one's knobs.)

So, check from the command line, ifconfig says wlan0 is there, maybe I've hit the RF kill switch? Where is it on this laptop... the right side. Accidentally turned it off, turn it back on again and... stack dump in dmesg.

[30254.731639] iwlist: page allocation failure: order:4, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK)
[30254.731651] CPU: 3 PID: 27021 Comm: iwlist Not tainted 4.9.0-6-amd64 #1 Debian 4.9.88-1+deb9u1
[30254.731653] Hardware name: Dell Inc. Latitude E6230/0YW5N5, BIOS A19 02/21/2018
...
[30254.731738]  [] ? get_page_from_freelist+0x8f0/0xb20
[30254.731742]  [] ? ioctl_standard_iw_point+0x20b/0x3d0
[30254.731779]  [] ? cfg80211_wext_siwscan+0x480/0x480 [cfg80211]
[30254.731785]  [] ? ioctl_standard_call+0x81/0xd0
[30254.731789]  [] ? wext_handle_ioctl+0x75/0xd0
[30254.731793]  [] ? dev_ioctl+0x2a3/0x5b0
[30254.731798]  [] ? sock_ioctl+0x120/0x290
[30254.731802]  [] ? do_vfs_ioctl+0xa2/0x620
[30254.731806]  [] ? SyS_ioctl+0x74/0x80
[30254.731810]  [] ? do_syscall_64+0x8d/0xf0
[30254.731814]  [] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

[30254.731825] active_anon:78161 inactive_anon:78636 isolated_anon:0
                active_file:907684 inactive_file:151303 isolated_file:0
                unevictable:4 dirty:1471 writeback:0 unstable:0
                slab_reclaimable:2798373 slab_unreclaimable:8816
                mapped:16671 shmem:9549 pagetables:4208 bounce:0
                free:38992 free_pcp:23 free_cma:0

Why is it trying to do an order 4 allocation? That's 64 pages of contiguous memory on an active system that's doing an rsync from a usb drive to the main system? (Second pass of file copying from backups.)

So I have to stop the rsync, sync && echo 3 > /proc/sys/vm/drop_caches, and _then_ toggle the RF kill switch? That's kinda pathetic... ok, and now it's back to finding no networks.

[30900.379840] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.380069] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.380159] iwlwifi 0000:02:00.0: Radio type=0x1-0x2-0x0
[30900.618546] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.618776] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[30900.618866] iwlwifi 0000:02:00.0: Radio type=0x1-0x2-0x0

What does disabled mean here? Is this because the apt-get upgrade yesterday updated the iwlwifi firmware? (I dunno why it did it, the one it installed with from the dvd worked fine. Would it work after a reboot if I'd never suspended?)

Ok, I clicked "disable wifi" in the gui thing, waited 10 seconds, "enable wifi", and then something like 30 seconds later (it toggled the rf kill bit again according to dmesg) hit "scan" and NOW it can see my phone... darn it, AND THEN IT WENT AWAY AGAIN.

This is amazingly brittle and I don't know what the magic incantation is, but it's CLEARLY software being broken here. Ugh, in addition to the driver needing an order 4 allocation and being unable to get it if the system isn't COMPLETELY IDLE, the gui tool is horked: "iwlist wlan0 scanning" shows me a bunch of networks. Lemme see if I can remember how to associate by hand... wow it's congested here, my phone is cell 24 in this list. (Lots of instances of "utexas" and "eduroam", that's a university for you.)

Ok, "iwconfig wlan0 essid Arkleseizure key s:password" ... is not it because it doesn't support wpa passphrase. Which EVERYTHING uses now. Right, let's see, that's the wpa_passphrase command which takes the ssid and the password... is that the same as the essid? Let's try it... I got a 64 byte hex string, which is longer than "key" will accept as an argument. And the iwconfig man page's section on the "key" and "enc" options (what's the difference between them?) talks about registering multiple keys and referring to them by number...? What is this nonsense.

Alright, let's try USB tethering. In dmesg I get:

[32786.261845] usbcore: registered new interface driver cdc_ether
[32786.268344] rndis_host 3-2:1.0 usb0: register 'rndis_host' at usb-0000:00:14.0-2, RNDIS device, de:d3:48:09:5c:0e
[32786.268382] usbcore: registered new interface driver rndis_host

And cdc_ether is presumably the ethernet thing going "hey, right here" and there's no second ethernet interface in ifconfig. Still just eth0, wlan0, and lo. The gui thing is apparently ONLY for wireless, doesn't have any wired control options. Do I need to insmod something myself? Why is this not working? Ok, dig down into /lib/modules/*/kernel/drivers/net/usb and it looks like I need to "modprobe cdc_ether"... which seems to have been a no-op? Ah, it's already there in /proc/modules. And that's the rndis_host stuff I guess? What does /sys/class/net say, that shows a usb0...

AH! My bad. ifconfig -a shows it, usb0 is down and ifconfig only shows up interfaces by default, -a shows all of them. So the only problem is the net app isn't responding to it, so run "dhclient usb0" manually to do dhcp on it and...

Ha! I have net!

And rain and thunder and lightning 20 feet away from my table. Quite the storm, might be here for a while. Of course when I go walking there's a storm. Yes I brought an umbrella, but downpour and lightning seem a bit much for it. Been going for a while, though, starts and stops a lot, maybe I can head out in a gap and make it to another overhang to wait out the next outburst?

Hmmm. Devuan's "Power Manager Plugin" has critical power level set to suspend at 10% but it never triggered, and at 1% (!) I noticed because the power light went solid orange, and I closed the laptop and plugged it backed in a bit (despite the lightning) to have enough power to get home.

Devuan has different bugs than Ubuntu 14.04 did, but the whole "Linux: smell the usability" thing is out in full force. 28 years of Linux and we still SUCK at this.


April 16, 2019

So, new laptop! Installed Devuan Ascii with xfce, and this time the setup is (still derivative of last time):

Install devuan, selecting xfce and leaving most of the options stuff unclicked. Boot into new system.

Fiddle with GUI stuff at the top a lot. I have cpu graph, disk performance monitor, network monitor, workspace switcher (8 desktops in 2 rows of 4), free space checker, DateTime doing "%a, %b %d, %r" in 14 point font, the network "notification area", power manager plugin, pulsaudio plugin, and clipman. Plus I went into settings->panel from the applications menu and told the bottom panel to hide itself "intelligently".

Then click on the battery icon, power manager settings, and suspend when lid is closed, system sleep mode "suspend" for both, on critical battery power suspend, disable "lock screen" checkbox, display power management blank after 10 minutes, sleep 11, switch off 12, reduce brightness after "never".

Then since that didn't stop the stupid screen lock on suspend (physical access to my laptop is game over, don't pretend it isn't until you've fixed badusb and friends), "apt-get remove xscreensaver".

Next up apt-get install aptitude chromium mercurial subversion git-core pdftk ncurses-dev xmlto libsdl-dev g++ make flex bison bc strace diffstat

The hard drive is annoyingly clicking. hdparm -B 254 /dev/sda fixed it, added that to /etc/rc.local. (Wow the /etc directory has a lot of crap in it in devuan 2.0.) Googled for a bit to see if the hard drive parking itself every 2 seconds was worse for its longevity than the vibration of hitting keys and jostling it around the table when the heads aren't parked, but nobody seems to have studies. (Presumably it still has the impact accelerometer that does the emergency park.)

aptitude install -R thunderbird (to get it _not_ to install the "lightning" calendar extension because this isn't outlook).

Set the terminal background color to _actually_ black, not just a dark grey, and make the text color fully white. I'm in enough bad lighting situations as it is, I don't need grey-on-grey in terminal windows. (I also switched from monospace to monospace bold, but I'm not sure it's an improvement? Hmmm... no, don't think so. Switched it back.)

apt-get remove libgnome-keyring0 (which I don't use and causes stupid chromium popups... and that didn't stop it. Nor did telling chrome never to store passwords, or switching off lots of password things in chrome://flags. And I dunno how to change the xfce pulldown menu to start it with --password-store=basic (what I _want_ is --password-store=none). When I need a darn password I'll enter the darn password, stop trying to "help" here. I NEVER tell my browser to save passwords, it defeats the purpose of passwords. Save a key cookie if you want to do that...


April 15, 2019

Dug up the 2 terabyte hard drive I bought in Milwaukee and walked back to the discount electronics place to try to pay them to install it into the new laptop (exercise!), and they more or less declined. (They'll install stuff I buy from them for free, but trying to hire them to install _my_ stuff is more expensive than buying the part from them.) Oh well, I can do it myself, just didn't want to.

Huh, this one is MUCH less painful than swapping out hardware in my netbook was, that required popping out the keyboard and digging _down_ into the machine, this one the bottom panel comes off and the memory and hard drive are right there. Convenient! (And totally made in china; Dell had nothing to do with the design of this hardware.)

Installing Devuan on it. Unlike the new oversizes system76 laptop (which I still have on a shelf) it did not ask for strange binary firmware that's not on the USB stick! Woo! (The easy way to get something to work with Linux is to pick hardware that's several years old.)

Dug up a 2 gigabyte hard drive for it.


April 14, 2019

Bought a new laptop! Walking back from Tax place again (this time carrying an umbrella as a parasol) because I had to drop off the check for the bank routing info, and on the way back I stopped in to the "discount electronics" place on Andersen near Lamar, and they had a Dell Latitude E6230 for cheap that's a nice form factor, reasonable processor (core i5), and can hold 16 gigs of ram.

I'm not asking for _that_ much in a laptop. It's a several year old model (2015 I think?), but Moore's Law dying means that matters a lot less than it used to. (Technically the S-curve tarted bending down around 2001 and the exponent gradually decreased until it's asymptotically approaching 1, meaning the advances these days are linear rather than exponential. The technology's still advancing, but not in a world changing way.)

And unlike anything I've ever seen from System76, THIS one is reasonably sized. (It doesn't QUITE fit into the netbook bag because the extended battery sticks out too much, but would if it didn't so points for trying.)


April 13, 2019

Back in Austin, went to my tax appointment and got a bad sunburn walking back. (Even though I was in the shade of I-35 for at least half of it.) Gotta go back tomorrow to drop off a check. Then I went to natural gardener with Fuzzy. They're out of African Basil.

Rideshare is expensive (between one way to taxes and both ways to natural gardner spent over $50 on it today), but my car is dead and self-driving is coming so I don't really want to replace it if I don't have to. Waymo's Guy In Charge Of That estimates they'd like to charge $1k/year ($85/month) for a flat rate subscription in a municipal metro are, as in your phone's Google Maps app grows a "take me there" button next to "directions" that when pressed turns into a countdown of seconds until your vehicle arrives. They're already prototyping this in Arizona, the tech is ready it's just regulation catching up to allow them _not_ to have someone sitting in the driver's seat "just in case". (Because nothing says paying full attention like somebody _not_ driving. "Driver assist" is an accident waiting to happen, either the human is driving or the human is NOT driving, halfway states are called a "distracted driver".)

To clarify: Google's technology is ready, but they've been working on it for over 15 years now. Uber's keeps killing people because they suck, and are trying to play catch-up, but keep in mind Musk didn't found Uber, Martin Eberhard did. Musk acquired it in a hostile takeover with the money he made from Paypal in the dot-com boom. His technology only advances when he buys other companies (like SolarCity and Maxwell) or when he hires people away from them who are already doing things (when doesn't work so well for them and tends to turn into lawsuits).

All the others are still playing catch-up, but everybody's working on it because it's a game of musical chairs. Lots of people are doing parts of the business model the way lots of people were doing parts of smartphones (apple newton, palmpilot, the motorola razr running ran apps written in java) before iPhone and Android shipped in 2007.

The thing is, one car per person was always a terribly inefficient model. Individually owned cars are only driven about 4% of the time (parked 96%), even human-driven taxis are driven about 40% of the time (the humans still sleep). Assuming self-driving cars are on te road 10x as much (which is a conservative estimate) you'd need 1/10th as many cars to serve a given metro area (yes even at rush hour, which is about 3 hours each way meaning multiple round trips even without carpooling).

Then add in the fact that an electric car lasts a million miles each before you have to service anything (other than replacing the tires every 30k miles: no air filters, no oil to change, no transmission, the batteries have active liquid cooling so they last a long time...)

So if you're a car company seeing the rise of cheap electric self-driving vehicle fleets, you're playing a game of musical chairs: your industry's manufacturing volume is about to drop by an order of magnitude and there's only enough market to support 1/10 as many car companies as we have today. They're all racing to switch over before their competitors do.

People immediately go "but what if somebody barfs in the car" (then you can report the car soiled in the app and request another, and they know which phone was riding in the car so they can place blame appropriately and prevent a recurrence), and this is why they're doing trials and limited rollouts and building service centers and so on.

The estimates are that the gasoline distribution network will collapse around 2025 when volume falls below fixed costs and the whole mining/shipping/refining/delivery/sale network we have now becomes unprofitable. At that point gas stations stop selling gas and become convenience stores, and gasoline becomes something those who still need it order delivery of (like liquid nitrogen from airgas), and keep their own tank on site. Sufficiently rural areas will be "stuck on dialup" for a couple extra decades, but cities will get rid of parking lots fast: that land's way too valuable when an app-summonable vehicle can pick you up and drop you off from the curb and never need to be parked anywhere but the fleet maintenance depot.

That's why I dowanna get a new car if this is only a couple years away. It's like installing a land line once digital cell phones have arrived, but not yet having a cell tower in range of my house. Or using dialup when cable modems are available, or still having cable TV when you have streaming services. I don't want to own a car, I want the app. I just need coverage to reach where I live.


April 8, 2019

I haven't been posting as here much because I've been posting to the mailing list, today's issue is reestablishing the setenv lifetime rules again so I can reopen the toysh can of worms.


April 7, 2019

Went to see Captain Marvel again, this time with my sister and the niecephews. Shortly before the big fight on the spaceship (the one set to I'm Just a Girl) a guy in the back row was discovered unconscious, and the next 20 minutes the theatre had the lights on while the movie played and everybody was looking at the back of the theatre instead of the screen as the theatre staff asked him loudly if he was diabetic until the EMTs showed up and carried him out.

They didn't pause the movie, but they did give us free passes to see another movie some other time as we left (and told us that he'd had an epileptic siezure but was otherwise fine).

This is why people wear bracelets for this sort of thing. One the one hand I feel bad for the guy, on the other he cost the theatre the revenue from a packed house and my niecephews missed the climax of the movie. (Not the punching spaceships part, but the entire facing down Annette Bening part and the montage the internet will inevitably set to "I get knocked down but I get up again". That's this movie's version of the camera circling the avengers while the theme plays, the punching spaceships bit is denouement.)


April 2, 2019

Submitted 5 ELC talk proposals (I think 3 of them were to "whatever conference they're hiding ELC behind this year", this pairing thing is terrible). Of course it was at the last minute (which due to pacific time was 2 hours later than I thought). I should memorialize them for posterity, but didn't.

Trying to finish and promote tar today so it can go in Android Q, which is mostly filling out the test suite so everything's tested (and fixing what that finds), and SKIP_HOST isn't granular enough.

What I want to say is "some non-toybox versions of this are expected to fail, but it's still a good regression test for us", such as the fact that the gnu/dammit tar can't handle "cat tarball.tgz | tar tv" and mine can. (I can autodetect type on a nonseekable stdin. It was a pain, but I refused to let it _not_ do it.)

But if you extract toybox source onto a mkroot system where the host is toybox and want to run the tests on the host toybox? That should be fine.


April 1, 2019

I hate april fool's day. Trying to stay off twitter.

The gnu/dammit tar has a --full-time but doesn't have --no-full-time, which is annoying because I'm printing --full-time by default. Sigh. I can add the other thing for compatibility, but ow? (If ls does --full-time, why doesn't tar -tv?) Sigh. Ok, implement --full-time just so the test suite can pass TEST_HOST...

Huh, I added a TARHD variable I can set where this test passes the output through "hd >&2" so a hexdump of the created tarballs goes to stdout. That way if they differ, I can figure out what differs. But what I _really_ want is to catch the failure and run the host and target versions through hd _then_, which means I need to be able to register an error handler function. Hmmm... It's a pity bash hasn't got an "is this function defined" check. No wait, that's under the shell builtins... "type -t name". Returns "function" if it's a shell function. Ok...


March 31, 2019

The L records the gnu/dammit tar outputs for long filenames have the permissions and user/group names filled out. They're not needed (they're in the next header and those are the ones that get used), but they're filled out. Meanwhile fields like timestamp are zeroed. There's no obvious pattern to it, I think it's an implementation detail (sequence packets are initialized?) leaking through into the file format.

No, it's worse. The owner/group is always "root" and the permissions are 644. So the field could be zeroed but it's instead nonsense. As with the " " after the checksum, just gotta match the nonsense to get binary equivalent tarballs.


March 30, 2019

I'm writing tar tests, trying to do a proper thorough job of testing tar (which the previous tests didn't really), and I did "tar c --mtime @0 /dev/null | tar xv", which should more or less be ls -l on a char device, but:

--- expected
+++ actual
@@ -1 +1 @@
-crw-rw-rw- root/root 1,3 1970-01-01 00:00 dev/null
+crw-rw-rw- root/root 0 1970-01-01 00:00:00 dev/null

It's showing size, not major, minor. (This is the gnu/dammit one.) I want TEST_HOST to pass, but they're showing useless info here. "Be compatible" is fighting with "do it right". Hmmm...

What does posix say? Hmmm. The last posix spec for tar was 1997, before they removed it (just like cpio, the basis for rpm and initramfs; Posix went off the rails and we're I'm waiting for Jorg Schilling to die before trying to correct anything). And that says:

The filename may be followed by additional information, such as the size of the file in the archive or file system, in an unspecified format. When used with the t function letter, v writes to standard output more information about the archive entries than just the name.

Great, EXPLICITLY unspecified. Thanks Posix! You're a _special_ kind of useless.


March 28, 2019

Ok, the Embedded Linux Conference and Open Source Summit are colocated in San Diego in August (the Linux Foundation does this to dilute the importance of conferences, it's about like how Marvel had endless crossoves to force you to buy more issues back in the 90's right before they went bankrupt). The CFP closes April 2. I should submit a thing.

Topics. Ummm. I could do an updated 3 waves thing (lots of good links for that, credentials, A03 is fan run and thus better at what it does, more on that, credentials vs accomplishment, and so on.) I could do a talk on 0BSD, on mkroot, on toybox closing in on 1.0...


March 27, 2019

So, tar paths...

$ tar c tartest/../tartest/hello | hd
tar: Removing leading `tartest/../' from member names
00000000  74 61 72 74 65 73 74 2f  68 65 6c 6c 6f 00 00 00  |tartest/hello...|

It's matching .. sections (the code I'm replacing was just looking at _leading_ ../ which isn't good enough).

$ tar c tartest/../../toy3/tartest/hello | hd
tar: Removing leading `tartest/../../' from member names
00000000  74 6f 79 33 2f 74 61 72  74 65 73 74 2f 68 65 6c  |toy3/tartest/hel|
00000010  6c 6f 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |lo..............|

And the gnu/dammit code is stupid.

$ tar c tartest/sub/../hello | hdtar: Removing leading `tartest/sub/../' from member names
00000000  68 65 6c 6c 6f 00 00 00  00 00 00 00 00 00 00 00  |hello...........|

_really_ stupid.

Of course figuring out what/how to cannonicalize is weird too, because I don't have abspath code that stops when it matches a directory, and there's no guarantee it would anyway rather than jumping right over it. I want the _relative_ path to be right.

Sigh. Compatibility, do what the existing one's doing...


March 25, 2019

Got a heads up from Elliott that auto-merges of external projects into the Android Q branch end on April 3, feature freeze in run up to the release. So if I want to get tar promoted and in, I've got until then.


March 24, 2019

Once again trying to work out if old = getenv("X"); setenv("X", "blah", 1); setenv("X", old, 1); is allowed. Because old is a pointer into the environment space, and setenv replaces that environment variable. Under what circumstances do I need a strdup() in there?

I dug into this way back in 2006 but don't remember the details...


March 18, 2019

Tar cleanup corner case: the gnu/dammit tar fills out the checksum field weird, I kinda dowanna do that but the resulting tarballs won't be binary identical if I _don't_...

Backstory: tar header fields are fixed length records with left-justified ascii contents, padded with NUL bytes. The numerical ones are octal strings (because PDP-7 used a 6 bit byte, we say the machine Ken and Dennis wrote Unix on had 18k of ram but that was 1024 18-bit words of memory).

The "checksum" field is just the sum of all the bytes in the header, and is calculated as if the checksum field itself is memset with space characters. (Then you write the checksum into the field after you've calculated it.) The checksum has 7 digits reserved (plus a NUL) but due to all the NUL bytes in the header, the checksum is almost always 6 digits. So it _should_ have 2 NUL bytes after it... but it doesn't. It has a NUL and a space, ala:

00000090  31 34 31 00 30 31 32 32  36 36 00 20 30 00 00 00  |141.012266. 0...|

The _reason_ for this is historical implementations would memset the field, iterate over the values, and then sprintf() into the field which would add a NULL terminator but not overwrite the last space in the field. And the gnu/dammit tar is either _doing_ that, or emulating it.

I'm not memsetting spaces into the cksum field, I'm starting with 8*' ' and skipping those 8 bytes... but the result is I'm printing out two NUL bytes at the end instead of NUL space. And if you check for binary identical files...

It's _almost_ certain no tar program out there is going to care about this, but if I don't and I use canned tarballs in my tests, CHECK_HOST would always fail with the gnu/dammit implementation. (Or possibly busybox, I haven't looked at what that's doing yet.)


March 16, 2019

Oh FSM. I feel I should do a response to LWN's motivations and pitfalls for new "open source" licenses article, but you could just watch my 3 minute rant on there being no such thing as "the GPL" anymore, copyleft fragmentation inevitably increasing as a result, and the need for a reclaimed public domain via public domain equivalent licenses that don't have the "stuttering problem".

Of course there's no mention of 0BSD or similar, they haven't noticed it yet. A lot of people haven't worked this sea change through to a logical conclusion yet, they're still trying to make a better buggy whip because their old one stopped serving their needs. Fighting the last war...


March 15, 2019

That side gig is hanging over me. I want to do the thing for them, it's not hard, but I'm huddling under an "out of service" sign.


March 13, 2019

At Fade's. Well, currently at the McDonald's down the street from Fade's.

Tar has an interesting corner case in autodetecting file type: if it's a seekable file you can read the first tar header block (512 bytes) and if it doesn't start with "ustar" (unix standard tar, posix-2001 and up so an 18 year old format we can pretty much assume at this point, albeit with extensions) then check for compression signatures for gzip and bzip...

At which point, if it's _seekable_ you seek back to the beginning, fork, and pass the filehandle off to gzip or similar. I just redid xpopen() so it can inherit a filehandle from the host namespace as its stdin/stdout. (It can still do the pipe thing: feed it -1 and it'll create a pipe, but feed it an existing filehandle and it'll move it to stdin/stdout of the new process; I should probably have it close it in the parent space too but haven't yet because when you pass along stdin/stdout _those_ shouldn't get closed and is that the only case?)

But if it's _not_ seekable, I have 512 bytes of data I need to feed into the new process, and there's no elegant way to do that. I kind of have to fork another instance of tar with the appropriate -zjJ flag and then have _this_ one spin in a loop forwarding it data through a pipe(2).

Which is awkward, but doable...


March 12, 2019

Packed out of apartment, onna bus to Fade's.

Hey, ubuntu found a new way to fail. Doesn't suspend because kworker/u16 (Workqueue: kmemstick memstick_check [memstick]) failed to suspend for more than 120 seconds, and so the suspend was aborted _after_ I'd closed the lid and put it in my laptop bag, so instead it got VERY HOT.

Bravo, ubuntu. Yes of course the correct thing to do if the memory stick probe hangs for 2 minutes is to MELT THE HARDWARE. Linux: smell the usability!


March 11, 2019

First day where I would be working if I hadn't quit the job. Sitting in the apartment poking at computer stuff. I had a long todo list I haven't done any of yet. Luckily, over the years I've learned that "not doing stuff" is an important part of the process. I need cycle time. Rest, recovery, sleep, staring out windows. I gave up a lot of money to be able to afford _not_ to do stuff today, and am enjoying it.

That said, I should at the very least drop off the "moving out of the apartment" form, and maybe take my bike back to the bike co-op I got it from and go "here, free bike". (It's a vintage Schwinn, it's lovely. Someone will want it as much as I did. Alas, can't easily take it out of state with me.)

Somebody tried to sign up to the https://landley.net/aboriginal/about.html mailing list, and I forwarded them to mkroot, but as I told them in the email... "I mostly talk about it on the toybox mailing list. And patreon. And my twitter. And my blog..." (It had a mailing list but I stopped using it after a thing happened. I have a vague roadmap to merge it into toybox and stop doing it as a standalone project, but need to implement route and toysh in toybox first.)


March 10, 2019

And thunderbird filled up all memory, wasn't watching, didn't kill it fast enough, and it locked the machine hard. Had to power cycle. Wheee.

Lost 8 desktops full of open windows, most of which had many tabs. Rebuilding much state. The most irreproducible loss is, of course, all the thunderbird windows where I clicked "reply" to pop open a window to deal with later. Thunderbird keeps no record of that whatsoever. (Kmail would reopen them all when restarted, but alas that was bundled into a giant desktop suite and went down with the ship it was tied to the mast of. Pity, it was a much better mail client than thunderbird. Oh well.)

Once upon a time, Linux had an OOM killer that would kill misbehaving processes if the system was in danger of running out of memory and locking up. People complained that their process might get killed. So the kernel devs neutered the OOM killer so it doesn't work remotely reliably and now the whole system locks up as often as it's saved by the OOM killer, because killing _every_ process is clearly an improvement to killing _a_ process.

Sigh. Lateral progress.


March 9, 2019

Thunderbird's sluggish again so I tried to clean out the linux-kernel folder. Since this is the big machine with 16 gigs of ram and 8 gigs swap, I told it to move 96k messages instead of the usual 20k at a time. It moved all the messages, and then did its Gratuitous Memory Hog thing it always does at the end (because Thunderbird is programmed strangely). It ate all 16 gigs DRAM, worked its way through all 8 gigs swap, and then I called killall thunderbird from the crl-alt-F1 console before the machine could hard lock (because the OOM killer dosn't work anymore, no idea why).

And of course when I started it back up, none of the messages it had spent hours copying to the new folder had been deleted from the old one.

Could somebody not crazy write an email client? This doesn't seem hard. Far and away the _most_ annoying thing about thunderbird is when it pops up a pulldown menu or hovertext, and then freezes for 6 minutes doing something where the CPU or disk is pegged, and the darn pop-up follows me when I switch desktops, blocking whatever's behind it.

So now I tried right click delete... and it's moving 96k messages to a trash folder. Sigh. NO, DELETE THEM! NOT MOVE TO TRASH! NOW WHEN THIS CRASHES I'M GOING TO WIND UP WITH _THREE_ COPIES!

It's a good thing this machine has gigabytes of free disk space because this email client is written by idiots. And once you start one of these operations that's going to take 4 hours (and then maybe try to crash the OS again afterwards if you're not babysitting it), there's no way to interrupt it short of kill -9 which would leave the files in who knows what state...


March 8, 2019

Last day at JCI. Stress level: curled into a ball, whimpering.

Sigh. I'd really like to move the Android guys to a more conventional build approach, where the Android NDK toolchain is not just a generic-ish toolchain but is the one used by AOSP, so that 1) you can export CROSS_COMPILE=/path/to/toolchain/prefix- and if your build is cross compile aware it just works, 2) Android isn't shipping 2 slightly different toolchains that do the same thing.

They are reluctant to do this because A) windows, B) they see me trying to apply conventional embedded-ish development to android as weird. (Everybody except them is an app developer. This isn't how you build apps!)

Sigh. I keep going "this reduces to this, just implement the general case and it should work in a lot more situations" and getting "but that's not how we've ever thought of it, you'll confuse people". I get different variants of it from the linux kernel guys, the distro maintainers, embedded developers, the android guys, compiler developers... everybody's in their own niche.


March 7, 2019

I've been doing a review pass of pending/tar.c and adding a bunch of "TODO: reading random stack before start of array" and so on, and I've come to the conclusion I need to change the xpopen_both() api. Because if the child process needs its stdin or stdout hooked up to an existing filehandle, there's no current way to do that.

The way it works now is you pass in an int[2] array and it hooks up a pipe to each one that's zero, and writes the other end of the pipe into that slot (int[0] going to the stdin of the process and int[1] coming from the stdout of the process). But what I _want_ is if I feed an existing filehandle to the process, THAT filehandle should become the stdin or stdout of the process. (So gzip can read from or write to a tarball.)

Also, once upon a time I had strlcpy() which was like strncpy but would reliably add a null terminator and didn't do the stupid (memset the rest of the range after we copied). It was just something like "int i; if (!len--) return; i = strlen(src)+1; memcpy(dst, src, i>len ? len : i); dst[len] = 0;" and it worked fine. But unfortunately BSD had the same idea, and added it to libc in a conflicting way (const const const str const *const) and I think uClibc picked that up, so I switched to xstrncpy() which will error_exit() if the string doesn't fit. Which 99% of the time is what you want: don't silently corrupt data. BUT with tar and the user and group name fields...

Hmmm, except if they don't fit what _do_ we want? Truncating could (theoretically) collide with another name, and if the lookup by name fails we've already got UID/GID. (I did bufgetpwuid but didn't implement a negative dentry mechanism for optimizing _failed_ username lookups...)

Ah, it's using snprintf(), close enough. (I keep confusing that with strncpy, which is stupid and will memset the rest of the space with zeroes for no apparent reason. But snprintf() will just _stop_writing_ at the appropriate spot, leaving a null terminator and not gratuitously molesting the rest of the field.)


March 6, 2019

Last week at work. Totally listless. Paralyzed, basically. I'm stress eating and stress tweeting.

Also, SEI has resurfaced with Probaby Money (not yet the same as Actual Money but you never know), and I've mentioned my recruiter found me a side gig (telecommuting getting a medical sensor board upgraded to new driver versions), and I'm kind of annoyed that I quit my $DAYJOB (which paid quite well) so I would have TIME, and that time is already filling up with other less-well-paying work.

I'm totally aware this is a self-inflicted problem, but... dude. I should be better at saying no.


March 4, 2019

Dreamhost has been poking me about renewal for landley.net. Got the check in the mail today.

(I know way too much about how the sausage is made to be comfortable doing financial transactions online. I'm aware it's silly, and yet...)


March 3, 2019

Poking at toys/pending/tar.c and of course the first thing I do (after a quick scan and some "this sprintf is actually a memset" style cleanups) is build it, make an empty subdirectory, and "tar tvzf ~/linux-4.20.tar.gz". And I get a screen full of "tar: chown 0:0 'linux-4.20/arch/mips/loongson64/common/serial.c': Operation not permitted".

Sigh. This is unlikely to be a small task.


March 2, 2019

Fighting bad Linux userspace decisions.

So top -H is showing the right CPU usage for child threads, but the main thread of a process has the cumulative CPU usage. I _think_ this is because /proc/$PID/stat and /proc/$PID/task/$PID/stat have different data (I.E. the kernel is collating when you read through one API but not reading the same data through another API).

I have a test program that spawns 4 child threads and has them spin 4 billion times in a for(;;) loop, and I just poked it to dprint(1, "pid=%d") the PID and TID values (to a filehandle so I don't have to worry about stdio flushing for FILE *), and I hit my first problem: glibc refused to wrap the gettid() system call? (What the... the man page bitches about thread IDs being an "opaque cookie" and I'm going "this is legacy crap from back when pthreads as an abomination, before NTPL, isn't it?" Sigh, so use syscall() to call gettid so I have the number I can look in /proc under.

Second problem: the process doesn't _end_ until the threads finish spinning and exit, which means the output doesn't close, so my little pipeline doing:

./a.out | sed -n s/pid=//p | (read i; cat /proc/$i{,/task/$i}/stat)

Is sitting there blocked in read until a.out exits, at which point the cat says the /proc entries don't exist anymore. This is DESPITE the fact that if you chop it at the second | you get the value followed by a newline immediately! It's just that bash's read is blocking trying to get more data AFTER the newline, for reasons I don't understand? (Even read(4096) should return a _short_read_. And yes the "read i;" needs a sleep 1 after it to accumulate enough data to see the divergence reliably, but this bug hits first and that confuses debugging right now.)

This totally needs to be a test case for toysh. My "bash replacement" should get this RIGHT, even if ubuntu's bash doesn't. (I was even desperate enough to check /bin/dash, which also got it wrong in the same way. Well, ok dash didn't understand the curly bracket syntax, but it waited out ./a.out's runtime _before_ getting that wrong.)


March 1, 2019

Two different coworkers basically need the toybox version of a command to fix a problem they're having. One is that busybox's ar can't extract an ipk file, another is a busybox tar bug where if you tar -xC into a subdir that results in broken symlinks (in this case a root filesystem install from initramfs into a mount point where /etc/localtime points to a timezone file that's there in the subdir but the symlink points to the absolute path of where it would on the final system), busybox tar does NOT chown the symlink. So the symlink belongs to root:root instead of whoever it's supposed to belong to, even though the tar file has the right info.

Alas, I haven't implemented toybox tar and ar because I've been too busy with $DAYJOB. I'm not sure if this is ironic or merely unforunate. I'd ask Alanis Morisette, but I'm told she had problems with that too.


February 28, 2019

It's the last day of the month and I kept meaning to check if any conference call for papers were expiring, but I just couldn't bring myself to care.

I told my boss at $DAYJOB on monday I'm too burned out to accomplish anything else, but they still haven't let me know when my last day is. They keep saying they're _not_ unhappy with my performance on the morning call, but _I_ am unhappy with my performance.

One of the big differences between my mental health in my 20's and now is I know when I need to bow out for self care. (I often miss when I _should_, but am reasonable about working out when I _need_ to.)


February 27, 2019

I'm doing board bringup for that side gig, and they just emailed me a large explanation of the hardware they need working. I unboxed the new board yesterday and confirmed the bits connect together, but haven't actually powered up the result yet.

My first goalpost on any new board is "boot to an initramfs shell prompt on serial console", at least when I'm trying to understand everything and rebuild it properly from source. Getting that working means:

1) Our compiler toolchain is generating the right output for the board, both in kernel mode and userspace/libc.

2) We know how to install code onto the board and run it. (Whether it's tftp into memory or flash it to spi or jtag or what.)

3) The bootloader is working, running itself and doing setup (DRAM init, etc), then loading and starting our kernel.

4) If we get kernel boot messages then the kernel we built is packaged correctly, has a usable physical ram mapping, and is correctly writing to the serial port.

5) If we can run our first program (usually rdinit=/bin/sh) then the kernel is enabling interrupts properly (the early_printk stuff above is simple spin-and-write-bytes with interrupts disabled, that's why printing fewer early boot messages can speed up the board booting), finding a clock to drive the scheduler, and this is where we verify the libc and executable packaging parts of the toolchain work right (because we're finally using them; often I do a statically linked rdinit=/bin/helloworld first if it's giving me trouble.)

When we're done "I built and ran a userspace program that produced output" means I should be able to build arbitrary other ones, and a toybox shell is the generic universal "do lots of stuff with the board" one, where you can mount /proc and /sys and fiddle with them, load modules, etc. That's basically where you get traction with the board.

When an existing BSP gives you a working Linux reference implementation, most of these steps are probably just isolating and copying what it's doing, but I like to step through and move all that stuff into the "I know what it's doing, or at least where to look it up if it breaks" category on any new board I have to support in a nontrivial way.

Then the next thing is usualy digesting the kernel .config into a miniconfig and seeing what's there, coming up with the minimal set of options to do the shell prompt thing and cataloging the rest of them.


February 26, 2019

I'm trying to figure out if my normal response to spam callers is "punching down". I always try to hit the buttons to get through to a human, then say "You spam people for a living. That's sad." and then hang up.

The problem is, I'm doing this to the minimum wage drones in some poverty-stricken rural area who are... doing it for a living. Not the people benefitting from it and collecting 90% of the money from whatever scam it is. But alas, this is the only way I know to push back. (It's not like our current government will do anything about it, not until the GOP finishes imploding, which won't happen until the Boomers die and the fossil fuel companies lose their position as 1/6 of the planet's economy.)


February 25, 2019

Told my boss I'd like to wrap up at work. The money is _lovely_ and this is work I could do in my sleep _if_ I could do it. Unfortunately I've got a variant of writer's block, which is a bit like having a big term paper due and being unable to start because you're so stressed out.

I've been spinning my wheels here so long that I've exhausted my coping mechanisms.


February 22, 2019

How is this page's bit on toybox wrong, let me count the ways:

The Toybox license, referred to by the Open Source Initiative as the Zero Clause BSD license,[7] removes all conditions from the ISC license, leaving only an unconditional grant of rights and a warranty disclaimer.[8] It is also listed by the Software Package Data Exchange as the Zero Clause BSD license, with the identifier "0BSD."[9]

It's not important that it's from toybox, other projects use it too. It was the OpenBSD suggested template license and I got Kirk McKusick's permission to call it zero clause BSD. IT doesn't remove _all_ conditions, it removes half a sentence. And SPDX approval came long before OSI, so a better phrasing would be:

The Zero Clause BSD license [7] (SPDX identifier "0BSD"[9]) removes half a sentence from the OpenBSD suggested template license [https://www.openbsd.org/policy.html], leaving only an unconditional grant of rights and a warranty disclaimer.[8]

Anybody want to edit wikipedia[citation needed] to fix this?


February 21, 2019

Still deeply burned out.

VirtualBox's .vdi files provide "sparse" block devices that grow as you use more space in them (up to the maximum size specified at creation time). The ext4 filesystem assumes any block device it's on might be flash under the covers, and attempts to wear level them via round-robin allocation.

Guess how these two interact! Go on, guess!

I set up a new VM, and because my previous one ran out of space I was generous about provisioning it, thinking it would only use the space when it actually needed it. After deleting two other VMs and a DVD iso and trying to figure out why a VM using 60 gigs in the virtual Linux system was consuming 160 gigs on the host...

I had a BAD DAY. And now I need to redo the VM from scratch because even if I could shrink the ext4 partition (the resize tool can grow them while mounted, but not shrink them), I dunno how to tell the emulator to give back the space it would stop using...

Darn it, I was excited about this, but no. The person who pointed me at it said it was a bash test suite that might help me with toysh being a bash replacement. But the readme didn't say what to _do_ to run th bash tests. I figured out that bin/bats with the thing to run, but its output with no arguments was useless and --help didn't really help either. I eventually figured out "bin/bats test" but then it only ran 44 tests and they tested the test suite, not the shell?

At which point I figured out that it's not a shell test, it's test plumbing written _in_ bash. That's useless, I've written and _published_ 2 sets of test infratructure in bash myself already (one in toybox, one in busybox). That's uninteresting, what's interesting is the _tests_, and this has none. And it's doing the "#!/usr/bin/env bash" thing which is INSANE: why do you trust /usr/bin/env to be there at an absolute path? Posix doesn't require that. Android (until recently) didn't even have a /bin directory. It's /bin/bash even on weird systems like MacOS X. The ONLY place that installs it but puts it somewhere else is FreeBSD, and that's FreeBSD-specific breakage. It's a fixable open source system: drop a symlink and move on. (Just like we all fix /bin/sh pointing to the defective annoying shell on debian.)


February 20, 2019

Upgrades to the su command came up recently, and it's been on my todo list forever: if you want to run a command as an arbitrary UID/GID, it's kinda awkward to do so with sudo or su because both conventionally want to look up a name out of /etc/passwd, and will error out on a uid with no passwd entry even for root. But these days with things like containers, there's lots of interesting UIDs and GIDs that aren't active in /etc/passwd. (And then there's the whole android thing of not having an /etc/passwd and using their version of the Windows Registry instead, because keeping system information in human readable text files is too unixy or something....)

So anyway, I want su -u UID and su -g GID[,gid,gid...] to work, at least for root. And I want to be able to run an arbitrary command line without necessarily having to wash it through a random command shell. And _implementing this is fairly straightforward. No the hard part is writing the help text to explain it, especially if I've kept compatibility with the original su behavior.

A word on the legacy su behavior: way back when setting a user's shell in /etc/passwd to /bin/false or /dev/null was a way of preventing anybody from running commands as that user. Then su grew -s to override which shell you were running as, so this stopped working from a security standpoint. (Besides, if you were running as root you could whip up a trivial C program to do it anyway, but the point was _su_ no longer enforced it.) And it let you specify -c to pass a command line to that shell so su could "run a command as a user" instead of being interactive, so this ability is already _there_ for most users, just awkward to use.

But su has an awkward syntax where it runs a shell and unrecognized options are passed through as options _to_the_shell_. (So the -c thing was kind of accidental at first.) So using su as sudo isn't just "su user ls -l", it's su user -s $(which ls) -l if you don't want to invoke a gratuitous shell in between. And defining new su options means they _don't_ get passed through to the shell.

What would have made sense was a syntax like xargs, where the first command that's not an option stops option parsing for the wrapper. But that's not what they did back circa 1972...


February 19, 2019

Burnout. So much burnout.

When I came to this job a year ago, I was interested in the technology. I was helping get realtime Linux running on an sh4 board. (The larger context was they shipped a Windows CE product back in the 90's, and Windows CE was being end of lifed by Microsoft. So this Microsoft shop was switching to Linux, which I'm all for and happy to help with. As for the sh4 boards, they had a bunch of this hardware installed at customer sites, and a large stock of part inventory to make more boxes with at the factory, so getting Linux running on those was useful to them.)

Coming _back_ in January was because the money was good, it was easy to just keep going, I didn't have another job lined up, and we've still got about half the home equity loan to pay off from riding down SEI.

But this time... they've already built up a reasonable Linux team (including people I know like Rich Pennington of ellcc and Julianne Haugh of the shadow password suite), all the new work is on standard x86 and arm boxes with gigahertz and gigabytes, they're using wind river's fork of yocto's fork of openembedded with systemd ru(n|i)ning everything, the application is still dot net code running on mono talking to a windows GUI app...

And I'm not entirely sure what I'm doing. Not "I don't know how to do this", I mean what am I trying to accomplish? What is this activity _for_?

I'm part of an enormous team where we have over a dozen people in a room for an hour twice a week going over excel spreadsheets reacting to comments on the design of things like "background file transfer" (strangely not rsync) which is somehow a 12 week project for over a dozen people, told "this is what you're doing this week" more or less via the waterfall method. There's an API document, an implementation of this API via gratuitous translation layer with a management daemon using dbus to talk to systemd, and then functions you plug in for a given architecture that the guy who wrote the daemon could have done in a couple hours.

I think this has turned into a "bullshit job". And I am unhappy. The money remains excellent, but... that's pretty much it.


February 18, 2019

If I titled blog posts, this one would be "Tabsplosion is a symptom of overload".

When I say "that's on the todo list", I'm fudging a bit. The toybox todo list does indeed have a todo.txt. And a todo.today. And a todo2.txt, todo3.txt, todo/*.txt, and various commandname.txt files with notes on individual commands.

My toybox work directory (for a couple years now) is ~/toybox/toy3, following my convention of doing a git checkout in a directory with the name of the project, so various debris that doesn't get checked into git has someplace to lib. This _starts_ as ~/toybox/toybox and there's a ~/toybox/clean for testing that I've committed sane chunks and it builds properly. Eventually so much half-finished cruft builds up in my work directory I clone a clean one and do some "needs to happen fast" project in there, and keep the old one around in hopes of salvaging the old work. (Which, as with viewing bookmarked pages again, never happens. This is why I have so many open tabs, there's a _chance_ I'll get back to the todo item it represents.)

This is how I wound up with toy3. (And in fact a toy4 and toy5 that didn't stick.) Those other directories have their own todo files in them. (Much of which overlaps, but not all.)

And then there's ~/toybox/pending which is full of things like a checkout of Android's minijail, libtommath, jeff's fixed point library from the GPS stuff we did, my old dvpn code (from like 2001), the rubber docker containers tutorial I attended at linuxconf.au, a CC0 reference implementation of sha3, snapshots of this and this in case the pages go down, and so on. The todo item is implicit in a lot of those. :)

I also have old tweet threads and blog entries and such that I should collate at some point. A lot of my todo items point to those.

As for the topic sentence, my todo list grows fastest when I don't have time to follow the tangents right now. So I make a note of it and move on.


February 17, 2019

The bus back from Minneapolis left at 9:25pm, and was supposed to get in at 3:30 am but got in at 4am.

I'm still using the giant System76 laptop from 2008, which is 6 years old but has 16 gigs of ram and 8x processor and a terabyte hard drive and is fairly reasonable now that I've gotten a new battery for it, except for 2 things. It's still fairly ginormous, and the hard drive is rotating media so I'm nervous using it in a high-vibration environment. Such as on my lap on a bus for 6 hours, even when there is a working outlet.

A coworker at Johnson Controls (Julianne Hough, the long-ago author of the Shadow password suite) has a "laptop" that's a tablet with a case and keyboard. Except it's a mac. I want an Android device that does that (and in theory I can get add a 128 gig sd card to however much built in storage the sucker has so I should be able to get something reasonable), but every time I actually buy something it's a cheap clearance device like the annual Amazon Fire tablet sales during "prime day", and they're so locked down that it's just not worth the effort to crack them. This is a structural problem: what I'm trying to to with toybox is turn android in a usable general-purpose computing environment you can actually use as a development workstation more or less out of the box, but they're terrified of the "evil butler" problem. (Which isn't _just_ a tablet problem, EFI TPM nonsense does this for PCs, there are periodic LWN articles on that.) You should be able to aftermarket blank 'em, but how you distinguish that from "an organized crime organization like the NSA or GOP sent a dude into your room for 30 minutes while you're at dinner and now your device serves them not you until they decide to assasinate you".

Sadly, I haven't installed devuan on the other System76 oversized monstrosity because firmware nonsense and too busy to care. I got email from System76 that they've introduced a laptop to their lineup that _isn't_ visible from space, but I don't trust them. If buying System76 _doesn't_ mean I can just slap an arbitrary Linux distro on it because it's full of magic firmware that never went upstream, what's the _point_? If I have to install a magic distro-specific Linux distro fork, I might as well get a GPD Pocket or something.


February 16, 2019

Hanging out with Fade in Minneapolis. I have deployed heart-shaped butterfingers at her. (It's her favorite candy bar, and there was a sale.)

Yay, the gitub pull request adding 0BSD to the license chooser got merged!

This means I have developed just enough social skills to disagree with someone about how to help without pissing them off to the point they no longer want to help! (Although it's still a close thing, I wouldn't say I'm _good_ at this. I'm still far too easily irritated and have to really _push_ to compromise. (In this case that would mean swallowing my principles and editing a wikipedia page directly.)


February 15, 2019

There are over 100 toybox forks on github. I did not expect that. Hmmm... The most forked of which just added a logo and half an "rdate" command, back in 2016...

The downside of 0BSD licensing is when you find a nice patch in an external repo that wasn't submitted upstream (or if it was, I missed it), I'm nervous about merging it because forks of toybox are not actually required to be under the same license.

In this case the repo it's checked into still has the same LICENSE file and no notes to the contrary, and I can probably rely on that, but I'm still nervous and like to ask. Submissions ot the list mean they want it in, which means it has to be under the right license to go in. The submission _is_ the permission grant, the specific wording is secondary.

The intent of the GPL was to force you to police code re-use: if you accidentally sucked GPL code into your project, you had to GPL your project. (In reality you just as often had to remove it again and delete the offending version, as Linux did with the old-time unix allocation function Intel contributed to the Itanic architecture directory back during the SCO trial. Solving infringement via a product recall and pulping a print run has plenty of precedent.)

Then GPLv3 hapened and "the GPL" split into incompatible versions, and suddenly you had to police your contributions just as hard, your GPLv2 or later project couldn't accept code from GPLv3 or GPLv2-only sources, and the easy thing to do was break GPLv2-only. These days there's no such thing as "The GPL" anymore, thanks to the FSF. "The GPL" fragmented into three main incompatible GPL camps (GPLv2 and GPLv3 can't take code from each other, and the dual license of "GPLv2 or later" can't take code from either one), and then there's endless forks like Affero GPL complicating it further. This means there is no longer a "universal receiver" license covering a united pool of all copyleft code into a single common community of reusability, which is why copyleft use has slowly declined ever since GPLv3 came out. These days with GPL code you have to police in both dirctions, incoming _and_ outgoing code.

0BSD goes the other way from the glory days of "The GPL": you have to be careful about accepting contributions (and I'm more paranoid than most about that, having been involved in more copyright enforcement suits than any sane person would want). But what that buys you is the freedom for anyone wanting to reuse your code elsewhere to just do it, whenever and wherever however they like. No forms to fill out, no signs to post, have fun. They don't even have to tell me if they did it. (The internet is very good at detecting plagairism, I'm not worried about that.)

A fully permissive license holding nothing back is the modern equivalent of placing the code into the public domain. The berne convention grants a copyright on all newly created works whether you want it to or not (the notice is just for tracking purposes of _who_ has the copyright, so you're not in the "the original netcat was written by 'hobbit', how do I get in touch with 'hobbit' or their estate?"), but there's no enabling legislation for disposing of a copyright. You can't STOP owning a copyright, except by transferring it to someone else.

And thus the need for public domain equivalent licensing. You can't free(copyrght) but you can work out a solution.


February 14, 2019

Date is funky. The gnu/dammit date didn't implement posix, and busybox gets it wrong. Time zones changing names because of daylight savings time.

Testing day of the week. Found a hack. Coded it up. Went to test it.

$ ./date -D %j -d 1
Sun Jan  0 00:00:00 CST 1900
landley@halfbrick:~/toybox/toy3$ busybox date -D %j -d 1
Thu Feb 14 00:00:00 CST 2019

Sigh.

The C API for this is kinda screwed up too, although we need a new one that handles nanoseconds anyway.


February 13, 2019

The biggest sign that "const" is useless in C is that string constants have been rodata forever, but their _type_ isn't because that would be far too intrusive.

Putting "const" on local variables or function arguments doesn't affect code generation (which has liveness anaysis). It can move globals from the "data" segment to the "rodata" segment, which is nice and the compiler doesn't get without whole-tree LTO because the use crosses .o boundaires, but everywhere else it just creates endless busywork propagating a useless annotation down through multiple function calls without ever affecting the generated code.

I periodically recheck on new generations of compiler to see if it's _started_ to make a diference, but I don't see how it can because liveness analysis already has to happen for register allocation/saving/restoring, and that covers it better than manual annotation can? In this respect "const" seems like "register" or non-static "inline", ala "Ask not for whom ma bell tolls: let the machine get it".

Sadly, even though I do add "const" to various toybox arrays to move them into rodata, the actual toy_list[] isn't const because sticking "const" on it wants to propagate down into every user through every function argument (otherwise it's warning city and in fact errors out about invalid application of sizeof() to incomplete types when I all did was add "const" in two places).


February 12, 2019

Phone interview with the side gig, I'd get to poke at a new architecture (we are the knights who say nios) which qemu has a thing for! But no musl support for it, and Linux support is out of tree? Really? (A whole unmerged architecture that people are still using?) It's frustrating there's no easy way to get qemu-system-blah to tell you what it provisions a board emulation with. (How much memory, I/O controllers, disks, network, USB...)

It would be nice if "qemu-system-nios -M fruitbasket --whatisit" could say these things. The board has to _know_ them, somehow. Maybe through the device tree infrastructure? I might try to teach it, but all my previous qemu patches languished unmerged for years. Not worth the effort.


February 8, 2019

Very very tired. Went off caffeine monday but it's 4 days later and still tired. Burned out, half days yesterday and today.

I turned down a job in Minnesota a recruiter offered me. 20% less money isn't a deal breaker, but... they're not on the green or blue lines? It's an hour and half each way to Fade's via public transit (green line, bus, then walk) so I'd need to get an apartment near the work site to avoid a longish commute from the university (and Fade), and they're in some sort of suburban industrial park where there are family houses but no efficiency apartments? And this employer moves to seattle in june anyway.

Contracting company at the recruiter I got the JCI job through wants me to skype with somebody for evening and weekend jobs. It would pay off the home equity loan faster...


February 6, 2019

I'm trying to build Yocto in a fresh debootstrap. You'd think this would be documented, but it's a bit like the "distros only build under earlier versions of itself" problem, because Yocto is a corporate pointy-haired project and Red Hat is Pointy Hair Linux.

As a first pass I want to run a yocto instance under qemu, but when I downloaded it yocto wanted me to install a bunch of packages like "makeinfo" that I don't want on my host system. Hence debootstrap chroot.

So install debootstrap (I used apt-get on ubuntu), then the wiki instructions say the setup is:

debootstrap stable "$PWD/dirname" http://deb.debian.org/debian/

Where "stable" is the release name, next argument is the directory to populate, and the third is the repository URL to fetch all the packages and manifest data from.

So clone yocto (git clone git://git.yoctoproject.org/poky), checkout the right branch (current stable appears to be "thud"), and then "source oe-init-build-env" and...

mount /proc /sys /run
apt-get install locales &&
locale-gen en_US.UTF-8 &&
su - user &&
cd /home/poky && 
source oe-init-build-env &&
LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 bitbake core-image-minimal
  echo en_US.UTF-8 UTF-8 >> /etc/locale.gen &&
locale-gen &&
update-locale LANG=en_US.UTF-8

What on earth is a uninative binary shim? All I can find is this and it's at best "related". It's downloading a binary it has to run on the system, and can't build from source. So much for building yocto on powerpc or sh4 or something. Thanks yocto!

Python 3 refuses to work right if you haven't got a UTF8 locale enabled, and yocto's bitbake scripts explicitly check for this and fail... but don't say how to fix it. So I read the python docs and downloaded the python 3 source code. Python's getfilesystemencoding() is calling locale.nl_langinfo(CODESET) (at least on unix systems), which comes from langinfo_constants[] in _localemodule.c in the Python3 source...

Right, you have to install the "locales" package, then run locale-gen, but the online examples showing how you can feed it a locale on the command line are wrong (including the one in the "Setting up your choot with debootstrap" section of the ubuntu wiki), it ignores the command line, you have to edit the locale.gen file to add the locale you want, then you need to update-locale to get it to use it, and THEN you can set the LC_ALL envornment variable.

Darn, yocto's parallism ignores "taskset 1 cmdline...". It's building on an 8x SMP machine so it's trying to do 8 parallel package downloads through phone tethering, and the downloads keep timing out and aborting. Hmmm... Google, google... It's bitbake controlling this, I can set the environment variable BB_NUMBER_THREADS to the number of parallel tasks.

Ok, core-image-minimal is currently building gnome-desktop-testing and libxml2. I object to 2 of the 3 words in this target name. I'll give them "image". Yeah, I accept that this is probably an image. But gnome-desktop-testing is neither core, nor minimal.


February 5, 2019

Doing release cleanup on sntp.c I hit the fact that android NDK doesn't have adjtime(). Grrr. I dowanna add a compile-time probe for this, and unfortunately while I have USE_TOYBOX_ON_ANDROID() macros to chop out "a" from the optstr, I never did the SKIP_TOYBOX_ON_ANDROID() macros (only include contents if this is NOT set) because I haven't needed them before now.

Sigh, I can just #define adjtime to 0 in lib/portablity.h. It's a hack, but android isn't using this anyway (they presumably set time from the phone baseband stuff via the cell tower clocks, not via NTP). It doesn't make the whole code stanza drop out like making FLAG(a) be zero would (then the if(0) triggers dead code elimination), but... I wanna get a release out already, it was supposed to happen on the 31st.


February 4, 2019

Ok, toybox release seriousness. What do I need to finish up to cut a release...

SNTP is the main new command and I've already used the "Time is an illusion, lunchtime doubly so" Hitchhiker's Guide quote. Oh well.

I've got an outstanding todo item from the Google guys about netcat, but it's a bug I found so I haven't quite been prioritizing it. (As in nobody else reported this bug to me, so it's not holding anybody else up.) Still, I got the ping (once they know about it, they wanted it fixed)...


February 3, 2019

Greyhound topped itself on the bus ride back to Milwaukee. Of course it left most of an hour late, when we got on it hadn't been cleaned (my seat's drink holder had an empty coke bottle in it), for the first time in my experience they checked photo IDs (and the woman behind me couldn't get on the bus because she hadn't brought hers, bus left without her), somehow 2 stops later every single seat was full even though they'd left a through-to-chicago passenger behind at the first stop, the outlets didn't work for the first 2 hours, and the heat was stuck on full the entire time and was somewhere over 80 degrees. (Eventually they opened the emergency exits on top of the bus and left them open so we wouldn't die, but it was never comfortale.) Around 6pm the bus tracker web page decided that before the next stop our bus would travel back in time to 1pm and continue on to retroactively reach chicago around 3:30 pm (going something like 200 miles per hour along that part of the route), and we were kind of looking forward to it by that point but alas, we were disappointed. Then they switched drivers in Madison, and the new driver started heading south straight to Chicago and had to BACK UP to go to Milwaukee when enough people checking Google Maps noticed and yelled at them. Over the intercom the driver claimed to have "missed an exit", and threatened to pull over and let anybody who complained out on the side of the road (we were in Janesville at that point, 40 miles south along I-90), and then drove back north (reconnecting with I-94 at Johnson's Creek) instead of taking I-43 diagonally to our destination. According to phone speedometer apps, on the trip north (along non-interstate roads) the bus sometimes got up to 55 miles per hour, but averaged less than that.

Still, I arrived in to Minneapolis only 2 hours late. Not my worst greyhound trip, but still memorable. (Beats the trip _to_ minneapolis where the driver intentionally triggered feedback on the intercom six times and said "wakey wakey" between each one as we got in around 1:30 am. I'm told Greyhound was an oil company ploy to discredit travel by bus and encourage individual driving instead. Given that the "buy up the busses and destroy them to promote freeways" plot in "Who Framed Roger Rabbit" is the part of the movie based on real events (in our world, they did it and won)...

There's also a significant element of "punishing people for being poor" going on here. I'm taking the bus not just because it's cheaper, but because between the shortage of direct flights from milwaukee to minneapolis gives me a lot fewer departure options, and even with a direct flight the "arrive 2 hours early at an airport many miles south of town" plus the minneapolis airport requiring multiple transfers to get Fade's apartment via public transportation (meanwhile greyhound is right on the Green Line, which lets off about 500 feet from Fade's apartment)... end result is the bus gets me there about as fast as flying, and if I'm lucky I can work the whole way. The bus terminal's a 15 minute walk from work without having to opt out of the Porno-Scanners for the Freedom Grope.

But there's very very strong signaling "this is for the Poors, you shouldn't be here if you have any other choice, we punish you now"... ("We" being "republicans", which is a "we" I personally am very much NOT a part of even when I'm not hanging out with the tired poor huddled masses yearning to breathe free that they despise so much.)


February 1, 2019

Our story so far: I got the record-commands plumbing checked into toybox and hooked up to mkroot, and along the way I found and fixed a sed bug that was preventing commands from building tandalone with toybox in the $PATH. (The regex to figure out which toys/*/*.c file this command lives in was returning empty, because -r wasn't triggering.)

So I fixed that, got the record-commands wrapper hooked up, built everything, and... all the targets built? Except I just fixed _sed_ and I knew the kernel build break was a _grep_ bug because replacing the airlock's grep symlink with a link to the host's grep made the build work! (I often do "what commands changed recently" guesses like that before trying to narrow it down systematically...)

Sigh. I pulled linux-git to a newer version so I'm not quite testing the same kernel source, or was it 4.19 or 4.20 I was testing? I hate when things start working again when I DIDN'T FIX THEM, it just means I lost a test case and whatever loose flakiness it revealed is still there but has gone back into hiding. It's possible switching grep versions changed something that got fed into sed, but that's still a bug: the output should be the same.

Darn it, now I've got to waste time figuring out how to break it again the right way.


January 31, 2019

Bus to Minneapolis so I can spend my birthday tomorrow with Fade.

I emailed Linus about arch-sh not booting, he pointed me at a pending fix that hadn't quite made it into mainline yet, and I confirmed it fixed it for me, but oddly lkml.iu.edu has both my emails but not Linus's in between?

Yesterday's toybox build break wasn't a grep bug, it was a sed bug, which broke toybox building anything with toybox in the $PATH. (The regex to figure out which toys/*/*.c file this command lives in was returning empty, because -r wasn't triggering.) Apparently I haven't got a tests/sed.test that checks "does -r do anything".


January 30, 2019

It's -20F out. The expected high is -7. I got permission to work from home today. (Mostly poking at yocto and going "huh".)

There's some sort of bug in grep that's breaking the kernel build, but I haven't reduced it to a test case yet, and what I used to use for this sort of thing in aboriginal linux was my old command line logging wrapper. So I spent most of a day getting the command line wrapper logging merged into toybox and integrated into mkroot, and... the toybox build is broken by the same grep bug, which means the logging wrapper install won't work in the context of the airlock (I.E. I can't build toybox with toybox in the $PATH, due to the bug I'm trying to _diagnose_).

Going back to bed.


January 28, 2019

It's too cold. And we have 8 inches of snow. My normal 20 minute walk to work (12 if I hurry) took 35 minutes today, including helping push a stuck car out of an intersection (along with a snowplow driver who got out to push on the other side).

When I got in only two coworkers I recognized were here. I'd go home early, but I'm already here and outside is the problem.


January 26, 2019

Busy week at work, wasn't sleeping well. Meant to spend today working on toybox release, but spent it recovering instead.

The big overdue thing at work is "timesync", which is where the SNTP stuff comes in. Back in late October we tried to figure out how the box keeps its clock up to date: it was close enough to just doing standard NTP that people had glossed it over as NTP... but not quite.

First of all, it's using SNTP ("Simple Network Time Protocol"), which is a subset of the NTP protocol (same 48 byte UDP packets with fields in the same place) that oddly enough has its own set of RFCs, and then in NTPv4 it all got bundled into one big SNTP+NTP RFC that's more or less illegible. So I went back to the earlier ones and am pretty much just implementing the old stuff and asking wikipedia[citation needed] whether it's safe to ignore whatever they changed.

An SNTP client can read data from an NTP server (it just doesn't care about several of the fields), but an NTP client can't read from an SNTP server (the fields SNTP doesn't care about are zeroed), and windows "NTP servers" tend to be SNTP. So if you use the Linux NTP client with a windows server, it doesn't work. (That took a while to figure out, and started us down this whole tangent.)

The box needs to be able to act as an sntp client (sntp not ntp because some exiting installs use the windows server), and it needs to be able to act as an ntp server (possibly sntp would be good enough because the downstream boxes are also running our software, but nobody seems to have _written_ an sntp server for Linux, because full NTP server works for SNTP client). And then it's got multicast.

Multicast? Yeah, there's a multicast variant in the sntp RFC, and JCI implemented it in old stuff (back in the 90's), but it's not working for some reason and it's .NET code which is a language I don't know (which isn't entirely a blocker but does slow me down) and which I haven't got a build environment for (which is the real blocker). And the ISC reference implementation in C doesn't appear to do multicast (because it's not 1996 anymore).

Note: Napster pretty much killed off Multicast starting around 1999. No podcasts use multicast. Youtube, Netflix, Hulu, and Amazon Prime do not use multicast. The original use case for multicast was "all that" and when it arrived it didn't, which means there isn't really a use case out there for it. The Mbone shut down years ago. Wikipedia[citation needed] says it's still used inside some LANs to do hotel televisions and stuff, but it's not routed through the wider internet anymore, and there really isn't a modern userbase for it, just the occasional LAN-local legacy install.

Instead we got MP3 and MP4 compression which shrinks data to 1/10 of its original size but means a single dropped packet is fatal. (As you can see with HDTV broadcasts "smearing" when the signal is marginal; and that's with a lot of effort put into implementing recovery!)

But JCI wants multicast because the old one they're replacing did multicast and they want to sell the Linux image as a strict upgrade to the WinCE image on the same hardware, without a single dropped feature. And long long ago their salesbeings pushed multicast as a Cool Thing We Can Do. So I wound up reading the RFC and writing a new one in C.

P.S. Although there isn't a Linux SNTP server, there _is_ a Linux SNTP client. It's one of the binaries the ISC source tarball _can_ build, but generally doesn't. I'm trying to convince buildroot to enable it. I suspect this was last tested by an actual human a decade ago, but we'll see...


January 23, 2019

Added multicast support to the sntp stuff. Should probably not name the multicast enabling function leeloo_dallas() but I've had enough sleep deprivation lately that's the sort of name I'm using. (Look, my brain takes the word "multicast", sticks a fifth elephant reference on the front and sings the whole thing to camptown races (doo dah, doo dah). When I'm tired enough this sort of thing leaks out into the outside world.)

All the config is on the command line: if you "snmp 1.2.3.4" it queries the server, prints the time, and how off the current clock is. Adding -s sets it, -a sets it via adjtime().

I initially had it so you could list as many servers as you liked on the command line and it would iterate through them, but if it switches between ipv4 and ipv6 I'd have to reopen the socket and I dowanna.


January 20, 2019

Ok, I need record-commands from Aboriginal Linux (which is built around wrappy.c), and rather than just dumping them into scripts/ I want to break that up into make/ and tests/harness...

Except that directory also has bloatcheck and showasm (halfway between build and testing), and mkstatus.py which generates documentation (is that build?) and I have a todo item to split up make.sh into a script that generates the headers and a script that builds the .c files. I think all the second half of make.sh is using from the first half is the do_loudly() function (which turns a command's output into a single dot unless V=1 is set)...


January 19, 2019

Working on sntp, and FreeBSD build/testing.


January 18, 2019

Darn it, poking at mkroot and I updated toybox to current git and swapped in "test" with the newly promoted toybox version, and the Linux kernel build is breaking on all architectures. And it's a funky one too, even on a -j1 build it goes:

  LD      vmlinux
  SORTEX  vmlinux
  SYSMAP  System.map
make: *** [vmlinux] Error 2

That provides no information about what went WRONG! Thank you make.

Which means I need to dig up my old command line wrapper from Aboriginal Linux; I should probably stick it in the toybox scripts/ directory, except that's geting pretty crowded with build and test infrastructure. (I provide make wrappers as a gui and "make help" lists the options but DEPENDING on make is uncomfortable, it would be nice if running stuff directly was easy to not just do, but figure out at a glance...)

I should split scripts/ up somehow. I can move the make stuff into a make/ subdirectory, but then scripts/ isn't all the scripts so shouldn't be called that. The problem is "tests" is a bunch of *.test files, one per command, and I'd like to keep that accessible and clean. It's already got a tests/files directory under it that's a bit awkward, but manageable. I could put tests/harness under there with the infratructure part, but then running it would be tests/harness/runtest.sh which is awkward. I could put "harness" at the top level but then it's much less obvious what the name means. Hmmm... tests/commands/sed.test? A top level tests directory with _three_ things under it?

Maybe I should add symlinks to the top level, ./make.sh and ./test.sh pointing into the appropriate subdirectory where the infratructure lives...

Sigh. Naming things, cache invalidation, and off by one errors remain the two biggest problems in computer science.


January 17, 2019

Human reaction time is measured in milliseconds, plural. A 60fps frame rate is a frame every 17 milliseconds. Computer reaction times are measured in nanoseconds. A 1ghz processor is advancing its clock once per nanosecond.

Those are pretty much the reason to use those two time resolutions: nanoseconds is overkill for humans, and even in computers jitter dominates at that level: DDR4 CAS latency's like 15 nanoseconds, an sh4 syscall has an ~8k instruction round trip last I checked, even small interrupts can flush cache lines...) Meanwhile milliseconds aren't enough for "make" to reliably distinguish which of two files is newer when you call "touch" twice in a row on initramfs with modern hardware.

64 bits worth of milliseconds is 584 million years, so a signed 64 bit time_t in milliseconds "just works" for over 250 million years. Rich Felker complained that multiplying or dividing by 1000 is an expensive operation (doesn't boil down to a binary power o 2 shift), but you've already got to divide by 60, 60, and 24 to get minutes, hours, and seconds...

Using nanoseconds for everything is not a good idea. A 32 bit number only holds 4.2 seconds of nanoseconds (or + or - 2.1 seconds if signed), so switching time_t to a 64 bit number of nanoseconds would only about double its range. (1<<32 seconds is just over 68 years, 1970+68 = 2038 when signed 32 bit time_t overflows. January 19 at 3:14 am, and 7 seconds.)

Splitting time_t into a structure with seperate "seconds" and "nanoseconds" fields is fiddly on two levels: keeping two fields in sync (check nanoseconds, then check seconds, then check nanoseconds again to see if it overflowed between the two and you're off by a second), _and_ the fact that you still need 64 bits to store seconds but nanoseconds never even uses the top 2 bits of a 32 bit field, but having the seconds and nanoseconds fields be two different types is really ugly, but guaranteed wasting of 4 bytes that _can't_ be used is silly, but if you don't a 12 byte structure's probably going to be padded anyway...

And computers can't accurately measure nanoseconds: A clock crystal that only lost a second every 5 years would be off by an average of over 6 nanoseconds per second, and that's _insanely_ accurate. Crystal oscillator accuracy is typically measured in parts per million, each of which is a thousand nanoseconds. A cheap 20ppm crystal is off by around a minute per month, which is fine for driving electronics. (The skew is less noticeable when the clock is 37khz, and does indeed produce that many pulses per second, and that's the common case: most crystals don't naturally physically vibrate millions of times per second, let alone billions. So to get the fast rates you multiply the clock up (double it and double it again), which means the 37000.4 clock pulses per second becomes multiple wrong clock pulses at the higer rate.

The easy way to double a clock signal is with a phase locked loop, a circuit with a capacitor and a transistor in a feedback loop that switches from "charging" to "discharging" and back when the charge goes over/under a threshold, so it naturally swings back and forth periodically (which is trivial to convert to a square wave of high/low output as it switches between charging and discharging modes). The speed it cycles at is naturally adjustable: more input current makes it cycle faster because the capacitor's charging faster, less current makes it cycle slower. If you feed in a reference input (add an existing wave to the input current charging the capacitor so it gets slightly stronger/weaker), it'll still switch back and forth more or less constantly, but the loop's output gradually syncs up with the input as long as it's in range, which smooths out a jittery input clock and gives it nice sharp edges.

Or the extra input signal to the PLL can just be quick pulses, to give the swing a periodic push, and it'll sync up its upswing with that too. So to double a clock signal, make an edge detector circuit that generates a pulse on _both_ the rising and falling edges of the input signal, and feed that into a phase locked loop. The result is a signal switching twice as fast, because it's got a rising edge on _each_ edge of the old input signal, and then a falling edge halfway in between each of those. Chain a few doublers in sequence and you can get it as fast as your transistors can switch. (And then divide it back down with "count 3 edges then pulse" adder-style logic.

But this also magnifies timing errors. Your 37khz clock that's actually producing 37000.4 edges per second becomes multiple wrong nanosecond clock ticks per second. (You're still only off by the same fraction of a percent, but it's a fraction of a percent of a lot more clock pulses.) Clock skew is ubiuitous: nno two clocks EVER agree, it's just a question of how much they differ by, and they basically have _tides_. You're ok if everything's driven by the same clock, but crossing "clock domains" (area where a different clock's driving stuff) they slide past each other and produce moire patterns and such.

Eventually, you'll sample the same bit twice or miss one. This is why every I/O device has clock skew detection and correction (generally by detecting the rising/falling edge of signals and measuring where to expect the next one from those edges. Of course you have to sample the signal much faster than you expect transitions in order to find the transitions, but as long as the signal transitions often enough it lets you keep in sync. And yes this is why everything has "framing" so you're never sending an endless stream of zeroes and lose track of how MANY zeroes have gone by, you are periodically _guaranteed_ a transition.).

Clock drift isn't even constant: when we were working to get nanosecond accurate timestamps for our syncrophasors at SEI, our boards' thermally stabilized reference clock (a part we special-ordered from germany, with the crystal in a metal box sitting on top of a little electric heater, to which we'd added half an inch of styrofoam insulation to keep the temperature as constant as possible and then put THAT in a case) would skew over 2 nanoseconds per second (for a couple minutes) if somebody across the room opened the door and generated an _imperceptible_ breeze. (We had a phase-locked loop constantly calculating the drift from GPS time and correcting. And GPS time is stable because the atomic clocks in the satellites are regularly updated from more accurate atomic clocks on the ground. In the past few years miniature atomic clocks have made it to market (based on laser cooling, first demonstrated in 2001), but they're $1500 each, 17 cubic centimeters, and use 125 milliwatts of power (thousands of times the power draw of the CMOS clock in a PC; not something you run off a coin cell battery for 5 years).

Sigh. Working on this timing SNTP stuff, I really miss working on the GPS timing stuff. SNTP should have just been milliseconds, it's good enough for what it tries to do. In toybox I have a millitime() function and use it for most times. (Yes another one of my sleep deprivation names. "It's millitime()". And struct reg* shoe; in grep.c is a discworld reference. I renamed struct fields *strawberry in ps.c already though.)

Rich Felker objected that storing everything in milliseconds would mean a division by 1000 to get seconds, and that's expensive. In 2019, that's considered expensive. Right...


January 16, 2019

Sign. No Rich, that's not how my relationship with Android works. I cannot "badger Android until they fix this nonsense".

I have limited traction and finite political capital. Leading them with a trail of breadcrumbs works best, which means I do work they might find useful and wait (often years) for them to start using it. And I can explain _why_ I want to go in a certain direction, and what I hope to achieve, and make as compelling an argument for that vision as I can.

But often, they've already made historical technical decisions that then become load-bearing for third party code, and you can't move the rug because somebody's standing on it. And their response is more or less "that might have been a nice way to go way back when, but we're over here now".

I'm trying to clean out the rest of the BSD code so that they're solidly using toybox, and making it so they can use as much of "defconfig" as possible. If the delta between android's deployment and toybox defconfig is minimized, then adding stuff to defconfig is most likely to add it to android. (This maximizes my traction/leverage. But it's _always_ gonna be finite, because they're way bigger than me.)

This means work on grep (--color), mkfs.vfat, and build stuff. The macos (and now FreeBSD) build genericization helps, as does the android hermetic build stuff. (Getting them closer to being able to use my build infrastructure, although they haven't got make and don't like arbitrary code running in their build.)

It's a bit like domesticating a feral cat. Offer food. Then offer food in the utility room. Except instead of a feral cat, one of the biggest companies in the world has a large team of full-time employees that's been doing this for 20 years now (The "Android One" came out in what, 2007?) which is constantly engaging with multiple large teams of phone vendor developers, collectively representing a many-multi-billion dollar industry that on such a vastly different scale they can't even _see_ me.

I can't even afford to work full time on this stuff. I'm doing what I can. You wanna post your concerns on the toybox list, go for it.


January 15, 2019

Sigh, $DAYJOB needs sntp, so let's do that for toybox...

Reading RFC 4330 (well a half-dozen RFCs, this has had a lot of versions and the new ones have added useless crap that's more complexity than help). Oh great, this protocol doesn't have a Y2038 problem, it has a Y2036 problem. They have a 64 bit timestamp: the bottom 32 bits of which is fraction of a second (meaning they devote 2 bits to recording FRACTIONS OF A NANOSECOND), leaving them 32 bits for seconds... starting from January 1 1900. For a protocol designed in the 1980's. So they ate 2/3 of the space before the protocol was _designed_. That's just stupid.

Anyway, the common workaround is if the high bit's _not_ set then it wrapped, which buys another 60 years or so. Still utterly insane to design the protocol that way.


January 14, 2019

Exhausted. Not sure I slept at all last night, just lay awake in bed. Is it possible to get jetlag without changing time zones?

Back at work: spent most of the day going through a month of missed email. They assigned a number of issues to me.

Back in my apartment, the manager was happy to see me and had a desk and a bed in storage, and says he'll replace the gas stove with electric (yay!). They should really put some solar panels on this building. (They don't just go on the roof, you can put them down the sides of tall buildings too, you don't even have to worry about sweeping the snow off of those.)

Poking at patch.c because I got reminded of todo items. Trying to add fuzz factor, which was easy enough (and my design for it's better) but... there's no tests/patch.test, and I don't seem to have patches that _require_ fuzz factor lying around.

I _used_ to just throw new commands through Aboriginal Linux and the LFS build, which was applying lots of patches. I suppose I could dig through the repo there and find where I adjusted them to eliminate fuzz factor. (Because even though I ported toybox patch to busybox over a decade ago, they still haven't added fuzz support to it. There's a lotta that going around, where things I was planning to do ages ago still aren't done in various projects, and it ranges from crickets to insistence that status quo is perfect and we've always been at war with eastasia. (People declared busybox "done" at the 1.0 release, which was before the majority of my contributions and long before you could use it in a build environment. Thing didn't happen therefore shouldn't happen is a failure of imagination. As Howard Aiken said long ago you don't need to worry about people stealing your ideas. Heck, I've been trying to get people to steal my ideas for a very long time, in a Tom Sawyer "paint the fence" way so I don't have to do it myself.


January 13, 2019

Flight back to Milwaukee. Sigh. Conflicted, but... this is the path of least resistance, and I know I can do it. (Neither Google nor the phone vendors will pay me to do Toybox or the android self-hosting stuff, nobody's interested in mkroot (hardly anybody was intersted in aboriginal even after I got it building LFS), and I can't afford to just do open source all the time. Gotta pay the mortgage. (I should really try to at least pay off that home equity loan this time.)

Got a hotel. It's $130/night, that's more per week than my old efficiency apartment here cost in a month. I should try to get that back in the morning. (They hadn't rented it out last I heard, and it's paid through the end of the month since I have to keep paying for it until they rent it out or 60 days goes by.)

I wrote up a thing about how patches work, because somebody on the list asked. I should collect and index those somehow, I suppose...


January 12, 2019

I committed a fix:

> Which is the "mode" of the symlink, except that mode says the filetype _is_ a
> symlink and you can't O_CREAT one of them so it's gonna get _really_ confused...
>
> Try now? (I added a test.)

Except that's inelegant (race condition between dirtree population and this stat, filesystem can change out from under us change?) and we're _supposed_ to feed dirtree the right flags so the initial stat() is following or not following the symlink appropriately. Why is it not doing that in this case... Hmmm...


January 11, 2019

Broke down and told chrome _not_ to restore state, just let it forget all those todo items. So now I have one window with only a dozen or so open tabs, which can restart itself without wasting half an hour fighting with it every time I open my laptop. I give it a week.

I should really pack my suitcase...


January 10, 2019

The battery on my laptop no longer holds ANY charge. Unplug it and it switches off instantly. Serious crimp in my "wander out somewhere and program for a bit at a quiet table" workflow. Even when I go somewhere with an outlet (which I now feel guilty about because I'm costing the place money, even if it's only a few cents), it loses all context going there and going back. Complete reboot each time.

And convincing chrome NOT to reload 8 windows with 200 tabs each in them (maintain the todo item links but leave the tabs in "failed to load" state rather than trying to allocate 30 gigabytes of RAM and max out my phone tether for 2 hours) is a huge pain. Doing "pkill -f renderer" USED to work but now SOMETIMES works, sometimes causes tabs to hang (still display fine but I can't scroll DOWN and it won't load new contents in that tab, but I can cut and paste the URL to a new tab that WILL load it so the URL is retained which is all I really wanted), and sometimes randomly crashes the whole browser process. Even pointing /etc/resolv.conf at 127.0.0.1 while chrome starts up to force the resolve to fail no longer prevents the reloads, these days it just _delays_ its load; it tries to reload periodically and once it can reloads everything.

They keep "upgrading" chrome to make it a worse fit for my needs, and of course I can't stick with old versions because "security". (You can sing "cloud rot" to the tune of Love Shack.)


January 9, 2019

Looming return to milwaukee, starting to get paralyzed. Fade flies out tomorrow, although essentially it's tonight so early in the morning (she and Adverb are visiting family in California before heading back to minneapolis for the spring semester, both her sisters live there and I think more of her family is flying in for a reunion?)

I should get a plane ticket, but the TSA and air traffic controllers miss their first paycheck on Friday. Bit reluctant to fly with air traffic controllers considered "nonessential"... (Bit reluctant to _eat_ with FDA inspection considered nonessential.)


January 8, 2019

Visited the eye doctor for my 6 month follow-up. Not obviously going blind! Yay!

Eyes dilated, not a lot of programming today.


January 7, 2019

Wandering back to an open tab in which I have:

$ truncate -s $((512*68)) test.img && mkfs.vfat test.img && dd if=/dev/zero of=test.img seek=$((0x3e)) bs=1 count=448 && hexdump -C test.img

Which at the _time_ was the smallest filesystem mkdosfs would create. (The dd blanks some stuff that varies gratuitously between runs so I can diff two of them and see what changed when I resize the filesystem.)

But now I'm running a newer dosfstools version and it's saying that 512*100 is the smallest viable filesystem. And THAT is clearly arbitrary. Sigh, I should look up the kernel code for this and see what the actual driver says.


January 6, 2019

Rebuilt mkroot with linux-4.20 (after rebuilding the musl-cross-make toolchains with current musl). The s390x kernel wants sha256sum now.

Sigh. Throw another binary in the PENDING list of the airlock install in toybox/scripts/install.sh. (It's in the roadmap.)


January 5, 2019

Attempting to install devuan on the giant new laptop, because the ubuntu they stuck on it has systemd and it's possible I'd use a BSD first. Devuan is basically a debian fork retaining the original init system and with a really stupid over-engineered nigh-unmaintainable mirror overlay system written in python. (I have no idea why they did that last part, and hope it's merely a transitional problem.)

The System76 bios is "black screen with no output" until their ubuntu boots, which is kinda annoying. I guessed "reboot several times and hit escape and alt-f2 and so on a lot during said blackness" and eventually got a bios screen that let me boot from a USB stick.

Devuan's installer is really _sad_ compared to Ubuntu. What Ubuntu did was boot to a live CD, then run a gui app. That's basically copying the cutting edge knoppix technology from 2003 (which is 15 years ago now), and they've been doing it since... 2004 I think?

Devuan started with a menu of multiple install options (I have no clue here and cannot make an informed decision, STOP ASKING ME FOR INFORMATION I DO NOT HAVE YET), but all of them seem to go to a fullscreen installer with a font that's way too small for comfort, and no way to change it. Ok, soldiering on: it's freaking out that I used unetbootin to create the USB boot stick, promising a plague of locusts and possibly frogs if I continue. But it doesn't say how I SHOULD have created it, and it seems to be working fine, so I ignored it and continued.

It's refusing to provide binary firmware for the wireless card (iwlwifi-8265) because Freedom Freedom Blue Facepaint Mel Gibson. If a manufacturer was too cheap to put a ROM in their hardware and they expect the driver to load the equivalent data into SRAM, debian sits down in the mud and sulks. Great.

I think I've found where to get the firmware from debian, but "devuan ascii" isn't clearly mirroring any specific debian distro? (The previous ones were, the newest one... isn't.) The instructions say to put it in a "/firmware" directory on the USB stick, which seems separate from _booting_ from the USB stick...) All the devuan ascii docs say that all necessary firmware is bundled. Hmmm...

Ok, downloading the 4+ gigabyte "DVD" version of the devuan installer (for a complete offline install) to make a new USB stick from, and I should try to fish the firmware files out of the system76 ubuntu install before wiping it. (There's a certain amount of "should I use the 2 gb hard drive of the 1gb flash drive" for this install, I left the flash disk in because it's already there and I don't ever intend to use systemd ubuntu.)

This has already eaten all the time I allocated to poke at this.


January 3, 2019

Three days of rain and I've gotten nothing done. Barely left the house. I'm not recovered enough from seasonal affective disorder yet for the gloom outside not to put me in hibernation mode.

I was ok moving up to milwaukee in January from Austin, that was a discontiguous break and my internal clock did not adjust. But staying in milwaukee for 3 months while the days got shorter, _that_ screwed me up.

Partly it's that the sun coming up reliably knocks me out, because college. The last couple years at Rutgers were primiarly night courses due to governor Witless destroying the comp-sci program with stupid budget cuts so the lost _all_ their full-time faculty (including the head of the department; if you're denied tenure you _can't_stay_ past 5 years and they blanket denied tenure to everybody, and comp-sci had only peeled off of the physics department to become its own thing 4 years before the budget cuts...). This was the #2 most popular major on campus after "undecided" and everything had to be taught by adjuncts after their day jobs, and now you _couldn't_ complete it without lots of night classes. So I'd get home long after sunset and do more programming, then the sun would come up and I'd go "oh, didn't realize the time" and go to bed. (Which was fine if I didn't have to catch a bus to go back to class until 3pm or so.)

Now the sun coming up knocks me out. Being awake at night is fine... until the sun comes up. When my alarm's set at 6:30 am and the sun comes up over an hour later, getting up in the morning is a _problem_. And that sort of anchors the rest of it...)


January 2, 2019

Did a little research for the multicast doc in the ipv4 cleanup stuff.

Multicast failed to take off because improved compression schemes (like mp3 and mp4) greatly restricted storage and bandwith requirements of media while rendering partial delivery of data useless, and due to the widespread deployment of broadband internet via cable modem and DSL. The decline of multicast started in 1999 when Napster provided a proof of concept that distributing MP3 files via unicast could scale. RealAudio quickly lost market share to unicast media delivery solutions. These days Youtube, Netflix, Hulu, and Amazon Prime all use unicast distribution.

The decline started 20 years ago and the multicat mbone (which this address range was reserved for) essentially ceased operations about 15 years ago. The last signs of life I can find are from about 2003.

Multicast was never widely used, the range was allocated for growth that did not occur, and remaining users are treating it as a LAN protocol which could use any other LAN-local address range their routers were programmed to accept. Note also that LAN-local multicast was conserving bandwidth on 10baseT local area networks, and we have widely deployed cheap gigabit ethernet now (with 10gigE available for those who want to spend money).

Reserving 268 million IPv4 addresses for multicast, in 2019, is obviously a complete waste. We can put them back in the main pool.


Back to 2018