Rob's Blog rss feed old livejournal twitter

2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002

todo: Raspberry PI

June 1, 2016

Alright, it's been a MONTH since I should have put out a toybox release. I meant to before the Japan trip but didn't get around to it, and have been busy since, but it's time.

I set an Aboriginal Linux all-architectures build going with a snapshot of what I've got to get binaries, and I've started going through the git repo to do release notes. I keep thinking I haven't done that much, but there are a lot of commits to collate into release notes...

May 28, 2016

Reading the wget man page is weird: it starts by explaining to you how unix options work. As in "wget -drc" is equivalent to "wget -d -r c", and -- prevents anything after from it being interpreted as an option, and so on. And then it says:

The options that accept comma-separated lists all respect the convention that specifying an empty list clears its value.

And I went, "This is a convention?" So I tried ps, which considers -o "" an error, and mount, which considers -o "" to add nothing to the existing option list (but doesn't blank it).

Anyway, I'm _not_ having a ".wgetrc" file, not doing the -e option to execute random commands (pipelines exist, and I've needed to make a "circle" command to route a pipeline's output back to its input for over a decade).

As for progress indicator, when I submitted my old "count" program to busybox years ago it got bikeshedded into pipeline-progress, and sometime later "pv" showed up.

May 25, 2016

Todo list critical mass is a funny thing. I've had wget on my todo list forever, and Last Year Isaac Dunham mentioned he was doing a wget, but never sent me a patch.

A month or so back I finally got a wget submission, from Lipi Lee, which is... not up to my standards: it's full of hardwired array sizes (which generally aren't bounds checked), it doesn't escape spaces in URL names (the GET syntax cares), and so on.

I threw it in pending, then I got a SECOND submission adding https support via command line stunnel variant, which Isaac mentioned mentioned in his submission, so I started looking at cleaning up the first submission... and wound up rewriting it from scratch. (As you do.)

I'm doing wget because busybox had wget years ago and never implemented curl, and seemed to get away with it. There's no standard for wget OR curl, so six of one...

To test it, I fired up netcat to act as a sad little server, which didn't work because I was using netcat -l /bin/cat which immediately backgrounded because it was calling a command with stdin/stdout redirected to that command. What I wanted was netcat -l with no command. And here I'm thinking if _I_ can't get this right and I wrote it, how does anybody else have a chance, I really need to upgrade the help. But haven't figured out what it should say yet.

Going back to wget, I implemented the URL escaping behavior (modulo the question of whether domain names can have low ascii or % characters in them, I'm escaping the URL _before_ parsing it and I THINK that's ok...) and then I got to the User-Agent field, and obviously I put "toybox" there but... "toybox wget"? Version number? How would a command other than the multiplexer get access to the version number anyway? (Answer: it wouldn't and I'd need to redo the build a bit to make that available, which re-raises the question of where the version number should live when git describe doesn't return anything, because tarball builds _are_ supported and git is NOT a build prerequisite...)


May 22, 2016

Toybox development is the same "constant series of cleanup passes" thing I used to do with busybox, or at least it should be that way. This combines strangely with the "push to 1.0" I'm doing.

I recently added bugetgrgid() and bufgetpwuid() to lib/ because it turns out libc does _not_ cache any lookup values from /etc/passwd and /etc/group, not even the one it most recently looked up, so things like ps and ls constantly re-parse /etc/password and /etc/group, over and over, even if there's just one user being displayed.

I noticed this cursoring left and right through the "top" fields, which should change the sort order and redisplay, but not re-fetch the data from /proc faster than normal. Cursoring into the username field was _really_ slow, which implied that a significant chunk of the processing time was loading the names. (Of course only sort by name needs to load _all_ the names, the rest only do so when displaying.) And I confirmed, once I had the cache implemented cursoring around different sort types was smooth again but cpu time of top doing normal refreshes didn't go significantly down.

So the CPU time of top is from the get_ps() code, not the show_ps() code. I know because cursoring around to do a lot of displaying, now that the cache is in there, doesn't hugely affect top's CPU usage even though it's displaying 5 or 6 times per refresh.

As far as I can tell, the get_ps() overhead is mostly... proc. How does the other top implementation manage to do it in less CPU? No idea.

May 18, 2016

Dreamhost's "Let's Encrypt" integration actually seems to have worked, and https is working on the site now.

I did a much simpler root filesystem build script that downloads and verifies packages (but doesn't patch them), populates a root directory, and installs toybox+busybox+dropbear into it. Still not sure what Aboriginal Linux should look like going forward, but there's one way it could be simpler.

May 17, 2016

Starting in 1980, MIT had a famous introductory computer programming course series that tried to teach people how computers worked, hardware and software, from the ground up.

In 1997 they stopped teaching this course because they decided building a system from the ground up was no longer relevant. So they gave up, and decided instead to teach students how to poke at black boxes to get behavior out of them, without ever trying to understand how the hardware and software actually worked internally.

I think that's wrong.

On the software side, I did Aboriginal Linux because I wanted the smallest Linux system capable of rebuilding itself under itself, so you could understand what all the pieces do. On the hardware side, J-core lets us do the same thing with a processor.

The systems that nobody understands will bit rot and go away as the people who created them cease being available.

May 15, 2016

I've been useless and mostly sleeping for 4 days now. Combining a stomach bug, jetlag, and going cold turkey off caffeine is a recipe for sleeping 16 hours a day. (Either that or I've managed to contract mono.)

I have Giant TODO List Of Doom to work through. (As always, the trip to Japan made my todo list _longer_, despite getting lots of stuff done.) And I'm just not up to doing it in more than about half-hour bursts. Blah.

Going off caffeine is mostly because my eyes are in terrible shape and caffeine seems to make it worse. I should find a proper eye doctor and get a proper diagnosis, but they keep saying they can't find anything. (Other than the floaters, cataracts, myopia, and visual migranes, none of which they can do anything about.) But it's hard to see (harder in the mornings) and there's all sorts of weird flashes.

May 12, 2016

Regained a day going back across the international dateline, so what was the 12th became the 11th again and now it's the 12th again and it's hard to figure out whether this is a new blog entry or a continuation of the previous one? Return layover in the SF airport, I slept for half of it because I couldn't do ANYTHING useful on the united iternational flight with every seat filled, in the backmost row where the seat doesn't recline (but the one in front of me did and was), with a large man in the next seat spilling over into mine, and epic sleep deprivation already going on but no way to sleep where I was.

So I mostly watched stuff on the seatback entertainment system, which is actually a strange torture device United devised to annoy its passengers. The captain or one of the stewardasses would blather something irrelevant every 15 minutes (such as turning on the fasten seatbelt sign 8 times during the flight, or telling us things that map screen on the seatback had been showing us live all along), which paused the video with a pop-up about "your entertainment will resume after you listen to this important message". Not once was it an important message. The intercom would click on and have fifteen seconds of hum before they actually spoke, then another dozen seconds of hum afterwards (and often they'd blather more irrelevant drivel after a long pause). And then the thing would resume for 5 seconds before it started all over again in Japanese. And then if you tried to back up the video, you hit the fact this search controls in the video don't work. (You can view forward/back at up to 16x speed, and then when you hit "play" it resumes either at the start of the minute or the start of the scene, which could be 8 or 9 minutes earlier. Seemingly randomly.)

United! They're cheap right now for a reason.

Still, Deadpool was entertaining. A movie where "irreverent" is actually an accurate description, and it's the most perfect comic book movie casting since Robert Downy Jr. was Tony Stark, except this time it really seems like they hired Deadpool to play Ryan Reynolds. Not sure how. (Also not sure how it was that _obvious_ given I wasn't a fan of the character before this movie, yet here we are.)

Tried to watch several other movies, but mostly made it about 10 minutes in and stopped it again. (I don't understand why the Alice in Wonderland movie starring Jack Sparrow got made. Or at least not from the first 10 minutes where Alice is dancing with some chinless person surrounded by horrible people. I'm sure it's important setup, maybe I was too sleep deprived to appreciate it, but entertianment utterly failed to happen. I'm told their doing a sequel. Ok then.)

I slept through half the 6 hour layover in SFO because the concrete floor here is SO much more comfortable than anything involved with United. But now I've got a couple hours to try to get a little work done.

The loopfiles stuff still needs a design update so the callbacks aren't constantly calling openfd(). Let's see what's using that:

md5sum: doesn't care (one read, auto-close), base64: one read, auto-close, blkid: one read, auto-close...

There's an inconsistency: md5sum and blkid continue bad reads (read, test error_msg), but base64 aborts (xread). Meanwhile md5sum uses read, blkid uses readall. The blkid callback doesn't care about fd vs fp, but the direct call of the same function does (checking errno=ENOMEDIUM). bzcat is convertible but sort of prefers fd? dos2unix cares because it's using copy_tempfile() which is fd. fsync cares because it's doing ioctl() stuff...

Sigh, converting this to something coherent looks like an enormous timesink. I _think_ what I should do is have loopfiles always use a FILE * and then things that need fileno() can trivially get it, and that avoids the main problem which is once you do fdopen() you can't dispose of that FILE * object _without_ closing that fd. (It's a one way transition.)

Speaking of enormous timesinks, time to get on the next airplane.

May 11, 2016

Sick most of last night, got about 3 hours of sleep. Not _quite_ food poisoning this time, in that I didn't throw up. Some kind of stomach bug. Alas, I couldn't take a sick day because I had to check out of my hotel, and the airplane home leaves shortly after midnight.

Wound up kinda useless all day.

The Turtle boards arrived! They boot Linux! Ethernet works. The HDMI connector works! The USB hub doesn't work (although it's apparently a simple fix if you have a soldering iron, an active hi/low got swapped in the overcurrent protection). I need to take Martin's wiki material and do a public version for

One of the people who emailed to express interest in our proposed Open hardware track at Linux Plumber's mentioned that 2 days after our proposal was submitted, an "FPGA development" track was submitted and the two have been splitting the proposed audience and no WONDER it's been 4 months since we submitted it and there's just crickets chirping. (I emailed to go "do we still have a chance" last month and got a one line reply back boiling down to "maybe", and that's the only communication with them in 2 months.)

So I emailed them to put the proposal out of its misery. If we really want to do something like that we could do it at ELC, where the CFP doesn't close for weeks yet and then they give you a reply back in a month, and that's WITHOUT having an entire page on your responsibilities doing their convention organizing job for them. (If you're going to ask for a significant resource investment from us, don't leave us hanging for 4 months about whether our proposal is actually accepted. I have other things to do.)

May 10, 2016

I've spent the past few days trying to get the VHDL git repository conversion finished, and it's HARD. The information mercurial has exported is nuts, and git provides no error messages when something doesn't meet its expectations. Between the two of these, it's slow going.

I threw up my hands and asked Rich for help, and he confirmed that the merge info mercurial is exporting does not _remotely_ match reality. Unfortunately, he's busy trying to make gdb work for j-core, so I don't want to distract him with my problems.

I think what I have to do is create synthetic merge commits. Not entirely sure how.

May 9, 2016

Busy week. Have not kept the blog up to date.

So we have a J-core roadmap with a 64 bit strategy now, and Jeff's also been working on scaling SMALLER to something we'll probably call J1. The basic idea is to rip the cache, prefetch unit, and multiplier out of j2 and try to fit it into the 8k version of the ICE 40 FPGA, which has 32k of sram. If we can make that work (and it looks doable if we can just get a VHDL toolchain targeting it properly), then we can move jcore down into the Arduino space in a _truly_ tiny chip.

To make this work, Jeff's been poking at nvc which is a simulator like GHDL but written in C, and seems capable of producing output that yosys could consume to produce a bitstream for the ICE40. Jeff stripped down J2 into a something tiny enough to fit, and then nvc couldn't simulate it, so he sent the tarball of VHDL source as a giant test case to the nvc maintainer, who's been fixing bugs ever since.

Apparently you can replace a hardware multiplier with bit shifts and adding, which means a 32 bit multiplier becomes a 33 clock cycle microcoded instruction, which is horrible from a performance perspective but small in terms of transistor count. However, the really _fun_ bit is that the original SuperH instruction set (SH1) had a much smaller multiplier, which becomes something like a 9 clock cycle microcoded instruction. Hence this stripped down thing being J1; we can tell the compiler to output original sh code for it, not even sh2.

No, it wouldn't run Linux. Although if you hook up 256k of ram you can sort of run nommu Linux out of ROM, according to a presentation at ELC last year (video, slides).

Jeff says he's done that, but he'd rather port the Arduino GUI to connect to a jcore toolchain and hook into that ecosystem instead, which wouldn't involve running Linux on the result at all. Still pretty exciting, if we can make it work...

May 3, 2016

Pinging the computer science departments of various women's universities around Tokyo: if we're gonna hire another couple developers we have to train in our stuff _anyway_, we can correct the project's gender balance while we're at it.

Speaking of which, a message on the toybox list asked me to add contribution guidelines to the README, and while I was there I dithered for a long time about adding more text to the "code of conduct" bit, and then committed an update with great trepidation. (The original was intentionally extremely terse because any attempt to address issues affecting 51% of the population gets complaints from much smaller groups insisting it should be all about them instead, how dare I, etc. Since they can only be helped by perfect people, and would rather I do nothing than do my usual half-assed job, I compromise by doing nothing for them and getting on with what I originally planned. I'm aware this makes me a horrible person in their eyes, and I'm ok with that.)

The patreon news post situation has goten to the "sorry, sorry, sorry!" stage where I'm avoiding looking at it because I feel bad. It's a self-imposed goal to write monthly updates and I've been SUCKING at it. (That effort gets sublimated into updating this blog, so that's something.)

And I need to do a monthly writeup of the traffic. There's been a lot of news (traveling to tokyo always flushes the todo list), and it should be collated.

May 2, 2016

Texas LinuxFest wants talk proposals by the 5th, LinuxCon Japan wants them by the 6th, and the Open Hardware Summit wants them by July 1st. I still haven't heard if the Linux Plumber's conference wants the open hardware track, but I'm more or less assuming they don't since I've heard nothing for two months. (I emailed last month and got back a one line reply that the organizer was "cautiously optimistic" we might still make. Nothing for a month before that, nothing for a month since. For something we're proposing to invest significant resources in, and have been waiting to hear back about since January. Wheee.)

April 30, 2016

I've implemented a lot more -o fields in ps than the documentation actually lists (mostly needed for things like top and iotop), so I'm trying to go back and fill them in. And I'm at -o PR vs -o PRI, which gets into what the "priority" value the kernel exports means.

The ps value is exported from fs/proc/array.c function do_task_stat, which calls task_prio() and prints that value as is. That function lives in kernel/sched/core.c where it's returning the process's p->prio - MAX_RT_PRIO, and the comment on the function says:

Return: The priority value as seen by users in /proc. RT tasks are offset by -200. Normal tasks are centered around 0, value goes from -16 to +15.

Except that MAX_RT_PRIO is defined in include/linux/sched/prio.h as MAX_USER_RT_PRIO and that (in the same file) is #defined to 100.

April 28, 2016

I miss kmail. Pity they glued it to a dead desktop, and then glued _that_ to an rss reader and 12 other things in a giant unseperable hairball. Oh well. Microsoft does bundling, kde and gnome do bundling. The GPL is itself an attempt to use copyright to do bundling.

Not really a fan of bundling.

I forgot to plug in my netbook yesterday, and the battery died while I was out at dinner. Died as in "I lost all my open windows/tabs again". So I had to relaunch chromium, which brought up the pervasive breakage in current chromium. Specifically, when I relaunch it and it connects to the network it tries to reload all its tabs, using gigabytes of memory and pegging the CPU in perpetuity as javascript in background tabs endlessly cycles animations I can't see and contacts servers to refresh ads I'm not watching and probably mines bitcoins for all I know.

The 12.04 fix was to fire up "top", find the tab processes eating nonzero cpu, and kill them. Repeat until no tabs were using CPU. I even had a script to do this (top -n 1 | screen scrape the output). But the new improved chromium damages its process environments so the names are truncated and I can't distinguish which tabs I can safely kill, and which will take down the whole chromium instance (closing all the windows). And then on the relaunch, the tabs I did manage to kill before it went down try to reload as soon as I connect to the network.

I tried connect/disconnect (wait) connect/disconnect (wait) to starve the tabs, but plenty still manage to load (and use CPU perpetually), and killing them is russian roulette and after a half-dozen accidental forced restarts of chromium, I figured out what I SHOULD have done.

As root, "while true; echo nameserver > /etc/resolv.conf; sleep 1; done". Leave that running (dhcpcd will periodically overwrite it, this puts it back). Launch chromium and let all the tabs fail their DNS lookups. That DOESN'T get retried every time you connect to the net.

Another other thing you can do is use toybox top instead of busybox top, which shows the "chro" processes and can sort by memory usage (cursor right so the memory column is selected, that's what you sort by). Killing one or two memory hogs can free gigabytes of ram, and it usually takes 20 or so kills before you accidentally hit a critical process that takes down the whole of chromium.

So I'm slowly adapting to my new Linux desktop version, and working around the "lateral progress" that desktop Linux is known for. Every time I upgrade, I have to come up with a new set of workarounds, and this is why Linux on the desktop isn't any more popular today than it was 10 years ago. They keep breaking stuff that used to work and calling it progress.

I call it "lateral progress", and you get it on Android in spades, where it's called "cloud rot"...

April 27, 2016

Still in Japan,coming up on Golden Week, which keeps getting described in english for some reason.

As far as I can tell, English is to Tokyo what Latin and Greek were to english a century or two back. You sprinkle in phrases in that language to show you had an expensive education, but said phrases sometimes make very little sense in the original language or out of context.

Had a discussion with Jeff and Jen about testing at dinner tonight (everybody went out for pizza), and I think I realize one of the disconnects I had with Jen (who is big into testing).

To me "complete testing of all the behavior" includes making sure the thing can emit all the error messages you think it can emit. (Can I trigger all the failure cases I think I'm catching? It's a codepath, what happens when we go along that codepath...)

That means testing is complete when I'm triggering all the positive _and_negative_ behavior I've implemented. If I didn't think to implement something because I didn't think of all possible inputs and some unexpected input causes weirdness when this environment variable contains this value while it's run with stdin closed so the first open becomes the new stdin and stdout has O_NONBLOCK set on it and we inherited umask 777 and were creating a filename with an invalid utf8 sequence on a filesystem where the driver cares and then we have an unexpected filename collision due to case insensitivity and our cwd has been overmounted by another filesystem so "." and /path/to/cwd give different results and our parent process set SIGCHLD to sigignore which we inherited across exec (POSIX!) and selinux made setuid() fail for a sudo root process trying to _drop_ perissions and writing to the middle of a file failed because it was sparse but the disk filled up and another file was on NFS where close() can fail (and leaving a deleted file open causes rmdir() on the enclosing directory to fail) we've run out of systemwide filehandles right before the OOM killer takes us out...

I'm aware there's generally stuff I didn't think of. Testing is complete when it covers everything I _can_ think of. More or less by definition. And then the real world breaks it, every time, and you add more tests as you fix each bug.

April 26, 2016

Highly productive day (somewhat confused by the international dateline). But Jeff keeps thinking it's wednesday so I don't feel so bad.

Among other things, I sat down with Niishi-san and asked about the memory controller, and wrote up the result on the j-core mailing list. I should put it on its own page, but despite the design walkthrough talk actual VHDL source documentation probably comes after finishing the git repo conversion.

April 25, 2016

Lost a day to the international dateline (there's a REASON the future begins in japan, it's already tuesday here).

April 24, 2016

9 hour layover in the San Francisco airport. Trying to close windows so I can charge the second battery, but there's a todo list critical mass where working through todo items opens new tabs faster than you close them and I am _so_ far past that event horizon....

Got a couple talk proposals sent to Linuxcon, which is on Toronto this year about a 45 minute drive from SEI's canada office. Kind of a shame NOT to propose something for that, since I've never been to their canada office. (My only visits to canada were Ottawa Linux Symposium. When it was in Ottawa, before moving out of Ottawa did to it what moving out of Atlanta did to Atlanta Linux Symposium. If your city is in the event's name, don't move out of that city. It won't end well.)

I'd hoped to at least flush through toybox bug reports and get a release out, and I did get the bzip segfault fixed (it was a bad error message, the error case was detected and the attempt to report it was segfaulting), and the latest find bug, and the start of ps thread support. And I got some basic test infrastructure for toysh in (ability to run a test under a different shell than sh.test is running under).

The loopfiles stuff needs a design update so the callbacks aren't constantly calling openfd(). Not quite sure what that should look like yet.

April 20, 2016

The long-delayed trip to Japan has finally been scheduled! I fly out on the 24th (at 7:30 am!), and get back on something like the 14th.

I was trying to get a toybox release ready by the end of the month. Now I need to get it ready in 4 days.

Hmmm. I need to get serious about the toybox release. That said, I have a 10 HOUR LAYOVER in San Francisco on the way out. (Do I know anybody in San Francisco?)

Poking at the shell some more to try to get what I'm doing to a good stopping point. I'm not sure how the signal handlers should work with nofork commands, specifically if I ctrl-C while it's doing a malloc, it could theoretically interrupt between the malloc allocating the data and the assignment of the return value. That's a memory leak. I don't know how to make it NOT be a memory leak other than never allocating memory. (It could do the same with filehandles but I can presumably clean up after that because I can check what filehandles are in use with /proc/self/fd, or even by hand if necessary. I don't think any of the nofork commands are doing mmap but that's at least theoretically recoverable too.)

If you hit ctrl-C at a bash prompt and then "echo $?" it says 130, which is 128+2. But when I had toysh exit with 130 from the error handler it said 2 instead, and after elaborate head scratching it's because bash (or maybe the syscall?) is doing an &127 on it. Ok then, reasonable behavior according to posix. So it's locally preserved, but not from child processes.

April 19, 2016

Once upon a time (back at Quest Multimedia in 1998) I wrote an arithmetic expression parser in java that worked fast enough on a 75 mhz 486 DX to not just evaluate each point of a line but smoothly animate it as the equation changed.

This was, alas, almost 20 years ago. But it was a pretty standard "two stacks, one for arguments, one for operators" approach I think I read about in a book once. Implementing my own in toybox was never something I dreaded doing, just something I never got around to because there's so much else to do. That's why looking at the contributed "expr.c" in toybox, even with layers of cleanup from somebody at Google, I really want to throw it out and start over again.

The big thing I _really_ want to do is have the same code handle expr and shell arithmetic expansion (I.E. $((1+2)) evaluation). The problem is $(( )) treats unknown strings as environment variables (force converted to int, or 0 if their value isn't representable as an integer), and expr treats unknown strings as strings in certain circumstances.

In expr, | and & can have string arguments, = > >= < <= and != can have string arguments, : always works on string arguments, and arguments by themselves are strings.

In $(( )) there are seprate logical and boolean operators, = is an assignment not a comparison, and there are whole categories of operators (prefix and postfix, bit shift, the ? : conditional thing...) that expr doesn't do. And it's not like either can be extended to do the other's thing: there are several obvious conflicts in the functionality: ":" as regex vs "? :" as conditional, "=" as comparison vs "=" as assignment, "|" as comparison vs "|" as boolean, and whether "BLAH" is a string or the contents of $BLAH coerced to integer type.

The other bit is that expr operates off of separated arguments and $(( )) takes a string. Parsing a string into seprate arguments isn't that big of a deal, but "expr 1+2" returns "1+2" because it's a string argument.

That said, they can be extended _most_ of the way towards each other, and it's easy enough to have a loop grab the next token out of a string and feed it start+length to a "deal with it" function, and another loop traverse an array to call the same function with strlen().

My big question is should the common plumbing have a mode flag, or should it operate off of two different operator tables? I'm leaning towards mode flag because the enum gluing together the table and the users of the old enum table is kinda unpleasant. (Yeah I did the TAGGED_ARRAY stuff for that case over in ps, but the natural names for these entries are things like + that don't work in a symbol name. Also, the TAGGED_ARRAY stuff made an enum of _index_ positions, and this has the values stored in the table. Yeah the tagged array stores strings in the table, but ps needed that. Pretty sure I could make it not store the strings when they're not actually needed, maybe SPARSE_TAGGED_ARRAY or some such...)

Another wrinkle is indicating precedence grouping. I added a bunch of tests to expr.test to show that * and / being the same priority (and thus happening in the order presented) matters (especially with integers), so I can't just use a simple order but have to indicate demarcations. But I can put NULL entries in the table to indicate each group end.

What I want is a SPARSE_TAGGED_ARRAY of operator strings, sorted in priority order from "(" to "|=", with NULL pointers at each priority group change. That way flush checking can search from the start of the array to the NULL after this operator's index position. That makes the "prec" and "op" fields of the current table go away, leaving just "sig", which indicates whether this operator takes strings or integers on each side, but this is a property of a priority _group_, not an individual entry, so it should probably be its own array and not in this array.

April 18, 2016

Guess what the reason for the numato flashing problems was? Go on, Guess.

Hands up everybody who picked Ubuntu being flaming stupid.

Seriously, they've got a demon that sends random AT commands to any newly attached serial port because clearly nothing says it's 2016 like assuming all serial ports have a modem attached supporting the patented (no really) Hayes AT command set. And yes, hayes press releases used to have "+++ATH" in their header fields, in case you're wondering why base64 encoding message contents got so popular...

April 17, 2016

Elliott indicated that mksh might be displacable, so I should iron while everybody's on strike due to overheating, or something like that.

So I'm reading the posix shell spec! And it's crap! Seriously, it says you should have builtins like "alias", "cd", "bg", "getopts", "read", and "umask" available in the $PATH so you can exec them and have env call them. What would "env cd" _mean_? Don't alias and read basically set environment variables in the current process environment space?

I'd ask the Posix list, but you know, Jorg.

The posix standard doesn't include any escapes in PS1 except "!" (not backslash escaped, just by itself, with !! being literal !). Anyway, attempting to do the PS1 escapes from the bash man page, which brings up the question of what \[ and \] actually _do_. The man page doesn't obviously say, but google found an explanation: they're a hack working around the fact bash doesn't query its terminal cursor position. So let's just ignore them and do it right.

(Meanwhile, Elliott's also expressed interest in mke2fs, an mtools replacement, and thread support for ps. I should do these things. I'm also working through the new bunzip2 segfault test case John Roeghr sent me, and Andy Chu's long todo list of testing stuff, and I should get expr, lsof, and file promoted out of pending for the upcoming release.

Wanna do a release at the end of the month. It's already looking kinda... squishy, as deadlines go... Let's see how close I can get!

April 16, 2016

The darn numato flash tool is breaking for people, because the python3-serial package is buggy and introduces noise when you open a connection. jeff suggested I write a new one in C, which seems like overkill until you ponder "what I'd be debugging isn't Numato's script, it's python 3".

I admit removing the python 3 dependency is almost worth a C rewrite. It's not quite to "kill it with fire" levels of perl dependencies, but python 3 was not python 2. The motto of Python 2 was "there should be one obvious way to do it", and the very existence of Python 3 contradicts that.

April 15, 2016

The j-core list is up!

There has GOT to be a better sysadmin in the company than me. Oh well, got there eventually. I still need to set up DKIM or SPF or some such, but let's see how trigger happy gmail gets on the spam filtering first.

April 13, 2016

The reason buildroot's qemu-sh4 defconfig isn't working is a recent commit changed the default kernel type from zImage to uImage. (The first is a default output type qemu knows how to boot, the second is the weird packaging u-boot and only u-boot needs, for no obvious reason.) So they know about it now, and are fixing it upstream.

People don't always believe me when I say I break everything, but seriously: most things don't work for me on the first try. This is why the "continuous integration" proposals replacing releases with the expectation that the git commit du jour must always be as stable as a release would be... I am not a fan. I don't care what your test suite looks like, it will break for me.

April 11, 2016

Since the reinstall of Ubuntu 14.04 I keep finding missing packages: they don't seem to offer openoffice anymore, but libreoffice works, mame gets more annoying each release (stoppit with the full screening and the mouse grabs, more --stop-doing-that arguments eery time...)

And I apparently don't have a proper qemu install on here (x86 is there, sh4 isn't; for some reason upstream breaks this up into multiple packages for no apparently reason). I usually build it from source because of that, and the git repo on this box apparently hasn't been updated since October 2014. So I did a git pull, then make clean, in that order. The clean complained that "config-host.mak is out-of-date, running configure" (boggle), and then died complaining that pixman isn't installed.

This is "make clean" saying this. Thanks QEMU!

Making sure configure enables virtfs is another fun one (it switches it off by default, when you force it on the complaint is you haven't got libcap-devel and libattr-devel, wich are the Pointy Hair Linux names for libcap-dev and libattr1-dev). And of course after ./configure is happy, make immediately dies because autoreconf isn't there (of COURSE configure didn't check for it)...

autoreconf: not using Libtool
autoreconf: running: /usr/bin/autoconf error: possibly undefined macro: AC_PROG_LIBTOOL

Seriously? There is NO EXCUSE FOR LIBTOOL ON LINUX. EVER. They notice it's not there, and then break because they try to use it anyway. That's just SPECIAL.

I could just about put up with having autoconf persistently installed on the box, but libtool gets uninstalled after each use, which means I wouldn't be able ot build qemu du jour unless this gets fixed. Hmmm... The problem is pixman, possibly last time I just installed pixman-dev rather than initializing the git subrepo? If it keeps the dependencies down to a dull roar...

How do you delete a git subrepo out of the project again? One of those "if you don't already know how to do it, there's no obvious way to look it up" things that are so prevalent with git. Time to try some google-fu... The answer is "git submodule deinit -f pixman". Clearly, I should have just known that intuitively.

After two minutes of watching "make clean" cd into each subdirectory and GENERATE HEADERS just so it could delete them again (before I killed it and did a "git clean -fdx" which took about 3 seconds), I started wondering what pixman actually _does_? Yes, I can see that the description says it's a pixel manipulation library for X11, the question is what does X11 already do that does NOT count as pixel manipulation? Oh well...

April 10, 2016

FINALLY figured out how to enable xbithack on which means I can do a nav bar with a news page.

And the j-core design walkthrough slides are up.

April 9, 2016

Drilling through back email, various broken link notifications for and and the toybox readme. That means people are reading my stuff. Woo! (And fixed, in all three cases. Still haven't got the presentation slides up because Jeff has the current version of those, mine are stale.)

Another package I forgot to install after the upgrade: VLC. Needs to download 21.6 megabytes of archives, and then probably another package with the various codecs so I can actually watch real videos.

No, it apparently installed all the codecs. Query: why does a VLC video display window have a minimum width it'll let me drag to, but not a minimum HEIGHT? The video will continue to scale down to really tiny if I reduce the height (the aspect ratio says the same), but whoever wrote the window sizing code artificially limited it for some reason. Clearly because they know better than mere users.

Didn't install mame. Didn't install openoffice. (But they don't package that anymore, it's libreoffice only... ok then? It still gives me an "soffice" binary.)

April 8, 2016

I normally run a webserver on loopback, and it's not working. I installed apache, but apache's config file syntax du jour doesn't like my old loopback.conf file: It's saying "Forbidden: you don't have permission to access / on this server." According to /var/log/apache2/error.log the reason is "[authz_core:error] [pid 13161:tid 139812817176320] [client] AH01630: client denied by server configuration: /home/landley/www/favicon.ico, referer:" which is USELESS for debugging why it's complaining.

Solution: I've meant to write an httpd in toybox. Now seems like a good time to do that. Apache has complicated itself into uselessness, and the entire Apache2 rewrite was about becoming multi-threaded instead of multi-process to scale better on windows, it never had anything to do with Linux and was in fact WORSE for Linux (that's why nobody ever wanted to upgrade off Apache 1.3), but they shoved it down our throats anyway.

So yeah, abandon apache and write a simple replacement. Not much harder than netcat, and I've got the start of a wget that can pipe through an external program to do ssh. This is how my todo list shuffles around...

April 7, 2016

CELF/ELC. Very convention, much wow.

Sitting in my hotel room until it's time to airplane again, and trying to finish up the Linux From Scratch 7.8 build for Aboriginal Linux. There are several ways to do it: should I build all the packages with the toolchain I provide, or build a new toolchain with the old toolchain? If I do build a toolchain with current gcc/bintils (and gmp, mpfr, and mpc because that mess leaks complexity like a drunk fratboy leaks beer), should I build it against glibc or against musl-libc?

Since building glibc requires _perl_, there are a certain number of packages that would need building under the original toolchain just to get to the point of replacing them, so I started by building a couple dozen packages with my old toolchain and confirming they build fine.

The glibc build also explicitly tests for binutils 2.22 or newer, and ./configure sits down in the mud and throws a tantrum under 2.17. So I tried building/installing binutils 2.25, and although it compiled fine the result went "ld: cannot open linker script file elf_x86_64: No such file or directory" so there are some path mismatches in here.

Also pondering upgrading bash in aboriginal to the last GPLv2 release, which looks like 3.2. It... really doesn't want to compile. Hmmm...

April 6, 2016

We gave a talk!. It was fun. Video may go up someday. I am exhausted.

UT has finally officially rejected Fade (because even though she got her undergraduate degree from Occidental College in california, she did some of her prep work burning through graduate prerequisites for _this_ degree at UT, and continuing on there would be incest or something. I don't understand it in the slightest).

Meanwhile, the University of Minnesota actively wants Fade, offering her a scholarship with stipend and everything MONTHS ago (while UT dithered), so she's moving into student housing up there, and probably taking Adverb with her. (If Adverb's too barky for an apartment, I can drive up and swap him out for Peejee, who has the advantage of not being a dog.)

Fuzzy and I are staying in the house in Austin (every time I've moved out of Austin I moved _back_ about 18 months later, so we might as well keep Very Nice House with 4% fixed interest mortgage), but this gives me an excuse to visit my sister up near Minneapolis for larger blocks of time.

April 5, 2016

Sitting in Khem Raj's LLVM panel at CELF, I downloaded the llvm 3.8.0 packages and tried to compile llvm:

The LLVM project has deprecated building with configure & make. The autoconf-based makefile build system will be removed in the 3.9 release.

Please migrate to the CMake-based build system. For more information see:

Oh look, another random dependency. Ths just FILLS me with reassurance. (On top of the gcc 4.7 requirement, and the fact that their linker doesn't work well enough for THEM to use it most of the time. And the fact that openembedded's "list of packages that won't build with cmake" includes _CMAKE_. And the fact that I left the compile running for HOURS after the panel and it's STILL NOT DONE, and that's just llvm, there's like 5 more packages in this chain...)

Looks like I need to maintain my old lastgplv2 toolchain a while longer. And get serious about QCC after toybox's 1.0 release. (Which needs a cfront implementation... this is not going to be fun.)

April 4, 2016

Back at CELF! Or ELC, as they're calling it this year. Fun convention, Rich Felker is here, as is Jeff Dionne although he hasn't made it to the actual convention yet. (He's holed up in the hotel restaurant being a CEO on the phone, and working on slides for our talk wednesday.)

Saw the Openembedded Talk, kernelci talk, and the yocto vs buildroot, and reducing android's memory footprint.

While I was talking with Rich in the hallway, Linus passed by and I said hi (the actual quote was "So you _didn't_ beam out after your keynote"), and he actually stopped and talked to us, which I did not expect. This would totally be a "Sempai noticed me!" moment if I hadn't been doing this for something like 18 years now, but it was nice to finally talk to him in person.

Mostly I introduced Linus to Rich ("meet your new superh architecture maintainer") and told him about the j-core stuff. He reminisced about transmeta a bit and said he wouldn't have time to use a numato board if we gave him one. (We bought a dozen to the convention to give away.) He also gave Rich permission to blow away anything in arch/sh he wants to, because Linus was this close to removing the thing during its orphan period and just didn't get around to it. (We want to keep the old stuff anyway because it shows prior art, and a lot of Japanese engineers who poured their careers into this technology are glad somebody's picked it up and run with it even if their companies don't care. We're "respecting the spirit of superh".)

So that was fun. Now back to the hotel to prepare for our talk.

April 3, 2016

Another travel day, plane from Chicago (where I spoke at Flourish) to San Diego for ELC (which used to be CELF).

I'd hoped to hang out with Jeff and Rich, but my luggage wound up in Burbank for some reason, so I hung out at the airport until 7pm when it showed up in San Diego. (Southwest gave me a $50 travel voucher for the inconvenience, which was nice of them.)

Got some more grinding in on the repository conversion, but I'm still working my way through 2012.

April 2, 2016

I would appear to be allergic to chicago. Constant sneezing and my skin itches. I wonder why?

Both talks went well, the outline of the Three Waves talk is up, the second was basically a walkthrough of what Aboriginal Linux does under the covers (assuming you don't want to use my scripts and instead build your own busybox/uClibc or toybox/musl initmpfs by hand). I look forward to the videos going up, presumably on their youtube channel.

I also attended a talk about Software Radio, but it was a lot more introductory than I was looking for (if I didn't think software radio was a good idea, I wouldn't have gone to the talk?) and half of it was the presenter showing us a video of a more interesting talk on software radio that he didn't give us the URL to.

Went to a very nice talk about how the distribution sausage is made which I'd like to re-watch when the video goes up. It's Red Hat focused, but the guy giving it knew his stuff. Talked with him a bit afterwards about how I want to bootstrap distributions from essentially a Linux From Scratch chroot, and the troubles I had with Gentoo's INSANE annotation of every single package with every architecture it had been tested on (so adding a new target requires touching every file in the entire tree, so the hexagon guys told portage that hexagon was a variant of x86 to get it to work, and then jettisoned portage for a linux from scratch approach without ever trying to push anything upstream into gentoo ever). So now I'm hoping rpm and dpkg would be less stupid.

He then explained how the situation was worse than I knew for Red Hat, because not only will Fedora 24 not build under Fedora 22, it won't build under Fedora 24 either. It builds under a modified fedora 23 and once they get all ~16,000 packages to build once they move on. And attempting to do a verification step of rebuilding under the result and fixing anything that broke would be too much work, they haven't got the resources.

So that's nice. I guess I should go back to looking at Debian, since they actually seem to care about this.

April 1, 2016

Today's the first day of Flourish, although both my talks are tomorrow. I flew to Chicago yesterday, and am staying with Beth Eicher, an old friend and coworker from Timesys who's one of the driving forces behind Ohio LinuxFest. (She has a kid now, who is ambulatory but prevocal.)

The bus we took to Flourish is the Jackson Park Express, which she pointed out Weird Al did a song about, specifically mentioning her stop. (Since she's been personally serenaded at two Weird Al concerts, she's decided that song is about her. I hadn't actually heard that one yet because the past two albums I bought the CD but don't actually have a convenient CD player hooked up to anything anymore, so I just listened to the songs on youtube. Apparently I missed a few.)

Yup, it's a weird al love song.

Flourish is a bit sparsely attended this year, but it's rebuilding after a gap year. That's the downside of student run conventions: the previous group that runs it graduates and there tends to be some rebuilding. Aggiecon tends to go in 4 year cycles as the new group who takes over flounders for a bit, learns to do it well, and then graduates together. You'd think it would involve smoother handoffs and gradually onboarding junior members, but it never seems to work out that way. The people who know how to run it do so until they're no longer available, and then newbies are thrust into the spotlight...

Penguicon tried to cycle the concom out and hand off to new people very year for the first couple years, that's why year 3 almost didn't happen. And then when they stopped that we got Mr. Penguicon trying to build his personal identity around it, which was

March 29, 2016

Tried to tether xubuntu 14.04 to my phone, spent ten minutes wrestling with the fact that any access point needing a password fails because it won't pop up a prompt (the hover text says it needs a password but it won't ASK for one), and when I went into settings->network connections manually it wouldn't let me edit anything (all the fields were greyed out). I thought maybe it had created a corrupted entry so I deleted the entry for my phone's wifi, but it recreated it corrupted. The trick turned out to be delete it, and then create a new wireless network WITHOUT trying to associate with it, and THEN I could give it a wifi password.

Seriously, this is craptacularly bad. Anyway, it took me ten minutes to get it to work, after which I couldn't remember what I wanted to Google.

Linux on the Desktop!

Meanwhile, the advantage of chromium over firefox was always that I could kill tabs. But the chrome web browser (chromium-browser) is broken in xubuntu 14.04. It truncates its command lines (the process is writing a NUL byte into its environment space so all the children of the --zygote process show their command line as "/usr/lib/chromium-browser/chro". The problem is, when chromium restarts itself and reloads all its existing tabs, it eats insane amounts of CPU and memory. (I have hundreds of tabs in a dozen or so windows. I've learned from experience that merely bookmarking stuff never gets looked at again, the firehose of new data doesn't STOP. But I'll periodically go through and harvest old tabs, and do the implicit todo items in them.)

So if I let my old "./" script run to completion, calling top -b and working out which chromium tabs are eating cpu long after they should have stopped, and killing them all, it eventually kills chromium. The whole thing, not just individual tabs. Because they destroyed the ability to distinguish between the child processes, thus defeating the whole purpose of chrome in the first place.

March 28, 2016

In my previous installation checklist I had to sudo ln -sf vimrc /etc/vim/vimrc.tiny and I still need to do that, because somebody at ubuntu hates vi and tries to sabotage it every single install. I don't know why.

I need to "sudo apt-get install aptitude" and then "sudo aptitude install chromium-browser mercurial subversion git-core pdftk ncurses-dev xmlto libsdl-dev apache2 xfce4-clipman-plugin g++". I didn't install flashplugin-nonfree libreoffice-gtk yet.

Disable that horrible "light locker" thing (if you don't know what the screensaver is called, you'll never find it). Make the power manager always show the icon. Set a root password...

The xubuntu terminals are SORT of white on black, except it's grey on a darker grey which is a STUPID source of eyestrain. You used to fix this by deleting terminalrc under /etc/xdg, which has now moved into an "xfce4" subdirectory for some reason, but deleting it didn't set the background to proper black so I went into edit->preferences and changed the color myself.

Eventually I should be able to move the mail over, but there's a lot of other random fixups to do first. (Copying my home directory with its random dotfiles fixed some of it, but not all. Ordinarily I DON'T do that, and force a proper re-setup so I can document what it all was and make sure everything's reproducible, but I just want to get this working again so I can do things before Flourish and ELC. I didn't expect it to eat 3 days, and can't afford for it to eat MORE than 3 days.)

March 26, 2016

The giant rsync finally finished. (I think I was rsyncing over a backup of the mac's virtualbox linux partition, not the most recent netbook rsync.) Then I let xubuntu update itself to 14.04, and when it rebooted the networkmangler icon had vanished so I can't select which wireless network to log into. (Bravo, ubuntu. I knew better, and I tried an uprade anyway. It has NEVER worked for me in the entire history of ubuntu.)

So, time for a clean install. Had an alarming moment looking at (which is an apache fresh install page), but it's (Right, I knew that, it's just been a while.) I downloaded the iso (first attempt had some sort of permission problen that firefox won't tell me anything about, why is may backup machine running firefox? Eh, retried and it worked.) Then usb-creator-gtk wasn't installed, but easy enough to add it. Then the usb stick had a GPT partition that usb-cd-creator didn't know how to handle (and fdisk doesn't know how to delete, but "cat /dev/zero > /dev/sdb" does)...

It IS possible to install over an existing partition without reformatting it if you manually mount the partition and delete everything but /home, then click "do something else" at the bottom of the selector thing (really, that's what it's called) and then double click on the partition, manually select the same filesystem type in the pulldown, and THEN it lets you not select "format" while assigning a mount point.

Unfortunately, the resulting install STILL doesn't have networkmangler launching. (All I kept was the /home directory, I guess that's screwing it up?) Luckily other people who've had this problem, and after a bit of googling I found a page that said I need to run nm-applet under the "dbus-launch" wrapper, and edit the Exec= line in the file /etc/xdg/autostart/nm-applet.desktop to add that wrapper. (If keeping /home broke it how is modifying /etc fixing it? Is this just plain broken?)

So yeah, if you're wondering why Linux on the desktop never happened, and why I don't consider Android on the Desktop a step backwards from thisnonsense, I'm trying to use Ubuntu 14.04.4. That's 4 bugfix only dot releases after a Long Term Stable release shipped, and this is the kind of hoops you have to jump through to do esoteric things like "connect to the wireless access point in the next room".

Meanwhile, the box makes an annoyingly loud pc speaker beep every time I accidentally plug/unplug the power cord (it's a replacement adapter so it jostles out easily) which is actually done by the BIOS (system management mode or some such nonsense), so I can't STOP the beeping, but in 12.04 I had the volume turned way down. How to replicate that? Let's see... Pavucontrol was useless (didn't have a knob for this), I installed aumix but it had even fewer controls (none for thepc speaker beep). I had this working in 12.04 so I know there IS a way... I tried fiddling with the "pcspkr" module but that's not it. (It wasn't installed and shouldn't be.)

Ah, I finally stopped looking for ubuntu pages on this and looked for LINUX pages on this, which led me to alsamixer which (when you hit F6 and select the bottom audio connection) pulls up a menu of 6 audio controls, the rightmost of which is called "beep" and controls the thing I want to control. I have no idea if it'll persist over a reboot... ah, "sudo alsactl store" makes it persistent.

Linux: smell the usability!

March 25, 2016

Yesterday I thought "Three day weekend coming up (celebrating the release of the movie Ishtar), I should probably upgrade my nebook to 14.04 so I can read my email on the machine I actually carry with me when I head off to Flourish." I wanted to actually do some programming this weekend, so I decided to upgrade instead of reinstall.

Since I don't trust the upgrade not to eat all my files, I decided to do a full backup of everything, through the network. The server already has the individual directories I particularly care about rsynced and periodically tarballed (with old tarballs on various USB disks), but it's been a while since I did the "everything under /home" rsync.

I should have plugged in a cat 5 cable. Even an rsync over an older incremental backup has taken a day already. Not sure when it'll finish, rsync isn't exactly known for a global progress indicator. (Maybe I could add that to the toybox rsync someday? Hmmm...)

March 23, 2016

As long as I've opened the can of worms that is sed again (because Debian did something stupid), emit() can return an error when it can't write data to the output. One codepath prints an error message, another doesn't. Most callers don't check its return code, two do. One of those two does an error_exit(), which may not be the right thing if you're doing -i? I don't know.

The hard part's almost always working out what the right behavior _is_, not implementing it. Posix isn't close to detailed enough to notice this stuff, And I can't ask the austin group list because they're thorougly Jorged.

Oh well. Throw it on the todo list.

March 16, 2016

The reason Android expects to stabilize at an installed base of 6 billion phones. 1.3 billion people have no access to electricity. So basically they expect everybody in the world who can get an android phone will.

March 15, 2016

One of the issues with vi and cp and such is figuring out when they should overwrite the existing file in-place, vs when they should copy data to an adjacent file then mv it over the original.

The first one has the downside that symlinks and hardlinks can modify multiple copies of the same file, and sometimes you don't want to. (For example, the bunzip2 install overwrites an existing /bin/bunzip in place, and if that's a symlink to busybox it'll brick the system. Alas "cp -f" doesn't help because it tries to overwrite first and only deletes the file if the first attempt fails. This is why toybox binaries are chmod read-only: so cp at least cp -f won't stomp the shared file in-place.

A downside of the second one is that creating another file in the same directory doesn't guarantee you're on the same filesystem, because we have bind mounts now. In fact when I'm developing an Aboriginal Linux build control image (either using more/ or more/ ../control-images/build/lfs-bootstrap"), I cp read-only files out of /mnt to /tmp and them --bind mount the /tmp version over the /mnt version. That gives me a writeable file, but renaming it doesn't work because the filesystem it's on is read only. Similarly some scripts do "cp file.txt /dev/ttyS0" instead of cat, because historically that's worked; it expects overwrite in-place.

I implemented "create another and mv" logic for patch, both because inserting text in an existing file is awkward (you have to read and rewrite all the data after that point, and you corrupt the file with data loss if interrupted partway), and because you should be able to leave the file unmodified if a patch hunk fails. I then factored it out into lib.c and polished it for sed -i.

But for editing arbitrarily LARGE files with vi, I want to be able to mmap the file and read the data in place. Fine: allocate an array of lines that have starting offset and length of each line within the file. (Do we want to limit ourselves to 4 gigs offset, or 4 gigs line length? If not that's 16 bytes overhead per line, possibly more if we do some type of tree structure for the indexes instead of a giant array that's a big memcpy to insert into the middle of... Eh, start with an array and wait for bottlenecks to present themselves.)

I don't want edits to go live until you save, so modified lines need to be malloced and saved locally until written. The historical behavior of vim's ":w" is to overwrite existing files, updating hardlinks. (Probably truncate and write? I should strace it.) That also says that the offset should be a "long" so I can store a pointer or offset in it, but I'd need a way to indicate which...

And then there's the problem that if somebody else modifies the file while we're editing it, our mmap changes and our offset/length indexes become invalid. Which says we need to read everything into malloc memory, which is what I was trying to avoid...

Sigh. Implementing: easy. Figuring out what the correct behavior should be: hard.

March 14, 2016

Today I wrote up why you get filesystem corruption use a writeable non-flash filesystem on flash in email, so I might as well copy it here.

Conventional filesystems are based on the assumption that blocks the filesystem didn't write to won't change, and the standard block size on Linux has been 4096 bytes for many years. [1] Hard drives used 512 byte blocks, and the newer ones use 4096 byte blocks, so you could update the underlying storage with fine granularity, and parts you weren't writing to stayed the same.

But flash erase blocks are enormous by conventional storage device standards, I've seen anywhere from 128k to 2 megabytes. If the flash hardware is interrupted between the block erase and the corresponding block write (power loss or reset both do this), then the contents of the entire erase block is lost. Meaning you could lose a megabyte of data on each _SIDE_ of the area you wrote, which can knock out entire directories and allocation tables or even take out your superblock. Blocking a 1 megabyte hold in a conventional filesystem tends to render it unmountable, and filesystems designed for use on conventional hard drives don't know this is an option.

FAT is especially vulnerable to this: the file allocation table is an array of block pointers all next to each other at the start of the partition. A single failure to rewrite the data after erasing an erase block will take out the entire FAT and trash the whole filesystem unrecoverably.

It's a small race window, but the results are catastrophic.

This is why there are "log-structured" filesystems designed specifically for flash, which cycle through all the available erase blocks and make a tree pointing back to the data that's still valid in the previous ones. Linux has several implementations of this concept.

This technique is sometimes confused with "journaling", because it provides many of the same benefits, but it's implemented differently. Log filesystems are organized into an array of erase blocks. To format one, you have to have to know the flash erase block size, and they must be aligned to the start of an erase block. Because of this you usually _can't_ use them on non-flash device because they filesystem driver will try to query the flash hardware to determine the erase block size, and if that fails they don't know how to arrange themselves. They're designed ONLY to work on flash.

In operation, they cycle through all the available erase blocks and make a tree pointing back to the data that's still valid on the previous ones. Each new erase block contains both new data and any existing data collated out of the oldest block in the filesystem, I.E. the one which will be overwritten next. If there are free erase blocks the filesystem can just write new data (often leaving most of that erase block blank) without deleting an old block. If there are sparsely used erase blocks it copies the data from the oldest one to a new one and adds its new data to the extra space.

When a log-structred filesystem is near full writes get slower because it has to cycle through a lot of blocks to find enough free space, copying the oldest data to the new one and collating the free space until it has enough space to write the new data. (The smarter ones can skip entirely full blocks and just replace blocks that had some free space in them.)

Mounting them can also be a bit slow because it has to read the signature at the start of each erase block to figure out which one has the newest timestamp, I.E. the one contains the current root of the tree.

The advantage of doing this (other than automatic wear-leveling) is that if writing is interrupted after an erase, the single erase block that got trashed can be ignored (each erase block is checksummed, detecting invalid data is easy). The previous block still has a root node describing the contents of the filesystem as it was before the last attempted write, and the oldest block never gets trashed until after the newest block is written. (That means it always needs one free block between the oldest block still in use and the newest block, to accomodate these failures. So you're never erasing a block that still contains valid data, the data had to be copied out to a new block first.)

Note: read-only filesystems don't have this problem. You can stick a squashfs or read only ext2 image in flash and it's fine, because it never erases blocks so the granularity difference between what the filesystem was designed to expect and what the hardware actually does never comes up. It's only when _writing_ to flash that you need a filesystem designed for flash to avoid data corruption.

[1] It used to be 1024 bytes, but the longest an individual file could be on ext2 with 1024 byte blocks is 16 gigs, and the largest with 4096 blocks is 4 terabytes, so everybody switched years ago. (Because it uses a 3 level tree to store metadata and each level can hold more branches in a 4096 byte block than a 1024 byte block, that's why the difference is so big.)

A follow-up question: can we use log structured filesystems on SD/MMC cards? The real question seems to be "can you disable the FTL (Flash Translation Layer) and enable MTD (Memory Technology Device) mode". To quote Free Electrons:

Two types of NAND flash storage are available today. The first type emulates a standard block interface, and contains a hardware "Flash Translation Layer" that takes care of erasing blocks, implementing wear leveling and managing bad blocks. This corresponds to USB flash drives, media cards, embedded MMC (eMMC) and Solid State Disks (SSD). The operating system has no control on the way flash sectors are managed, because it only sees an emulated block device. This is useful to reduce software complexity on the OS side. However, hardware makers usually keep their Flash Translation Layer algorithms secret. This leaves no way for system developers to verify and tune these algorithms, and I heard multiple voices in the Free Software community suspecting that these trade secrets were a way to hide poor implementations. For example, I was told that some flash media implemented wear leveling on 16 MB sectors, instead of using the whole storage space. This can make it very easy to break a flash device.

If you can figure out what the erase block size your sd card is using, you can theoretically use block2mtd, at least according to the Raspberry PI and Debian guys.

That takes a block device an adds manually supplied flash erase block information. This only reclaims reliability if the FTL implementation, when receiving an aligned erase block sized write, won't break it up and do silly things with it behind the scenes. (Depends on your sd card vendor, apparently? How do you tell you've fixed it except by yanking the power a zillion times? Even some "senior embedded engineers" gloss over these issues because injecting failures at the operating system level won't trigger this, the FTL chip will automatically follow each erase with a write replacing its contents behind the scenes unless it loses power (or gets hard reset) at the wrong time. Simple tests in the lab won't hit this issue.

According to block2mtd.c in the kernel source, you either insmod block2mtd from initramfs with "block2mtd=[,]" or else write to /sys/module/block2mtd/parameters/block2mtd for the static version. In theory it should be able to provide this in the kernel command line too, but I'm not sure they wired that up in this module? (I'd have to examine further...)

Another alternative is to make sure your partitions are aligned to a nice big power of 2 size (so whatever your erase block is, you're not crossing them) and be prepared to lose the writeable partition. If you're logging to FAT, and the FAT partition is toast when you try to mount it, reformat the thing. That way the system can at least boot and start logging new data.

Also, you only lose data while writing it, once it becomes read only it should be safe, so when doing firmware updates you can have two partitions, write the update over the "old" one, and then have your boot software do the same "check the partitions, which has a valid checksum and the newest date stamp, use that" and should always have one valid even if the other is toast. (This is a common approach in the embedded world, reduces bricking the device on update.)

March 13, 2016

Long thread on the list about the toybox test suite. To me it sounds like Android has a hammer, so everything looks like a nail, but I admit testing is not an area of expertise of mine. Along the way, I seem to have posted a lot of "it's on the todo list!" that weren't in the recent post about that.

Poking at ls and trying to fix the -q and -v stuff. Posix describes ls -q with "Force each instance of non-printable filename characters and <tab> characters to be written as the <question-mark> ( '?' ) character. Implementations may provide this option by default if the output is to a terminal device." So what happens if the username has a tab in it? And presumably "character" is "utf8 wide character" now, right?

I'd ask the posix list about that, but Schilling. So I have to decide the right thing for myself and ignore the broken standards committee.

Meanwhile, I redid ls to use crunch_string() with its own escape for -qb, which brings up a problem: there are three failure cases needing escapes, low ascii (0-31), invalid sequences, and unmapped unicode points. The problem is the escape function gets passed a "wide character" (I.E. a decoded int), and then I need to dismantle that back to the raw bytes for -b byte escapes. I _think_ wcrtomb() will do it, but the docs are unclear? (Will the wc->bytes always exactly undo the bytes->wc transform? Is this guaranteed symmetrical even when it's an unknown unicode point?)

Another problem is that for -b I need to escape spaces, which are a valid printable character the escape isn't currently called for. And if the escape is sticking a backslash before characters, I need to escape backslashes. So I need to add an argument to crunch_str() to tell it what printable characters to pass through to the escape function.

(My brain's still screwed up by the cold and lack of caffeine, I'm just chipping away at the darn coal face anyway. Somewhat ineffectively, but eh.)

March 12, 2016

The reason "make test_mv" is failing is that mv depends on cp, so the single build of "mv" has cp in there also, and since it's first in the table mv always thinks it's cp, and thus acts like cp. (The multiplexer code gets yanked but the command table isn't length 1, so only the first entry in the table is noticed.)

The problem is cp is implemented in two chunks, the posix cp functions from the 1970's and CP_MORE which adds the giant stack of non-posix options (-adlnrsvF) you need to implement a modern cp.

In general I've been leaning more and more towards removing per-command options. There should be one obvious way each toybox command behaves, having to check a config file to see which variant you built is silly. When I started toybox I had a lot of busybox influence that took a while to clear out: toysh had a dozen menuconfig options before I even implemented environment variables. It doesn't anymore, now you select which commands you want in menuconfig, but other than a few global options there aren't any that change the behavior of a command, just whether or not to include it.. I started seriously cleaning the sub-options out when writing the outline for my 2015 ELC talk. (The old story: you start to write documentation and change the code rather than documenting what it currently does.)

My recent experience with the posix committe (laws, sausages, and standards: do not watch them being made) has tipped me over the edge in removing the CP_MORE and MV_MORE config options. Toybox cp and mv implement a lot more options than Posix mentions, and I'm ok with that. You can't select to make them _not_ do it in the name of compliance with a dead standards committee.

March 11, 2016

I composed the following reply to this message but didn't send it to the list. I suppose my blog is the right place for it.

On 03/11/2016 06:04 AM, Joerg Schilling wrote:

> ...


Not wanting to interact with the guy who kept a memorial to decade-old Linux bugs in cdrecord's README as a reason Solaris was better and everybody should use it instead of Linux is _exactly_ why I didn't post any of this here when I maintained busybox, or since toybox was accepted as the Android standard command line implementation going forward.

Him, specifically, and he's STILL doing it:

> Note that your observation about Linux is not complete, there
> are other strange things in the Linux history. After 2001
> (4 years after a related POSIX proposal was withdrawn), Linux
> started to implement that POSIX proposal for ACLs and extended
> arrtibutes. Other platforms at that time did already decide to
> implement the NTFS ACLS that now have been standardized
> together with NFSv4.

15 years ago this happened! See how evil they are!

I knew better, and I posted anyway. I'll stick to Posix-2008 as my frame of reference to diverge from and stop bothering you guys.


I suppose another objection would be that their mailing list software doesn't make it obvious how to get a link to the thread a message is in, (in this case here.)

The reason I haven't been engaging with the posix mailing list is one of its most prolific contributors is Joerg "Linux Sux Solaris Forever" Schilling, who I don't want to get on me. I'm getting ready to throw out the Posix baby with the Schilling bathwater and just treat them the same way I do LSB and the man pages. They document one way of doing it, but not necessarily the only one. Posix is of historical interest, but has sadly fallen into disrepair in modern times.

The posix committee is dead, time to move on.

March 10, 2016

Phone call with Jeff to try to work out our ELC talks. Lots of good material about why open hardware in general, and why j-core specfically. Now I just need to write it all up.

I also need to contact the various tokyo women's universities to see which have comp-sci department, and try to arrange visits so we can try to hire graduates and/or try to interest them in computer hardware courses based around J-core to hire _next_ year's graduates.

March 8, 2016

I've had a cold all week. Not combining well with the lack of caffeine. I am getting NOTHING done. Oh well, recovery time...

I'm trying to fix rm -r, which should be able to handle infinite-ish recursion depth (as described in the recent giant todo list). I've gotten a basic cleanup pass done on the dirtree stuff, but just can't focus well enough to design the new stuff with any confidence in the result.

March 6, 2016

Still working out what file.c should do, but at least I got another update checked in.

March 5, 2016

There's an article (and here's an interview with the author) that says single women now outnumber married women in the US, and the trend line's pretty clear going forward. In 1960, 60% of women ages 18-29 were married, today it's 20%. The median age of first marriage for women wandered from 20 to 22 between 1890 to 1980, then went up to 23 in 1990, and now it's 27.

Fade and I got married in 2007 so she could get on my health insurance, after happily living together unmarried for years. Since Obamacare happened and we've been getting individual plans, so we're still married but our particular catalyst for going through with it's gone away.

In related news, Fade's girlfriend broke up with her today (amicably, and they're still friends; long distance relationships are hard). I hope she finds another one in Minnesota.

Backing up: Fade hasn't heard back from the University of Texas yet, but she applied a few other grad schools and got accepted by the universities of Michigan and Minnesota, and Minnesota's offering her a scholarship with a living stipend. Fuzzy is strongly against moving (she loves the house and her garden), and in the past 20 years I've moved out of Austin three times and moved back again each time, so would prefer to keep the house even if we spend a few years up north again. I highly doubt we'd find this nice a house as convenient to 24 hour grocery stores and a large university and so on that we could afford. (We could only afford it because 4% fixed interest rates on the mortgage, if we sell we lose that.) But Fade's run out of graduate school prep and needs to actually start her Doctorate if she's going to get one.

We're not sure how this is going to work out, but I might wind up spending 6 months up north with Fade (near my sister and the nicephews), and 6 months back here in Austin, with Fuzzy taking care of the house (and the dog and cats) when I'm not there. Minus however much time I spend in Japan, of course...)

March 4, 2016

During my most recent trip to Arkansas, Nick pointed out that some of my vision weirdness sounds like optic nerve swelling. (The official name for which is a latin phrase I'm not remembering at the moment which literally means "optic never swelling", but sounds much more ominous.)

This can cause flashing at the sides of your vision when you rapidly move your eyes side to side, as the irritated nerves signal through the only channel they've got. One of the things that can cause this: bad/chronic sinus issues causing adjacent swelling of things near the sinuses.

This explains much! It would benice if one of the various doctors I've been to over the years could have mentioned this, but given that the American Medical Association decided back in the 1970's to raise doctors' salaries by restricting the supply, I.E. engineering a shortage of doctors by enforcing medical school graduation quotas (for example the new Dell Medical School at UT is starting with a class of 50 students for the whole school), these days you get like 10 minutes with a doctor, and you're one of 40 patients she sees that day, so only the most immediately obvious stuff gets noticed before they're off to the next patient. (And when we started importing foreign doctors, the AMA cartel moved to limit their visas. Of course obamacare does nothing to address this because challenging the power of for-profit entities that have cornered the market is not Obama's thing, instead it's about channeling money through for-profit insurers to pay the higher fees for any share at all of the smaller pie.)

Anyway, I know from past experience that caffeine has been making the flashing worse, but couldn't understand why. Given that it seems to be sinus swelling screwing up nerves that makes sense, so I went off caffeine and started taking store-brand zyrtec daily until cedar pollen season's up.

I really, really miss caffeine. My producivity has been noticeably reduced by its absence. But still being able to see next decade would be really nice.

March 3, 2016

I wrote a big long reply to the mailing list that probably should have been a blog post, so copying it here for posterity. (I.E. in case Dreamhost's darn web archive eats itself again.)

On 03/01/2016 09:18 PM, enh wrote:
> No worries. Is it easier to keep track of things if I use github pull
> requests?

Not really. That's not the problem.

This week I fixed the bzcat integer overflow John Regehr reported, and the base64 wrap failure, dealt with Mike's heads on on glibc breaking the makedev includes, fixed three different build environment breaks (prlimit probe and MS_RELATIME define for uClibc, finit_module probe for ubuntu 12.04, ) and redid the test suite so it's consistently printing the command name at the start of each test and then factored that out so the test infrastructure was doing it.

Right now I'm trying to figure out how to redo lib/dirtree.c so that rm works with infinite depth. (I have a test in the test suite, but haven't checked it in yet.) Posix requires infinite recursion depth out of rm, but if dirtree closes the parent's filehandle and tries to reacquire it later (via openat("..") and drilling down again from the top if that doesn't match up, and keeping symlink parent filehandles open because reacquiring those is nuts at the best of times), then you have to do breadth first search because dirclose() loses your place and order isn't guaranteed, so I need a mode that eats filehandles and a mode that eats memory.

I haven't written the dirtree code but I've written the start o the test case for it:

+# Create insanely long dir beyond PATH_MAX, them rm -rf it
+// Create directory string 1024 entries deep (and half of PATH_MAX),
+// then use it to create an 8192 entry directory chain. Note that this
+// "mv a chain under another chain" technique means you can't even enforce
+// a $PATH length limit with a mkdir check, the limit can be violated
+// afterwards, so rm -r _must_ be able to clean up.)
+// This one is _excessively_ long to also violate filehandle limits,
+// so naieve dirtree openat() implementation keeping filehandle to each
+// parent directory would _also_ exhaust limits (ulimit -Hn = 4096).
+// (But hopefully not so long we run out of inodes creating it.)
+for i in 1 2 3 4 5; do X=$X$X$X$X; done
+for i in 1 2 3 4 5 6 7 8
+ mkdir -p "$TOPDIR/$i/$X" &&
+ mv "$TOPDIR/$i ." &&
+ cd "$i/$X" &&
+ continue
+ i=
+if [ ! -z "$i" ]
+ break 2>/dev/null
+ exit 1

I'm trying to get NFS working with toybox mount. (You'll notice the recent "mount doesn't pass through -o leftover string data" fix.) I documented a simple NFS test environment years ago, ala (in qemu you mount rather than but busybox was passing through the old binary config blob format instead of the string config format, and there's no reason the string version SHOULDN'T work but I haven't made it do so yet so I'm sticking printk()s in the kernel I'm running under qemu to see why.

The need for file to detect cpio files got me looking at cpio again, remember my pending "add xattr support to cpio/initramfs" half-finished patches, and I also noticed that it wasn't checking the cpio magic (fixed that, it accidentally got checked in with the makedev header stuff, although the major()/minor() rename in lsof didn't because of pending lsof cleanup I didn't want to check in yet).

My pending cleanup of lsof needs to address the fact it takes 18 seconds to produce its first line of output when run with no arguments on my netbook (the ubuntu version takes 0.7). That's a largeish thing. Plus this looks like it chould share the ps /proc parsing infrastructure, which needs to be genericized into /lib. (I did some but not all of that refactoring work in the last interminable round of ps meddling; the common code can't access TT.* or FLAGS_* because that's all command-specific. Doing that would ALSO let me break out ps/pkill/top into separate command files.)

I was using the bash 2.05b limitations (toybox not rebuilding under it because jobs -p wasn't implemented yet) as an excuse to reopen the toysh can of worms, which I'm tempted to continue because digging into it has reminded me that the research I did circa 2006 is full of things I no longer remember. (Plus the busybox ash/hush split is sad, and aboriginal linux needing to use two shells simultaneously goes against the entire point of the project. So I need a proper bash replacement that works on nommu, and that means I write one.) But for big things like this I need to devote large blocks of time, because chipping away at them produces no progress. (I spend all my time figuring out where I left off and why.)

I have a build break in my local sed.c to remind me to add the "sed,+NUMBER" range extension feature and "s///NUMBER" extensions.

I have another build break in netcat to remind me to:

A) finish factoring out the xconnect() and xpoll() stuff into lib/net.c, taking into account the ipv6 name lookup stuff ala:

egrep "(->|[.])ai_" toys/*/*.c

Not to mention also converting:

grep gethostby toys/*/*.c

B) Add the UDP support for:

C) Figure out whether I should merge this with tcpsvd.c and/or telnet.c, or if factoring out the common code into lib/ is enough. (Also, does the tail -f code work into that merge or stay separate? There's a whole infrastructure design rat's nest I need to do a lot of pacing and staring off into space about there. It's on the todo list.)

I have xdaemonize_nofork() in lib/portability.c to remind me to do a second pass of nommu support over everything. (That's ANOTHER thing to fix in netcat.c.)

scripts/ has a one line patch not checked in to remind me of THIS issue I need to deal with:

+ # todo: install? --remove-destination? (Will install stomp symlinks?)
  [ "$1" == "--force" ] && DO_FORCE="-f"

Remember the help.c rewrite to more intelligently shuffle and recombine help text? That's pending too.

In tests/expr.test I have:

+# expr +1
+# expr 2+1
+# expr 2 + 1
+# expr 2 +1
+# expr X * 2
+# expr X + 2

Which is a year-old reminder of this and that's just a side issue, the real reason expr needs a rewrite is priority grouping ala this.

In tests/printf.test i have:

+# The posix spec explicitly specifies inconsistent behavior,
+# so treating the \0066 in %b like the \0066 not in %b is wrong
+# because posix.
+testing "printf posix inconsistency" "$PRINTF '\\0066-%b' '\\0066'" \
+ "\x066-6" "" ""

Which isn't checked in yet so "git diff" shows it and thus I remember it. (I hope to get to the point where just HAVING a failing test in the test suite reminds me of an issue to fix, but I tried to do a triage pass on the test suite last month to split out "contributed test needs to go in 'pending' even though I don't have a category for that because it's not remotely testing the right things", from "tests are valid but incomplete" from "there is no test for this command at all" from "these tests need to run as root in a controlled environment and the last time I sat down to make an aboriginal linux test environment run under qemu I tried to test 'ps' and couldn't figure out how to get output that wasn't hugely subject to kernel version skew then got distracted" from "I went through every #(%(&# line of the posix spec for this command and I've covered every corner case and this one's DONE"...

(Sheesh, the test suite could eat multiple months all on its own...)

I have another note in tests/test.test that "test" is a shell builtin so this test suite never actually tested our version, and that I need to add a way to to detect this. (VERBOSE=false to symlink the binary to /bin/false maybe? No, VERBOSE=hello to substitue the "hello world" command out of toys/examples which is _guaranteed_ to produce the wrong output, if not the wrong exit code... :)

I was partway through doing vi and less (completing the lib/interestingtimes.c and lib/linestack.c stuff), and I really need to get back to that before the "I forgot the details of how shells should work, need to reread everything again" problem sets in on that. (I was halfway through a dd.c rewrite once, tabled because the comma_iterate() infrastructure wasn't there yet and I knew I'd need common code for mount -o and ps -p and so on. That infrastructure's still not exactly DONE, but I've forgotten the details about dd and have to relearn them again at this point anyway.)

The recent wget submission from Lipi Lee (the second patch to which didn't apply, last hunk failed and I'm not sure why, but it brought up that my patch.c not only doesn't handle the git rename and permissions extensions which I need to add, but it doesn't even handle that "\ no newline at end of file" message if it comes before the last line of the diff. I note the failure was "git am" refusing to accept it, neither did ubuntu's patch, it was broken for some reason anyway, I just started debugging it in mine because I can get better debug output from my code with CONFIG_DEBUG and -x. Maybe I should make -x a default option, "why this patch didn't apply", but the output's way too verbose...)

Oh, speaking of patch, it's the main user of the "data is char * instead of void *" feature of double_list, and I keep meaning to go look if double_list would more naturally be something else, and if so how to modify patch.c to deal with it. Lots of OTHER things cast away the char * when void * wouldn't need a cast...

Anyway, the problem with wget itself (modulo cleanup and adding support for cookies and redirects and logins and a progress indicator and making sure it handles all the headers from here and...)

The REAL problem is I need to make it understand https:// and shell out to that openssh command line utility Isaac Dunham pointed me at last year that's better than stunnel. AND I need to make it handle ftp:// and combine that with the eventual ftpget/ftpput/ftpd commands (which is why i wasn't opening this can of worms yet, I eventually want an httpd that can do enough CGI support to run ph7)

Also, Rich Felker recently fixed strace for nommu, and I have that bookmarked along with this. (Given that I worked to port strace to hexagon in 2010 I'm reasonably familiar with the underlying mechanisms and could probably implement a really basic one for toybox in about a week, if I had a spare week.)

It's on the todo list. So is collating the unescape logic between printf.c, echo.c, sed.c, and sh.c and maybe adding a new wrapper layer to handle hex and/or octal escapes. (How that works in with the printf.test above, i don't know yet.)

Another nommu issue is that "did exec work or not" is difficult to determine, I need to fix xopen() to pass the "did it exec" failure back through pipe[3] to make vfork() be able to detect inability to exec as described here.

I still haven't gotten all the singleconfig commands working, for example "make ftpput" fails because the command is an OLDTOY() alias of ftpget and the singleconfig infrastructure isn't smart enough to work out what to do to build it. The config is wrong, the help text logic is wrong (unless I already fixed the help text logic part, I'd have to check. I was working on it and got distracted by emergency du jour.)

I need to figure out if readfile() should trim the \n and if so audit/change all the callers (and I thought I'd allowed a chomp() variant into lib but couldn't find it when I went to look.)

I'd like to figure out why date is failing with weird error messages:

$ sudo ./toybox date -D %s 1453236324
date: bad date '1453236324'; Tue February 53 23:63:00 CST 2024 != Wed Mar 26 01:03:00 CDT 2025

Ubuntu's wc -mc shows both, it would be nice if ours could.

This is not a complete list. This is me glancing around at my todo heaps and remembering a thing or going "why have I got that browser tab open", "what's in THIS directory's todo.txt... ah 'collate todo.txt files out of these other three directories', of course..." I have TODO: comments at the top of a bunch of toybox files I haven't looked at in ages. That's not even counting tackling the rest of the pending directory or lib/pending.h, this is just the top couple layers of my todo list. The stuff I've been working on _recently_.

And all that's just toybox. I need to do an Aboriginal Linux release soon, so I'm trying to figure out how to fix my old version of binutils, which the 4.4. kernel broke on arm and on mips. (And on superh but Rich is doing a workaround for that. The other two I've root caused but not fixed yet. The core issue is nobody but me regression tests on the last GPLv2 release of binutils anymore, so I have to patch either their projects or binutils, and the patches to binutils are wandering into "add entire new features" territory.)

I'm trying to finish the Linux From Scratch build control image upgrade from LFS 6.8 to LFS 7.8 (a multi-year gap and more or less a complete redo, which is where the perl sed fix came from).

Luckily the recent fix for the chromeos guys also fixed the "toybox doesn't build under bash 2.05b in aboriginal) problem. The other uClibc regressions I checked in fixes for recently.

I'm trying to convert the j-core mercurial repository (which has a half-dozen subrepos) to a single unified git repository. I figured out that the previous blocker was that git "rename" patches weren't getting their paths rewritten consistently and git was dying with an error message so misleading it didn't even blame the right _line_. (Found it out by removing chunks of patch at the end until I found the specific line that made the error manifest.)

My ELC talk was accepted (well, one of them), but it's really a talk that my boss Jeff Dionne knows 10x more about, but he has even less time than I do, so we somehow have to coordinate on putting together a slide deck and co-presenting by April 4.

I've been invited back to talk at Flourish, I have a half-dozen potential talks I could give there and need to select/prepare a subset for them. That's April 1 and 2, I plan to fly up to Chicago on the 31st, be there April 1 and 2, fly to San Diego on the 3rd, be at ELC, probably be there for a while longer due to $DAYJOB, and then fly back to Austin. (There's presumably a trip to Japan at some point but that's sometime after this batch of travel.)

I'm trying to put together an open hardware microconference at plumber's which involves asking people if they _would_ want to go (and possibly speak), assuming it makes and without explicit promise of travel budget because we dunno who will get what yet.

I need to finish migrating to, which has turned into a complete rewrite of the site. I need to install the VHDL development tools from a different tarball onto a different machine (updating the install instructions) and then build the current version from the converted git repository to post current binaries into a downloads directory that does not yet exist.

Oh, and I need to write a VHDL tutorial, which would involve me learning VHDL first.

I had a lot more todo items written down on a piece of paper, but I lost it.

> Also, did you say a few months back that you'd started on 'ioctl'? If
> so, do you want to check that in? If not, correct my misapprehension
> and I'll do it at some point. (For that and prctl I'm also slowed
> down by the fact that there's much to dislike about the existing
> commands...)

Oh right, that. I should go do that.


P.S. The email notification for came in since I hit "reply" to this message.

P.P.S. Sorry I haven't set up a proper Android test environment. Dismantling AOSP is something I really look forward to, and do NOT have the desk space for at present...

P.P.P.S. Same for Tizen and chromeos, and I need to resubmit the toybox addition patch to buildroot...

P.P.P.P.S. Oh just hit send already.

March 3, 2016

My ELC talk got approved (J-core design walkthrough), but I'm co-presenting that one with Jeff Dionne and need help putting together the presentation. (I'm presenting about stuf I don't actually KNOW there yet, although Geoff Salmon can help me come up to speed if Jeff is too busy. And Jeff can do it off the top of his head.)

Flourish contacted me and asked me to speak again, and work seems willing to send me there. It's right before ELC, but if I fly up to chicago on the 31st and fly from there to San Diego on the 3rd it works out.

I have a bunch of topics I could do at Flourish:

  • The turtles all the way talk on j-core/open hardware. (Work's sponsoring the trip, practice for ELC, there's new material, it's of general interest anyway. I could do the J-core tutorial too, maybe work would spring for a dozen Numato boards again and I could set people up with them?)

  • The prototype and the fan club (again), the talk I gave there 5 years ago, which never quite got properly recorded. I keep wanting to refer people to it, and the material's still relevant, so...

  • The rise and fall of copyleft. Important talk on the widespread return to public domain licensing, I gave a halfway decent version at OLS in 2013 but only the audio was recorded and I didn't retain the list of web pages I showed (instead of slides I gave primary references), I've wanted to do a proper one ever since. Maybe "Copyleft vs the Public Domain" is a better title?

  • The three waves: Hobbyists, Employees, and Bureaucrats (oh my). I did a series on this for The Motley Fool and there's a BUNCH of new material (among other things it explains the existence of Google Alphabet.)

  • Android as a self-hosting build environment. The PC is now Big Iron, but Apple's read-only iPad future isn't appealing. Android on the workstation is necessary to replace the PC with smartphones, the sooner we do this the less scar tissue we'll have to overcome (preferably before the FBI/NSA forces phone vendors to lock us out of our own devices).

  • Embedded Linux From Scratch. How do you make the tiniest system capable of rebuilding itself under itself from source code, then bootstrapping itself up to arbitrary complexity?

Or I could do a toybox walkthrough, or come up with something entirely new...

March 1, 2016

Not to be outdone, a kernel upgrade also broke the sh2 target, although in this case it's not 4.4 but more like 4.6. Rich's git tree has the new cmpxchg instruction, which my binutils 2.17 doesn't understand the mnenomic for.

Easy enough fix, but that's 3 architectures that dowanna build with the toolchain that 4.3 built with. Disturbing trend.

February 25, 2016

Oops. Trying to make NFS work with the toybox mount command, which is a giant pain, and I remember why I switched NFS support off in Aboriginal Linux's baseconfig: all I added was CONFIG_NFS_FS=y, CONFIG_NFS_V2=y, and CONFIG_NFS_V3=y, and the infrastructure it pulls in DOUBLES the i686 kernel compile time!

Anyway, I found part of the problem: The mount command wasn't passing through the string options. It was parsing out the mount flags, but the rest of the string was carefully assembled and then a NULL got passed to the kernel instead.

That'll screw things up, yes. :)

Anyway, I wrote down the test environment setup many moons ago, just build the old Userspace NFSv3 server and run it ala:

echo "$PWD (no_root_squash,insecure)" > blah.cfg
./unfsd -d -s -p -e $PWD/blah.cfg -l -m 9999 -n 9999

Then in theory, inside the kernel I do:

mkdir blah
sudo mount -t nfs -o ro,port=9999,mountport=9999,nolock,v3,udp \ blah
ls -l blah

(Where "/home/landley/test" is whatever the $PWD above was.)

Of course, it doesn't work even with the first mount problem fixed, so I'm adding printks to the kernel to find out why...

February 20, 2016

Paul McKinney pinged me about the Open Hardware Microconference I'm trying to put together at LPC in november, and asked who else I'd invited.

Nobody so far, because I didn't know if it would make, but it's easy to find ideas.

We _need_ a RISC-V person to have any sort of reasonable coverage, and it would be nice to ask the people who tried to clone arm why they stopped at armv2 (answer is almost certainly intellectual property law, but even the Thumb instruction set was announced in 1995...)

Anyway, I'm off to shake the tree and see who's interested. (I know the Qualcomm Hexagon process architect and Linux kernel maintainer socially, at least a little, but dunno if their employer approves of open hardware or how interested they'd be...)

February 19, 2016

Got a tweet out of the blue from an old friend (ex-friend?) whose sudden religious conversion a couple years ago I never understood. According to her twitter stream she apparently just had surgery to remove her Thetans or ritual circumcision or something, and has now officially been born again as the person she always really was, and is sending out "you didn't believe in me, look at me now" notices? (I think?)

My attempts to understand this new religion when it first happened deeply hurt my ex-friend's feelings (I was supposed to just smile and nod), and since she was already systematically cutting off contact with everybody from her old life at the time and declaring us all evil anyway, I just shut up and let that happen. I was under the impression afterwards she was pretending I never existed, but now random contact? Confusing.

The tweet didn't show up in my stream, but when I saw Fade's reply (it was aimed at both of us. I wonder if I used the "mute" feature a year or more ago, or if twitter's being weird?) I read back a bit in her twitter stream, and her new religion seems to be working out for her, so yay? I'm glad she's happy?

The odd part is I saw her father Dale yesterday, because Nick knows him and visited him at work twice during the apartment hunt. In fact Dale recommended the real estate office (down the hall from his day job) where Nick found the new apartment.

My ex-friend's father is one of the people she cut off contact with, and when he asked me how she was doing and I honestly said I had no idea. A mutual friend of Nick's and Dale's died recently, someone they said my ex-friend knew growing up (and the reason Nick's move to Austin got aborted, so I had to return the cat). So Dale emailed his kid for the first time in forever, and apparently the response was, in its entirety, "Do not ever contact me again." That's about the point I stopped reading my ex-friend's twitter stream, when she started talking about "emotional blackmail" from relatives trying to "drag her back in" I decided I was caught up enough.

Since Dale complied and this was before my most recent trip to Arkansas, presumably this tweet is just weird timing? I admit hearing the story of the "six word reply" as Nick put it contributed to my decision not to renew contact. I haven't got the social skills to navigate minefields. It was made clear to me way back when that I was not helping matters, so I stopped. I only tried to understand in the first place because this was a friend of many years, otherwise it's none of my business. If her new religion is making her happy, have fun with it. I don't understand the guy who made TempleOS either, but I respect the amount of work that went into it. They can go off and do their thing, and leave me out of it.

February 18, 2016

Still in Arkansas, but Nick has an apartment! (I'd link to Nick's coverage of the week but that's over on Faceboot and I've never had an account there. I keep meaning to introduce Garrett and Nick, but it always boils down to them both being on Facepalm and me not being. I have twitter and a phone, I didn't have an AOL account the first time around either.)

The Android guys just sent me a mount fix that I think is the same issue the Cyanogenmod guys put in their tree a while back. Since going through the cyanogenmod tree is still on my todo list, and they never actually submitted their fixes to me (or informed me of their tree's existence, I found it on a google search for something else)... yeah.

I already fixed one of their issues (install.c no longer #includes toys.h so it doesn't use LSB headers that aren't available when cross-compiling from MacOSX) in a way they'll probably never notice. They applied a patch to toys.h to #ifdef out the LSB headers, I fixed it a different way, no idea if they'll ever notice or if they'll carry a presumably now-unnecessary patch forever. *shrug*.

The downside of the fixes from the android guys is they don't regression test on uClibc or older distros like ubuntu 12.04. Accumulating build breaks I need to clean out...

Today's example, Elliott's insmod patch hooking up "insmod -" is very nice, except that finit_module isn't available on kernels before 3.8. Unbuntu 12.04 uses 3.2 plus 98 security patches. So that's a build break on my netbook.

February 17, 2016

Still in Arkansas.

Finally got the darn perl build fixed. Here's a quote of the commit message:

The perl build's attempt to escape spaces and such in LD_LIBRARY_PATH is _SA It uses a sed expression that assumes you can escape - to use it as a litera (you can't, it has to be first or last char of the range), and assumes you have to escape delimiters in sed [] context (you don't), and/or that non-printf escapes become the literal character (they don't, the backslash is preserved as a literal), meaning it winds up doing "s/[\-\]//" which is a length 1 range, which is officially undefined behavior according to posix, and regcomp errors out.

But if we don't accept it (like other implementations do) the perl build breaks. So collapse [A-A] into just [A].

Testcase taken from perl 5.22.0 file Makefile.SH line 8.


On the bright side, this unblocks the Linux From Scratch build, I think? It would be nice to ship with that again next release. If I ever get the darn toolchain building 4.4...

February 16, 2016

Still in Arkansas

Nope, I plead the fifth, although the ability to look at text files as text files remains kinda important. (As long as they still accept text files and their markup doesn't stop them from working as textfiles, life is good. But if I ever have a patch bounced because my change to Documentation/blah.txt screws up magic markup, I will not be resubmitting it.)

Elliott contributed a nice file.c (not posix compliant but posix can go hang on this one, its' spec for file is uselessish; they removed cpio from the command list but not from file?). I'm doing a cleanup pass on it, and there is SO much bikeshedding on the list. (I declined bikeshedding on the kernel list, it followed me home.)

The main argument about file.c isn't about posix, but about whether or not it should match what the de-facto standard implementation does. The problem is, the de-facto standard is both inconsistent and crazy. (It's not a gnu program, so it could be way way worse, but still.)

Posix has a file standard which suffers from Posix failure mode #1: it doesn't include enough detail to be useful. (Posix failure mode #2 is not noticing the 1970's are over yet. They still maintain the standard itself in SCCS. No really. To this day. I don't know why.) So copying with posix here is reasonably easy and completely irrelevant, although Elliott's first version largely didn't bother.

I was thinking of producing mime type output, which at least has a standard, but most of the argument so far has been about describing ELF binaries, and if you do that it just says "application/x-executable" for EVERYTHING, which brings us back to the posix failure mode.

February 15, 2016

Upgrading Aboriginal Linux to the 4.4 kernel and the mips build is breaking, because they enabled VDSO, which requires linker features added in binutils 2.23. So they have a test, scripts/ which dies with busybox awk saying the regex has an unbalanced ")" (apparently that awk is extended regular expressions and it expects basic, or something?) Turning that into [?] the build then fails with a linker error.

Digging deeper, here's why:

$ ld --version scripts/
GNU ld (GNU Binutils for Ubuntu) 2.22
$ ld --version | scripts/
$ mips-ld --version
GNU ld (GNU Binutils)
$ mips-ld --version | scripts/

Obviously, the ten digit number is larger than the eight digit number, so it assumes binutils 2.17 from July 2007 must be new enough. (That's using ubuntu 12.04's host awk there, can't blame busybox for that.)

February 14, 2016

Quiet day in a hotel with sick kitten and netbook.

The toybox test suite is in horrible shape, and my poking at various loose ends snagged some ratholes. Lots of commands don't have any tests, but the recent makefile update to list available commands makes that easier. (I tweaked it some more today, now there's "make list_working" to show working commands, "make list_pending" to show pending commands, and "make list" to show both sorted together.Most (but not all) "testing" macros start with the command name in the description field. I thought I'd remove the repetition and have the infrastructure do that, but it turns out not ALL of them do, so I made a giant evil sed invocation (for i in tests/*.test; do X="$(echo $i | sed 's@tests/\(.*\)[.]test@\1@')"; Y="$(grep "testing[ \t]" $i | sed "/testing [\"']$X/d" | grep -v 'testing "name"' | grep -v "testing ['\"]['\"]")"; [ ! -z "$Y" ] && echo $X && echo "$Y"; done) and have been slowly regularizing them so that I can use a variant of that to _remove_ them all at once.

This means I'm looking in various tests/command.test files, and finding some things we aren't testing well, such as the fact that according to posix, "rm -r dir" should handle infinite directory depth. Currently toybox rm is limited by open filehandles (1024 or 4096 depending on ulimit), which is longer than PATH_MAX but not infinite. I have a design worked out to make that work properly (close parentfd for non-symlink traversals, open ".." to reclaim it and if stat doesn't match drill down from last saved fd (root or last symlink) and if the saved stat information diverges from the current stat info, error out because somebody did a "mv" while we're traversing the tree and that's cheating).

Alas, I haven't _iplemented_ this yet. But I now have a test creating an 8192 entry directory chain and trying to rm -r it. (Fun thing: mkdir -p on the host fails with an argument greater than PATH_MAX, and thus "scripts/ rm" fails to create the dir because single command tests use host commands for the rest of the script, but if you make a 2048 byte string of 1024 directory entries and loop doing a mkdir, cd, mkdir, you can drill down past the limit. In fact if you mkdir the paths in parallel and do "cd longpath; mv ~/nextbit .; cd longpath;" to assemble the chain, the OS would have to traverse the contents of each directory to _enforce_ path length limits. And even then you can mount a filesystem that's _already_ got one too long, which is why rm has to just cope with it.

February 13, 2016

Spent the day driving to Nick's parents' house to return Moose, the sick kitten we've been taking care of since December. (Moose has a "liver shunt" which means he's on a special diet and requires medication before each feeding, which doesnot combine well with a house full of free-fed cats and Adverb.)

Nick's plans to move to Austin fell through, so I'm back here to deliver cat and help with apartment hunting in Little Rock. But hey, driver's license again...

February 12, 2016

Geoff salmon hugely improved the Numato j-core bitstream, enabling d-cache and i-cache (there was room in an lx9!), switching over to the new ddr controller, and upping the clock speed from 32mhz to 50mhz. The result is _noticeably_ snappier, and I really really really need to get the VHDL repository conversion finished and posted.

But today, I'm fighting with the perl build in LFS 7.8. The package sequencing LFS gives is nuts, if you try to build chapter 5 _or_ chapter 6 starting from Aboriginal Linux's 7 package build, you hit fun things like coreutils needing automake which needs perl which needs who knows what.

The problem with LFS is chapter 5 has too many packages (because glibc is a giant pig and needs perl to build). Building chapter 5 assumes the moon, and building chapter 6 assumes all of chapter 5 (including perl).

So I'm back to coming up with my own sequence, where we reverse the umbilical from the lunar module to get the extra 4 amps necessary to build perl.

The perl build is weird. We explicitly tell it _not_ to use its own built-in forked copies of zlib and libbzip2, for reasons I'm not entirely clear on. (Making them prerequisites to perl because they're... trying to optimize... perl? Really? It's _perl_. Not even the one that knows swordfighting, one of the useless homeworld protocol droid ones C3P0 evoved into since the "long time ago" bit.)

Where was I?

Right: musl had a bug (#ifndef __cplusplus; then clearly we're C11) which was easy to work around. Then toybox sed had a bug that was hard to track down, but I eventually got it to "s/[A-A]//" is an error if a length 1 range [A-A] ever makes it to regcomp. (Why? Because posix says that's undefined behavior. Why is it undefined when it's clearly a synonym for [A] I have no idea.

The path to figuring this out was long and elaborate because the regex was trying to escape spaces and such in $PWD, and they were doing so via backquotes so THING=`sed stuff` was setting the variable to an empty string, which became "LD_LIBRARY_PATH= ./perl" and with glibc that found in the current directory (because posix says empty path segments are synonymous with . thus PATH="/blah::/thing" means check the current directory between /blah and /thing, and musl didn't implement that because it's a bad idea security-wise.)

But digging further, LD_LIBRARY_PATH _should_ have been /home/perl, the sed expression was failing when used with toybox sed, and the failure was silently ignored due to backquotes not propogating errors back to the caller. (You _just_ get the output to stdout, output to stderr goes to stderr where you don't see it amongst the rest of the perl config messages.)

The problem is the regex was written by somebody who thinks "s/[_\-\/]//" makes _sense_. Instead it has 2 mistakes: 1) you can't escape - in [], if you want to match a literal - put it at the beginning or end of the list, 2) square bracket context trumps s/// context, the delimiter ending the range doesn't count in square brackets so you don't need to escape it. So regex was seeing \-\ which is the length 1 range I referred to earlier (and yes that means it _won't_ match - but the range is actually inverted so extra things get escaped and the shell doesn't care), and regcomp threw an error, meaning sed exited with an error message. Apparently the gnu/dammit sed is collapsing that range to "s/[_[\]/]//" before calling regcomp, so I should too.

Gotta drilldown through the incidental breakage to find the actual problem needing to be fixed. That's perl for you.

February 10, 2016

Elliott pointed out that the updated grepdoesn't work on Ubuntu 14.04 because sometime after Ubuntu 12.04, glibc acquired a bug where printf("%.*s", INT_MAX, string) only prints one character. Since I tested on Ubuntu 12.04 (which doesn't have the bug) and musl (which doesn't have the bug), and he tested on bionic (which doesn't have the bug), we didn't immediately notice.

This is a category of bug where I suppose I should stick in a workaround, but remember to yank it OUT again once I no longer care about glibc being broken (by whatever path that occurs).

Meanwhile, toybox now warns if it's building code out of pending. I should

February 9, 2016

Implemented ulimit, which was so much fun. The bash version SUCKS.

No really: bash's ulimit documented -b but didn't implement it, had a -x based on an RLIMIT_LOCKS feature the Linux kernel removed in 2003 (so it hasn't worked in 13 years), used 1024 byte units for -f when posix explicitly said 512, and then it used 512 byte units for -p which was displaying a hardwired value that chagned in 2010 (linux commit 35f3d14dbbc5) so it's been wrong for over 5 years. Linux grew a very nice RLIMIT_RTTIME feature back in 2008 (linux commit 8808117ca571) that ulimit never bothered to hook up (I made it -R, it limits realtime tasks to X microseconds before they have to block or get a kill signal)...

And of course Linux grew a "prlimit" syscall ages ago (2.6.36 in 2010) but bash's ulimit doesn't use it. So I added a -P to specify the pid to ulimit on, and made it default to getppid().)

February 6, 2016

Kitten-related drama last night, almost drove to Arkansas because of it. Probably doing so this coming weekend instead.

I finally bothered to look up the makefile syntax to include a file but not mind if it's not there (it's "-include filename"), and made a generated/ file with targets for each command name ala "make ls" (each of which just calls "scripts/ ls"), plus a "make list" to show them all, and clean:: target adding them to "make clean". (It builds them in the top directory, but that's where toybox goes, so... The "make change" target puts them in a subdir already.)

I had to filter out the "help" and "install" commands, which already had other meanings, but my real problem is that the generated directory is blown away by "make clean", so if you do a "make defconfig; make clean; make ls" it doesn't know make ls.

Right now the .config file is at the top level directory because kconfig is an existing de-facto standard I'm borrowing (linux, busybox, uclibc, buildroot...) and there are like five variants of it that can be crapped into the top directory by that stuff (why does it make a .config.old? Your guess is as good as mine) but they're all hidden files. I could make this be another hidden file at the top level directory (ala .git) or I could teach make clean to leave one file in generated. Not sure which is uglier.

February 5, 2016

ELC call for presentations submission deadline is today, so here's the ideas I've got off the top of my head:

First I'm ruling out resubmissions of existing talks: the rise and fall of copyleft one, turtles all the way, toybox status update... I've done that. People haven't heard them, and there may not be good recordings of some, but I could do podcasts if I really wanted to. (The blog that nobody reads could grow a youtube channel nobody watches.)

(Heck, I could redo the prototype and the fan club, or repackage the three waves stuff as "Hobbyists, Employees, and Bureaucrats (oh my!)" updated to explain what Google's "Alphabet" project is actually for, why the kernel summit's "call for hobbyists" flopped, understanding why the Linux Foundation ending individual memberships was probably inevitable, maybe describe it as a synthesis of the mythical man-month and the innovator's dilemma, drawing heavily from the computer history book "accidental empires"... But convincing people that the talk is worth listening to without GIVING the talk is a chicken and egg problem, and I'm not up for it.)

So here's a couple talks I could reasonably summarize at the last minute:

Android on the Workstation - From Aboriginal Linux to AOSP, the hairball and selfhost sections, and the fact that independent iOS devs beat Android to self-hosting, and that recent android on the desktop article wasn't a Google initiative, just somebody trying it.

Portability in 2016 (I.E. "Ze autoconf, it does nothing!"). Hmmm, I should note that bell labs' original unix was a reaction against multics, and linux was a reaction against minix, and have a section on glib being terrible...

Beyond that I did three variants of jcore stuff, mostly in hopes of getting Jeff Dionne to attend. An Open Hardware BOF, a jcore processor design walkthrough (which I'd need Jeff or Geoff to do), and a redone Turtles talk with Rich going into reviving the toolchain and the kernel and nommu userspace and so on. (Still starting with how great it is that patents expire, but this time talking about what we've DONE, not what we plan to do. Using the build to emit specsheets, GHDL simulator, FPGA bitstreams, and ASIC masks all from the same BSD licensed VHDL source on github. SMP, DSP, EIEIO.)

I'd also love to have arranged a libc round table with Elliott Hughes (Bionic maintainer) and Rich Felker (musl-libc and linux superh maintainer), plus maybe the buildroot and openembedded maintainers. Alas, they moved the darn conference from San Jose (where Elliott lives) to San Diego (a 7 hour drive away), so no.

February 4, 2016

My local copy of netcat hasn't compiled in a while because it's halfway converted to use xconnect() out of lib/net.c. The reason I stopped halfway is this is a hard problem, because the ipv4 to ipv6 transition was handled TERRIBLY.

Why didn't they make the ipv6 stuff handle IPv4? If an ipv6 address with all but the bottom 32 bits zeroed was an ipv4 address, userspace could switch exclusively to the ipv6 api and not have to care. But no, 20 years into this "transition" every userspace program that wants to support ipv6 without _dropping_ support for ipv4 (I.E. the entire english speaking internet) still needs duplicate codepaths.

The network stack always used variants of an "addr" structure, which has a bunch of flags specifying what type it is, and then different structures dependingon what those flags (in the first few fields, common between different structures) say.

Did I mention that the Berkeley guys, who gave us this network stack, also invented "vi"? Yeah. Wanna know why BSD hasn't taken over the world? There's a hint. (Not that AT&T's "streams" was any better. Neither group that inherited Bell Labs' work had a worthy successor until Linux, and I explicitly include the minix and gnu projects in that. The original Bell Labs Unix combined pragmatism with taste in a way that's hard to do, let alone sustain. They were reacting _against_ Multics in the same way Torvalds was reacting against Minix and Gnu.)

In terms of a server, we've got socket(), bind(), connect(), and listen(). For name resolution getaddrinfo() replaces gethostbyname() and getnameinfo()

February 1, 2016

Birthday lunch at Dead Lobster with Fade and Fuzzy, and then when I got home Fuzzy had a three layer parfait style birthday cake ready, with a layer of hard chocolate on top.

Putting together a Linux From Scratch 7.8 build control image to test the pending toybox release. There is sooo much subtle breakage in this thing. The groups file it says to create has a systemd user in it (for the non-systemd lfs). The glibc build says the test suite is critical and not to be skipped under any circumstances, but it's also expected to fail in multiple ways. The pkg-config source bundles its own copy of the glib source, but when autoconf detects that glib isn't installed on the host it DOESN'T USE IT, instead telling you you can add --oh-get-on-with-it to the configure command line to tell it to do the obvious thing it refuses to do on its own for no apparent reason.

I asked about that last one on twitter and was told A) openbsd wrote its own in perl (not an improvement), B) there's a fork that removes glib from the thing, but LFS doesn't use it.


There's an attr package which won't work with gettext-stub because it needs some command line thing (misericorde or some such) installed by gettext. So I built gettext, which took longer to build than glibc, and then it broke because of the __BEGIN_DECLS thing, so I shoved the appropriate #defines into musl's features.h (since my build script was already adding a #define __MUSL__ to that, yes I know the _right_ fix is to add a sys/cdefs.h) and rebuilding toolchain...

I suspect I can cut a toybox release at this point though. Nothing's broken on current toybox yet, and if it does I can throw a patch in the next aboriginal release when I check the fix in to toybox git.

January 31, 2016

Got email from Flourish (conference I spoke at in Chicago in 2010), which is scaling back up and wants me to speak there again. I told 'em sure if they can pay to fly me there. (Eh, maybe work would be willing to sponsor a trip if they can't?)

Also got email this morning from somebody asking me to finish the Aboriginal Linux "history" page. So many todo items...

I did hear back from the O'reilley conference: all three of my talk proposals were turned down, although one they were interested in promoting through "other channels" (whatever that means). (A while back I got an email asking if I could do the jcore tutorial in a smaller timeslot, and I said yes, but they turned it down anyway. *shrug* Not my normal stomping grounds, only applied because they're in Austin this year. Not going if I have to pay my way in.)

Still no word on a possible next Japan trip.

January 30, 2016

Now that I'm temporarily freed from the tyrrany of the todo list (toybox is in release freeze, just testing and bugfixing checked in at the moment), of course I have a giant surge of things I want to do to it.

It occurs to me that my struggles with the hash table code are because I'm trying to make it too generic. What I specficially need is an inode hash, so tar and gene2fs and mkisofs and rsync and such can detect hardlinks. (In theory I can make hardlinks in a fatfs too. If it's read only. Technically that's filesystem corruption in that context, though. :)

This means I don't have to handle the delete case, and am doing fixed size "long->pointer" associations. (I'm tempted to say "if you try to deal with a filesystem with more than 4 billion inodes on a 32 bit system, get a 64 bit system. As long as the failure mode is just not detecting hardlinks, I'm ok with this.)

So what I can do is have a 4k block of struct {long inode; void *data}; (on 64 bits that's 16 bytes, so ~256 entries per 4k block), do insertion sort into that, and then when one fills up split it into two child blocks of 128k each... except how to balance the sucker so it doesn't degrade to a linked list? Gets us back into tree territory, but what I want to do here is avoid having 100,000 individual tiny allocations in the tree, each with its own pair of child pointers. Batching data this small makes sense, but the existing tree descriptions aren't talking about how to balance trees of _ranges_. (Probably haven't googled the right keyword yet, this isn't a new problem. Shouldn't be that hard to work out from first principles either, but the question with each algorithm is "what are the pathological cases".)

My normal approach of doing something simple and waiting for the real world to overwhelm it doesn't apply here because tarring up a terabyte filesystem is a common-ish case these days. Except... the filesystem tells me the link count, so I only have to remember the ones with a hardlink count >1. So insertion sort in a realloc()ed array wouldn't actual be a huge deal. Hmmm...

January 28, 2016

Tidying up pgrep/pkill for the release, implementing -o and -n there, and I noticed pgrep -f is only checking the command line that fit in toybuf (ballpark 2-3k after the slot table and other strings loaded), and /proc/$PID/cmdline can be essentially unlimited length now.

Hmmm, when we do an ANSI size probe, scankey doesn't return an indication that it received it. (It saves the new size, but doesn't return a notification that we need to adjust.) Ah, we should send sigwinch to ourselves, and install a handler for it.

Redoing the Linux From Scratch build. It's been long enough that this is basically "start over from scratch", but I already did it before (and did BLFS once for qualcomm, although I didn't get to keep that code becuase you couldn't open source anything you write on the clock there).

That probably _is_ holding up a toybox release because my checklist includes running the new toybox through the full LFS build, and that hasn't worked since the switch to musl, so...

January 27, 2016

Got an email from the guy who owns the domain asking for half his previous price and saying he's contacted all the other people who showed interest earlier and doing an act now supplies running out thing. Um... no? (It would have been nice to have, but he said no and I went on to other things.)

Finally got top to a point I'm happy to include it in a release. The headers are outputting the process and CPU and memory info. It's still using more cpu time than the other top (2% vs 1%, about 50 miliseconds per redraw on this 5 year old netbook that was ultra-low-end when I got it). I suspect a lot of that is glibc's utf8 fontmetrics code being APPALLINGLY slow, possibly I should hardwire the utf8->wide char stuff into a lib function. (Meh, shouldn't suck in musl or bionic, I think? Haven't benched it there yet.)

Darn it, benched musl. It's 3% there.

Right, it's PROBABLY the sort stuff that does string compares of otherwise numeric fields in the fallback sort. I could annotate those better to indicate which ones actually need it. (It could also be that we load all the data and then do generic processing of it, so it could be thrashing more data through the CPU cache.)

Eh, not holding up the release for this though.

January 26, 2016

Ah, the kitten _did_ break something when he pushed my netbook off the counter onto tile. The case for the nice new battery is cracked and a piece broke off on one end, revealing a flimsy looking electrical contact and the end of what looks like a fairly standard A battery. (Well they said it was a 9 cell...)

This is what duct tape is for. (Although Fade suggested electrical tape instead, and that seems to work quite well.)

Actually trying to implement the header stuff was painful enough I ripped the parsing out and just hardwired the fields in question. I had to read a new file to get the memory info anyway, and the "number of different values" vs "sum of values" vs "count of entries" vs "highest" vs "lowest" vs "number with this specific value"... it was implementing custom code for the majority of outputs anyway.

I did make the iotop header output the summable fields in the order they appear is -o, so is you feed it a different -k that should adapt.

January 24, 2016

The last bit of "top" I'd like to get in before a release is the header stuff that shows total memory and running processes and all that. In theory, %PID style escapes can use all the existing ps -o FIELDs, but that's per-process data which needs to be collated, and there's more than one way to collate it, plus sometimes it needs to be filtered, or even outright replaced.

Let's look at each header line in ubuntu's top:

top - 13:52:01 up 2 days, 23:15, 1 user, load average: 0.23, 0.42, 0.76

Uptime isn't an existing per-process field, it would be a new escape to show a global field.

The user count could be %USER counting number of unique entries. The best way I currently have to calculate unique entries is to sort by this field and then iterate counting transitions, so this is re-sorting the proc data list every escape, although I'm told qsort() got tweaked so feeding it an already-sorted array is no longer its worst-case pathological n^2 behavior case. It feels expensive, but the common case is what, 300 processes? Hmmm...

At some point I need to add something hash-like to at least track seen entries for hardlinks, and doing an insersion sort into an array is N^2. I'm tempted to do a self-balancing tree, but trees have data and two child node pointers so you're tripling the size of the data storage even without each one being a separate allocation...

(Even though I wrote the red/black tree documentation in the kernel Documentation directory, I'd still have to look up or work out how to actually implement one. The Linux Foundation guys asked me to write a red-black tree explainer for linux/Documentation when I applied the documentation fellowship back in 2007, I complained that there was already a perfectly good article on it, and a Wikipedia[citation needed] entry, but they said Documentation should have one so I read those two and wrote one, citing both of them. But both handwaved how to do the actual node balancing decision, so I referred to them. The big problem with that entire position is they didn't have a clue what they wanted me to actually _do_, stating a problem is not the same as coming up with a plan to solve it.)

Really my problem is I don't want to go off on ANOTHER tangent implementing infrastructure right now. And for the hardlinks problem I can do insertion sort into an array for now, which is efficient up through about 4k block size, then maybe do something that splits nodes and makes a tree out of them later to scale past that point.

As for the rest of the line, I honestly don't care about load average (it's trivial to fetch out of /proc/loadavg but I've personally never made any sense of those numbers, wait for somebody to complain). I note that _all_ busybox shows of this line is loadavg.

Tasks: 288 total, 2 running, 285 sleeping, 0 stopped, 1 zombie

%PID could also use number of unique entries for the field, so same code as %USER, but in the PID case we know they're all unique so that's just an expensive way of counting total number of entries. So... special case would save CPU time at the expense of codesize, but it's a small one, PID is the first field and all we have to do is skip the sort, so if (which) qsort().

Except... if we do "-u root" filtering, do we should number of MATCHING processes or number of TOTAL processes? And while we're at it, we don't have to count %PID at all because TT.kcount has it, although that's still matching not total.

Showing "running processes" is %S=R, or really %S=R= so the parser can tell where the match ends. (That's a string compare, but eh.) Sleeping is %S=S=, stopped is %S=T=, Zombie is %S=Z=... easy enough.

Cpu(s): 38.3%us, 7.4%sy, 1.5%ni, 50.9%id, 1.7%wa, 0.0%hi, 0.2%si, 0.0%st

Three problems here. The first is "the htop problem", ala how many CPUs worth of data are we showing? If we aggregate processes we also aggregate CPUs, although "400% user" on a quad processor system is 100% usage of all 4 processors so do we divide by processors or not? Maybe Cpu(4): 400.0%us? Except a 32-way machine expands a length 5 field into a length 6 field if we do that...

Second: I'm not currently tracking/displaying these individually, just cumulatively. So I have to add more typos[] fields and possibly more slot[] entries.

,p>Third: the easy way to program this is to iterate over the list for each field. The efficient way to program this with regards to cache fetch behavior is to create an array of display fields and populate them all in one pass. I can't do that for SORTED fields (where I care about unique entries), but when adding up totals it makes sense. Then again, "when in doubt use brute force" says to do the simple to program version and wait for somebody to complain. (Either way counting is way faster than sorting, and doing an optimization only for only the already-faster cases seems sillyish?)

Mem: 7899148k total, 7770796k used, 128352k free, 281016k buffers
Swap: 3909628k total, 286084k used, 3623544k free, 2338096k cached

The last two lines are basically fetched out of /proc/meminfo. As with loadavg, this really doesn't have anything to do with existing ps info, it's not per-process it's global (per-container).

These are padded to match up, so I genericized the printf format scanning stuff out of seq -f and did "%9.10KMEM" that can turn into %9.10d, which handles the padding but not the fields. I have human_readable(), but this is forced to kilobyte output (ala iotop -k but per-field), which raises the fun case of terabyte memory systems needing more columns so if this was hardwired output I'd just measure the "total" and use that length for the rest of the fields on the line, but that's not the approach I've been taking here...

(Other fun thing: how to describe this mess in the help text! Pretty much provide the default header format as an example at the end?)

January 23, 2016

I've mentioned that Linux on the Desktop ain't happening, and included the material in several talks. The most recent update on this front is that something like a decade into attempting to upgrade our infrastructure, an attempt to redo x11 to use 3D graphics is as likely to work with a random application as linux is to install on a randomly laptop purchased without researching it first.

(Did I mention that Stu gave back the new netbook I got last year (the one I stuck the SSD in) because he couldn't get linux to work with it either? It installed; touchpad won't work and it doesn't suspend right, don't remember what else was wrong with it.)

Speaking of conference talks, I haven't heard back from the O'reilley folks which I assume means all four of my talk proposals were turned down. (Meh, never spoke there before. Only submitted because they're in Austin this year. Given I've never been to SxSW and didn't visit the recent Worldcon an hour away in San Antonio...)

The ELC deadline was pushed back to February 5th, I should submit something to them soonish. Not sure what. And there's a call for tracks at plumber's that I waved at Jeff about doing an Open Hardware thing and he thought that was a good idea. Not sure who should run it though.

January 22, 2016

I looked at adding const to toy_list[] to force it into the read-only data segment (nice on nommu systems with no copy on write), but this gave a screen or so of "expected pointer and got exact same pointer with const on it!" warnings, so I reverted it. I'm willing to put const in two places to change a data type, I refuse to clean up after it with dozens of (void *) typecasts to whap the compiler on the nose until it stops begging at the table. (Hint: string constants are in the read only data segment. If you try to modify one it segfaults. That's the behavior I want. Static checkers generate false positives, annotating every single false positive with "no it's ok" is busy work.)

And yes, typecasting to (void *) is the correct way to signal "the compiler's pointer type-checking is wrong here, and I am explicitly disabling it". Typecasting to some _other_ type implies it actually means something.

In theory there's a less-portable (attribute)(_thingy_) that could force it into the read-only segment without the type pollution. Seems silly, though.

Digging through a zillion bug reports on the toybox and aboriginal mailing lists. Elliott says running mount as a normal user (without the suid bit set) causes a segfault in android's build, because error_msg() tries to print command name and gets a null pointer dereference. Except when I try "./toybox mount" it exits immediately because toy_exec() is returning to force a re-exec to reacquire suid after an earlier command dropped it. But that's not what's happening here,the "earlier command" is STAYROOT but never got root to begin with, but since it's not the first command this check comes before the toy_init() check due to reentering through toy_exec(). Hmmm...

I want debug output for a misconfiguration, at least when CFG_TOYBOX_DEBUG is enabled. I think to get that I have to record when the process had root permissions but dropped them; that seems to be the only way we know when to get them back.

Well, I _could_ stat /proc/self/exe and see if it has the suid bit. Not that /proc is always mounted in a chroot. I was tempted to create a function that just returns a mode so you don't have to declare a stat struct each time, but given that there's stat, lstat, fstat, fstatat(), and xstat()... not worth creating that many functions. Plus you test if (!stat) but 0 is a valid mode so the test would have to be if (-1 != getmode("file")) and... eh, not quite worth it?

Hmmm, draw_str() needs to know the current x position to pad tabs out right. Well, it doesn't when it's escaping them. :)

But the more and vi logic needs to know the current position to handle tabs.

January 21, 2016

And lo, I hath waged battle with the DMV and emerged victorious! (Ok, spent 10 minutes with a seriously cute driving instructor proving that I can still parallel park. No euphemisms were harmed in the making of that sentence.) My new drivers's license photo is immensely silly, at least the black and white version they gave me on a piece of paper. (New card mailing in 2 weeks.)

Did a big cleanup pass on ps/top/iotop/pgrep/pkill removing the last batch of pervasive magic constants. (I introduced an enum. I am not fond of enums, but sometimes you just gotta enum. Reinventing it with a stripped down version of TAGGED_ARRAY was the coward's way out, it was alas the proper tool for the job.)

This is the largest chunk of the work necessary to factor the shared ps logic out into lib/ but not quite all of it. For example, TT.bits is a bitfield of seen types set in parse_ko, and get_ps() uses that to figure out which files it needs to open. (Always /proc/$PID/stat, the rest are negotiable.) Then again, if that TAGGED_ARRAY moves to lib as well... I need to figure out how the translation list should work (ps_main() has a bunch of "-o NICE" is actually "-o NI", possibly those should just become full-fledged typos[] entries.

And I still have to figure out what to do about the headers. I want to let top and iotop use the same -o and -k as ps to define what output fields to show, but making the headers similarly configurable is nonobvious. I could use some kind of %FIELD escape except the header fields currently being shown are totals for all processes... Hmmm, I suppose the header display logic could traverse the array and add up the appropriate data. Well, for the numeric fields anyway. For the string fields it gets less obvious; something like "user" becomes number of different entries. I may need a second layer of annotation on the slot indexes.

Right now it does | 64 to indicate string fields to ksort(). Are there any such types where the correct response ISN'T count how many different types we see? Hmmm... doesn't look like it. There's a few (like "F" and "S") where sticking them in the header doesn't make any sense.

As for adding up the numeric ones... for memory usage add it up, for STIME you want oldest, CPU and PGID and UID and so on are number of different entries (like string), somebody probably wants average...

Maybe instead of %CPU I could have $CPU and #CPU and @CPU for different display types? Hmmm...

January 20, 2016

Updated our health insurance today. Went back to the place with the "Obamacare" sign out front and downshifted from a "gold plan" to a "pyrite" because it had gone up over $100/month for slightly worse coverage, as of January 1st.

I am really rooting for Bernie to move the overton window towards single payer. Yes, he's talking about a ridiculous idea, and his job is to keep bringing it up until the joke wears off. Then it's no longer a ridiculous idea. Alas, this is basically what Trump and the mad hatter's tea party crowd are doing in the other direction with a new crusade against muslims. We should have somebody pushing back the other way. Hillary triangulates to whatever the center currently is, and with the tea partiers moving the far right fringe out into indian country, the "split the difference" center moves further right every year and she happily trundles along with them. That's not helpful.

We need to deploy the GOOD kind of crazy, the so crazy it just might work kind, to counterbalance. I'm aware Krugman is running the numbers and tsk-tsk-ing at Bernie, which makes me think of Han Solo's "Never tell me the odds." JFK's moon shot wasn't proposing something rational, FDR's new deal wasn't a careful measured response to anything. Obama mothballed and outsourced Nasa, I'm not excited about following him up with a lady who wants to put Snowden in Gauantanamo, and whose campaign premise is "inevitability" just like it was last time when she lost to Obama. (All the journalists using "Titanic" to describe her campaign know exactly what they're doing.)

Yes I want a woman to be president, but Elizabeth Warren isn't running. Sarah Palin is also female, not interested in voting for her either.

I got top checked in! Working on polishing stuff now. The way ubuntu's top selects which fields to display is the "f" and "o" keys, which pull up modal selection dialogs with a list of fields and a letter key assigned to each. While I could do that, I'd prefer to have the same -o and -k options thatps takes available to top. (Why do it _differently_ in different commands?)

This would mean you'd have to exit and re-run top in order to change the display, but I think I'm ok with that. The question is, what would everyone else think?

I should ask on the list....

January 19, 2016

Started on the "top" command. In theory fishing things out of pending and plugging them into defconfig is straightforward, even for a complete rewrite of the command (as in this case, using the "ps" infrastructure and sharing most of its code with iotop). In practice, Android is using several commands out of pending, including pgrep, pkill, and top, which are three that share infrastructure with ps./p>

What I really need to do at some point is break the shared infrastructure out of ps and move it to lib/procps.c, but that's cleanup after I implement all the things using the shared infrastructure.

I need to update the www/code.html page too. And possibly break it up into www/code/args.html, www/code/linestack.html, www/code/main.html, and so on.

Ah-hah! The reason for the netbook battery being lose is the little locking latch hadn't quite engaged. Push it in REALLY HARD and it makes an alarming click that means either "you just broke something" or "it's latched". (I noticed this when it fell out, which did not seem a normal thing to happen to laptop batteries.)

January 18, 2016

Visited my new doctor to have her look at my wrist, which is currently exhibiting BOTH failure modes. (The "mosquito bite doing runaway swelling/scarring until I hit it with steroid cream", and "that tiny discolored bit under my wristwatch which I put steroid cream on until it grew to cover my entire wrist and the medispring lady shook her head at me for putting steroid cream on a fungal infection, since that suppresses the immune system". Yeah, pilot error there.

This time I got a cream with both a fungicide and a steroid in it. (Ok, oversimplification: I had some of this stuff before but right after the tube ran out I got two mosquito bites right next to each other on the troubled wrist, during one of Austin's gaps that make our 6 weeks of annual winter non-consecutive, and each developed a different problem.)

Remind me to implement an rss filter for the "<span id=health>" and "<span id=programming>" tags, so you can select which content you care about. I guess this paragraph goes under programming.

Speaking of which, I got ps and iotop doing proper UTF8 fontmetrics with all three types of escapes. In the process I found a bash bug where putting three DIFFERENT kinds of invalid UTF8 sequences into bash's command line history, one right after the other, made cursor up/down measure wrong and advance two places to the right each time. (You could call it an xfce terminal bug, but bash can detect/escape these and toysh needs to, so: bash bug.)

Did you know the ansi escape "reverse" (esc [ 7 m) is _modal_? I thought that since its job is to swap foreground and background color, you'd just do it _again_ to switch it back, but no... (Why not? Because that would be too obvious.) Instead you need a 27m to switch it back off (or 0m which is what I was doing everywhere else, but since I dunno what foreground/background colors you're currently using I want to preserve that state, so...)

The people I want to smack for being stupid here presumably implemented this crap in the 1970's, and are probably dead now. (Intel extended their first processor to 8 bits for a "glass tty" contract, which lets us know _when_ these suckers were new and shiny and competitive: when the 8008 came out. let's see, I learned that from this interview with the design engineer who created the 4004. (I also have interviews with the 4004's layout engineer who felt he didn't get enough credit and left Intel to form Zilog and create the Z80 (Zilog was the 8-bit era's AMD, making compatible clones of Intel's chips, Z80 ran 8080 programs). And I have an interview with the customer Intel created the 4004 for. And the above design engineer's manager was none other than the Moore's Law Guy.)

Where was I? Oh right, the Ted Hoff interview says the Computer Terminals Corporation contract that led to the 8008 (the 8080 was a process refresh of the 8080 they stuck some improvements in) was negotiated in December 1969. So yeah, 1970's.

(That was "<span id=comphist>".)

Anyway, toybox: I should probably make ls use this too. (If it's doing --color output, it can escape invalid UTF8 sequences.) And I can finally do a proper "top" implementation, and this was a big blocker for vi, and for shell command editing/history.

What else needs this? Easy way to check: what's querying terminal_size()? Don't care about anything in pending yet, got hexedit, ls, and ps... vmstat just outputs numbers... sed? Why is sed querying... because the "l" command wraps output to terminal width using backslashes. Yeah, that needs... huh. Should that let invalid utf8 sequences through, or escape them?

I _think_ it should escape them, on the theory it's already abusing the output (escaping \a and \v and so on). So yeah, need to update sed for utf8 fontmetrics. Wheee!

Doing it _right_ is often nonobvious. I feel sorry for the guy who did "boxes", but _this_ was always going to be the hard part. Changing all the places in the code that need each change, and dealing with the ramifications of all the edge cases.

I reeeeeally need to do a test suite fill-out pass. But when I start that I probably won't get any new functionality implemented for months, so that's a near-1.0 thing. Pending first...

Oh, and I got pgrep and pkill implemented (as long as I was poking at ps, pgrep and pkill use the same infrastructure. Well done _right_ they do, all that -u filtering and such). And I made it so that pkill DOES NOT MATCH ITSELF, and made pgrep do that as well. I'm not sure if this is the right behavior, since the pgrep man page doesn't mention this and posix doesn't mention pgrep, but when testing "pkill -fl stop renderer" and having pkill suspend itself before it had hit half the chromium tabs... Yeah, that's a thing it should not do.

January 17, 2016

Oh bravo Google. My netbook battery jostled in the bag, and I when I restarted chromium and hit "restore" there was no net connection. I just connected to HEB wifi, and it tried to reload all the tabs it couldn't get a net connection for before (several hundred of them). Every one redirected to the HEB login page, which does not store the original destination.

I have been annoyed by chromium's reload behavior from day one, I normally "pkill -f renderer" right after a reload to make it stop (main reason for not using firefox: I can't kill individual cpu-hog or memory-hog tabs findable with "top"), although a blanket kill like that discards the cached ones too (which is inconvenient but beats the alternative). I forgot this time, and chromium was epically stupid because Google Knows What's Best For You And Won't Let The Behavior Be Disabled (tm). (Just like gmail's filtering!).

The most STUPID part of chromium's bad design is that the "back" button doesn't remember the location past the redirect. Why not? Because Google knows what you REALLY want, and it's "to lose your place".

Of course when "reload everything" does work, it swap thrashes the machine to death (way too much active memory at once) and uses up the 10 monthly new york times logins reloading pages I already loaded (mostly from twitter links, which doesn't count against my total) and so on. Really, it has LOTS of bad effects. But losing what the tabs POINTED to (todo list items mainly) is the most annoying part.

I suppose it's another argument for https everywhere, those turn into a "your connection is behing hijacked!" page that retains the original URL (which is all I want). Defanging stupid chrome browser breakage.

January 16, 2016

I really like the new netbook battery except that it sometimes jostles loose while the netbook is banged around in the bag during suspend (not a problem I've seen when it's out and in use, seems to be because it's physically large and awkward and catches in the bag a lot), and since Acer doesn't seem to put a capacitor in the thing even a short unplug causes it to power down, forcing a reboot and closing all my tabs. (Then again, multiple hours of use vs 45 minutes... I'm getting used to it.)

So, toybox utf8 support. The "standard" unprintable character escaping is for low ascii characters (0-31) to be "^X", if mbrtowc() fails (invalid utf8 sequence) show "" in hexadecimal, and if it converts to utf8 but isn't iswprint() (generally noticed because wcwidth() is negative), print "U+XXXX".

This isn't the only escape regime; hexedit uses its own and less generally does different stuff. That said, I'm leaning towards coercing as many as possible to that because it covers all the cases and is reasonably generic.

One hiccup of distinguishing the three types is my callback to actually print characters takes a wchar_t, for an invalid utf8 sequence I haven't _got_ a wchar_t. That said, I can take the literal value (first byte thereof, anyway; invalid sequence doesn't say how LONG the invalid sequence is. Luckily, wchar_t is an int (on both gcc and clang, "cc -E -dM - < /dev/null | grep WCHAR_TYPE" says so), so I can feed in negative values.

January 14, 2016

New netbook battery arrived last night, upgrading from a 3 cell to a 9 cell. My netbook is now wearing high heels, although the screen not opening more than 90 degrees is far more of a distraction. (I could probably file some plastic off the back edge of the screen to fix that, but let's see how big a deal it is in practice.)

I bisected binutils git last night (shudder) to find where the arm linker behavior changed, and the commit added a test to a chunk of code that my version doesn't even have, so not necessarily hugely useful. (Technically that was Red Hat code, not FSF code, but I feel dirty nonetheless.)

Circling back to find.c, I left myself a cryptic note I don't immediately understand. This sort of thing happens a lot. (Mostly I was sticking english into the code to force a build break where I left off, so I didn't forget to finish fixing a problem I don't havea test for in the test suite yet.)


The actual bug I was tracking down is the "find -execdir +" behavior getting the sequencing wrong so that each new directory it descends into _starts_ with the name of the directory. It should add the directory name to the parent's list _then_ descend to the new directory with a new list.

Given that, the comment makes more sense: move the directory "push" down after the append. Except there's conditional logic in there, the -execdir may not always be enabled, but we still descend/traverse? The push/pop behavior has to match to avoid memory leaks.

January 13, 2016

Since Linux 4.3 was such an easy upgrade, I thought I'd try a quick 4.4 build to see how it went. Arm and Mips both broke.

The ARM problem is funky, I bisected it to a commit that's basically doing this:

armv5l-ar rcs virt/lib/built-in.o
armv5l-ld -EL -r -o virt/built-in.o virt/lib/

Then when virt/built-in.o is linked into vmlinux, it's got zero FLAGS so it's EABI version 0, and the vmlinux is EABI version 4. The problem is when the linker reads an empty *.a archive to produce an empty *.o file, it has no ELF objects to copy values from, so initializes the ELF header fields to zero. And then the next link catches a type conflict and breaks.

This isn't just doing it with my 2.17 toolchain, I have Code Sourcery's arm2009q1 toolchain lying around (2.19.51) and it's zeroing the flags in the .o file... ah, but the final link isn't failing. So that answers the "how did they not notice obvious breakage" question; it affects 2.17 but 2.19 has a workaround.

FYI you can build Aboriginal Linux's kernel version out of tree ala:

cd ~/linux
git clean -fdx && git checkout -f
for i in ~/aboriginal/sources/patches/linux-*.patch; do patch -p1 -i $i || break; done
make ARCH=arm allnoconfig KCONFIG_ALLCONFIG=<(cd ~/aboriginal; more/ armv5l getconfig linux)
make ARCH=arm CROSS_COMPILE=~/aboriginal/build/cross-compiler-armv5l/bin/armv5l- -j 8

Substituting different CROSS_COMPILE= paths for different compilers to test, and dropping the || break if you don't care about patch version skew. (In the arm case it's needed to plug more processor types into arm "versatile".)

January 12, 2016

Jeff asked about whether $DAYJOB should migrate to git or not (something we've been talking about behind the scenes for a while), and rather than write a long argument about it I swap-thrashed over to the jcore repository git conversion stuff. (The best argument is having already done the work and being able to go "here, try this".)

We're mercurial internally, which is great except that mercurial is dying. The git userbase is big enough that users and projects are moving from mercurial to git, and it's snowballing at this point.

The real argument for _us_ is that if we want jcore to work as an open source project with contributors outside of $DAYJOB, a git repo is way better than a mercurial repo. But we need to dogfood that external repo, having our internal hardware extensions be a subdirectory (subrepo?) under the rest of the build, and the rest just using the external repo and putting all our non-proprietary commits straight into that.

A mercurial/git straddle there is possible, but painful. So we might as well move to not require new hires to know two systems.

The reason the mercurial conversion is hard is the code was originally developed as a half-dozen separate repos, and I'm trying to merge them into one big history. This is entirely on the mercurial side, coming up with one clean mercurial repo I can then convert to git. (A sign of mercurial's decay is that there are no mercurial tools to do this, so I'm writing shell and python scripts.)

January 11, 2016

Did a quick Aboriginal Linux release with the 4.3 kernel, since it applied more or less cleanly. No other changes, just trying to catch up now that 4.4 is out.

Pondered cutting a toybox release to go along with it, but almost all the work that's been done is infrastructure, not commands. There are a lot of commands that are now low hanging fruit, but I haven't actually done them yet. (Started on pkill while I was there, since the ps work makes it low hanging now. Yes, it's starting over rather than using the pending one, but I need -u to be comma seprated lists and I want to enable multiple sort criteria at once, and...)

I still need to make new build control images that work with musl.

January 10, 2016

Rich Felker threw his hat into the ring for sh maintainership, to general acclaim.

What's been more acromonious as the accompanying patch to remove the non-superh noise that Renesas has been dumping on the list ever since they moved to arm.

There are people arguing in favor of squatters' rights, saying that in the past year or two (of a list more than a decade old) noise has taken over, therefore noise has a right to be there. Personally, I am not impressed by these arguments:

> I know it's not a perfect comparison... but assume the
> original x86 Linux mailing list was called "linux-i386".

It was called "comp.os.minix".

Right now I'm holding off until the first patch gets in, then we can reopen this bikeshed.

Meanwhile I dug up my previous phone (Samsung Galaxy S) but it won't upgrade itself to Marshmallow. (It had one upgrade when connected to wireless, a butfix dot release, then nothing.) I can try shoehorning cyanogenmod on it, but the instructions online are... awkward and bricktastic.

I'm reminded of this because I accidentally brought up the bootloader menu on my Nexus 5 trying to charge the thing. The darn mini-A cables they use warp so easily, as do the sockets, that it needs very careful positioning to actually make electrical contact. Even when I take the case off, it keeps spitting the cable back out unless I hold it in at the right angle (about 5 degrees off from straight). I tried jamming it in extra-hard (not like I'll bend it worse at this point) and brought up the bootloader menu. (Similar to finding the virtualbox window properly fullscreened when I came back to my mac to find Peejee curled up on the keyboard. No idea how she did it, and it didn't last past switching out and backin, and she brought up around 50 "really delete all your email Y/N" windows in the process.)

Said bootloader is, of course, "locked", and "secure boot" enabled. This is not YOUR phone, this phone belongs to Google, the NSA, and whatever organized crime group bought the NSA leaks from people less patriotic than Snowden. (You sell secrets to people who KEEP them secret, you don't wind up in exile. I expect organized crime takes care of its own way better than the american people do. Seriously, if it gets collected and retained,it will leak. Only question is when.)

And, next trip to Japan looks like it'll probably be in late February. I have a ginormous todo list before then, so what am I doing? Toybox.

January 9, 2016

And lo, the new chest freezer has arrived!

I'm behind on my monthly patreon reports. In part because at the end of November I hadn't _done_ anything (lots of grinding but not much in the way of results yet), and now I'm trying to do all the things at once again...

My editing and formatting of last year's blog entries continues to be way behind. I spent an entire day editing stuff and advance it by a month, so there's another week's worth of stuff to do. Alas the way I wrote blog entries... well, I talked about that last week. Sequencing makes blockers, even when I have a lot of later stuff already written up. Just another thing I need to sit down and grind through.

I am, however, cheating by taking the year break as an opportunity to upload notes-2016.html before finishing the second half of 2015. I haven't moved the notes.html symlink or switched the rss generator (the rss feed is also in strict chronological order). I can post about it in a patreon update, and everybody else... I should catch up eventually. :)

In the absence of an rss feed I might be a little more lax about posting half-finished entries and finishing them later. It's similar to rebasing git commits that haven't been pushed yet: once somebody else might have pulled it, you're stuck with it. If somebody's rss feed already read the entry, even minor changes might cause duplicates...

January 8, 2016

Netbook rebooted again last night, due to me holding down the button after fifteen minutes of it updating the cursor position ONCE, due to swap thrashing. (Too many open chrome tabs.) Lost all my open windows again.

Implementing the cursor left/right sort stuff for iotop. I'm doing iotop before top both because there's less expectations about its behavior (largely due to not being nearly as widely used as top, due to the default implementation being a python program that wants to run as root), and because there isn't an existing one in the tree so I'm not being rude by throwing out an existing implementation and starting over. (Of course once I've got iotop working, top shares probably 90% of the code and possibly even can share a main() function...)

I really should have named this project dorodango, but apparently somebody's camping that domain name. (I emailed the guy who's got, which displays a PHP error message, but he wants "over $2000" for the name. I can't say having my projects being on my personal domain site indirectly boosting my resume's google search rank is entirely a bad thing for me personally, although if I really cared about that I'd do the minimal pandering required to avoid ranking penalties. Plus, I really like my current job so put a big "not looking" notice at the top. So really, six of one...)

January 7, 2016

Got an Aboriginal Linux release out, 1.4.4 using the 4.2 kernel. Thanks to the -rc8 I beat the 4.4 kernel, meaning I'm technically only one release behind at the moment!

And I FINALLY got most of the architectures switched over to musl, the stragglers are armv4l, m68k, mips64, sh2eb, and sparc. Musl hasn't got support for m68k and sparc yet (or the half-finished alpha and s390x ports I was working on), armv4l is still oabi (because eabi requires thumb instructions, although there's apparently a linker workaround), mips64 is broken even under uClibc so I should fix that before migrating, and I wanted a clean sh2eb reference version for one release before switching that over (and opening the can of worms that is static PIE on gcc 4.2).

I need to do build control images, but that isn't necessarily part of the release, so...

January 5, 2016

The 11 months of mailing list archives Dreamhost deleted are still gone. Not surprising, since the weeks of archives they never properly archived LAST christmas are still gone too.

Elliott Hughes sent a patch to convert various perror_msg(x) calls to perror_msg("%s", x), and each one was already a constant but their static checker can't tell, and they have mandatory -Werror on the false positive. This only comes up in the CONFIG_TOYBOX_DEBUG case because I wasn't enabling the __attribute__(this uses printf format arguments) thingy except when DEBUG was enabled precisely _because_ it produces false positives if you have 'char *s = x ? "blah" : "blah"; printf(s);'.

We went back and forth a bit until I decided the right fix is to have "error_msg_raw(char *msg);", except I need three of them. (error_msg, perror_msg, and error_exit. I haven't wrapped help_exit yet because there isn't a user.)

Meanwhile, I found out that cyanogenmod switched from busybox to toybox in November, and don't seem to have mentioned it to me? (Possibly it fell through the cracks.) they have a github repo with lots of local changes.

This raises an interesting point with regard to public domain licensing: busybox can pull patches out of any repo it sees, without asking, because the result has to be GPL. (Or course there's random non-GPL crap glued into GPL projects all the time, even Linux removed some ancient AT&T malloc code from the itanic arch when SCO trolled them, but the _assumption_ is it's GPL if they were competent.)

Toybox can't pull changes out of random repos, because those changes could be licensed any which way. (There may be a de facto assumption the license hasn't changed, but nothing requires it.) So we have to ask, or wait for submissions, or re-implement.

I expect I'll re-implement.

January 4, 2016

FINALLY figured out what was wrong with Aboriginal Linux, and of course it was retroactively obvious. A mixture of pilot error and "cross compiling is hard".

The second stage cross-compiler-$ARCH.tar.gz builds are statically linked, have thread support enabled, and various other cleanups that let you extract them in an arbitrary directory and run them from there. They are portable in a way that simple-cross-compiler-$ARCH.tar.gz are not.

You don't need a second stage cross compiler to build aboriginal linux (the reason to make them is you can tar them up and distribute them), but it'll use it (instead of simple-cross-compiler) if it's there. The more/ script that builds all targets in parallel creates both cross compilers, but ./ will only do so if you set the CROSS_COMPILER_HOST environment variable to one of the recognized targets. (The script is a small wrapper around, it's basically the same infrastructure doing a modified "canadian cross" both times.)

Making these portable cross compilers involves rebuilding the cross compiler packages using both the earlier simple-cross-compiler-$ARCH target toolchain (for the target code like libgcc), and a host toolchain to build the parts that run on the host (like the gcc executable itself). Because Ulrich Drpepper broke static linking in glibc (intentionally over a period of years because he hated the concept), I can't reliably use ubuntu's toolchain for this, and instead use the the simple-cross-compiler-i686 toolchain I built earlier. If that code is statically linked, it should run on both 32-bit and 64-bit x86 targets. (The 32 bit code is actually very slightly faster because the smaller pointer sizes thrash the cache less, this is why Linux added the x32 target.)

When, I switched the i686 target from uClibc to musl-libc, all the other targets still built fine... until I enabled the second stage cross compiler. So ./ sh2eb worked because it skipped, but more/ set CROSS_COMPILER_HOST=i686 and then the build broke trying to build elf2flt with a musl toolchain (duplicate symbol definition due to a stupid autoconf test putting -lc before the FSF's -libjustinbieberty).

What took forever to figure out is a clean worked, a dirty didn't (used if it was there in build/), and more/ always triggered the failure after the musl switch. Bisecting when you're not actually running the same test each time is a frustrating experience.

(Eventually I gave up on bisecting and sat down and traced the bug through the build until I figured out what was failing and why and where it came from. "What changed" is easier to debug than reverse engineering code when you don't understand how it ever worked in the first place. Or "if" it ever worked, since sometimes the difference is it decided to use a different file or include a different #ifdef section that was never compiled at all in the working version. In this case, digging through to figure out what specifically the code was complaining about finally let me figure out what HAD changed: musl in cross-compiler-i686.

January 3, 2016

Continuing to poke at find. The -depth option (to trigger depth first search) was recursing before loop checking, so endless loops (with -L) wouldn't be detected in that case. Oops.

I go back and forth on various design issues. Supporting multiple -execdir on the same find command line is one of them: some days I'd go "just detect the second and error out saying it's unsupported", but I'm already halfway through implementing support for it and it's not that bad. (It's _obscure_ but there's a strong element of "do it RIGHT" in toybox. What doing it right entails is the wibbly bit: is _not_ doing it at all the right thing?)

I've got it the point where "find toys -execdir echo '{}' +" is _sort_ of working. As in it's pushing a new context onto the stack _before_ saving the current directory, so each exec list starts with the enclosing directory name. Oops.

My "natural addition point" for the list push yesterday is elegant and simple and (unfortunately) wrong. And -depth is still tangled up in it, because in that case the whole directory processing needs to happen from COMEAGAIN. I think that -depth is actually currently wrong, not processing directories at all (because it returns early to change processing sequence, but comeagain isn't doing the extra processing in that case).

And all of this makes me really want a worthwhile test suite already filled out so I can at least spot introduced regressions. And of course poking at that reveals that the ubuntu version of "find toys -execdir echo {} + -execdir ls -C {} +" _isn't_ doing the collation thing, it's calling each one individually with a single argument (so execdir with "+" acts like execdit with ";"). which means building the test suite by testing the host version before I've got the corresponding target stuff in testable shape... is a little harder.

I hate having a test suite that only tests ONE version of the code, because how do I know if the tests are right or just confirming that the code does what it does? Catches regressions, but doesn't prove anything _else_. But when I start getting into "ubuntu does this wrong, busybox does this wrong"... I guess if nobody's noticed it can't really _matter_? But it's still not RIGHT. Grrr.

Anyway, this is why I added SKIP_HOST to tests that the ubuntu version is expected to fail.

January 2, 2016

Poking at toybox find.c again. The -depth and -xdev options aren't affected by parentheses, they're global. I wonder if there's an easy way to note that in the --help output? (Terse vs thorough, the eternal struggle.)

Isabella Parakiss reported (and fixed) the same bug Daniel K. Levy reported back in September. It took a while for me to confirm this because Dreamhost's list archive is STILL down and what I had bookmarked was a link to the web archive. ( snapshotted the september index page, but not the individual posts.)

Anyway, dug through and confirmed that, cherry-picked out the actual fix (and the "linux environment space hasn't been limited to 128k/process since 2007" fix), and a pending "we're checking S_ISDIR(dev) instead of S_ISDIR(mode)" fix that I thought I'd already checked in? Right...

Anyway, the remaining big find hairball is making -execdir + and friends work, which is where I left off last time I was looking at this.,/p>

So -exec, -ok, -execdir, and -okdir all collect a list of names and arguments to call exec() on. the ok variants prompt, the -exec ones just call. The dir variants collate them by directory and do fchdir() into each dir, the others use paths from the top directory and does the exec from there.

All of these can be terminated with either ";" or "+", the first calls exec on each file as it's discovered, the second collates a bunch together and makes one exec call ona pile of them. In the + context the collating is different between dir (flushing as it exits each direcory) and non-dir (collecting a giant pile of absolute paths).

January 1, 2016

Happy new year.

I put "update the darn blog" as a patreon goal, and even though we never got near it (not that I've been exactly shouting about my patreon from the rooftops) I've updated my text file whenever I remember. Unfortunately, when I trail off in the middle of a thought I do the next day's entry and then have to go _back_ because I can't post them out of order (rss feed, for one thing; policy of not editing entries after they go up for another). And when there are LARGE entries like "I'm sad my friend's gone nuts" that I need to write up more than the first 25% of, or things where "there's a link that goes here but I'm not on the machine I saved that on...", or it's a half-hour's research to finish this paragraph and I haven't got time right now...

Well they can add up. Plus I'm doing all the tags by hand and I have to check the thing for typos and unfinished paragraphs and forgetting the closing quotes in an <a href=>

so it eats the next several sentences...

Anyway, I was hoping the patreon would guilt me into it, but I guess I should tap the amount down. (The patreon totals recently got adjusted to be take-home pay after fees anyway, so everybody's goals should shift since then.)

I'm conflicted about patreon. On the one hand, it would be amazing if my open source work did become my full-time job. On the other, I _like_ my $DAYJOB at a company that's redoing the electrical grid to make solar and wind mainstream viable (Problem: legacy electrical grid designed for centralized generation, if you feed nontrivial amounts of electricity back in at the edges people's light bulbs start exploding. Solution:)

And the WAY they're doing it is awesome, creating open hardware by cloning architectures all the patents have expired on, releasing the resulting hardware source files under a BSD license, and writing up both how to test and modify them on FPGAs and how to negotiate with fabs to burn small runs your own wafers to make your own SOC. (Right now the smallest run it's worth doing is 6 wafers, resulting in around 36,000 chips with our little SOC, which would cost around $50k for a 150 nanometer process. Kickstarter regularly funds projects dozens of times more expensive than that.)

The founder of the company is the founder of uClinux; my entire open source career has basically been following in that guy's wake and I get to hang out with him in person for weeks at a time. They fly me to japan to do this. He walked me through how nommu should work. I got to drag my friend Rich Felker into this and get HIM doing important work on their dime too.

I LIKE this job.

But... turning android into a self-hosting development environment is _important_. I found the place to stick the lever to shift the entire course of the computer industry and I'm PULLING AS HARD AS I CAN. And it's time-critical and NEEDS to happen and I can help it happen the right way and... I gotta do a systemd replacement init. I need to do proper container infrastructure. I need to clone a non-GPL git for Android's repo. I need to de-hairball AOSP. Clearly Google isn't going to hire me to do this (we've been through this. We've ESTABLISHED this), but they're LISTENING. They'retaking my code and contributing to my code and I need to ramp development UP and get it all DONE and...

I'd like to convince $DAYJOB to hire more people to let me focus on the bits where their interests and android's overlap, but we're at that ironic stage where we're too busy to hire more people. (Money they have, engineering bandwidth to bring new hires up to speed, not so much. Part of the reason I've been quiet about the open source nommu/jcore stuff, apart from being so busy with everything else, is our first big push got an interested person who wanted to come work for us and... we dropped him on the floor. We were too busy to follow-up on hiring desperately needed people. We also had a consultant in Japan who we were too busy to properly bring up to speed to the point we could use her...)

I keep thinking I should really focus on fixing _THAT_, but I'm not management and don't want to be. (I'll team lead your ear off but hiring and firing decisions are pure Bistromath to me.)

Back to 2015