Rob's Blog rss feed old livejournal twitter

2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2002

December 30, 2014

Heard back from Dreamhost. They assure me it's not swap thrashing, the hardware handling the mailing list is just "too old".

That's... not a diagnosis. It really isn't.

On the bright side, @roytam1 on twitter reminded me that toybox has been mirrored on gmane for like a year now, so there _is_ a working web archive. Added a link to the web page and posted a note to the list.

December 27, 2014

Part two of the ongoing saga of my attempts to explain what swap thrashing is to Dreamhost tech support is up. In case you missed part one, there it is. (Part zero was back in August, when I was distracted by flooding.)

As far as I can tell (without a login to the box) they have a server that's been been constantly swap thrashing for a month and is now 10,000 messages behind in processing its mail queue, and their fix is to increase the number of queue handler processes. (Because adding cars to a traffic jam clears the problem right up.)

In my third attempt at explaining things, I'm linking them to wikipedia. I've honestly lost track of the line between "sarcasm" and "helping".

(There is a process that feeds messages into the database, and a second process that creates html files from the database. When the first one got constipated, they re-ran the second one manually for me. I explained that the second one had been running about once per day (it should be running every ten minutes or so, but close enough), but it only noticed a "new" (actually multiple weeks old) message every couple days. Obviously the indexing half wasn't the problem. After a couple weeks of explaining that over and over, they "fixed it" by adding more "messages into the database" processes, and since then the "html file from the database" task hasn't run once. I'd _like_ to run top and iotop on the box, but I don't think I need to in order to diagnose swap thrashing. Maybe someday they will too.)

December 26, 2014

Toybox has xwrap.c containing functions (all starting with x) that exit instead of reporting errors. These are mostly wrappers, like xmalloc(), that do the same thing as the function they wrap but just never return error. (This is why atolx() isn't called xatol(): it's not a wrapper for atol, it has significantly different behavior, parsing extensions for kilobytes and megabytes and so on.)

But a lot of the time, I want a "warn" instead of "error" version. I want it to print the error message (and note that it did so in toys.exitval), but return an error code I can handle and continue the loop. Lots of commands take multiple arguments and continue to later arguments even if earlier ones produced errors. At the moment I'm just calling the non-wrapped version and adding my own error_message() calls in the individual command rather than having "w" prefixed versions of all the xfuncs.c contents.

But it's one of those "pressure on both sides, both ways suck" things. Double the size of xfuncs.c or bloat the commands?

There's a third option, but making it work is tricky. In order for the eventual shell implementation to call this infrastructure without exiting, I made the actual xexit() function able to longjmp() back to a recorded setjmp() location. In order to make this any kind of generic I'd need all sorts of cleanup stuff (what malloc()s didn't get freed? Did somebody close stderr() because the -q option said no error messages? They changed the umask! Signal handlers! Lotsa random subtle possibilities...) but just keeping control is less of a big deal.

In theory, when I want a warning version of an xfunction() I can setjmp() the toys.rebound entry, call the function, and then zero toys.rebound again.

The problem is, I want to _automate_ this. Make it a wrapper around the call to avoid cut and paste boilerplate. And I honestly can't figure out how to do that, any "wrapper(function_call(args));" will call the inner function before the wrapper. I could make WRAPPER() into a macro, but then this disguises significant inlined code/complexity and that's exactly the kind of bloat I want to avoid...

Hmmm, I suppose I can add it as an option to loopfiles(), which iterates over command line arguments. If I can feed that a flag...

December 22, 2014

One of the sed uses I collected and then discarded as "that broke but I know what's wrong with it" was doing "r po/POTFILES" and my collection infrastructure didn't grab po/POTFILES because it's not actually parsing the sed _script_ (just the command line). So I deleted tests with that specific failure and it made it through the end of the LFS build!

So I promote sed, swap it in, and it went "boing". First because of yesterday's "four packages weren't instrumented right, and they did weird stuff", and _then_ because those "r" tests also hit other bugs later in the script.

Now I'm to the point where I've fixed those, but collecting a full test run is now showing all sorts of divergences. Meaning my fixes look like the introduced another incompatability. (It's mostly "extra newlines", but who knows what that could screw up, it's gotta go...)

I look forward to being done with sed, some glorious day...

December 21, 2014

After weeks of tweaking the sed debugging infrastructure to collect all the data of an entire aboriginal build through the end of the native bootstrap of linux from scratch, and compare every sed invocation with the host version to confirm toybox and ubuntu were producing the same output, I got to the end, threw the switch and promoted sed from pending to posix, swapped it in for the aboriginal linux build...

And it broke in host-tools.sh, step _two_ of the build process. Well of course it did.

Two problems:

1) an aboriginal bug where resetting record-command.sh after building toybox didn't actually update the $PATH, so the redone wrapper directory didn't take _effect_ so it didn't wrap the distcc build meaning my extra sed wrapper didn't get called for the remaining 4 packages in the host-tools build.

2) the e2fsprogs build is actually using multiline continuations! This is the non-posix feature where a single 'c' can append more than one line via -e "c\" -e "line1\" -e "line2". It's a gnusim, and while I made a stab at implementing it nothing had actually tested it yet, and it broke.

December 20, 2014

The toybox mailing list archives are still borked. The most recent message today is from december 3, which is still an improvement. (Two messages have shown up in the past 3 days! Progress!)

Of course I myself sent more messages than that to the list just _yesterday_, so...

I posted a message that may someday show up on the list archive detailing the history of this issue over the past year. Dreamhost's twitter account tells me that it's now an "admin bug". This might be progress! Except... if it wasn't an admin bug before, does this mean that instead of looking at the technical issue, they're now looking at their support process? Has my bug gone meta?

I just want the mailing list archive to work. I didn't set up my own copy of mailman because this is a service they claimed to offer, and every time I migrate mailing lists (through them.com and impactlinux.com and so on) I lose history...

December 19, 2014

Still testing sed. Just ate a day and change trying to figure out why the busybox build was dying with my instrumented version of busybox sed (collecting all the inputs), and it turned out to be that my extra logging stuff is eating (and storing) stdin for sed... which busybox's scripts/gen_build_files.sh calls in a "find | while read d" loop, so when sed eats stdin it eats the find output and the loop terminates prematurely.

Right. I keep having to make the _logging_ smarter in order to collect all this data. Possibly I should switch back to "try it and check for deviances in the build output". Which is hard because gcc creates randomly named /tmp files and then calls ld with the output so there just _are_ normal variances, but it might be easier to filter them out...

December 18, 2014

Finally got around to listening to Bradley Kuhn's talk about GPL enforcement, and now I really need to do an updated version of my "rise and fall of the GPL" talk because _dude_ is he doing revisionist history.

It's not that he only ever refers to me as "a troll" (yes, that's a quote), it's that he completly elides the SFLC. Does not mention them once. Eben Moglen, co-author of GPLv2 and v3? Who's that?

Oh well, you can tell he worked for the FSF for many years: taking credit for things other people did and revising history to paint yourself in a better light is what they _do_...

December 17, 2014

Bit of a hiccup developing in my workflow: I work best when I can ignore the outside world and focus on the thing that's grabbed my attention until I reach the end of it. Work has given me a dozen vague goals of varying importance, and keeps interrupting me with new ones that would take a whole lot of studying just to figure out what's involved in doing them. (With no awareness of any difference in diffculty between "write a new GPS driver" or "come up to speed on this ethernet timekeeping protocol" or "what does Linux actually need to do SMP"... (two weeks later) "by the way we were thinking of doing cache coherency in software, would that have any downsides", and of course "this other developer wrote this extensive math library that apparently uses dbus somehow but isn't really documented, can you reverse engineer what he did and explain it to the client"...)

Anyway, this means that any time I try to focus on something I'm bugged by the awareness of a dozen other todo items I've never even been able to SCOPE. And outside of toybox the biggest issues I _have_ dug into turned out to be "pull a thread and the whole thing unravels" sort of deals (the toolchain stuff), so I did a lot of investigative work and haven't got anything to show for it yet. Reluctant to follow up on that without knowing how long it'll take to produce results, especially since things are HAPPENING in the toybox world so I can't focus on that to the exclusion of all else because this other stuff is _important_. (Just... how important is it within the context of $DAYJOB?)

Oh well, I've still got weekends to catch up on the stuff I consider important. The pressure temporarily goes off and I can _focus_. (Ok, even within that context there's "following up with dreamhost support tickets", "aboriginal linux really needs a release", toybox has giant pending patch lists to merge, and I promised I'd finished gzip compression side for a company that needs it to ship their product, and I'm _almost_ done debugging sed, and a european cable company wants me to prioritize dhcp, and ANDROID IS REPLACING TOOLBOX WITH TOYBOX AND I'VE GOT PENDING TODO ITEMS RELATED TO THAT WHICH SHOULD BE DONE YESTERDAY... but at least then it's _my_ schedule and I'm not constantly second-guessing my decisions on what is and isn't important.)

(That capitalized bit isn't a secret, it's in the AOSP core git repository. The problem is Dreamhost still isn't reliably updating the toybox mailing list archive. Luckily the actual emails are going through...)

Interesting times indeed.

December 12, 2014

Exciting things are happening in the toybox world, and I'd like to link to mailing list archive posts. Unfortunately, Dreamhost has been having issues for months now.

Mailing list archies are not a thing they competently do, apparently. And filing support tickets about it seems to have broken their support mechanism...

(It's weird, every few days another message will process, from the trailing historical set. My theory is the hosts running the containers are massively oversubscribed to the point where cron jobs are basically never running, so processing the "new message arrived from the list" triggers is not happening. The messages are going into a spool and trickling out literally _weeks_ later. It's currently a couple weeks behind. And the tech support guys do not comprehend the nature of the problem, no matter how many times I point them at it...)

December 08, 2014

The reason ELF and fork() can't run as-is on nommu is that if you haven't got a memory management unit translating addresses, then physical and virtual addresses are always the same. This means "this address is already used by another program" becomes a problem.

One place this sucks is stacks: you can't auto-grow a stack without a memory management unit, because adding an extra 4k here or there means finding a large enough contiguous address range and moving all the existing data (think realloc()). The "contiguious" limitation makes things uglier: you can't assemble a 4 meg block fo data from several smaller chunks without an MMU, so allocations _fail_ more in nommu. So when you compile a nommu program you have to tell the linker how big a stack this program needs up front, and that memory is dedicated entirely to being that stack until the program exits.

Oddly enough "you can't read/write this bit" is much less of a problem, because you can have some low and high water mark registers saying "any write above this address causes an interrupt/segfault", so the kernel can set those to limit the window of memory the userspace program can access. Of course this means all a program's memory (executable pages, heap, stack) have to be together in one (or a small number of) contiguous address ranges, because you only have so many mask registers and can only set up so many windows into memory for userspace to access. (Or you can just mask off the kernel and let userspace processes stomp each other all they want. Depends on your definition of "secure".)

When you fork(), your new copy of the heap is full of pointers into the old heap's address space. In theory you could adjust all these pointers if you knew where they were, but this turns out to be basically the same problem as making garbage collection work in C. (In theory you could fix this by having every heap pointer be an offset into your address space, and having the compiler add a base pointer to the start of the heap when dereferencing them. Except that not all pointers are in the heap: you've got a stack segment, read only global data, writable global data, bss (which is writable global data that starts zeroed so you don't have to record it in the executable), memory returned by mmap(), and so on. You have to teach the compiler that there are different _types_ of pointers, and this sucketh hugely at a design level.)

This is why vfork() exists, which creates a new process running in the old process's memory and suspends the old process until the new one calls exec() (or exits), then the new one gets an entirely new address range with a new heap and everything, and the old one resumes running from where it left off. (The old one has to suspend because if they're using the same stack things get ugly quickly, and even if you _do_ waste a temporary stack you've basically got two threads without semaphores and overlapping calls to malloc()/free() could corrupt the heap's free space tracking. vfork() could be made to suck less now that people have spent 20 years trying to make threads work, by treating it as a special case of threading, but there's no real point. If you want threads, you can get threads. Except combining threads with exec sucks in _different_ ways...)

December 07, 2014

I've been wrestling with nommu for a bit, which means getting either binflt or fdpic working in the toolchain. I started with a working toolchain, a fork of the old Code Sourcery release from back before Mentor Graphics bought them, and I thought "I'll reproduce this with current upstream packages and rebase our development to that".

This has been unexpectedly hard, and I finally figured out why. I've been operating under a false premise, thinking that "nommu has been around forever", "sh2 has been around forever", "binflt has been around forever", "linux has been around forever", and "gcc has been around forever" meant that these things in any way overlapped.

Linux nommu support originally meant "coldfire", an m68k derivative. I learned about coldfire back at timesys circa 2006: a system with a large number of deployments but a small number of projects. It did few things but those things shipped in very large volumes. It was of no interest to timesys because nobody was doing any new projects with it, and the development of setting up new projects was their bread and butter. (Back before the company imploded, anyway. No idea what they do these days, nobody I knew who worked there was still there even a year later.)

After coldfire, blackfin took point on nommu stuff for a while. That was the most inward-facing company I've ever encountered in the Linux world. For example, they used uClibc and their full-time maintainer for the blackfin uClibc fork was Mike Frysinger, who Erik briefly handed uClibc maintainership off to since he was the only guy who had a full-time job doing it. During the 4 month period before I gave up and staged a coup, Mike did not post to the uClibc mailing list once (blackfin had its own internal mailing lists), and he continued to maintain his own blackfin internal uClibc tree where he did all his work, which nobody else could see until he deigned to port stuff upstream (which was not a priority for him). Needless to say, this didn't work well.

So blackfin forked the nommu stuff coldfire had done, and did a lot of work off in its little corner that only ever applied to blackfin, and this stuff never really made it to the outside world (with the exception of the kernel guys insisting stuff went upstream or else they'd break it twice a year by gratuitously renaming interfaces and such).

During all this, the theoretically vendor-neutral uclinux.org website bit-rotted, the repositories it used to host went down, and its links offsite went 404. The main site _itself_ stayed up, it just stopped usefully hosting anything.

So these days the "upstream" elf2flt package that buildroot points to is a git repo on Mike Frysinger's personal website, one which last had a checkin circa 2012. This package _claims_ to have support for sh2, but that's just autoconf boilerplate.

The code sourcery fork of the elf2flt package (from 2006) has buckets of sh2 code, which never went upstream. As with all the code sourcery stuff, it's based off their in-house fork which has thousands of lines of unrelated changes (mostly for arm), so picking out the sh2-specific changes from the noise is really hard. The best I've managed to do is try to bisect the vanilla packages for the version with the shortest diff (since some of the changes code sourcery did trickle upstream... but very gradually over a period of many years, and not all of them were upstream when Mentor Graphics came along and swallowed the company whole, and lots of the developers left).

Then there's Renesas, which was the spin-off of Hitachi that inherited the SuperH architecture. (You know how when AT&T gave up on commericializing unix they unloaded it on a new subsidiary that was 50% owned by Novell, and then sold the rest to Novell a few years later? Renesas is that for Hitachi and SuperH, I forget who the partner was and it really doesn't matter, they became independentis instead.) They're a company so insular that _blackfin_ stands next to them and goes "dude, your head is _way_ too far up your ass, that's not cool".

What I want to know is, the Linux kernel _claims_ to support sh2, which has never had a Memory Management Unit. (sh4 does, sh2 does not. I forget whether sh3 had one.) The uClibc project claims to have an sh2 target. And gcc claims to have sh2 support as well... but it only produces ELF output, and you can't run plain ELF on nommu. (Not without section-level base pointers that gcc doesn't do, although I'm told coldfire tried it; never went upstream of course.)

So at some point, either somebody had an elf2flt binary that worked for sh2, or they went "we got this to compile! Victory!" and never tried to _run_ any userspace programs. And nobody noticed for about 15 years...

(It can't be "this last worked with a.out binaries" because A) I don't think those necessarily help, B) the switch from a.out to ELF started in 1995.)

December 2, 2014

The text of the posix "expr" description said there were horizontal lines between priority groups, but there were no horizonal lines in the html version (going back to at least SUSv2, I checked).

I poked to the posix standard development list and they replied that it's a defect in the troff to html conversion. The PDF version supposedly has the lines, but is only available to members with a login. (The site said membership cost $2500, but apparently the login you create to join the mailing list counts, and other people have documented workarounds.)

Best of all, after I poked them they fixed it.

November 30, 2014

Watching busybox build, the filename shell/ash_ptr_hack.c wandered by and I got curious. It's some horrible "#ifdef your way around compiler behavior to do something obscure to change the behavior of global variables" thing, but the full facepalm part was that it does it _twice_, once for the #ifdef and once for the #else of GCC_COMBINE. Lookup up where _that's_ set, it comes from scripts/Makefile.IMA which is a compelling example of why I no longer have anything to do with busybox development.

I'm sure there's a lot of cause/effect feedback in there, but I'm not sure which way it goes. If I was still involved would I have cleaned that out, or given enough exposure would I have come to consider it a good idea?

The current busybox codebase is not my idea of fun reading. I don't like the extensive collection of #ifdefs in .c (not .h) files. I don't like the way each command's main function is prototyped on the line before it's declared, _and_ has two different magic macro modifiers in those two lines doing something non-obvious. (Nor do I like FAST_FUNC.) I don't like the main() entry point for the entire program being in the #else case of an ifdef in libbb/appletlib.c (the first place you'd look for it), and then the first things it does is micromanage libc's memory allocation (which is libc's job to get right). The existence of arch/i386/Makefile is kind of creepy too (it's been there for 7 years), but not nearly as creepy as the way various files (such as coreutils/env.c) have the GPL boilerplate at the start of the file and the BSD boilerplate at the end: BSD says "you must copy this specific text", GPL says "no additional restrictions", this contradiction is not resolved by putting the two at opposite ends of the file. Pay no attention to the license terms behind the curtain...

I'm poking at busybox code for the first time in forever because I'm annotating their sed.c so I can capture all the sed inputs of a complete build and run them through toybox sed and compare what my version and another version do given the same data to work on. This means I need to snapshot the complete command line, and each file argument, and stdin. Doing that as a wrapper is actually kind of tricky because "what's an argument" and "what's a file input" is a bit idiosyncratic. (Not impossible, I just thought it would be faster to bash this in situ into sed because once upon a time I wrote rather a lot of this code. I underestimated how much the project has changed since I left.)

Yes I could just annotate the new toybox one instead, but this way I can capture an entire build run's outputs at once, because the busybox one currently produces the right answers so the build doesn't break. (Years ago I spent a several months making that work, and even though I'm not reusing any of that code I might as well take advantage of it somehow.)

In parallel I'm making toybox sed tests based on a pedantic reading of the posix spec, but that's not going to show me the full extent of actual weirdness and corner cases in autoconf. The "this is not gnu sed" thing for --version is because autoconf changes its behavior if it doesn't detect _gnu_ sed. Why bother having two codepaths at all if you have to make it work for posix sed anyway? Because gnu. I did the LIE_TO_AUTOCONF stanza because the posix codepath never gets _tested_ on Linux, so I don't trust it.

So it's been a while since I've looked closely at busybox code. The busybox guys have been engaged in a certain amount of frog-boiling since I left, and I'm getting the stylistic changes all at once. They seem to ship reasonably working code despite all this, so presumaby it's not as big a deal to them as it is to me. But it no longer serves the goal of simplicity that originally attracted me to it...

Busybox still isn't to the "DON'T LOOK MARIAN, KEEP YOUR EYES SHUT!" levels of the face-melting GNU/Necromonicon codebase, but "I wonder how busybox did it" hasn't been a regular part of my development process on toybox since before the relaunch. Licensing issues aside, it's just not worth the effort trying to navigate through it. I'm trying to figure out how much of it's them and how much is me. It's not _that_ dirty a codebase compared to a bunch I've trawled through recently, but those don't disappoint me with how they used to be so much better, and exemplify an ideal now forsaken...

My fault for leaving I suppose. I don't get complain about dirty floors in a building I used to mop, but don't anymore. But it's my blog. :)

November 28, 2014

A pending todo item for toybox is making all the commands build standalone, maybe eventually adding some kind of "make allsingle" target. Right now, each single build redoes a lot of work the other commands already did, because of really trivial variations on the output. So I'm looking at the toybox generated files and trying to figure out which ones actually vary based on the config.

The most obvious one that does is config.h, but that only depends on .config and sed and is cheap to generate. (Once I've got sed.c finished I may need to build a generated/sed in order to reliably produce it, but eh.)

The most obvious ones that _don't_ are config2help, mkflags, Config.in, and Config.probed. Less obviously, globals.h and newtoys.h don't. (They're included after config.h, which #ifdefs the appropriate bits out.)

And optlibs.dat seems like it would, but it doesn't: it's the list of libraries that are available in the build environment, I.E. the ones we can link against without causing the toolchain to complain about a missing library. We use the --as-needed flag to remove the _dependencies_ on the libraries if we didn't use any symbols out of it.

Creating build.sh is basically free, and the obj directory is needed in order to do parallel builds. (We could avoid rebuilding generated/obj/*.o files that didn't change, the common libs stuff for example, but I'm concentrating on the header/setup code at the moment. Probably not that hard to do, though, if the test is just "any header or the relevant .c file".)

The help.h file sometimes does vary because of collating, but if config2help is already there it shouldn't be that hard to recreate it.

The complicated one is flags.h (and to a lesser extend oldtoys.h). This varies based on every file and on the config, due to the USE_ strings. And although it shouldn't do quite as much as it's been doing, it's still going to vary a bit.

I also need a new make target to build every standalone command. I'm leaning towards "make change", since it breaks the toybox binary up into a bunch of little ones.

November 26, 2014

Listening to a lady named Karen Souza do sultry nightclub singer covers of Never Gonna Give You Up, Like a Virgin, Take on Me, Billie Jean is Not My Lover, Tainted Love, Every Breath You Take, Sweet Dreams are Made of These... (The kind of arrangement that has a tinky piano and a drummer hitting a cymbal with a brush.)

Some of them (like "Time After Time") barely change at all. Some (like "November Rain") sound like this was the original arrangement and the other ones were the covers. And nobody can do "Don't You Forget About Me" like Bowie (29:48). But other songs...

Lemme just say that the sultry jazz cover of Radiohead's "Creep" is epic and awesome and wrong.

It's all on Youtube.

November 14, 2014

Debugging all the things!

Ran the busybox test suite against the toybox sed I'm doing and it's got some weird ones in there: "s@[@]@@" does not treat the @ in square brackets as a delimiter. (I checked, there's no obvious variant of "s[\[[][[" that's accepted by the gnu/dammit implementation.)

I'm not supporting nul bytes in the pattern space, just in the data.

November 8, 2014

Entering the home stretch on sed, and it's tricksy all the way down. When you 'r' the contents of a file, no newline processing is done. (Any previously unterminated line gets its newline, but the file contents are copied verbatim and if they didn't end with a newline, ok then.)

If two 'w' commands write to the same file, the second one deletes the first. (Presumably the first is writing to an unlinked file that goes away when the filehandle is closed. How 'w' works for /dev/ttyS0 and such, I couldn't tell you. (But will probably have to determine experimentally, and then add to the test suite.) Does O_APPEND use the file length as the write position? If so, why wouldn't two O_APPEND writers to the same file

The s/// command is, as always, funky. I have three argument slots in "struct step" from the argument parsing, and when I wrote it the only thing using the third was the filename for option 'w'. However, the spec states that the files are opened (and existing contents discarded) during option parsing, so what I really need is a persistent filehandle. Except I _also_ need the filename for error messages if the disk fills up in the write fails.

I made 'w' write a similar dynamic data blob like regex and strings were doing. (Integers are 4 bytes says LP64, deal with it.) I haven't done a close pass at the end that parses the pattern list, finds the we entries (and s entries with w flag) and closes their filehandles, but given that exit() closes them all for us and there's no flushing to be done on raw filehandles, I'm fairly ok with this.

November 7, 2014

The posix sed spec remains profoundly unclear. For example, what does 'D' actually do?

$ echo -e "one\ntwo\nthree" | sed 'N;D;a that'

The above prints "three". Why does it do that? We can follow the logic with:

echo -e "one\ntwo\nthree" | sed -e 'i ===' -e 'p;N;i ---' -e 'p;i +++' -e 'D;a that'

So it looks like the restart hits 'N' which has no additional data so acts like 'q' which prints default output on its way out. Ok...

Also, the 'c' command says it should substitute for the last line of the range, but the command can be in multiple nested ranges. So what do you do about:

$ echo -e 'one\ntwo\nthree\nfour\nfive' | sed -e '/two/,/four/{;c hello' -e '}'

The answer is, apparently:

one
hello
hello
hello
five

So only the range actually _on_ 'c' counts.

November 6, 2014

For many moons the binutils ./configure in aboriginal has been saying 'expr: syntax error' as it goes along. Even though it doesn't seem to be hurting anything, I wanted to track it down.

The problem is that the binutils autoconf is doing:

if test -n $blah; then
blah=$(expr $blah \/ 4)
fi

Which is turning into expr "/" "4" which is nonsense and deserves the syntax error. Guess WHY it's doing this? because the $blah in the "if" isn't in quotes, and thus turns into "test -n", which SUCCEEDS (returns error code 0) for no readily apparent reason. As in the host version, on ubuntu, does that.

I boggle.

November 5, 2014

So here's why my pending redesign of toybox option parsing's flag allocation won't work the way I described: at compile time the code that fills out toys.optflags only sees the currently enabled flags in its option string, that's why I was skipping gaps in the first place when creating the flag macros.

So if I _am_ to have stable flag values that don't change in response to config options, I need to communicate to the runtime version where the gaps are, meaning I can't just pass through the string containing the USE_BLAH() macros but need to generate something with in-band signaling about where the gaps are. (And then use that in the OPTSTR_blah macros in oldtoys.h as well.)

This is... fiddly. The flags go right to left (actually much easier for me to reason about because those are bit positions if you write the flags out in binary), but string parsing goes left to right (it's creating a singly linked list in the naieve "results are in reverse order" way).

On the bright side, we have a maximum of 32 flags (sizeof(long) on 32 bit systems) and characters below ascii 32 (space) are never used as flag values (they're all "nonprintable" control characters like tabs and newlines), so I can just say if it's < 32 it's the skip amount. Generating that's likely to be fun, but seems possible?

Let's see why THIS design doesn't work in practice...

November 4, 2014

Memory corruption always sucks to track down, because the crash is way the heck after the actual problem occurs. There's a significant Wyle-E Coyote period where the program is chugging along fine only because it hasn't looked down yet, then you have to backtrack to find the cliff edge you went off.

Luckily if you know what the sucker _should_ be doing and you can reproduce the problem, you can eventually find the deviation. But there is a certain amount of suckage in the process. In this case, five days worth.

In the sed option parsing I'm writing, each command gets turned into a "struct step", and is it goes along it assembles a doubly linked list of them to make in-order creation easy. In my first pass I used toybuf as the scratch memory to create them in, and had a pointer to the end of the structure that I could advance each time I consumed more memory (initially for the regex_t instances in line match addresses). Then at the end there was just one xmalloc(end-start) and memcpy of the same data, then add it to the list.

Unfortunately, once I started throwing arbitrary length strings into the mix (sed commands i, a, the part s inserts, branch labels...) I couldn't guarantee it would fit in toybuf anymore. So, snapshot toybuf about halfway through and remalloc from there.

Problem: if the start of the allocation moves, the "write new data here" pointer has to move too. Additional problem: if realloc() something that's already in a doubly linked list (line continuations ala "i \" need a second line of data, so we modify the existing entry after adding it, luckily), the pointers back into this node from the next and previous nodes have to move too.

I was writing everything into the correct place, then it moved.

I like keeping the data block together rather than have lots of dangling allocations (modulo what regcomp/regex_t is doing internally) not just because it makes freeing the sucker easier, but because nommu systems are vulnerable to memory fragmentation and these are fairly long-loved allocations (duration of the program running, anyway). And if I don't realloc() the line continuations, I need to special case the multiline version anyway. (Or at least treat it differently than I'm doing now, I.E. it can have content on the same line and content on a following line, but not both.)

Anyway, now that it's not segfaulting I can probably clean this up and make it less complicated (the change of direction in the design's a bit obvious), but making unrelated changes to code that doesn't _work_ is the opposite of debugging, and making a problem go away is not the same as fixing it. My choices were throw it out and do it again, or make it work as-is before proceeding, otherwise I wouldn't trust the result.

November 1, 2014

The very first energy drink I tried (on my first visit to Austin's Fry's location) was a highly tasty citrus thing that I then couldn't FIND again for years. It came in a black can but the liquid inside was bright orange, more or less tangerine flavored, and the name and packaging gave no HINT of what flavor it actually was.

Years later, I saw a bright orange energy drink can in a gas station mart and tried it, and what do you know it's an exact match for that flavor I couldn't find. I tried to stock up, HEB didn't carry it, but Fiesta did... and they still had it in the older black cans. The packaging changed recently, but it's the same thing I couldn't find all these years.

(Aside: Fiesta is the _other_ twenty four hour grocery store within easy walking distance of my house. Yeah, I'd be jealous too.)

The reason I couldn't find it again is it's called "Monster Khaos", with text on the can (still there on the new bright orange one) that starts "Our Pro Atheletes are always looking for an edge, so when they've got an idea we listen." No really. It's an utterly fruity 30% juice beverage, and the juices are apple, tangerine, pineapple, and peach. (As in "fruit was harmed in the making of this beverage." Not the usual "made with" meaning "adjacent to" where said lemons are in a block of lucite in the parking lot which the workers salute on the way into the factory (both workers; they oil the robots).)

A few days later fiesta was doing a "3 for $5" deal and only had two cans of "Khaos" left, so I tried the flavor on the shelf _next_ to it, "Khaos Assault", which is bubble gum flavored. Not something I personally go for (what the...???), but I can at least see the theme here. And that theme is "foofy".

What I _don't_ understand is the mindset that thinks the obvious name for tangerine and bubble gum flavors is "Khaos" and "Assault" repectively (only one of which needs to be misspelled, but food manufacturers have a long and difficult history spelling anything "Froot" related). I do not understand why they need to come in a black can with flavor text about Pro Atheletes (who are capitalized despite occuring in the middle of a sentence).

These are beverages that deserve tiny umbrellas, and Monster seems _terrified_ that somebody less than totally manly might find that out. I question the marketing decisions that went into this packaging, is what I'm saying.

Oh well, at least I found it again. Now if only the tangerine-pineapple one came in a diet version, but I guess that's a bridge too far...

(Because the diet "rockstar" versions trigger sparkly visual migrane symptoms for me, that's why I switched. Cheap, tasty, and full of random things like "milk thistle" that I apparently react badly to. Yes the lemon rockstar still tastes like dishwashing detergent, but I don't actually mind that and they have nice orange and fruit punch flavors anyway. No, it wasn't a problem before all the incandescent lighting got replaced with vibratey strobing fluorescents, but the two effects stack badly to make everything look like it's sparkling, and last year's adverse reaction to dental anesthetic sensitized me a bit. Eyestrain's sort of a limiting factor for my ability to program.)

November 1, 2014

Another month snuck up on me.

Trying to get a toybox release out for the 15th. I'm behind on kernel releases again and I'd like to get ahead a bit, which means an aboriginal linux release, which means a toybox release before that, so I might as well do it early.

This aboriginal linux release should support sh2, and maybe coldfire. Nommu support! I'm learning it at my dayjob, so I might as well apply it. (Coldfire is already in qemu, sh2 I have local hardware for.)

October 29, 2014

Studying the intricacies of the "binflat" format, which doesn't work ANYTHING like I thought it did. In theory ELF files are a container format a bit like zip or the .a files made by "ar", just optimized for holding snippets of executable code and listing entry and exit points so a linker can go along and splice things together.

Because of this, I thought "elf2flt" was basically working like "objdump -o binary" and producing a flat executable file from the information ELF already had. I very vaguely recalled that people were running binflt files out of flash filesystems and ramfs, where the executable code lived in the filesystem memory and didn't need to be copied into DRAM, and that the main reason for the format conversion was to basically turn the file into a runnable memory image.

Basically if you go to the toybox source and do "CFLAGS=--static scripts/single.sh cat" and then "./cat /proc/self/maps" you get something like (on my x86-64 netbook):

$ ./cat /proc/self/maps
00400000-004c1000 r-xp 00000000 08:01 11846866 /home/landley/toybox/toybox/cat
006c0000-006c3000 rw-p 000c0000 08:01 11846866 /home/landley/toybox/toybox/cat
006c3000-006ca000 rw-p 00000000 00:00 0
00b8b000-00bae000 rw-p 00000000 00:00 0 [heap]
7fff107ea000-7fff1080b000 rw-p 00000000 00:00 0 [stack]
7fff109d8000-7fff109d9000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

If those first two sections were contiguious, and you dumped to them to disk, that would be a binflt file, right? Unfortunately... no, it isn't. That's not remotely what's happening.

The big fiddlyness is making them "relocatable", which is necessary for shared libraries because otherwise each library in the system needs a unique virtual address space it expects to load at, which means you have to have all your shared libraries built by the same group that can allocate unique address range to them. (That's how the old a.out stuff worked, and it meant that combining a.out with shared libraries was _insane_. It worked ok for static binaries, but not for libraries, which is why everybody moved to ELF.)

Alas, nommu systems have this problem for _static_ binaries too, because the processes don't have virtual address translation. With an mmu every executable can start its text segment at the same address (a bitup from 0 so they can map a dummy page there with all the protection bits off, for the purpose of segfaulting if you try to access a null pointer even with a largeish offset). But without an mmu, you can't have "cat" and "ls" start at the same address range, not without copying the data out every context switch (which would be insanely slow).

(A nommu system may have low/high water mark registers specifying what memory you can currently access, so the operating system can mask out everything but the currently runnable process as unreadable, unwriteable, or unexecutable, so a single rogue process won't crash the system. Then again, they might _not_.)

So in a nommu system, binaries really need to be relocatable. Which means binflt needs to be relocatable. Which means it needs the equivalent of all those ELF tables listing every instruction that tries to access something that can move, so the address at which it's trying to access it can get an offset added or subtracted to it. And in order to make those adjustments, you need to know where the relevant symbol is this time around, so you need sources and destinations.

Static linking is sort of less horrible because you can relocate an entire block of data at a time. Everything is relative to the start of a section, so you have a single offset you're adding to a lot of places.

A statically linked C executable has four interesting sections: text, data, rodata, and bss. The "text" section is where the executable functions live, so you need a list of places attempting to call functions (and set function pointer variables). The data section contains all your writeable global variables that are initialized to nonzero values, think of it as a giant struct with a pointer to the start of where you've loaded it this time and each variable has an offset from that starting position. The rodata is the read only data, it's like the data except this section has the read only bit set on the memory. And then bss contains all the global symbols that _aren't_ initialized to something other than zero, these don't need to live in the file (you just need the length: malloc the appropriate block of memory and meset it to zero before calling main), but they _do_ require relocations to access the symbols because the start of bss is dynamically determined at runtime (what's a free chunk of memory you could allocate, given everything that's already used).

There are more interesting sections at runtime, the stack and the heap and the environment and maybe TLS if you're using threading, but those aren't loaded from the executable file.

In _theory_ on an architecture with buckets of registers (just about everything except x86), you can assign a register to each of those sections and have all attempts to access any of the symbols be "register plus offset", where the contents of the register are set when you load the section and the offsets don't need to change. In practice... that's not what happens, the offsets get adjusted at load time to be a standalone address (sometimes called "zero based", because the address is "zero plus this offset"), so we don't need to tie up a register acting as the base pointer for each section.

So, getting back to binflt: for each of those four sections it needs a table of all the code that's trying to access a symbol out of each section. In theory you can extract this from the ELF file, right?

For some reason, you can't. There's an elf2flt but it doesn't work on _normal_ elf files. It doesn't even work if you build -g with all the debug info. No, it creates a new binary to wrap the linker (moving the old ld to ld.real), and the linker wrapper calls the "nm" binary to dump offsets, and then after parsing that data then it calls elf2flt.

On top of this, there's a linker flag, -Wl,-elf2flt, but all it seems to do is change the entry point ("lf2flt") or something?

But that isn't the truly head-scratchy part. That would be dynamic linking binflat binaries. They're not all statically linked. You can have "flat" shared libraries, and "flat" binaries using those shared libraries.

At this point, I officially have no IDEA what this format is designed to accomplish. It's the only binary format that the example linux system I got from work supports, and I'm tring to replicate it so I understand it. But... dude.

(Oh: there's also some version skew here. The original uclinux.org stuff seems to have bit-rotted, the old cvs repository for this stuff went bye-bye and buildroot switched to a git mirror of it years ago. Which was last touched in 2012. There's still some documentation if you look, but "long int" means "32 bit" there, so even the most current documentation I can find assumes a 32 bit build host.

That this point, I reading the kernel's fs/binfmt_flat.c to try to understand what the heck is going on. But the last time I read binfmt_elf.c was 2010, so I need to go reread _that_ to refresh my context...

October 25, 2014

Posix sed spec: "A function can be preceded by one or more '!' characters". Gnu/dammit implementation:

$ echo -ne "one\ntwo\nthree" | sed -n '2!!p'
sed: -e expression #1, char 3: multiple `!'s

But of course.

Also, when posix says that the delimiter for s/// or y/// can be any character except backslash or newline, I _really_ want to add NUL to my test suite and MAKE IT WORK IN TOYBOX. (Well they shouldn't say any character if they don't MEAN it. Standards should have _standards_.)

Also, the gnu/dammit version of "sed 42,p" uses the error message "unexpected ," but "sed 42,3p" works. Define "unexpected".

October 18, 2014

Toybox design issue:

The automated option parsing digests the command line before the relevant command's main() function is called, turning "-x" into bit set in toys.optflags with a FLAG_x macro you can mask to check that bit. Each command's NEWTOY() macro includes an option string that says what command line arguments to expect and how to treat them, the big comment at the top of lib/args.h explains the syntax, and it's also explained in the code walkthrough.

There's a generated/flags.h header with #ifdef blocks for every command #defining the appropriate FLAG_x macros, and you #define FOR_commandname before #including toys.h to select the appropriate set of flag macros for the command you're currently implementing. You can even #define CLEANUP_oldcommand and #define FOR_newcommand and re-#include "generated/flags.h" to switch from one flag context to another in the middle of a C file.

The "too clever for my own good" bit is that if a flag is switched off in the configuration, the relevant FLAG_x macro becomes 0, so dead code elimination can zap the relevant code implementing it. (Any if (toys.optflags & FLAG_x) {blah;} becomes if (toys.optflags & 0) which becomes if (0) which goes away at compile time.)

The problem comes in when I try to implement multiple commands in the same C file. Any given line of code can only be in one flag context, and if that command is disabled in the config all its flags are 0. So even if I have the righthand parts of a string match (the bit values count up from the right, the same way binary numbers do when you write them out as "011010") so two commands have a subset of the same flags for use in a common function... if the command whose flag context we're in is configured out, all those flags become zero. And a scripts/single.sh build never has more than one command enabled at a time.

In theory I could add flag macros prefixed with the command name (or an all-caps version thereof) that would show what the bit values would be if the command wasn't disabled, except... it doesn't work that way. A command can have sub-options allowing just _part_ of its flags to be disabled, in which case the other FLAG_x macros will move down to different bit values.

Possibly what I should do is change that so the bit values are always what they would be at allyesconfig...

Ooh. I can have a FORCE_FLAGS that's normally #defined to 0, and when I #define it to 1 it can switch on all the flags for the command even when it's disabled.

October 17, 2014

Sed addressing weirdness, from reading posix: you can have regular expression address matching using either "/address/" or by escaping the first character and then having it appear unescaped to end the regex, ala "\@address@". If the escaped address character is escaped again in the address, it's treated as a literal, but a \n in the address is treated as a newline. So two obvious questions: A) what about the other printf escapes like \t, B) what happens with "\naddressn", if you escape the \n in there is it treated as a literal n or as a newline? (The spec is unclear. the useful thing to do would be to check that we're escaping the delimiter first, and then check for printf style escapes if it isn't one of them. Of course, that's not what gnu does. It treats \n as a newline unconditionally and you can't have a literal n in an n-delimited regex.)

Presumably, this being gnu, it's because they just didn't think of it. But that means adding support for \t and similar (useful!) screws up strings the standard says you should match.

So riddle me this, batwoman: why does "echo -e '\tx' | sed -n '/\t/p'" match and print a tab and an x, but "echo -e '\tx' | sed -n '\t\txtp'" does not match? But "echo -e '\tx' | sed -n '\n\txnp'" _DOES_ match?

They implemented the more complex behavior and then SPECIFICALLY DISABLED it for \n. BECAUSE GNU.

October 16, 2014

Sed is funky. Did you know "sed 0p" has a specific error message? (But "sed 1,0p" is allowed?) Because gnu, apprently.

(I am trying very hard not to turn this entire sed.c file into a giant series of Amber references, but _you_ try repeating "pattern" a bunch of times without throwing in "logrus" once or twice.)

(It's a fiction series from the 80's and 90's. Trails off into unresolved plot threads at the end of the second series because the author, Roger Zelazny, died. Good books though. Hard to get through the first two because the protagonist isn't particularly sympathetic, but he gets better, and then book 6 starts with a new viewpoint character. First series I read that started with an adult who underwent character development to become a different kind of adult.

October 15, 2014

I've botched up a sh2 target for Aboriginal Linux (a fork of my sh4 file with toolchain and uClibc target adjustments) and it's being stroppy.

Sanity test: building Hello World.
aboriginal/build/simple-cross-compiler-sh2eb/bin/../lib/libc.a(fwrite_unlocked.o): In function `fwrite_unlocked':
fwrite_unlocked.c:(.text+0x88): undefined reference to `__udivsi3_i4i'

So I go in and try to fix that and get caught in gcc developer insanity: there's a horrible lib1funcs-Os-4-200 mess that turns into a second libgcc appended after the first one which includes -Os (size optimized) functions that sh2 doesn't implement speed-optimized versions of, so why the heck did they split it like this? Possibly this has been cleaned up since gcc 4.2.1 but sh2 existed back in the old days and I'm trying to make it work in a toolchain I _know_ before going too in depth dissecting what Code Sourcery did to gcc 4.5.x back before Mentor Graphics bought them. I got sh4 working years ago, this isn't _that_ big of a delta.)

As far as I can tell this second library isn't actually being _built_ (there's code to add it in the specfile, but the library file it wants to link in is nowhere to be found in the build directory), so I just did a fairly heavy handed #include "lib1funcs-Os-4-200.asm" from the sh/lib1funcs.asm to put the symbols in the library it _is_ building.

This inexplicably does not work, even when I blow away the output directory and rebuild to make SURE it's rebuilding everything. The build isn't breaking, but the link still fails. So I rsync my entire aboriginal working directory over to a faster build machine and redo it there...

Sanity test: building Hello World.
/home/landley/abotest/build/simple-cross-compiler-sh2eb/bin/../lib/crt1.o: In function `L_uClibc_main':
(.text+0x2c): undefined reference to `__uClibc_main'
/home/landley/abotest/build/simple-cross-compiler-sh2eb/bin/../lib/crt1.o: In function `L_abort':
(.text+0x30): undefined reference to `abort'
/tmp/ccB2Xlqv.o: In function `main':
hello.c:(.text+0x18): undefined reference to `puts'

That's a completely different build break. Leaving aside how the L_ prefixes are supposed to be stripped off by the build, it's a different break from the same build. (Ran it again on both boxes to be sure.) The difference is xubuntu 12.04 vs xubuntu 13.04. (Yeah, that's what the system76 box came preinstalled with. No, I don't know why system76 is shipping non-LTS releases. No I can't uprade it without a reinstall, the ubuntu upgrade servers went away at the end of the 1 year support period for non-LTS release. Yeah, I thought you could still do a distro version upgrade to at least the next LTS after that. Apparently not, because Cannonical.

Digging into all this weirdness. Have I mentioned how much I hate the entire gcc build process?

October 14, 2014

Today's "nobody but me will ever notice" programming corner case: Implementing sed, I have an existing lib.c loopfiles() function that operates on an array of filenames (generally argv[]), opens each one, calls a callback with a filehandle and name. If loopfiles was told to open the file read only (which is what you should do for source files, you actually _can't_ always open them read/write), then it closes the filehandle again on return. Otherwise it's the caller's job to close them.

The new code needs to iterate over the file contents line by line, meaning it wants getlines(), which uses a FILE pointer instead of a file descriptor. A file descriptor is an integer the operating system uses for read() and write() system calls, a FILE * is the ASCII/ANSI wrapper structure (from the C89 standard) that includes a buffer, allowing calls that use it to read ahead a bit even on nonseekable input (like pipes). This means you can read lines without doing single byte read() system calls, because if you read say 512 bytes at a time (500 times more efficient than reading 1 byte at a time, those round trips to the kernel add up) what do you do with the _extra_ data when you overshoot the end of the current line? The next reader will want to start with that data, and you can't shove it BACK into the filehandle if it's not seekable.

So: buffer in the program, which future readers can use. (In this case there _aren't_ any future readers, but this is the C library's mechanism for doing line reads that doesn't suck. So I have to turn the file descriptor integer into a FILE * structure.)

The function for doing this is fdopen(), which takes an int and returns a FILE pointer. Unfortunately, when you call fclose() on that FILE *, it closes the underlying file descriptor.

October 13, 2014

Fell out of the habit of blogging for a bit there. (Downside of blogging with vi and raw html: if I trail off halfway through a big entry, and don't go back and edit previous entries (which I don't, modulo the occasional broken URL and typo), and I can't post it partially completed, it goes on the todo list until I get around to finishing (or at least capping off) the partial entry. Yeah I know, not a good system. Coming up with a better one is ALSO on the todo list, but relying on stuff like livejournal or wordpress exposes you to cloud rot. (I have a vague interest in tumblr, but it's of a generation of social media sites that believes navigating archives is only something the NSA will ever want to do. At least twitter lets me periodically download my entire tweet history as a .csv file I can grep. Twitter sucks at being twitter, but letting me get my data back out makes up for a lot.)

Let's see, what did I do this month...

Left my previous job (last day was the 3rd, first day at the new one was the 6th), I'm now working at a company that's not only doing cool things (yay solar power and smart grids in an actual non-vaporware way) but lets me bang on toybox and musl on the job. (Which is awesome; I never did toybox programming on the clock at work even when I was blocked waiting for something else because as cool as my old boss is, he's not the entire company nor will he be there forever. I don't want future lawyers making ownership claims.)

Initial ramp-up period's eating my life though. I have to get an sh2 toolchain working (they have one but it's a fork of a fork, from back before Mentor Graphics bought Code Sourcery), and come up to speed on nommu (realtime sensors want VERY CONSTANT LATENCY, plus the chip has a skeletal transistor count). Their current sample board has a vmlinux image using binflt and similar deep embedded trickery that I've largely avoided up until now because I didn't need it. I know the theory, but now I need to make it work.

Some of the original implementors of uClinux work on this project (they're still annoyed Erik took out the dash from their library name), and they want me to make toybox and musl work on nommu.

Ok then.

September 11, 2014

Playing with pcbsdos. Its make is so deeply broken I'm trying to bypass it and run the build commands directly. Hitting all sorts of weird stuff like the fact bash is installed... under /usr/local. So why have it at all? (And what on earth does this distro think "local" means?)

The environment is creepy. Take tab completion: at the top level of the toybox source there is a file "Config.in". It is the only file that starts with a capital C, so I went "grep Con" and hit tab. It turned the C lower case, because there's _also_ a lower case "configure" in the directory. The fact one with a capital C exists is _ignored_, so it can go "but didn't you mean to maybe match the lower case one even though you gave me a mixed case string so _you_ were clearly specifying case?"

I haven't _just_ used Linux. Over the years I've used Solaris and AIX, I came to Linux from OS/2, and before that I had an amiga. But this thing is just trying too hard. I keep hitting not just things that are missing, but things it _shouldn't_ do. It's trying to be smarter than normal, and thus second-guessing what I do, but not smart _enough_ for those guesses to be right. It's trying hard to _distiguish_ itself, and thus the benefit I thought it might have (simplicity) is clearly not present. Alas the BSD variants that _are_ simple aren't _complete_, and I haven't worked out how to add stuff to the base image yet to get a functional development environment. (The package management is only used by people with decades of experience, no current tutorials on any of this aimed at newbies who want to develop....)

September 4, 2014

So if systemd is so bad, why are people bothering with it? Backstory time.

In the beginning (1969-1974 at Bell Labs), unix system bringup was basically a shell script which backgrounded demons (with the & operator) as appropriate. You didn't have to track the backgrounded demons after launch because they shouldn't exit. ("What do you do when a process just spontaneously crashes?" "I dunno, I don't run Windows.")

In 1975 Unix creator Bill Joy took a year off from Bell Labs to teach operating system design at his alma mater, the University of California at Berkeley, and his students created BSD unix (the "Berkeley Software Distribution"), which still uses simple shell scripts to bring up the system. This style of init script is generally called "BSD init scripts".

When AT&T started commercializing its version of Unix (allowing itself to be broken up in 1984 so it could expand out of the telephone business into computers and telecommunications), it worked to differentiate itself from the existing unix systems by complicating the hell out of itself creating features it thought might lure customers away from the existing BSD or Labs unix versions they already had installed. It made a lot of "infrastructure in search of a user", some of which got used and others (such as system V driver "streams") didn't.

In retrospect, AT&T's commercial unix is called "System V" because they used roman numerals for their versions of "The Unix System", and version 5 was as far as AT&T managed to get before realizing what a mess it had made, and selling the remains of its unix business to Novell. (I described that history a while back.)

System V init broke the one big BSD style init script into a bunch of little files, one per service to be configured or daemon to be launched. Each script took an argument "start" or "stop" to launch or bring down its service. These scripts were gathered together in a directory under /etc where each script's name began with "S" or "K". The "S" scripts were run to bring up the system, the "K" scripts were ignored. This way a GUI tool could let you enable and disable services by renaming the files, and different packages you installed could add more and more services at startup without having to edit an existing file.

As always, responding to clutter with boxes to put it in meant clutter bred to fill the available space. (Why get rid of anything when you can file it?) So System V init scripts proliferated, and system boot got really, really slow.

Running init scripts in parallel was an obvious solution, even before the the widespread availability of SMP starting in 2005. In a BSD-style init environment this was easy (just fire off background processes with & and use the "wait" to synchronize and process exit status), but sysvinit wasn't designed with this concept in mind. The only information about which init scripts depended on which other parts of the system already being up was the alphabetical order the init scripts ran in (which became numerical order with fixed length numerical prefixes). This didn't let you run _any_ init script until all the others before it ran, so there could be no parallelism without additional information.

The classic unix tool to run tasks in parallel was "make", and developers in the AIX world were already using make to bring up the system in the 90's. (TODO: link to Linux Weekly News article about this.) Unfortunately, make is a horrible grandfathered-in pile of crap from the 1970's that makes similar-era tools like "ed" and "sccs" look elegant (someday, configure/make/install needs a complete rethink the way CVS ws replaced by git), so no major distro took this suggestion seriously.

Meanwhile, back before Steve Jobs died, MacOS X introduced a new parallel init system, "launchd". It worked quite well. Unfortunately it used xml based configuration files, which nobody wanted to copy. Since

Trying to speed up sysvinit is how Ubuntu made its first "too dumb to live" technical decision: redirecting the /bin/sh symlink from bash to dash. Their stated reason was to speed up init scripts, but they wrote upstart anyway because it failed to do that. And they insisted that changing the #!/bin/sh line at the start of each init script to say /bin/dash instead, and leaving the /bin/sh symlink alone, was far too intrusive a change... so they broke kernel compiles.

Hint: any time you ship a Linux distro that can't compile the Linux kernel, you have screwed up. (Red Hat did this a little over a decade ago with "gcc 2.96" and "kgcc", and over the next few years went from over 50% of the Linux workstation installed base to less than half that. There's a _lot_ more to that story, but this particular mistake has a history as a canary in the coal mine for poor technical judgement.)

Redirecting /bin/sh from bash to dash was especially stupid because bash as the default Linux shell was just about the one thing Linux distros agreed on. Linus Torvalds' biography "Just For Fun" described how he created Linux by adding unix system calls (which he looked up in the Sun Workstation manuals in his university library) to a terminal program he wrote so it could run bash without rebooting into minix (which was a microkernel architecture, and thus too broken to keep up with a 2400 bps modem without dropping characters). This means bash was the default shell of Linux before it was called Linux, and remained so from 0.0.1 right up until Ubuntu screwed it up.

Since moving to a slightly faster (but still sequential) shell parser barely put a dent in the init speed problem, Ubuntu decided to write its own parallel init system, citing launchd as inspiration but creating its own incompatible version (to avoid the xml config file format) which it hoped would become the new Linux standard init. Unfortunately, Ubuntu never admitted their /bin/sh move had been a mistake and reverted the original breakage now that it had _clearly_ failed to accomplish its original purpose. So when Ubuntu followed it up with upstart, enough people viewed the new program as doubling down on their original mistake to keep it out of other distros. (Even Debian, Ubuntu's parent that had rolled to a stop until Ubuntu hired full-time developers to unblock a years-overdue "debian stale" release, did not adopt upstart.)

Numerous other small distros tried to come up with their own init system rewrite (such as gentoo's openrc), but in the absence of a standard people rallied around this merely increased fragmentation. When Ubuntu failed to convince other distributions to adopt upstart, it launched new questionable initiatives such as the Unity desktop (the classic "microvax" mistake that Microsoft also made with Metro) and the Mir 3D graphics server. (And they based their "launchpad" sourceforge clone on one of the failed "not git" source control systems, monotone, attesting that their technical judgement was universally poor.) This loss of focus blunted the impact of all of these projects.

This leads us to systemd, from the people who brought us gcc 2.96 and kgcc back _before_ they abandoned the workstation market in favore of fat corporate procurement contracts and became "pointy haired linux". Red Hat approached the problem with the goal of becoming a standard, not of doing a good job, and they focused on systemd as the One Big Thing they were currently doing. They took the Mozilla/Microsoft Office approach of bundling as much functionality as possible into a giant monolithic entity that would be as hard as possible to clone, and set about convincing everybody else to use it. Not that it was technically superior, but that it would become so unbiquitous that it would be nonstandard _not_ to use it.

There's a famous line from the BBC series "Yes, Prime Minister": "Something must be done. This is something. Therefore we must do it." The argument for systemd has never been about the quality of the solution, but about the severity of the problem, and the same "inevitability" Hilary Clinton had in the 2008 Democratic primary (and has since regained for the 2016 nomination). This _will_ win, so you might as well support it, or you're wasting your ~~vote~~ time.

Me? I waited out AOL, windows, devfsd, and hald. I've never had a facebook account, and have just as little interest in using systemd.

September 3, 2014

Aboriginal Linux continues to be hiccuping, due to the new ccwrap not quite working to build everything with the old packages. I'm trying to get a release out using the new wrapper with the old C library, to reduce the number of variables. I want it to build linux from scratch 6.8 (yeah, stale, it's a regression test) on all the targets. And it's not quite working.

Right now it's complaining that the busybox build can't find _start, which is kind of impressive. Before that there was a library hiccup because the -L specified on the command line was going after the -L the wrapper was adding for standard libraries, and of course somebody in some gnu/dammit package in LFS was supplying their own util.a. I mean why wouldn't they?

(I say "gnu/dammit" because the FSF zealots are always going on about gnu/linux/dammit and I don't see what Linux has to do with it.)

Anyway, reshuffling the -L stuff broke an _earlier_ package build. Of course it did. Fiddly stuff, this.

All of this is holding up toybox because I use aboriginal linux as a test harness for toybox commands, _especially_ the ones requiring root access such as mount, umount, and useradd. Lauch a virtual system in qemu, don't run things as root on the host that may screw up the config and (at best) require a reboot.

In theory I could just grab an older version that's not horked. In practice it's easiest to rebuild the image with a new toybox snapshot, becuase it's just drop in and rerun the script. I have to fix everything in both projects to put out the two overdue releases, what order I do so in is secondary.

In practice, I need a toybox release first. But I don't know that toybox does everything right until I run a full aboriginal build and linux from scratch build with it. Bit of a catch-22...

September 2, 2014

The end of the month snuck on up on me again.

Intensely focusing on toybox debugging, so of course I've watched the entire last season of Matt Smith's Dr. Who episodes on my phone. (See "T-Mobile tethering limit" during the flooding.)

Caught up now, anyway. The embarassment squick at the start of the last episode was completely unnecessary. (More than once I paused an episode because I knew exactly what was going to happen next and wasn't that interested in waiting for it to be over. Wasn't wrong any of those times.)

Also, the unnecessary intro on the episode where the "what shall we be afraid of this week" committee decided wireless routers would be the monster of the week... that whole pointless homage to "tell, not show" was kinda spoiled for me because the dude didn't have a flashlight under his chin making "wooooooo!" noises. The lack was palpable.

Jenny, Vastra, and Strax were awesome though. (Yeah, even the memory slug slapstic. If you're going to have a comedy relief sontaran, I guess that would be how.) And John Hurt was perfect, as was that woman's scarf. And of course David Tenant's last words as the doctor remain consistent.

Nobody correcting Matt Smith's pronunciation of Metebelius 3 though... that was just sad. It was a recurring planet (and the crystal he picked up from it a recurring mcguffin) for an ENTIRE SEASON of Pertwee episodes, pronounced consistently. Including his regeneration episode into Tom Baker. (Yes, Sarah Jane went to Metebelius 3. She organied a rebellion. As you do.) "The reason I'm writin' is how to say chitin" indeed.

August 25, 2014

What did I do last week...

Finally fixed the darn spacebar, after three days. (The little metal clips two straight wires go into needed to be bent into exactly the right position, something like a 30 degree angle. That was non-obvious, and had to more or less be worked out from first principles.) Yeah, a tiny issue to anyone else but "the keyboard on the machine I do most of my typing on" is kinda important to my workflow.

Finally got the aboriginal linux toolchain working properly with uClibc again. (I had the dynamic linker subtly wrong several ways, and every time the diagnostic messages were _useless_. Also, if you tell it the dynamic linker is uClibc.so.0 (which exists) instead of ld-uClibc.so.0 it gets really confused.) This unblocked me enough that the linux from scratch build is dying attempting to build the util-linux "mount" command, probably due to the install command in the repository version of toybox.

Got "blockdev" (a command to call block device ioctls) cleaned up and promoted. (Somebody submitted it to pending, it is apparently a thing their use case needs.)

Checked in a giant pile of half-finished mount code, including changes to umount and common getmountlist code (which is accumulating all those comma separated list parsing functions I didn't have a place for). I needed to get the aboriginal environment working again to test these. (Ok, I could have pulled out the binaries from last release but where's the fun in that? Without dogfooding (I.E. "eating your own dog food") you never get a reasonably usable system. I've gotta hit all the bugs before the users do, and if it's not a good enough tool for my use, I need to fix it...

August 22, 2014

The accumulation of probably-cathair under the spacebar on my netbook got so bad I finally popped the key off to clean under it. Don't ever do this on an Acer Aspire One 722. The spacebar is held on by spite and a rube goldberg contraption of three separate wires designed by someone who hates you, personally.

I called in a professional hardware fiddler and _they_ couldn't get it back on any better than I could. (At a 15 degree angle with the top flipping up to 90 degrees if I pull on it). The closest youtube video we found demonstrated _gluing_ it back into place.

I've taken the darn thing off to try to figure out how to put it back on better three times now. (Four.) It's actually slightly easier to use with the plastic bar completely off. (I have to depress a tiny little plastic switch in the middle, but it triggers reliably and is always in the same position...)

August 17, 2014

Still knocking the dust off of Aboriginal Linux. (Rust? Whichever is a better metaphor.)

My build system needs to do a _lot_ of things, simultaneously. Even though it's designed with a fairly tight focus (build the simplest Linux system capable of rebuilding itself from source code), within that I want it to do a lot of things. (There was a whole presentation on this.)

One of them is buildall.sh, compiling each target in parallel. Which is tricky because there are some shared pieces of infrastructure that need to be built first (host-tools), and then _not_ distrubed by later builds.

If you run record-commands.sh and then run host-tools.sh, I want the second to re-run the first when it swaps the $PATH. This first happens about midway through the run (when busybox and toybox and the host toolchain symlinks are in), but before it's built mksquashfs and distcc and such. (Which I don't count as part of the 7 packages because _technically_ you don't need 'em. Once upon a time there was a mke2fs in busybox, and I still intend to do one in toybox. But you never have to fsck squashfs, and can run multiple emulator instances from the same filesystem at once. I might actually do a mksquashfs at some point after I finish the gzip compression side...)

August 16, 2014

That was a fun one to track down.

Smoketesting aboriginal for a release with the 3.15 kernel, just to get it out there. Of course testing against current toybox, with the new "install" command I completed yesterday. First bug: install needs to ignore -c. (No really, the man page says "ignored", but binutils is providing it. Easy enough.)

Second bug: when natively running the lfs-bootstrap build, it the m4 ./configure stage fails saying it can't run C programs the compiler produces, and the actual failure message (buried in config.log because gnu) is "Accessing a corrupted shared library". What?

So: an hour of digging into ./configure trying to tease out of it what it's actually trying to _do_. For some reason the C program and compiler command line it's trying to build aren't recorded in config.log. (Because gnu.) And it deletes both before reporting that something was wrong with them (because gnu). Also because gnu, despite the configure shell script being 862k of block copied boilerplace, the delete command isn't a simple "rm" line I can delete; there are plenty of those but the one actually relevant here is buried in some environment variable being expanded by eval or some such. (Or maybe a "trap EXIT", I tried "trap '' EXIT" to stop it but that might append rather than resetting, not sure.)

I fished out conftest by doing a "cp conftest walrus" before it deleted it, and confirmed from the command line that running it wasn't happy. ldd says it's got libc.so.0 (which is there) and ld-uClibc.so.0 (which is also there)... and then says "not a dynamic executable" _after_ listing libraries. What the? (This is the uClibc ldd, so there could easily be a bug, but something's clearly fishy with this binary.)

I dug up my static strace and ran that against it, but it just said that exec was returning -ELIBBAD. Not instrumenting the kernel right now, so I need to see what the compiler command line is. Back to fishing in configure.

Grabbed the conftest.c file the same way as conftest, but it's not very interesting. (Not quite hello world, but it just does an fopen() of a log file, doesn't write anything to it, and returns based on no ferror() and fclose returning 0. Yes, it checks the return value of close, because gnu.)

You can't easily stick echo statements into the configure script to see where you are and what it's doing because ~~gnu~~ it's redirected stdin and stdout and stderr to some random place, not in a wrapper function when it needs to but for the whole darn file. (It turns out >&6 will get you stdout, but > /dev/stdout won't. > /dev/tty might, I didn't think to try that.) Luckily, redirecting to a _file_ still works.

Eventually I tracked down the '(eval "$ac_link") 2>&5' line that was doing the compile (line 4 thousand something), but it turns out $ac_link evaluates to:

$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5

Yes, a string of a half dozen more environment variables, ending with ">&5". So if I eval "echo $ac_link" it will redirect to the output somewhere I don't want it to. I wound up doing "eval echo ${ac_link:0:76} >&6" to trim the >&5 _off_ so I could see what the actual compile line was, and it turns out to just be "gcc -o conftest conftest.c". Ok then. Run that and confirm it's producing the same broken binary.

At this point it's pretty likely my new wrapper script is hiccuping, but I've _built_ stuff with it. The root filesystem I'm running in was cross compiled with it. So here's a command line that's triggering bad behavior, let's compare its behavior with the old one. (What, did I get crti.o/crtbegin.o/crt1.o in the wrong order? Once upon a time that broke powerpc because one was falling through to the next...)

Setting CCWRAP_DEBUG=1 makes the wrapper spit out the rewritten command line before calling rawcc, and a cut and paste of that into the previous release of root-filesystem-i586 produces an identically broken binary. So it's not something to do with the way I'm building the libraries themselves, it's this command line. Doing a CCWRAP_DEBUG=1 on the _old_ wrapper (in the old root-filesystem) gives another big long command line which has things in a different order (the new one does -nostdinc and the headers first, the old one did -nostdlib and the libraries first).

The header chunk is identical except for the old one defining -U__nptl__, and adding that to the broken command line doesn't fix it. So that chunk can go away, and focus on the libraries...

And there it is. Old: "-Wl,--dynamic-linker,/lib/ld-uClibc.so.0". New: "-Wl,--dynamic-linker,/lib/libc.so". Which is correct for musl, and which _compiles_ for uClibc. But the whole of musl is one binary, _including_ the dynamic linker. (Different ELF entry points when called different ways, that's why it works.) In uClibc, each part is in a separate file.

Yes, it was something stupid I did. Of COURSE it was something stupid and entirely my fault, that's how these things work. The reason everything else worked is I was testing static compiles, the only dynamic compiles I tested were musl, I didn't _regression_ test that I hadn't broken dynamic compiling with uClibc. (The aboriginal build is mosty building static packages, because the native toolchain should run on an arbitrary host and because busybox is 20% faster running ./configure under qemu without the dynamic linking fixups causing repeated retranslation of executable pages. Self-modifying code is the worst thing you can throw at QEMU from a performance standpoint.)

Now to fix it, rebuild, and see what else breaks in the linux from scratch native build. (Well, still the old linux from scratch 6.8 build. I need to finish the update to 7.4. I know there are newer ones now but that was a LFS and BLFS simultaneous release.)

August 15, 2014

My netbook rebooted last night, for the first time since February. A moment of silence for approximately 85 open windows on 6 desktops. I have no _idea_ what I was in the middle of doing.

Possibly I just drove it so deeply into swap that 5 minutes after a keypress it still hadn't unblanked the screen. Dunno.

Oh well, I needed to do something like that to swap out the hard drive in the netbook with an SSD. Wondering if I should maybe take the giant laptop to a repair place first and have somebody frown at it expensively.

Speaking of which: google's chromosome browser has a "did not properly shutdown, reopen all those windows" button which opens them all to "could not load" if you do it without net access. This is exactly what I want. But then the next time you connect to the net, it triggers a reload of every tab that couldn't connect. All 500 of them. Simultaneously. Playing every youtube video at full volume, focus-grabbing popups for every page requiring a login, doing the stupid "time warner's nameservers couldn't look up the DNS entry for this thing that's only available on an intranet you're not currently connected to, let's redirect you to a page of advertising that loses the original URL"... The _real_ fun is if I'm on the bus and driving past a McDonalds without remembering to turn off wireless and it loads every tab with the "click this pointless button before we'll give you internet oh by the way this destroyed the saved URL" page.

I _actively_ do not want chromatin to try to reload all those tabs the next time I connect to the net. Google programming this behavior into their web browser (which it did NOT do a few months back) was a stupid regression.

I have no idea who I'd even complain at. Might locally hack it away, if I can be bothered to waste 6 hours figuring out how, knowing the next upgrade will break my local hack. (For a real open source project I'd submit the complaint and hack upstream, but Google has its head too far up its ass to ever notice the existence of outside contributions. Epic Not Invented Here syndrome, they produce regularly updated abandonware.)

I consider this another instance of "cloud rot": it used to work, it stopped working, because I haven't got a fixed locally installed set of software I can retain known behavior for.

August 14, 2014

Darn it, I saw a bug I can't reproduce. The new parallel make for toybox hit an error in one of the files and didn't stop, instead it continued on until linking (which noticed the missing symbols and barfed there, but still.

I think what's happening is the PID number is wrapping (the pid range defaults to signed short, I.E. 1-32767, to avoid confusing really old brain-dead programs that haven't been recompiled since 1992, it's /proc/sys/kernel/pid_max).

pid_t is actually int, although finding that in the glibc/headers from the gnu/dammit project is absurdly complicated, if you grep for pid_t and typedef on the same line, you get this pile of redundancy:

/usr/include/termios.h:typedef __pid_t pid_t;
/usr/include/sched.h:typedef __pid_t pid_t;
/usr/include/unistd.h:typedef __pid_t pid_t;
/usr/include/x86_64-linux-gnu/sys/shm.h:typedef __pid_t pid_t;
/usr/include/x86_64-linux-gnu/sys/types.h:typedef __pid_t pid_t;
/usr/include/x86_64-linux-gnu/sys/msg.h:typedef __pid_t pid_t;
/usr/include/gcrypt.h:  typedef int  pid_t;
/usr/include/utmpx.h:typedef __pid_t pid_t;
/usr/include/signal.h:typedef __pid_t pid_t;
/usr/include/time.h:typedef __pid_t pid_t;

I.E. making DARN sure that typedef happens. No, we couldn't have one place that does that, it would make too much sense!

But what's __pid_t then? A bit more grepping finds:

/usr/include/x86_64-linux-gnu/bits/types.h:__STD_TYPE __PID_T_TYPE __pid_t;

Which is just silly. So what's __PID_T_TYPE?

/usr/include/x86_64-linux-gnu/bits/typesizes.h:#define __PID_T_TYPE __S32_TYPE

Because gnu. No, seriously, because gnu.

/usr/include/x86_64-linux-gnu/bits/types.h:#define __S32_TYPE int

And THERE is your int. Which the LP64 standard says is going to be a signed 32 bit integer.

Meanwhile, musl-libc has just the one hit, easily grepped for:

$ grep -r "typedef .* pid_t;" musl/include musl/include/bits/alltypes.h:typedef int pid_t;

If you want to know why I prefer to use musl, it's because it's like that _everywhere_. Turtles all the way...

August 13, 2014

Poking at getting the 3.15 kernel into aboriginal, and commit 5d2acfc7b974 broke miniconfig fairly fundamentally.

The commit is "kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT", and it exists to switch ON symbols in allnoconfig, so it can disable all the default values behind them.

This is a big behavior change. The point of miniconfig is "start with allnoconfig, then switch on these symbols as in menuconfig, letting the dependencies run." Yes it was a bad historical decision to use visibility to mean two different things (this symbol is shown when configuring and this symbol takes effect when compiling), but it's done and people made a lot of decisions about what's a good configuration granularity based on that. Should you _have_ a symbol to do something deeply esoteric which would be required to be "y" in order for your kernel to boot at all? That's fine if you can hide it away in an "expert" menu. But now this change is forcing miniconfig to add all those symbols, and I won't do it because the change is stupid. There were uses of "allnoconfig" other than debug code, and turning it into a debug-only thing breaks them.

August 13, 2014

Sigh. My email machine (the one I couldn't find the power cord for during the flooding) died last night. Powers on to a black screen, not even the manufacturer's logo, the hard drive light comes on for half a second and then not again...

It's always something.

The problem could be as simple as a SIMM got unseated, but I spent a lot of the evening digging through boxes to find the terabyte USB drive so I can make a '2014' snapshot of my netbook since it's been about a year, and said email machine was my "regularly rsync to this" backup box. Also refreshing the usb key copies of the aboriginal and toybox directories.

My workspaces tend to have a lot more than actually gets checked into the repositories. I generally have directories like "toybox/toybox", "aboriginal/aboriginal", "linux/linux" and so on, where the upper level ones have all sorts of debris that shouldn't go in the repo dir. For toybox I've got copies of all the other projects listed in the roadmap, snapshots of nacl and the bsd port of openlaunchd, a zillion saved emails and documentation files, half-composed cleanup descriptions, something called "shocco" which is apparently a documentation generator I was looking at, various shells cripts with names like "fiddle.sh" that I'd have to stare at for a while to remember what they _do_...

For the aboriginal one one of those shell scritps is "boxen.sh" which looks at build/host to see what's enabled in busybox and in toybox, and in build/logs to see how many times each command line is run, and then it produces a report of which busybox commands are still in use in the aboriginal build, sorted by number of times it's called. The tail of that currently looks like:

    10 busybox dd
    12 busybox gzip
    46 busybox bzip2
    81 busybox tar
    85 busybox find
   682 busybox diff
  1041 busybox awk
  1942 busybox sh
  2024 busybox install
  3202 busybox tr
 16603 busybox expr
130099 busybox sed

And that's what I've got left to replace before aboriginal can rebuild itself. (Of course to build LFS and provide a tolerable native command line it also needs: ash bunzip2 fdisk ftpd ftpget ftpput gunzip less lspci man mount pgrep ping pkill ps route sha512sum test unxz vi wget xzcat zcat.)

So anyway, when I backup my workspace I back up the higher level directory, not just the repository. Meaning it's generally a lot bigger than what's checked in because there's tarballs and such in there...

Blah. What I _should_ do is finish backing this netbook up, install the 500 gig ssd I bought just before the flooding, and install the current ubuntu on here. Then I could host email locally. (The email machine has a more recent ubuntu version than the netbook.)

There's a problem where I tend to be too busy _using_ my machine to do any upgrades that require a reboot. This netbook was last restarted in February. I have many, many open windows (in 8 virtual desktops), and the current set of software that's on it hasn't paniced or failed ot come back from suspend in all that time.

August 12, 2014

A while back I told Chrome (chromium? Whatever Google's web browser is called this week) to disable javascript for all google.com domains, because 15 second load times for the main google.com page (waiting for that stupid little microphone at the end of the text entry bar to show up) got old, and I don't _want_ it constantly bothering me about G++ or "on this box you're using firefox, why would you do that, here's a giant ad for chromatite".

It's amazing how buggy the site gets when you do this. When you go to maps.google.com you get a big grey unresponsive screen with "google" at the bottom. In theory, you can go to http://maps.google.com?output=classic and use the OLD google maps, which worked fine without javascript. (You couldn't scroll or see sattlite view, but "look up address, display result" worked fine and that's 95% of the use cases right htere.)

Alas, when you go there with javascript disabled, Google redirects you to http://maps.google.com?m/output=classic&oi=nojs which shows the main google.com page. (Which is not the maps page.) So what you have to do is delete the crap it added and enter the original URL a _second_ time, and _that_ works (if you immediately do it after the first, failed attempt).

Google! Suffering from "cloud rot" since... around 2011 I'm guessing.

I'm wrestling with this trying to look up where the new Traitorous Joe's went in at the Arboretum. Apparently it's replaced the Sephora next to the Starbucks on Great Hills Trail. Of course the only way I found this out was by googling for "trader joe's austin arboretum" and finding an article about its grand opening two months ago which listed the address. Google Maps still thinks it's not open yet. So Google Maps has a combination of stale data that doesn't include what I want, and horrible new interface that lost capabilities it used to have.

Can't leave well enough alone. Gotta squeeze blood from a stone... There's a cloud rot song in there, somewhere. Possibly Steven Brust could write it.

I'm still angry that chrominator is giving me a nonfunctional Google page instead of the new tab with the recently visited grid. Still haven't found a way to switch it off short of hacking chronometer's jar files. When my machine's overloaded to swap city (I.E. basically always), it can take 30 seconds to open a new tab because of this garbage. (It's trying to _update_ to have the new graphic do jour. Stoppit.) And if you _do_ click on the google search bar of said page (done so by accident more than once), the first character you type switches focus to the title bar. If I _wanted_ to type into the ntitle bar, I'd have started doing so years ago. Sometimes I want to _search_ for lwn.net instead of go there, and I'm not meddling with in-band signaling for a field with that many overloaded meanings. Not gonna do it.

Sigh. I'd switch the default search engine to duckduckgoose or similar if that would make it stop doing this, but I haven't found a knob. Google is not big into giving people a choice these days.

August 11, 2014

There are a lot of fiddly corner cases in mount. Even ignoring all the LABEL= and UUID= stuff for the moment.

Case in point: in fstab the "user" option lets anybody mount it, but only the user who mounted it can unmount it. (How does it keep track? Dunno yet.) The "users" option is a minor variant that lets anybody unmount it. (There are no corresponding "group/groups" things, but groups are an anachornism from the minicomputer days that really need to go away now we have containers.) There's also an "owner" option, which tracks based on the owner of the block device. (Of course given that there are four categories of filesystems: block backed, pipe backed (usually called "network" but fuse and virtfs are in here too), ram backed, and synthetic (proc, sysfs, debugfs, devtmpfs...), three of which don't have a local file you can check permissions on. Yes, in this categorization loopback mounts count as block backed.)

What I'm trying to look up is if there's any equivalent of

There is craziness like "mount -a -O _netdev" (because they decided "mount -a -O no_netdev" should have an underscore?)...

Sigh. [read, read, read...]

August 10, 2014

I've been plugging toybox into the aboriginal build (with the 2.15 kernel, yeah still a release behind) to regression test stuff. That's why I've been doing 8 gazillion "find" tweaks, because stuff kept breaking. (Sometimes in subtle ways, such as "find / -mindepth 1 -maxdepth 1" traversing the entire filesystem because the test being false doesn't stop it from recursing. The behavior isn't _wrong_, modulo being extremely wasteful and potentially having permission problems with directories it can't access.)

I can trim the busybox-defconfig yet more, switching off find, cpio,

Diverged into cleaning up lspci because that was in the busybox list and I went "wait, didn't I already clean that up"? Mostly, but that was before the vendor database support went in.

Ok, people, please don't do this:

strncpy(vname, vtext, strlen(vtext) - 1);

It is earnest, well-meaning, and COMPLETELY USELESS.

August 9, 2014

Toybox issue: I taught the build how to build individual commands for about 2/3 of toybox, namely the commands where the C file and the command name match up. I'd like to add the remainder, but it's a bit trickier.

The problem isn't finding out which file the command is in, some variant of grep "OLDTOY($COMMAND," toys/*/*.c can do that. It's figuring out which config symbols need to be switched on. Because commands that share a file sometimes depend on each other, this command is only available when this other command is enabled.

For actual OLDTOY variants I can grab the second argument from the OLDTOY macro and capitalize it, but even that's not necessarily sufficient. Look at the relationship between "cp" and "mv". Currently both are in cp.c, and the config symbol for MV is actually "CP_MV", I.E. an option of cp. I can break it out, but it still depends on CP (in the kconfig), and more to the point there's a CP_MORE and a CP_MV_MORE, and the second depends on the first. The existing scripts/single.sh will switch on subsymbols (When building "cp" it enables both CP and CP_*), but switching on MV_MORE won't switch on CP_MORE because the "depends" is not easily parsable with sed.

What I need is somtehing I've needed for a while: kconfig infrastructure that can switch on a symbol this symbol depends on. Then I would have to keep adding symbols for menu guards every time the kernel guys decide to visually tidy up menuconfig a bit more.

Of course at some point I need to write my own kconfig stuff that isn't GPL. That's been a todo item for years.

August 7, 2014

Pixel (the dog) died in her sleep last night. The autopsy says she had a tumor the size of a quarter next to her heart, which ruptured.

Very, very late to work and then distracted enough I left my bike on the bus again. (Only got 4 hours of sleep.)

August 6, 2014

Still banging on the find implementation, and WOW are the gnu/dammit extensions stupidly designed.

So I hit a couple different things that required -mindepth and -maxdepth. It does not work like other commands. If you type "find . \! -mindepth 2", it complains that they never bothered to implement that because GNU.

All the other numerical values in find let you go +N, N, or -N to indicate whether you want equal, greater than, or less than. (Posix requires this.) But not this crazy gnu extension, it just takes a number, and has different option names for interpreting that as greather than or less than.

Also, there's already a "-depth" (triggering depth-first search), and this has nothing to do with that, so why choose that name..?

The Free Software Foundation remains really bad at any sort of engineering activity.

August 5, 2014

Chipping away at the remaining mount todo items, which turned into fixing umount todo items. (Specifically "user" support, to let umount regulate the suid bit via /etc/fstab annotation.)

August 2, 2014

Day 2 of flooring installation. It looks really nice. The couch is on its side in the kitchen. Bookcases have been moved from the office to the bedroom to the living room. The dog is confused but taking it in stride. The kitten is enthusiastic about all the excitement, and wants VERY MUCH to interact with wet tile grout in ways that cannot possibly be healthy. Peejee has written us all off and is staying in the backyard for the duration. George is in a cupboard, but that's actually fairly normal for her.

On the way home from work yesterday, I had to buy more tile, for reasons that remain unclear. (I think the store misplaced some of our first order, we can worry about it once it's all .)

So: toybox! I got find -exec working today (just dumb bugs and off by one errors. Doing array manipulation in C it's possible to have like 5 different off by one errors in different lines.

I need to read the spec one more time and maybe consult Wikipedia[citation needed]'s opinion on what the find command should do. It's about as far as you get from "authoritative" (Wikipedia is the world's largest collection of anecdotal evidence), but I missed the need to detect loops on my first reading of the spec, so... (Not caring about Slowaris relationship with Jim Morrison in the -type section, and the bit about I/O redirection is shell stuff that has nothing to do with find... Still, could be good test cases in there.)

There are actually 8 gazillion test cases that are kind of hard to add to the test suite. Implementation-wise: "find -exec" should not segfault if there's no semicolon, nor should "find -exec \;" with no command. (Both did for a bit and I fixed it.) It should _fail_, but do so with a coherent error message. How do you you detect that when the standard doesn't specify an error message or return code? (Keep in mind I prefer my tests to pass in the gnu/dammit version too, and busybox where applicable. Although sometimes those ones do something _stupid_, so it's not always achieveable except omitting a test...)

Anyway, taking a break from find for a bit and going through my list of unfinished commands. My next big "thing I should tackle" is getting mdev up to speed, I've had multiple requests for that. (And I should cleanup sysvinit and work on a lunchd implementation before too much longer, before systemd is badly entrenched. I need to set up an android build/test environment so I can fiddle with locally built stuff. Doesn't have to be the current version since Google's Not Invented Here syndrome now extends to hiding the code from the world until it's been dead for a full iteration of Moore's Law.)

But first: I was adding grep -ABC support, implementing mount, and rewriting dd.

And I should finally attack sed.

Plus this morning I got a diff implementation from Ashwini Sharma and although I thought I was being pretty vocal about wanting to use Bram Cohen's "patience diff" algorithm, a quick google pulled up the most recent mention on the list in 2012. Oops.

Possibly the roadmap should have "plans" sections for each command?

I need to update the roadmap. And the status page. I have large additions to code.html that I haven't checked in yet because they're not finished. (I could get function lists with sed but filling out the descriptions of the resulting functions has to be done by hand. And then how do I keep it _current_?)

Intermittently tempted to launch a patreon to see if I could replace $DAYJOB with fulltime work on toybox, but then things like this flood happen and suddenly household finances have a lot less margin for error. Between evenings and weekends I'm actually carving out about 20 hours/week for toybox these days, which is a lot better than I was doing six months ago. But then implementing "find" eats somewhere between 40 and 60 hours, when I thought I could knock it out in a day or two...

July 30, 2014

Blah. Gradually feeling worse since monday, and it's finally turned into enough of a sore throat and chest cold (which combines poorly with lingering heatstroke) that I'm taking a day off from work. (Ok, I fixed one pending issue and emailed it in, today was _supposed_ to be my telecommuting day but I'm not up for it.)

Before the flooding I was actually feeling _recovered_ for about a week there, but between hauling wet carpet (probably where I picked this up), the air conditioner going out twice, and general disruption in sleep schedule... yeah, not fun.

Took the phone in to the T-mobile store, where they plugged it in and it booted up first try. Of course. (With the old background so the OS upgrade didn't take.) Their theory is it wasn't charging after I moved it, and the OS install was aborted by this (cpu froze) but the screen didn't go off until the battery died the rest of the way, and then it was too drained to charge properly until the battery "rested" a bit.

A phone can be too drained to _charge_. That seems like a design flaw.

July 29, 2014

No _clue_ what box my email laptop's power cable is in. I should just go to a computer store and see if I can buy a replacement.

Programming stuff! What have I been doing... Finally got the next big chunk of find.c checked in. Needs a whole lot of debugging (it compiled, and I fixed enough regressions that "find -type f" works).

Here's the sed invocation I used to make an unordered list of functions in the lib.c and xwrap.c files for the code.html page (last week, but for the record):

sed -n '/^[a-z].*(/s@.*@<li><b>&</b></li>@p' lib/lib.c

July 28, 2014

T-mobile's "unlimited" internet is not actually unlimited if you pay them for streaming: that's capped at 3 gigs. If we'd known that, Fade probably wouldn't have watched Hulu on her laptop.

However, the unlimited bandwidth DOES include watching netflix on the phone itself, and I haven't even watched the last Matt Smith season yet. I may make an _effort_ to make sure T-mobile does not come out ahead on this one.

(Fade swapped the cable modem back today, but I'm still annoyed at them.)

And my phone decided to brick itself. It's had a pending Android version upgrade for a month that I've been ignoring, but apparently if the battery completely dies and you boot it up plugged into USB, it starts the install without asking. This is particularly awkward if the power source is a laptop that itself has an almost dead battery.

I got it swapped to actual wall current, but it then went half an hour with no screen update (progress bar at 0% the whole time), and when I gave up and force power cycled the thing, it wouldn' reboot. Plugging it back into the wall gave a red flashy light instead of the battery charging animation.

So that's nice.

July 26, 2014

Having tile installed is going to cost more per square foot than the tile itself does. Lovely.

So: fixing the dirtree stuff in toybox for find -execdir:

Last night I tried xoring 1<<31 into the filehandle to signal the comeagain call, but AT_FDCWD is -100 which already has that bit set, so I'd have to special case that with a dirtree_notagain() function, and although negative numbers count _down_ from all bits set (one's complement!) so -100^(1<<31) is somewhere over 2 billion and not a reasonable filehandle number with any real-world system resource limits... I just don't want to have to _explain_ it.

I have reached the point where the limit on the amount of cleverness I'm willing to put in my code is: do I want to write up a chunk on the code.html page describing how this works, or would I rather put that effort into trying to come up with something I wouldn't have to explain.

Looking at the structure layout again, there _is_ a place I can add a byte without pulling in padding up to sizeof(pointer), or more accurately where such padding already _happens_: the char name[] array at the end is already a random length, and it can start unaligned without ill effects (because C splices and dices strings all the time so padding the start of a char array wouldn't _help_, and if the compiler is doing it the compiler is broken).

Yeah, ok current versions of gcc are known broken and increasingly deteriorating since the gplv2->gplv3 license switch drove all the pragmatists into the arms of apple. But that last bit means I don't have to care, because the FSF can go off in its corner and die and won't take anything important with it anymore. It's already trashed all the projects it had access to.

So anyway, if I add a "char again;" before the name array at the end, we only add one byte (to the potentially tens of thousands of entries in a tree representing the metadata for a terabyte filesystem, but there you go).

The problem with keeping the filehandle open after dirtree_recurse() returns is that dirtree_recurse does an fdopendir(), readdir(), and then closedir() (which closes the filehandle). So instead I have to have dirtree_recurse() itself call the comeagain callback (moving it out of dirtree_handle_callback()), meaning it needs the _unfiltered_ flags (not just symfollow, it also needs COMEAGAIN), which means that commands using dirtree_recurse directly (I.E. ls.c) need to feed in DIRTREE_SYMFOLLOW instead of the zero/nonzero of toys.optflags & FLAG_L.

And yes, I could align FLAG_L to be the same bit position as DIRTREE_SYMFOLLOW, but see "don't want to explain it", above.

Many years ago, Intel's designs used "racy logic" where they saved transistors by having sections of their chips consume signals right when they were generated, so they didn't need any circuitry to latch the value and keep it around until it was needed, they just carefully lined everything up. And then they did a die size shrink and all their timings changed and their yields went into the toilet, and they had to yank the Pentium III 1.13 ghz chip from the market because they'd essentially overclocked their own chip.

It's a bit like that. Making everything "just line up" seems like the most clever design until you try to change stuff later and the house of cards all falls down. As probably Ken Thompson said, "When in doubt, use brute force." I have reached that point.

July 25, 2014

Hired Dudes (a phrase I grabbed from the disasterhouse blog) took away the fans last night. So quiet. we can use the lights again. Still haven't found the box with the power cord for the laptop that has all my email filters set up on it. I don't even know which pile to start looking in.

I've been chipping away at find.c all week anyway, and I've hit a glitch in -exec parsing that says I've got a design issue in my dirtree stuff.

Backstory: the -exec option to find runs a command in response to matches. (I usually just "find -print0 | xargs -0" because it wants the shell metcharacers "{" and ";" as arguments to use this, which you have to escape, but that's posix for you.)

There are two ways of calling -exec: ending with ";" and ending with "+". The + version aggregates together names and calls them (like, you know, xargs would), so I have to accumulate a list of commands. (Or use the DIRTREE_SAVE thing on the dirtree nodes, except in that case I'd have to save the directories the nodes descend from, and then mark whether or not that was saved because it had a find match or just because it was needed for the tree, which is why I'm not doing that).

(There's also "-ok" which is the "-exec" version of cp -i: it asks you "y/n" before each match. It's not required to support the + mode, only the ; mode, because combining the aggregation and the questioning is not really well defined. I _think_ I should ask before adding each name to the list, not before running the child process with lots of names. That's not the problem.)

There's a general problem of environment space exhaustion, because Linux caps a process's environment space at 32 pages, times 4k that's 131072 bytes, and that has to store the environment variable strings (each null terminated), the argv strings (each null terminated), and a pointer to each string (the argv[] and envp[] arrays, each _also_ null termianted). Oddly argc is passed in a register or calculated from argv[], so does not consume environment space. Find is required to break the file list into chunks small enough to not exhaust the environment space (although in theory you can define enough environment variables you haven't got enough environment space left to run "hello world".) But traversing a large directory tree can easily give more than 128k of output, so it's a real world issue. I can do math to figure out when the next filename puts us over, but only using incestuous Linux knowledge: I'd like to be able to probe this somehow...

That's not the problem.

One of the extensions Linux has is "-execdir", which does a chdir into each directory and runs children in that directory with the local path, not generating a path from root. This is kind of cool: it uses less memory and does less repeated directory traversal, meaning it's not just faster and more space efficient, but actually more secure in response to people moving things around while you're traversing them.

I implemented dirtree based on openat() and friends, the new Linux system call set that takes a filehandle to a directory and uses that as its current working directory. Alas, not every system call works that way: there's no mountat(), no chrootat(), and although there's an fexecve() the filehandle is to the executable to run. So I have to actually chdir into the directory to create a child process with that directory as its cwd.

Note that rm is supposed to be able to handle arbitrarily deep paths, which you can create on any filesystem by going "mkdir -p dir1/big/long/path/..." and "mkdir -p big/long/otherpath/.../dir2" and then moving dir1 into dir2, so even if there _was_ a limit it's not actually enforceable. But dirtree is opening a filehandle for each level, and the default "ulimit -n" is 1024 simultaneous open filehandles. Since path_max is no help here... Yeah, that's on my todo list.

That's not the problem either.

So I'm interested in making -execdir (and -execok) work. The _fun_ part is when you go "is there anything stopping a single find instance from having two -exec calls? No? In that case can one of them be -exec and the other -execdir? Yes?" Which means I need to grab a filehandle to the original directory and store it in the global state. But I don't want to "chdir ." with every single normal -exec, and my first reaction (have the child chdir after the fork() but before the exec) doesn't work with vfork(), and I've had interest from the uclinux guys in using toybox on nommu systems. So my _second_ guess is I can initialize the global TT.topdir file descriptor to -1, and then if it's -1 have -execdir's initial parsing pass (called with a NULL dirtree poitner) do an xopen("."), so we know we need to chdir to it for normal exec.

(While doing all this, I modified xopen() to default to adding O_CLOEXEC to the flags so I'm not leaking arbitrary file descriptors into children. What I did was XOR the O_CLOEXEC flag, so I reversed its meaning. If you want to switch it OFF, so a filehandle is inherited by children, xopen(O_CLOEXEC). Or, you know, just use the non-x open() function.)

(And then, because that's sufficiently non-obvious, I started describing it in code.html, which turned into adding descriptions of lib/xwrap.c and lib/lib.c functions in general... Yeah. Tangent. Big tangent, as always.)

All of this is still not the problem. Here is the problem:

For "-execdir {} +" mode I need a callback when we're done with a directory to flush all the files we found in that directory. That's what DIRTREE_COMEAGAIN is for: the first callback says "deal with children and then revisit the directory node afterwards". And to distinguish the first and second calls, you only get a second call for directories (so S_ISDIR(dirtree->st.st_mode)), and on the second call the directory filehandle (dirtree->data) has been closed and set to -1.

The problem is I can't perform directory local operations on the second callback because the filehandle I need to do so is gone. It was a design decision (obviously I can do all the directory operations _before_ recursing into the child... except when I can't), and it's coming back to bite me.

No, I can't just chdir to each directory as I enter it because it's a depth first traversal where each directory could have more directories under it and once I chdir into one of those I have to chdir back _up_ the stack afterwards to deal with more contents of the previous directory. And no I can't use .. because A) depending on the -HLP flags we may have followed a symlink to get here, and B) some malicous user can mv a directory while we're in it and then we'd recurse up to the wrong part of the tree. (For rm -r I have a vague plan to implement a DIRTREE_INFINITE mode flag which closes the filehandle and traverses back up via ".." but would compare the parent's saved st.st_dev and st.st_ino values with an fstat() of the reopened directory filehandle and error_exit if somebody is being malicious. It's on the todo list. Part of the _reason_ it's still on the todo list is readdir() has a second implicit filehandle open anyway, so I have to _stop_ readdirring the parent in order to descend into the child and then restart the parent readdir when I back up again and delete the child directory, and that means if I hit a file I can't delete the whole thing can loop forever, and _that_ says I should read and cache all a directory's entries before trying to delete any of them which can get darn memory hungry so maybe I should only cache the ones I _couldn't_ delete... I have design work to do here.)

So anyway, it looks like I need to modify dirtree to not close the filehandle until after the second callback, which leaves the question of "how do I signal we're doing the second callback?" That's the reason I did the -1 thing in the first place: I really don't want to add a field to the dirtree structure just to signal this. What with malloc() padding and alignment padding and such, even a char would wind up being 4 or 8 bytes, and with large directory trees that adds up to many kilobytes (even megabytes) of data doing something like "zip or gene2fs a big directory" which needs the entire tree instantiated up front.

(Not that I've implemented either of those yet, but as Dave Lister said, "I'm gonna get a sheep and a cow, and breed horses. It's me plan! I planned it!")

July 20, 2014

I spent today cutting apart carpet and pad with kitchen knives, and carrying long dripping strips of it out to the back porch where it can fester _outside_. Then I collapsed from exhaustion and napped on the bathroom floor for a bit because it turns out I'm not _entirely_ recovered from that heatstroke thing yet. (Who knew?) Still gotta deal with the tack strips. (Pointy bits of metal soaked by floodwater? As the person with the most recent tetanus shot of the household, those are my job. But they're a bit damp to come out cleanly right now. Dry they pry up reasonably well, wet you get an inch at a time broken off at the next nail holding 'em in.)

We called a professional remediation company, which had a guy come out and stick an electronic beepy device into the walls and floors and carpets and such, and frown expensively at them, and write things down, and basically go "yeah, it's all soaked". So they're charging us 3 grand to tear off our baseboards, drill holes behind where the baseboards were, and fill the house with 15 giant blowers and 2 industrial dehumidifiers for the next 3 days. This should dry out the sheetrock before Texas' rather aggressive microfauna can do anything particularly unpleasant to it. (This is also why the carpets had to come up a little faster than we'd expected.)

They're coming back at some point with a machine to tear up our nice wood laminate flooring, which is soaked and warping. We're replacing it all with ceramic tile, which was in the kitchen and is apparently our only flooring to survive intact. (It got soaked too, it just doesn't _care_.)

Some of Fade's friends also came over to help us move furniture away from the walls so we could be ready for the blower setup. We went to the storage space to get empty boxes _out_ of it, so we could pile the contents of the bookshelves into boxes, move everything away from the walls, and make our electric meter spin like mad.

The house is full of so many fans that we have to keep the lights off to avoid tripping a breaker. The air conditioner is still set at 72 but the house is 85 because SO MANY FANS. (Not looking forward to this month's electric bill.)

Asked about avoiding this in the future. The main Hired Dude had some landscaping suggestions and possible redirecting the gutter outflow things, but it mostly boils down to "try not to have 7 inches of rain in a single evening again". (Did I mention the house has been here for half a century and apparently hasn't had this problem before? And here we thought being hundreds of miles inland and a couple hundred feet above sea level meant the oil companies screwing up the weather would be less of an issue locally. Well, modulo the wildfires we had a couple years back burning down an entire suburb, anyway. That was back before the floods that trashed the suburb Steve Jackson lived in.)

Global warming: one star review, would not recommend.

(If this blog entry seems slightly incoherent, that's because it is. We bought Camine a metal bedframe so her mattress is up off the floor, but her box spring got hauled away by the people delivering the bedframe because it's soaked. She'll need a new one eventually. There's also an inflatable mattress, and the couch. We'd probably just get a hotel room but there are three cats and a dog who would be very unhappy to be left alone in a dark house with loud blowers and giant wheeled dehumidifers the size of our dishwasher with tubes leading into sinks...)

July 19, 2014

Dog at the vet overnight due to allergic reaction. Power went out due to a thunderstorm. Woke up at 3am to a quarter inch of water on the floor. The rugs are soaked. The cable modem is soaked. My netbook was up on a table, so that's something.

There were apparently 7 inches of rain at 183 and I-35, which is more or less uphill of us, and Austin is on granite so the water flows freely down to the river but can't really soak _in_ most places. There's a few feet of topsoil, then bedrock. So there was flooding in Hide Park and UT and all over where we are.

Of course, considering Fade's friend (and ex-boss) Steve Jackson had his house _totaled_ by a flood last year (he got several feet of mud, we got a quarter inch of rainwater that didn't even make it to the bottom shelf of our bookshelves), could be worse.

We've got the air conditioning down 5 degrees lower than normal until we can get somebody to come tell us how badly soaked we are. (This is texas: you do not trifle with potential mold here.) The carpet in Fade's office and Camine's bedroom are both completely soaked. The other half of the house _seems_ ok, but puddles of water keep appearing from nowhere which means it's coming up through the slab. (Apparently the water table is currently at ground level.)

Nope, none of it helps the lake or the aquifer, that would need rainfall further west. This all runs into the river they've called "town lake" ever since the put dams at each end of it, and goes downstream to deliver bat guano to rice farmers. Running through a hydroelectric generator along the way, apparently. As far as the city's water supply is concerned, we're still in stage 3 drought with our water supply capacity percentage in the high thirties somewhere.

I am _so_ glad that when we had a small roof leak right after we got this place, we just replaced the whole roof. Our _ceiling_ is fine. The top half of the walls seem fine. But water's coming in through the baseboards, which is not fine.

It's been a week.

July 18, 2014

Finally feeling a bit recovered from the heatstroke at Texas LinuxFest, which means I'm catching up on the overdue deadlines for the factory packaging stuff I couldn't get working when I couldn't focus properly.

Picked up my bike from the Cap Metro lost and found today. They have a dedicated lost and found number with a three minute recording that announces their hours... but not their address. Kinda sums up Cap Metro, really.

I was out of it enough last thursday that I left my bike on the bus when I got off, and they'd pulled away before I remembered. (The busses have a pull-down bike rack in the front, which holds 2 bikes and is often full so I have to wait for the next bus, and sometimes the one after that. (It would be about as fast to bike in to work than wait for the next bus then spend time on the bus stopping every 80 feet, but I'd need a change of clothes by the time I got there and if I didn't bring one... I mostly just bike home, where a shower and change of clothes aren't a big deal.)

So I called Cap Metro and arranged for Fade to pick it up when the bus got back to the start of the line, where it has a 15 minute break. Except they told Fade the _departure_ time not the arrival time (Cap Metro!), so when she showed up at the specified time the bus had already left, and at that point I had to go to the lost and found, which is only open weekdays and closes at 5pm (Cap Metro!) so I had to take time off from work to do this.

Did I mention the reason light rail got repeatedly voted down by the populace of Austin the first decade I lived here was that everybody was entirey in favor of it as long as it WASN'T going to be run by Cap Metro? It's sort of a giant wirlwhind of nepotism, embezzlement, and incompetence. Alas, the nepotism part meant it poltiically prevented anybody ELSE from doing light rail until enough new people had moved to Austin who didn't know the history to go "sure, light rail, what a good idea" and finally vote it through with them ~~screwing it up~~ in charge.

Anyway, yay having my bike back.

July 12, 2014

The "cp" command never had -HL hooked up, and now that I'm trying to make it work it's fiddly.

One thing I noticed is that posix's definition of "-f" is more limited than the one I'm using, if you copy a directory over an existing file ("cp -rf dir file"), posix says that should fail. I even have a test in cp.test checking for that (and currently failing). But the definition I implemented is that cp -f removes existing target files, and doesn't care _what_ you're trying to copy over it.

I'm not sure whether the right thing to do is change the cp implementation or document the deviance and yank the test. I'm leaning towards the latter.

So back to -HL, it's actually -HLP where P is the default behavior and any of those flags switches off the previous one (radiobutton style), except it's _actually_ -HLPd except -a implies -d but you don't switch off the _other_ parts of a?

I _think_ -d is a synonym for -P. On linux symlinks haven't got attributes (except timestamps), so the man page making a distinction between --no-dereference and --preserve=links isn't immediately obvious to me what the difference is?

If you have a symlink to a directory, "cp -r" should copy the symlink, cp -rH should copy the directory, and cp -rL should follow symlinks _in_ the directory (potentially causing bad loops).

July 11, 2014

On the way to the bus home, found $66 in cash lying in a parking lot of a grocery store. Turned it into the manager of the grocery store, feeling kind of silly (no identifying information, just three twenties a 5 and a 1 lying in a 10 foot square patch of asphalt, but losing that probably screwed up somebody's day). Didn't quite make myself late enough to miss the bus, but did make myself late enough that the 24 hour bus pass expired by 3 minutes so it cost me a dollar to get on the bus. Yeah, that one's self-inflicted.

Ok, it was an arboretum grocery store with built-in farmer's market so maybe they didn't care _that_ much, but still. (Or somebody running a really weird sociology experiment, but you'd think a single $20 would cover that and I was _looking_ for somebody watching during the three minutes or so I stood in the parking lot holding a wad of cash yelling "hello?"...)

Yes, I could have kept it. I didn't. I expect the grocery store probably will. Yes, I'm agnostic bordering on atheist these days. (Can't tell you where the universe came from, but Jesus and Santa Claus go in the same bucket.) Doesn't mean I feel comfortable profiting from the misfortune of others or that I won't try to figure _out_ what the right thing to do in a given situation is, even when it seems largely futile to follow through on it. Everybody needs a hobby...

July 10, 2014

Ah, the function I want is fnmatch(). Carry on. (But wordexp() is still conceptually sad.)

I think I've worked out how to implement find's parentheses non-recursively, using toybuf as a stack. The easy way is to use a byte per level, but I'm only storing 2 bits (the "not" status, I.E. whether there was a ! before the parentheses, and the "active" status, I.E. whether or not anything in this parenthetical gets evaluated because we're on the wrong side of an -a or -o that's short circuiting this entire parenthetical out).

The less easy way is to hijack the bitbuf code out of compress.c and use only 2 _bits_ per level. That means instead of limiting parentheses to 4096 deep, we limit them to 16384 deep, and given that Linux currently has 32 pages of environment space per process (32*4k=128k), and each argument takes up sizeof(long) bytes for the argv[] entry plus the actual null terminated string, so each parenthetical level (a "(" and a ")" entry) would require (on a 32 bit host) 12 bytes, total 196608 which is bigger than 131072, so you physically _can't_ hit that limit on linux...)

Ahem. For the moment, 4096 should be fine. Wait for somebody to complain. (Heck, that's _nested_. Adjacent pops back off the stack.)

Speaking of the "active" bit, we also need to evaluate the entire thing once with the active bit off from the beginning (and a null dirtree), for two reasons: 1) syntax checking so we get error messages about bad command line arguments out of the way up front, 2) because posix says some tests (-user, -group, and -newer) must evaluate their argument only _once_, so we need to store it. I can just push those suckers on a linked list and then traverse the list during usage; nice thing about in-order evaluation is that works.

Possibly "concise" is a word I should put in the toybox goals. It says simple, correct, and small, but the source code code being "concise" (which is not the same as terse) is part of what I mean by small. ("Small source" and "small binary" are something things you actually wind up _trading_off_, as anybody who's seen busybox's ongoing #ifdef forest can attest...)

July 9, 2014

Still re-implementing find.c, and I've gotten to -name so I want to do a regex-like search except using glob() syntax. But the glob() and wordexp() functions in posix are INSANE.

I'm searching the directory already, I want to filter the results. I want go "does this name match this pattern". And I can't _get_ that because both these functions want to search the directory themselves. Huh?

I don't want a function to muck about with the filesystem, I want a function to check a string like regex does but using a slightly different syntax: * means .*, ? means ., and [a-z] means same as regex.

But neither glob nor wordexp() will let me decouple the "match" and "directory traversal" functions. So deep in that sandwich is code I want, but I can't _use_ it.

The especially hilariously bad one is wordexp(), which does "shell expansion". It expands shell arguments so tilde looks up usernames and $VARIABLE gets substituted and $(blah) calls subshells and all sorts of useful stuff... except for these bits (from the man page):

The strings must not contain characters that would be illegal in shell command parameters. In particular, there must not be any unescaped newline or |, &, ;, <, >, (, ), {, } characters outside a command substitution or parameter substitution context.

If the argument contains a word that starts with an unquoted comment character #, then it is unspecified whether that word and all following words are ignored, or the # is treated as a non-comment character.

The result of expansion of special parameters ($@, $*, $#, $?, $-, $$, $!, $0) is unspecified.

I grepped the old bash 2.05b source Aboriginal still builds (good enough to build linux from scratch, 1/3 the size of current bash with fewer dependencies) and both the current busybox git and the old one from my day. There are several _mentions_ of wordexp (bash has a --wordexp mode, lash.c had comments explaining why it _didn't_ use wordexp), but nobody implementing these shells actually calls the wordexp() function. Yes: even busybox, implenting at one point _five_ different shells and with a mandate to avoid code duplication (yes, I know) does not call the libc function but instead reimplemnts the functionality from scratch.

Why on EARTH would you make a shell expansion function that CAN'T BE USED TO IMPLEMENT A SHELL?

July 8, 2014

So I'm rewriting find.c and as with the dd.c rewrite, every time I reach a good stopping point... I can't check it in. There's already one in pending and the infrastructure can't handle two names with the same command. If my change doesn't apply to the one in pending, I can locally replace it but don't feel good about deleting the one out of pending until the new one is better than that one.

Said dd rewrite is still languishing out of tree, while mount.c is more than half-done and that part's checked in. Not quite sure how I should handle that. It's a workflow issue really...

July 7, 2014

Toybox 0.4.9 is out. Didn't get half what I wanted to done, but I got various unexpected bugs tracked down and fixed. (The aboriginal ./download.sh stage was deleting the entire packages directory after a host-tools build because date -r was saving data in the wrong field. If said data for %s was positive it was generally way in the future, but when it was negative every file registered as older than the start of download time, and was thus stale. Yeah. Fixed that...)

Only promoted 9 commands though. Got over a dozen new pending submissions. Still falling behind. Working on it...

July 3, 2014

A very nice contributor sent toybox a half-dozen test suite entries, and a lot of the things they test for don't actually work. I started with chmod, which was breaking because it created a chmod 000 directory and then rm -rf wouldn't delete it. That's an rm bug.

Meanwhile, musl remains stroppy. The powerpc and mips dynamic linker is segfaulting. (Static builds work, dynamic builds do not.)

July 2, 2014

Still doing fixups to Aboriginal Linux. Part of it's musl (so much testing and debugging), and part of it's just that I haven't paid proper attention to this project in a while and small todo bits have accumulated.

On the bright side: basic musl support is in! (Just comment out the UCLIBC_CONFIG entry in a sources/targets/$TARGET file and build the target.) I would be _amazed_ if it can actually build Linux From scratch, and there's no musl support for sparc, m68k, or alpha yet. (Not that I've finished and checked in Alpha, it's just one of those pending todo items. I was working on it in Minnesota and got most of the way there...)

This means switching wholesale over to musl would be a bit of a regression, but it's worth it to switch from the dead project to the actively maintained one. I might be able to add sparc support to musl myself. (Ok, given my todo list we're talking about 2016, but still...)

June 30, 2014

Tried very hard to get a release out by the end of the month, but instead I went down the rathole of getting musl to work in Aboriginal Linux. (Well posix_fallocate() didn't work in uClibc even though baseconfig-uClibc has the symbol enabled that it LOOKS like it depends on, and one thing led to another, and it just ate the whole weekend.)

Given the number of half-finished things I should be able to close off with another week's work, I bumped the release to _next_ weekend.

June 27, 2014

Doctor's appointment early this morning to ask why I still feel terrible two full weeks after after my brush with, apparently the official term is "heat exhaustion". He said if I really screwed myself up it can take a month to recover, that I was obviously dehydrated there in his office, and that I should avoid caffiene and salt, stop exercising, stay indoors in the air conditioning, and drink gatorade until the electrolyte balance in my cells recalibrates itself. (Basically a couple days bed rest would be ideal.)

I'm just glad I didn't do obvious cardiovascular damage. (Heart attacks are not a good idea.)

So, programming. Trying to get a release out monday and I've gotten over a dozen new pending commands. Other than getting "mount" and friends finished, I'm just doing a sweep of the pending directory focusing on cleanup. (It helps that I have like half a dozen partial cleanups lying around, but I need to finish and promote stuff to avoid falling further behind submissions.)

One such thing is lib/password.c which involves terminal control. There's a horribly time consuming can of worms I need to open at some point. (I researched a lot of this in 2006, but that was 8 years ago.) Might as well dink at it now:

Terminal control is inherently semi-obsolete. The original Unix group at Bell labs had DEC PDP minicomputers hooked up to teletypes, a keyboard attached to a printer that wrote what they typed on form feed paper. (And not even a dot matrix printer, but moveable type attached to a rotating steel ball so there was a hardwired set of symbols it could print.) These printers were so primitive they didn't even know when they'd hit the right edge of the paper, you had to send them a "carriage return" telling them to go back to the left edge of the page, and some of them needed a "new line" character on top of that to move the paper down to the next line.

A tty's printer and keyboard weren't necessarily directly connected: what actually happened was the TTY's keyboard sent bytes to a serial port, and the printer part read bytes from the serial port, so the computer had to echo what you typed back out for it to show up on the paper. Except no two brands of teletype spoke quite the same language: they didn't have the same keys on the keyboard, the same punctuation layout on the daisy wheel, and when they did have common keys/symbols they didn't even use the same bit patterns on the serial port to represent them. So the unix "teletype" support layer (that's what "tty" stands for) had to have a table saying "this byte means backspace" and "add a newline to the output after every linefeed", and so on. (Luckily by the time Unix came along the ASCII vs EBCDIC debacle was mostly over, with everybody outside IBM standardizing on ASCII. Ok, even in the 80's Commodore still had uppercase vs lowercase swapped, and used ctrl-C to switch colors, and everybody used different stuff past 7 bits. But compared to the previous mess, this was totally standardized.)

The late 70's saw the introduction of the "glass tty", a terminal with a CRT built into it instead of a printer, and a graphics card rendering a monospace font. This let you do new things like move the cursor around and replace existing text on the screen, which combined to let you _insert_ text in a line. (Radical new concept, killer app for the new tech.) To do these magic new things, the glass tty devices introduced "escape sequences", multibyte sequences to do the new things printers couldn't do. Of course DEC's VT100 and IBM's TN3270 (and a dozen others) still had incompatible hardwired sequences telling them what to do, so they needed translating, and the tty layer in the kernel didn't know how to handle more than one byte at a time, so userspace software packages like "termcap" were invented to handle that in userspace, solving the problem by adding another layer of indirection and a lot more complexity.

But in the 80's hardwired terminals gave way to PCs running terminal software, emulating the old VT100 and 3270 escape sequences and all the different newline/linefeed/backspace behavior. From this point on, you had software talking to other software. Neither end had hardwired requirements, each one was emulating earlier behavior. Both sides were even using the same lookup tables: the termcap package was replaced by terminfo, which was replaced by curses, which was replaced by ncurses, but they still emulate the same earlier interfaces to support all the earlier software trying to use them.

These days, /lib/termcap/l/linux is a config file describing how a Linux terminal should behave, at least when the $TERM environment variable is set to "linux". There are 38 other files in /lib/termcap/*/* in my ubuntu install, and most of them are named after the pieces of _software_ that want that command set. Things like xterm, screen, rxvt, mach, hurd, cygwin...

Back in the 80's, DOS came with "ansi.sys", which intercepted output before it got to the IBM PC bios screen writing routines, and handled the escape sequences standardized by the American National Standards Institute. 30 years later, everything in the world understands these sequences. If nothing else THAT provides a common subset of behavior that you can expect to just work.

At this point you'd expect them to pick a behavior, standardize on it, and discard the obsolete emulation machinery. But no, outputting ANSI escape sequences directly is somehow unclean. We've got all these horribly overengineered layers, there must have been a very good reason for implementing these giant piles of crap, and therefore it's wrong not to use them.

Sigh.

June 26, 2014

On a video conference call today. 13 people at 5 sites on 2 continents, and it had one woman (in ireland). The rest were white men.

We're excluding 51% of the population of earth. I think that's an issue. (It's also an issue that various ethnic groups in the united states and people in countries around the world also get excluded, but US women are speaking the same language, going through the same education system, and benefitting from the same infrastructure. Their participation rate being an order of magnitude less than males is a single isolable variable.)

I've been interested in this topic and what (if anything) I can do to help for a long time. Unfortunately, the software industry seems to have decided that the fix for this is not to do anything to help the 51%, but instead to have members of the 49% get surgery and cosplay.

This is why austinlinuxchix.org died: two of the founders (Kandy and Sally) were coworkers of mine at Boxxtech, where I got to see them gradually lose interest due to (as Sally phrased it) the "shortage of factory original females". And when the founders wandered off, the organization collapsed. Kandy went on to work for the CIA, and Sally went off to get a biology degree and exit programming entirely. (Another organization I followed at the time, girlstart, avoided this fate by transitioning from high school to middle school, and they also backed off from programming to general "STEM". I'm still on their mailing list, but 5th graders dissecting squid wasn't quite their original focus back in the 90's. Oh well.)

These days, 2/3 of the "women in tech" I currently follow on twitter have turned out to be trans. (It's an oversimplification to say my apprentice Nichole had a bad experience with "skirtman" in Arkansas and switched majors to pre-med, but technically that's what happened.)

So when lwn.net links to a talk on diversity and says of the presenter "she's a 20 year industry veteran named Coraline"... Neil Gaiman invented the name "Coraline" (originally a typo) in a book published in 2002. I don't even have to view the video to go "this passes about as well as that balrog on the bridge with Gandalf". But obviously this isn't a brogrammer mansplaining because... they're just not. Ok then.

(Note: programming hasn't really ever had a problem with gay men that I've noticed. From the early bluebox hacker "Captain Crunch" to the author of Sendmail to Joel Spolsky, they've always been part of the tribe. Google isn't finding Joel Spolsky talking about "diversity" despite 15 years of blogging, authoring books, hosting a podcast...)

I'm still interested in learning about 51% of the population recovering from millenia of systematic oppresion and how my chosen field can stop being so unrelentingly hostile to them. But white dudes' ability to Make It About Them remains awe-inspiring, and the fix my industry has chosen is for dudes to move over into those seats and continue to outnumber the 51% even among "women in tech". I still don't see how this solves the actual _problem_, but I guess that just means I'm too old.

June 22, 2014

So many weird little corner cases with ccwrap, about 1/3 of which may be my fault ("-shared" is a link option, do I still need to pass it through when I'm manually supplying the right entry/exit code? Well, maybe. Do it to be safe...) and a bunch are the old wrapper passing through silly stuff (duplicate "-static", or "-static" with "-c", or any link options with "-S"...).

So a few fixes go ito ccwrap2.c, and a lot of sed gets applied to logrus.txt (the log file recording the input and output lines of the old wrapper for a full aboriginal build). Then my test script feeds the old input line into ccwrap2.c, sorts the output words alphabetically to ignore position differences (which there are some I _know_ I don't care about), compares if they match, rinse repeat.

Next up: -nostdlib with -S. Old wrapper, fix with sed... Ah, I got -print-file-name=include and -print-libgcc-file-name wrong. (Yes, there are single and double dash versions of some commands. But not all!) Building uClibc's libpthread has both -nostartfiles and -nostdlib, the second of which implies the first. (Old wrapper was passing through -nostartfiles anyway.)

Fun deviances that don't matter, such as the old wrapper converting --print options to corresponding -print options, so things like "--print-multi-lib" and "--print-multi-os-directory" got converted to the respective single dash versions. The new wrapper passes them through unchanged, which isn't _quite_ right (for qcc I'd have to write my own support for this) but the toolchain I'm buliding is not multilib so it's printing some variant of "." for all of them. I.E. A NOP at present.

Hmmm, the old wrapper in c++ mode added -fno-use-cxa-atexit to everything, the new one isn't. I _think_ musl doesn't need that, but I'm not sure? Have to test...

June 20, 2014

Poking at the wrapper, finding things that were theoretically wrong with it all along, because gcc developers "never thought of that".

For example, -nostdinc removes all the -I directories but -nostdlib does _not_ remove all the -L directories. So if you go -nostdlib -l libc it _finds_ it. Which is bad when you're cross compiling and don't _want_ it to check the host directories for the wrong category of library. (And don't get me started on -B having -L as a side effect.)

Luckily, the horrible gcc internals have a "CROSS_COMPILE" #ifdef that gets set in gcc/configure if host != target. Which is why I do the target=$arch-unknown-linux host=$arch-walrus-linux trick so they NEVER MATCH.

June 19, 2014

Fade got on the airport shuttle around 4am, off to 4th street. (As much as I enjoyed last year, I'd much rather she have alone time with her girlfriend, and this way they can share a hotel room.) I did ask her to pack some bottles of Pumpkin Woodchuck for Seanan McGuire, who is one of the GoHs. (I am a great fan of her "things ending in -een" ouvre, and she got another chapter of Velveteen posted.)

The original plan that Nik could visit and overlap with Fade's travel (say hi a couple days and then be here when Fade was out to avoid introvert burnout) hit the snag that Nik went incommunicado and hasn't answered email, twitter, or phone since last month. Oh well. More programming time. Might bike to Fry's, haven't done that in forever. (Yeah, heat and hydration. I know. Need to be more careful this weekend than last.)

Got the first stab at the ccwrap rewrite checked in to the aboriginal linux repository. Through the todo list, now it's debugging time.

One interesting thing is I had the old one log the incoming and outgoing command lines for an entire build (so I can compare what the new wrapper does with the same commands), and it turns out to be fairly hard to get bash to remove "quotes" around arguments. X=$(sed blah | head -n 1) gives me the line in a variable, but echo $X gives me quotes. Feeding it to a shell function and echo "$@" gives me quotes. I have to "eval" a command line to re-parse enough to make the quotes go away.

June 18, 2014

At texas linuxfest I got pointed at yet another BSD variant, so giving it a try. It installed from DVD ok. The menu bar at the bottom isn't the right size at 1024x768 and qemu-system-x86_64 keeps the CPU pegged when the system is presumably idle (and if I suspend it for a while so the laptop fan stops, it freezes on resume until it's delivered all the accumulated synthetic clock timer events).

But those are teething issues: it's a build environment. Fetched toybox source with wget, did "make allnoconfig", and... It died trying to build the allnoconfig binary. Because the make line:

$(HOSTCC) -o $@ kconfig/conf.c kconfig/zconf.tab.c -DKBUILD_NO_NLS=1 -DPROJECT_NAME=\"$(KCONFIG_PROJECT\"

Becomes:

cc -O2 -pipe ./kconfig/conf.c -o ./kconfig.conf

Note the lack of -DKBUILD_NO_NLS without which it tries to #include a nonexistent header.

Running make with -B: no dice. Tried to use strace: not there. Tried to install strace, the "AppCafe" thing had never heard of it (and searching for just "trace" suggested something called "FontForge" as the "Best Match".

After an hour of head scratching and a bunch of blind alleys, I eventually worked out that kconfig/conf and ./kconfig/conf are considered different targets by BSD make, and that it's triggering an implicit build rule. Fine, take the ./ off of $(obj) in kconfig/Makefile and try again, and it goes:

make: don't know how to make kconfig/zconf.tab.c. Stop.

Apparently the "%c: %.c_shipped" rule saying "@ln -s $(notdir $<) $@" is ignored. It's not even _failing_, it's not _triggering_. Does % not include path names maybe?

Someday, maybe, I'll get to actually trying to compile _my_ code instead of fighting with the makefile to get it to build old forked kernel code from 2006? That would be nice...

June 15, 2014

Confirmed UCLIBC_HAS_ADVANCED_REALTIME is set in the aboriginal uClibc config, and always was, so the lack of posix_fallocate() is a bug even by their standards. I could do yet another local patch to the obsolete uClibc release, but musl beckons. I've just gotta do the work to take advantage of it...

Rather annoyed this is blocking the mount/umount testing, but there you have it. (Kinda hard to test "umount -a" safely on the host...)

It's ironic that for years, things like busybox and toybox features were blocking sidequests thrown off by aboriginal development. Now aboriginal development is a blocking sidequest to fix my test environment for toybox.

June 14, 2014

Texas Linuxfest: not quite the way I planned.

Took friday off from work to prepare my talk: redoing "the rise and fall of copyleft" in hopes of fitting it in the alotted time this time. (Last time I started with ~12 hours of material and whittled it down to ~3, for a 1 hour timeslot. Since then, I've done more research, but that seldom actually _reduces_ the amount of material there is to cover.)

I thought I'd stop by the convention center and see if any of the Friday programming was interesting. (The website didn't say what the friday programming actually _was_, just which corporate interest was behind it, in about 4 hour blocks.) Then hang out there and do the piles of editing to shoehorn my talk into the alotted time.

Unfortunately, the number 10 bus doesn't stop at second street anymore, it goes from 5th street right across the bridge into south austin. And since that was the 1st street bridge rather than congress avenue, walking from where the bus let me off to the convention center was a half hour, dressed in black, in full texas sun. So I was fairly dehydrated when I got there, and since it's not the kind of event that has a con suite and the only two obvious concession options the convention center had were both closed, I rehydrated by chugging one of the two energy drinks I'd brought with me.

Don't do that. I felt fine for an hour and a half, then my blood pressure crashed so hard my fingernails turned blue. That... was not fun.

Managed to recover enough to go to the speaker's dinner that evening with the nice editor lady from england (who works for apress), but didn't get any talk prep done. (Shoulda just stayed home and done it there.)

Saturday I woke up with a hangover, which I didn't think was an option if you don't drink, but apparently is. I tried to do some editing but kept acting as a social condensation nuclei: sit at a quiet table in a deserted hall and 15 minutes later surrounded by a knot of jabbering people. (Nobody else for 30 feet in either direction, but humans are herd animals and boy does it show.) In theory there was a speaker's lounge, in practice it wasn't labeled and I couldn't find it. Never did wind up getting much editing done, and I was still vaguely out of it when it came time to give the talk, where I actually covered less material than I did in Ohio. Sigh.

Met up with David Mandala and hung around with him and his daughter the rest of the day, so on balance a good day. (He's working for Linaro now. Apparently ubuntu's technical decisions and what David thought was a good idea diverged to the point he couldn't help them anymore. I'm glad xubuntu 14.04 has 5 years of LTS because I'm not upgrading to the one that has systemd; Android on the desktop should be ready by then.)

Sunday I intended to work to make up for taking Friday off, but I was too fried to get anything done.

Moral of the story: if you give yourself borderline sunstroke, do NOT rehydrate with an energy drink. Not even the "recovery" ones. It'll screw up your whole weekend.

June 12, 2014

Tested toybox under the aboriginal build and it broke because posix_fallocate isn't there and fallocate.c is now in defconfig. The posix-2001 function (in the standard for 13 years now) is only present in uClibc if the config symbol UCLIBC_HAS_ADVANCED_REALTIME is enabled.

*boggle*

What does preallocating disk space have to do with advanced realtime? I... what?

My own darn fault, I really want to be testing under musl. Not the last-ever release of uClibc before development finally collapsed after eight years of thrashing and stagnation. Just a lot of intricate and tedious debugging to get from point A to point B. (Reimplement and retest all the things!)

June 11, 2014

The "peek" and "poke" primitives were things I implemented for stat way back, but the rewrite went in another direction (table based made it worse than an if/else staircase). So I have functions in the library, one of which is used in one place (poke is used once in ifconfig), and the other isn't used at all.

I noticed this because dalias wrote a "host" binary to test musl, and sent it to me in case I could use it in toybox. And cleaning it up to use toybox shared infrastructure, I'm finding infrastructure that isn't actually quite shared. :)

What makes this fiddly is A) I want to be able to read integers from arbitrary offsets, and some processors throw a wobbly at unaligned access, B) sometimes I want to specify big/little endian, and other times I want native endianness.

June 9, 2014

Had to look at some busybox code for work, and that turned into quite the rathole. It's gotten very, very complicated since I last really looked at it (2011 maybe?)

Busybox grew systemd support (CONFIG_FEATURE_SYSTEMD), which only seems to affect sysklogd (and that in a tangential way) but _conceptually_ says bad things.

CONFIG_EXTRA_COMPAT is a global option meaning "grep -z support". CONFIG_USE_PORTABLE_CODE replaces a single variable sized array in "find" with an alloca(). Why are these global options? They're examples of what I refer to as "infrastructure in search of a user": this config option might come in handy for something else, so let's make it global and put it far away from its only user. (Of course examples like CONFIG_DESKTOP where they do become widely used, then have the problem that it's hard to figure out what the config option actually DOES.)

CONFIG_FEATURE_SKIP_ROOTFS says to special case rootfs which is dumb because if you're not using it, it's overmounted. This is not special, any filesystem can be overmounted with something else and then it shouldn't show up in df or smilar because it's _not_visible_. Toybox df got this right in _2006_.

CONFIG_FEATURE_DEVPTS is in general configuration, not library tuning, even though theonly place it's used is libbb/getpty.c. That defines xgetpty() which is used in script.c and telnetd.c... but for some reason if the config symbol isn't enabled, ash.tests in the testsuite is disabled. (Why? Not a clue. Not interested enough to try to dig deep enough to understand it, I'm just noting that nobody _else_ had either. Nobody seems to be doing cleanup, its' all accumulation.)

CONFIG_FEATURE_BUFFERS_GO_ON_STACK is weird because the help text for CONFIG_FEATURE_COPYBUF_KB says "Buffers which are 4kb or less will be allocated on stack. Bigger buffers will be allocated with mmap." So there's two different places to micromanage something that really shouldn't be the user's problem in the first place.

CONFIG_FEATURE_COMPRESS_USAGE is one of those things where if you have zcat (and thus deflate) built in, go ahead and compress the usage messages. If you DON'T have it built in, don't. Making this a separate config option so the user has to decide this is silly.

CONFIG_INSTALL_NO_USR is another "why is this a thing?" There's already a CONFIG_PREFIX which you don't need either: in toybox, going "./toybox" lists all the command names and you can iterate over the output with "for" and symlink. "./toybox --long" gives you the paths. If you want to strip off "/usr", sed exists. (And this is why I don't have CONFIG_INSTALL_APPLET_SYMLINKS either: A for loop with ln is "for i in $(/bin/toybox --long); do ln -s /bin/toybox $i; done". Hmmm, I should have the help for the "toybox" command say that...)

The "Build Options" menu SHOULD NOT EXIST. You can build busybox as a static binary with "LDFLAGS=--static make". The Debugging Options menu exists to do things like add -Werror to CFLAGS.

And don't get me started on misunderstanding the ENABLE_BLAH and IF_BLAH() macros I added. The old CONFIG_BLAH macros are still there: they're either #defined or undefined depending on whether the symbol is set, so the ONLY thing you can do with them is #ifdef or #if defined() preprocessor tests. The ENABLE_BLAH macros are set to 0 or 1, so you can use them in normal tests where they're compile-time constants that drop out, and let the dead code eliminator chop out unused code. The IF_BLAH() macros resolve to their contents when the config symbol is enabled, and nothing when it's disabled. This is a more compact way of doing an #ifdef without requiring three lines each time, and was mostly meant for chopping out varargs.

A big problem with configuration options is that your testing is spread over the various codepaths, so the more codepaths you have the less testing each one gets, and it's weighted so your less used ones can get no testing at all. Add in the problems with ifdefs and this can get bad fast.

So what is busybox doing? Tons of #if ENABLE_BLAH lines, and "if (param == PARAM_a IF_DESKTOP(|| param == PARAM_and)) {" (from find.c). because "if (param == PARAM_a || (ENABLE_DESKTOP && param == PARAM_and)) {" would just make too much sense. Or the way coreutils/tty.c selects code with IF_INCLUDE_SUSv2() and IF_NOT_INCLUDE_SYSv2) when if (ENABLE_INCLUDE SUSv2) { blah } else {blah} would work just fine.

One of those approaches looks like C, has all branches processed and syntax-checked every compile, and does not require a programmer to have any domain-specific knowledge to understand what the code is doing.

The other is basically inventing its own flow control primitives via the macro preprocessor. That wasn't the INTENT of those primitives, but that's how they're being used.

June 7, 2014

Went with Camine to visit Stu, using Austin's Light Tactical Rail. The train itself was nice: air conditioned, the seats were like new, there were maybe three other people on the entire train.

The ticket machines are hilariously bad. The first one took my money, navigated through confusing menus, failed to vend a ticket, refunded more money than I'd given it and declared itself out of service. The second machine vended a ticket along with an 'electronic change card' (I'd given it exact change), and then I couldn't figure out what to DO with the ticket: no turnstyle, no conductor, we just got on the train when it arrived and the doors opened. Apparently buying tickets is on the honor system?

(Yes, I'm calling two cars a train. Benefit of the doubt and all.)

On saturdays, it doesn't start running until just before 5pm. (Why?) It only goes all the way to Leander (where Stu is) on weekdays, on saturday it stops at Lakeline. The Lakeline station is nowhere near the lakeline mall, instead it's several blocks away across 183 in the middle of nowhere, with no obvious pedestrian way to get anywhere.

Why is it "the red line" when there's exactly one stretch of track in use? Is there a non-red line?

When we got to Lakeline station, there was a bus stop saying the bus continued on to Leander. We _could_ have taken the bus to get there and ignored the light rail completely, but the bus is run by Cap Metro so it's incompetent.

When I first got to Austin adding light rail was a perennial voting topic, where the people of Austin wanted it but did NOT want it handed over to Cap Metro (you will never find a more wretched hive of nepotism and incompetence, and that was before their operating funds got embezzled). But Cap Metro had the political leverage to prevent anybody else from getting it, and kept bringing the topic up for a vote every year until enough new people had moved to Austin who didn't know the history that they went "light rail, what a good idea" and gave it to Cap Metro.

So now we have train stations in the middle of nowhere, which nobody's going to build anything around until the train runs a reasonable schedule. The train won't run a reasonable schedule if all the stops are in the middle of nowhere, and it's run by Cap Metro. Stalemate.

June 5, 2014

Checked in some mount code but... there's a snag. Autodetecting bind and loopback mounts involves statting the first argument to see if it's there (and if so what type it is: dir on dir or file on file is a bind mount, file on dir is a loopback mount)... except what if it's a network filesystem? It's theoretically possible that any random name could exist, so we could misdetect a loop or bind mount when they specified -t cifs.

What I've checked in only detects loop and bind if they didn't specify a -t, but for loopback mounts "-t ext2" is a valid thing to do.

What I want is to check for loopback only if we're mounting a block backed filesystem. Those are the non-nodev ones in /etc/filesystems, but that only lists loaded modules. If they haven't tried squashfs yet and it's a module, it won't be in the list until the attempt is made (which will load the module because the kernel will trigger a userspace callback to do the module load).

I think what I have to do is try the mount once, and then try _again_ as loopback if the first one fails? Because otherwise I have no idea what filesystem type they're doing...

June 6, 2014

On twitter somebody said that Gen X didn't have much YA so they wound up reading Heinlein, and now regret what it did to their brain. (I replied that I read Podkayne of Mars, Friday, the Cat who Walks Through Walls, and the first half of stranger in a Strange Land until the point where the protagonist suddenly had a personality-altering stroke and formed a cult, at which point I skipped to the end to see "they're all dead but in a good way" and decided not to bother with the rest. I also have a copy of "Job, a comedy of justice" but never made it more than 5 pages in.)

More than one person has said my experience with Heinlein was like my experience with X-files, where I saw the episodes "slug boy", "better living through cannibalism", and the one where some guy's shadow ate people. These are, apparently, the three worst episodes the series ever did, but I feel I gave it a fair chance.

But the bit that stuck with me was the "off the top of my head" list of titles to refute the no YA books. In two tweets I listed the Mad Scientist's Club, The Ship Who Sang, the Pern Harper Hall series, the Oz books, stuff by Alexander Key, the Amber books, the Riftwar saga (and "daughter/servant/mistress of the empire" side-series), Dragonlance, Arrows of the Queen, Andre Norton's Witchworld and Star Kaat series, Anna to the Infinite Power, and The Girl with the Silver Eyes. (And I was trying to list "The Watcher in the Woods" but couldn't remember the name.)

Two Thirds of those books had female protagonists. Not only was Dorothy Gale the protagonist of books 1, 3, 4, 5, and 6, but in book #2 the male protagonist was born female and turned back at the end. (Yes, spoilers; the Oz book with the transgender queen came out in 1904. As with comic books, there seems to have been a significant step _backwards_ between then and now so we can re-fight these battles today.)

This got me thinking about why I don't understand modern transgender issues again. I don't understand how gender matters enough to fight _for_ a gender role. Fighting _against_ them sure: a man can bake, a woman can do construction work, they shouldn't get in the _way_. (I'm aware shouldn't and doesn't aren't the same thing: the pervasive anti-woman prejudice in this culture is horrible and stupid and needs to be _fixed_. But we used to treat the Irish about as horribly as any other persecuted group, fixing this sort of thing is possible.)

But "I'm really an X, and it's fundamental to my identity that I be recognized as such"... if you fill in "vietnam veteran", "type O universal blood donor", "descendant of Abraham Lincoln"... it is possible to be in or not be in any of those groups. It is not easy to join any of those groups after the fact. Needing to _redefine_ those groups to include you and have everyone else acknowledge the new definition of said group... I don't get it?

If you couldn't identify with a female protagonist growing up, your YA reading list was indeed a bit truncated. (And this was before Tamora Pierce got on the scene.) The switch from Tip to Ozma was handled so matter of factly it just didn't _matter_, the queen had been kidnapped as a baby and hidden via transformation, so they had to undo the transformation to make her queen again. (I think he/she had the option to stay as Tip and _not_ be Queen, but it wasn't a very appealing one given the witch who'd raised Tip being fairly nasty and now defeated. The _next_ time Ozma was kidnapped (a few books later), she was sort of genie-ified and hidden in a peach pit in the middle of an orchard, and a large expedition led by Dorothy had to go on a fetch quest for her. It's that sort of setting.)

The gender of the protagonist of these books didn't _matter_, at least not any more than any other aspect of their characterization. (Some of the protagonists of the books I read were aliens. And not just humanoid aliens like Alexander Key's Tony and Tia, but The Star Kaat series had chapters from the cats' point of view. Diane Duane's Book of Night With Moon had that too, and her Star Trek book The Wounded Sky had a giant glass spider alien as just another character. The Pern books had dragons participating in politics. I could put myself in their ~~shoes~~ talons, but didn't look down at the end of the book and go "clearly, I am a dragon, and must publicly be recognized as such at school or Life Is Pain". I could also put myself in _my_ ~~shoes~~ flip-flops.)

I think I understand physical sex (including intersex and klinefelter syndrome and so on), understand attraction (homosexual/bisexual/asexual...), understand transvestites, and understand effeminate/tomboy/butch/flaming/androgynous presentation. But unilaterally declaring oneself to be a woman isn't any of those. It seems like people are fighting to change the definition of "woman" to retroactively include them... and seem to think that sufficient cosmetic surgery will somehow help? If you want to redefine away the word "woman" the way the word "literally" now means "not literally" so we haven't got a word that specifies the original meaning anymore... ok? But how does shaving your beard help this process? I don't understand the motivation here.

My friend undergoing hormone injections tweeted "The only feminists I'm scared of are the ones who want to eliminate gender roles altogether, rather than just equal." I think that was directed at me? I don't see how gender roles are different than racial roles. "That's a thing women do/don't do" means the same to me as "that's a thing black people do/don't do". I don't see either statement as something to defend, let alone increase? What am I not seeing?

May 30, 2014

Loopback mounts! Another fun instance of cross-command sharing.

I don't want to put the losetup stuff in lib/lib.c because posix is missing huge chunks of obvious 20 year old functionality. In order to setup loopback associations, losetup has to #include and including linux/*.h does not belong in common code (in case BSD becomes interesting someday). Which means mount needs if (CFG_LOSETUP) and calling functions out of the other command.

When I wrote losetup, I had it save the loopback file that "losetup -f FILENAME" attaches to into toybuf. But: losetup also uses a lot of xopen() and error_exit() and so on, and we don't want mount -a to exit halfway through the list because it hit an error with one of the entries.

Once again, some sort of popen() for toybox commands looks like something I need to implement...

May 29, 2014

Toybox infrastructure sharing question: I'm banging on mount and I want to call "fstype" (an alias for blkid that just prints out the filesystem name), except I basically want to do "X=$(fstype)" but from C. I.E. fork a child process and capture the output, with the newline at the end stripped off.

There's a couple ways I could go about this. At some point the shell needs infrastructure to create a pipe pair and getline() the result, and if that was implemented I could just reuse it here. But it's not yet. Should I diverge to do that now? (I can't use popen() because there's no guarantee toybox is in the $PATH and I don't really want to re-exec anyway, just fork() and interact with the child.)

At the other end of things, I could implement a "save it to toybuf" magic flag and just call blkid_main() nofork, but that's kind of ugly. I could check for *toys.which->name == 'm' to see if it's called from mount (the way it's distinguishing fstype/blkid now) and just save the result to toybuf, but that stanza wouldn't optimize out if blkid was built standalone, and it's the kind of incestuous inter-command knowledge I've tried to avoid. (mount can check if (CFG_BLKID) and call blkid, but blkid shouldn't have a "come from" for mount.)

I could also factor out the filesystem detection code into something mount could call and get a char * back, except that blkid needs more than that, it also needs "label" and "uuid", which are bundled together into a structure that would have to move to toys.h or lib/lib.h to be shared, but returning a pointer to the structure instance isn't good enough because ext2/ext3/ext4 are distinguished by a separate test once we've got the structure (checking feature flags in the superblock after matching the magic number).

Sigh. However I do it, the granularity's slightly wrong.

Hmmm... the default behavior of mount is to try to open all the filesystems in /etc/filesystems if the filesystem type isn't specified (or is "auto"). These days the kernel puts ext2 after ext3/ext4 in that list, so the sequencing works out. The fstype stuff is just an optimization, trying to find the type more quickly, and potentially identifying types that would require a module load. I can defer implementing that until after I implement X=$(subshell) support in toysh, and then reuse the common infrastructure for that. (I don't want to implement said infrastructure now, because the shell is likely to be the more demanding user with job control and such, so I'd most likely have to redo it later for the shell if I did just what mount needs now.)

It _looks_ like I need a ptoyopen() and ptoyclose() that work more or less like popen() and pclose() except with xexec(). (And if I need xexec_optargs() the caller does the setup themselves.)

May 28, 2014

I'm trying to listen to Tim Bird's keynote speech in the Linux Foundation's "oh yeah, this url will totally still be here in a year because we've learned NOTHING about archiving for posterity, why would we care about the future, we're the Linux Foundation!" web page.

And they've got some random Intel vice president opening for him, the other guy's speech glued into the same mp3. So I'm listening to this VP boast that Intel has dropped all support for 32 bit targets and doesn't build (or regression test) anything 32 bit anymore, as if that's a good thing.

Intel has built yet another completely automated build enviornment that lets you churn out an android system without having to do anything, or understand anything, because the ability to maintain software once it ships is not Intel's problem.

Their IDE runs on windows! Woo.

They have an html5 app builder that's _not_ tizen! It's got its own html5 runtime they wrote, crosswalk, crosstalk, someting like that. Because this wheel totally needed reinventing.

They have "great compatability", and a "compatability layer" but never said "with what". (Presumably "with the arm version of android", since they're still trying to shove x86 down the throats of phone and tablet vendors. Your PC _will_ run an IBM S/360 mainframe processor, because how can it _not_?)

They're working to make sure "all the ecosystem pieces are in place". Lovely.

You know how a mobile processor like arm gets good performance? By not wasting work. That means things as basic as branch prediction, speculative execution, conditional assignment, and so on, optimization techniques which consume power only to discard the result a lot of the time, aren't necessarily a win when running from battery. They up the absolute performance but not the power consumption to performance ratio.

So yes, Intel has great performance when when running from wall current with a cooling fan (or at least a giant heat sink). But the mobile space is very different. And _in_ that space, arm does a pretty darn good job. You can't run old windows binaries on your phone or tablet, just like you couldn't run VAX code on a PC.

But Intel and Microsoft are totally strapping rocket engines to this turtle, and are going to keep spending all the money until it takes off. (At the other end of the candle, this gives us ACPI for Arm because Windows RT, and they try to shoehorn it into Linux. Sigh.)

May 27, 2014

Chipping away at toybox mount, doing the no arguments behavior of listing current mount points. (Basically a simplified version of the existing df logic, except it needs options.)

And... I don't understand what the host mount is doing. It says "/run" has option size=10%, but /proc/mounts shows size in kilobytes, adding up to about 1.6 gigs on a box with 8 gigs ram, and pulling up python using the size given by "top" for free memory, we have:

print 1579840.0/7899188 0.200000303829

I.E. 20%. So where is it getting 10%...?

Ah. There's an /etc/mtab file. Which of _course_ doesn't match reality (they never do). Guys, this went away in 2005. It doesn't WORK with shared subtrees, which are a component of containers. (Heck, it didn't even work with chroot.) You have to ask the kernel what the mount points are, you can't maintain a file with a global list of a per-process attribute.

Meanwhile, /proc/mounts is reporting "relatime" on most of the mounted filesystems, despite this bit being filtered out by the kernel. (It's not just the default behavior, the old "spam the log with atime updates" behavior got _removed_. I investigated this last year (eep, almost two years ago now). Is it spitting back what it initially got on the mount call, even though the kernel doesn't use it? Or is it going "you didn't specify any _other_ behavior so this is the default? Sigh, more research needed...

May 24, 2014

Three day weekend! (An unexpected one, it wasn't until a co-worker said "see you on tuesday" on the way out the door friday that I looked up the holiday calendar at $DAYJOB. I get memorial day off. Yay!)

I need it because I'm trying to catch up on toybox and aboriginal. Both have fallen out of sync with the kernel releases (currently at -rc6) and I want to get toybox's 1.0 release out this year if at all possible.

May 23, 2014

Scott Fraser on twitter suggested I try an Arch livecd with installer, but I think I've figured it out: the toybox library probe isn't using $CFLAGS but the actual build is. So if you have an insane build environment that provides different libraries for static and non-static builds, and you feed --static in $CFLAGS without mentioning to me that you did that (or ever trying a non-static build), it could break the way reported.

I tried the livecd, by the way. It booted up to a desktop I didn't recognize, which offered no obvious icons and in which right click on the background offers two options, "settings" and "change background". It had "activites" on the toolbar in the upper left, which [trigger warning] pulled down a unity-like sidebar (shudder), containing six icons: cnchi, files, chromium, empathy, xnoise, and show applications. I have no idea what cnchi, empathy, or xnoise are, files and chromium aren't a terminal, but "show applications" might do it. Except that popped up an icon grid, no obvious way to scroll, the only thing terminal-like on it was "Root Terminal". I clicked on that, it prompted me for a password, I left it blank, it said incorrect password.

Note: I am booted from a livecd, and it wants a password for the root user. This is a Linux desktop with no obvious way to start a terminal as the logged-in user.

This livecd is obviously "still Arch". I have no idea who the target audience is, but clearly not me. I've run out of interest in it.

May 20, 2014

Email from Stephen, who can't get toybox to build under arch linux. I've built toybox under ubuntu, fedora, gentoo, slackware, debian, suse, aboriginal, pclinuxos (whatever that is)... but not arch.

The log he attached says:

Compile toybox...
/usr/bin/ld: cannot find -lutil
/tmp/cckhpcDA.o: In function `do_id':
id.c:(.text.do_id+0x1ae): warning: Using 'getgrouplist' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
[a dozen more of that...]
collect2: error: ld returned 1 exit status
Makefile:8: recipe for target 'toybox' failed
make: *** [toybox] Error 1

Which is just _weird_. Ok, arch linux isn't installing the dynamic libraries in its linker path by default, only the static ones? But that's just warning spam, the actual build break is -lutil not found. Except scripts/make.sh runs a loop trying each library and only including the ones that exist, so how...?

Time to install arch! It has... archlinux-$BLAH-dual.iso. What does "dual" mean? Also bootstrap-*.tar.gz, and an arch subdirectory with more stuff in it... (arch/boot/x86-64/archiso.img?)

Backing up a level, it has an "installation guide" that says they used to provide a complete install image, but they no longer support installing _without_ network access. That's nice. What does _dual_ mean?

Ah! I remembered the line from Blues Brothers, "We have both kinds of music, country _and_ western." It's referring to i686 and x86-64. Of course. No support for anything heathen like arm.

Ok, I booted under kvm and... got a shell prompt at root. No installer, no explanation... This thing learned _nothing_ from Knoppix. (Oh well, that only came out 11 years ago, you can't expect them to have learned anything _that_ quickly.) Back to the darn installation guide.

Wow, this is like installing gentoo. You manually partition and everything. (I typoed "mount /dev/sd1 /mnt" and it complained that /dev/sd1 was read-only before saying it didn't exist.)

Ok, gentoo-level micromanagement of keyboard mapping, timezones, locales, fstab, hostname, whatever mkinitcpio is (do we _need_ a ramdisk?)... setting a root password but _not_ creating a non-root account. (Because why would you want to do that?)

The "install and configure a bootloader" bit is just sad. Click on grub it goes "grub or grub legacy" which I DON'T CARE so back up and do syslinux, scroll past the "it'll eat your brain!" warning about btrfs, and... it says install the syslinux package but does not say _how_.

This is the point at which the Arch install guide disappears up its own ass. It has not, as of yet, said how to run the package manager. I know from other people talking about it that it's called "pacman", but apparently pacman does not have an "install" command. It has help, version, database, query, remove, sync, deptest, and upgrade, but no install. Because why would a package manager install packages? And we google... which says that "sync" (-S) is what it calls install, and -Ss is how you search. (Which you can't figure out from the help.)

Ok, time to reboot, and... the initramfs died waiting for /dev/sda3 when there's only sda1. What? The instructions _said_ to append genfstab output to /mnt/etc/fstab, it did _not_ say to edit the fstab stuff that was there. (Where did it get sda3 from, anyway? There isn't one.)

I can mkdir /mnt, mount /dev/sda1 /mnt, chroot /mnt, and then when I try to 'exec init' it dies with "Trying to run as user instance, but the system has not been booted with systemd".

Terrific. What a distro. Because if you have to do this much manual micromanagement to put the system together, _obviously_ what you want the result to be running is the giant black box of magic impenetrable heuristics that is systemd.

May 19, 2014

The date command is commonly extended to understand nanoseconds, both with .ss on the end of the posix date format, and via %N in the conversion specifiers.

Except... the conversion specifiers are basically just strptime(). But strptime doesn't understand %N. How utterly GNU to have extended date but NOT strptime(). (Why do a generic fix when you can add a special case? It's the FSF way. Ok, you'd need a nanoseconds field appended to struct tm, which could conflict with a future standard extending it, but you could submit your change to them as something obvious for them to do. Or create a "struct ntm {struct tm tm; int nanoseconds};" and try to resist declaring "struct ntm *uncle_henry, *a_twister;"...)

May 16, 2014

Every time I run Thunderbird, it says:

GLib-CRITICAL **: g_slice_set_config: assertion `sys_page_size == 0' failed

Most people don't notice because they don't run it from the command line, so these messages go to /dev/null. But this is why I don't like assertions (they're often wrong), and one of many, many reasons why I don't like glib. (This is its' idea of "critical", a thing that can apparently be completely ignored, and how could it be true anyway, system page size of zero would mean what?)

Of course the biggest problem with "glib" is the g. Which stands for "gnu", which means "the code is just a sideline, the real goal of the organization is spreading a religious doctrine". (Well it can't be political if they're not lobbying to change any laws...)

May 15, 2014

In toybox I want to add a generic signal handler to set toys.signal, and write to toys.signalfd if that's set. Except... toys is initialized to zero, and zero is a valid file descriptor. Ok, it's stdin, but if you daemonize you can close stdin and then the next open file descriptor would be zero and that's potentially a quite reasonable usage case with a sharp edge.

Sigh. I could instead initialize it to -1 in toy.singleinit (which is the "not a filehandle" value open returns on error, so The Right Thing). Probably the less disgusting option, but it's more scattered complexity...

May 14, 2014

Ah, the Cherokee Freedmen are probably a better example than "royalty" when describing misfile. Clearly artifical state, neither better nor worse than not being that, but fighting to keep it because it's who you see yourself as.

And Glenn Greenwald's Fresh Aire interview about snowden, once again, no mention of Manning. (Wikileaks and pentagon papers: yes. Any comparison with the other giant leak: no. I sort of expected Tery Gross to jump in with Manning's solitary confinement when the discussion was about Snowden staying overseas because he couldn't get a fair trial here, but apparently she didn't want to diverge into the "Who's Chelsea" bit either? Dunno.)

May 9, 2014

So busybox defconfig doesn't build under aboriginal with the new busybox version, and the culprit is busybox tr. If I put busybox tr in the $PATH and rebuild busybox defconfig on an ubuntu host, it doesn't work.

Working out an isolated reproduction sequence outside the aboriginal build (so I can bother the busybox guys with it) turns out to be nontrivial. For starters, they took out scripts/individual in git commit 02788ac7e2a4. How nice.

Ok, do allnoconfig, then menuconfig and forward slash search for TR... ok, 8 gazillion hits, try prepending an underscore... Didn't help.

Ok, let's find it in the menus... wow, that's a lot of menus. Ok, after looking in the 8th guess and not finding it, back out, "find . -name tr.c" says it's in coreutils? I looked there. Ah, the menu is not in alphabetical order. It _sort_ of is, but not really. Great.

Now to figure out what I need to switch back on under the "busybox" menu to get the "default" behavior defconfig would provide (so building just one command hasn't disabled support for --long-options or some such that would break this usage _anyway_...)

As someone said on twitter earlier today, "by Mycroft and Rassilon": FIVE SUB-MENUS!

Ok, HERE is your reproduction sequence. (Rassilon help you.)

make defconfig
make
mv busybox tr
make clean
PATH=$PWD:$PATH make

(Wow. It's... kinda hard to use busybox these days. There's a REASON my approach to toybox has evolved to the point where most commands have no sub-options, and a grand total of seven global options in the "toybox" menu. When I first specced out toysh it had 11 sub-options, and plans to add more as I went along. The current one in my tree currently has one sub-option: "Interactive Shell". You can build one for doing scripts only, or you can build one with -i and all the funky command line editing and history and stuff. But all that paradox of choice stuff in between? Nobody cares. Make it small and simple, if it's not worth doing don't do it.)

Back to busybox: unfortunately, that didn't reproduce it. Which means the weird linky bit where it can't figure out it needs libm, so the build breaks? That only happens when linking against uClibc. So I guess I'm not surprised they didn't notice, but it's still a deficiency in busybox tr (it links against uClibc when you're _not_ using busybox tr) and thus a reason for me to write a toybox tr that _doesn't_ have this problem.

(And move to musl, which presumably doesn't have this problem either. Working on both. Still spread a bit thin...)

May 8, 2014

Chipping away at dd. Remembering why this command's been on the todo list for so long. It's theoretically easy and practically a forest of corner cases.

Interesting hiccup in the command line option parsing: if you have two commands with the same name, sed can't match up the lines and produces one with the wrong number of entries, so the converter aborts early but doesn't flag it as an error. So you get a truncated list, and commands after that say TT isn't #defined. (Which doesn't let you know what the problem is.)

Fiddled with it a bit to try to produce a better error message...

May 7, 2014

Watching society change around me, trying to figure out what's fad and what's permanent.

Three people on my twitter feeds just got their first tattoo. Is that a thing that's going around now? I guess laser tatoo removal makes that less of a big deal than it once was. Last decade it was piercings. I don't really care about that either way.

The erfworld kickstarter lists a bitcoin donation total. Pretty sure bitcoin's a fad, being declared property and not a currency, so you have to report price fluctuations while it's in your posession on your taxes as a capital gain or less, adds enough overhead to the thing that it's probably going to get really unpleasant. But... dogecoin just sponsored a car in some race. (Or, more accurately, somebody sponsored a car on behalf of dogecoin.) Despite starting as a joke, that one at least isn't a horrible deflationary goldbug attempt to FUD the entire concept of cryptocurrency with a libertarian asshole implementation thereof. So... will that still matter in 10 years? I'm really hoping it's beanie babies and I don't have to care.

I still don't get the "sex != gender" thing. I've tried, and I don't get it. I know nobody cares whether or not I do, but I'm on the wrong side of this one, and it bothers me (and makes me feel old).

I get tomboys and lesbians and gay men and bisexuals and transvestites and so on... I just don't get "my body isn't female but I am" as a statement people find meaningful to the point of being passionate about it. I don't understand what that statement is supposed to _mean_. "This color isn't _optically_ purple, it just has purpleness." Um... ok?

This is some sort of social construct that's apparently passed me by, a gender _role_ people want to inhabit. I grew up with the women's liberation "anything you can do, I can do better" thing, with women wearing men's clothes and insisting that imposed gender roles weren't _real_. I internalized that. And on the male side, my response to the hulking Conan the Barbarian types on the football team was pretty much "please try not to drip sweat on me, I don't want to catch jock itch, ah you threw my textbooks in a mud puddle again and this is your idea of hilarious"... generally not wanting to get involved with performed "masculinity" in any way. I never had anything to prove to people who cared about that, because people who cared about that were some strange alien species. (If I put a dozen random people in a building and had them play hopscotch, it wouldn't be nationally televised, so why are these people chasing a ball considered such a big deal?)

Most of these roles were going around calling yourself king due to strange women lying in ponds distributing swords, as it were. Graduating from college didn't make me a College Graduate as a different type of person, it meant I'd obtained permission to say something on my resume and people working for some bureaucracy would agree if you phoned them to check. I used to say about my first professional job, "there's no such thing as IBM, just a bunch of people pretending". I overcame "impostor syndrome" early on by figuring out everybody _else_ had it, with the possible exception of the Dunning-Kruger crowd. Makeup and dresses and suits and shaving were just more forms of play-acting our society expected, the suit and tie you wore to job interviews and funerals was every bit as silly (and stylized) as a rubber red nose and clown shoes, and if you don't wear anything they lock you up because our society can't handle what human beings actually look like.

I suspect I don't get gender roles for the same reason. In 7th grade I was the only boy in the choir, but I liked singing so why wouldn't I join it? I also took home ec, because it seemed to be teaching more useful things than most of the other classes. If there were no "jobs only fit for men", then there were no "jobs only fit for women".

Now people are insisting that these gender roles are not only real but that reversing them has a deep meaning to their identity. And I go "um, so the furries who insist they're _really_ an ocelot, or otakukin or something, and they must be able to show up to work dressed as a badger or their rights are being infringed... this is different from that?" How is this different from insisting that you're _really_ asian, and moving to live there and insisting the Japanese treat you as a native and never comment on your caucasian features standing out like a sore thumb in tokyo? Not "I want to move there" but "pretend I'm not what I was born as, or that makeup and surgery can erase the difference"? If I developed a hormone imbalance and gained 300 pounds I literally couldn't loose without surgery, nobody would be allowed to mention it because I'm _really_ a thin person and even if I choose not to have liposuction I can dress as a thin person and you all have to tell me how thin I look because it's not my fault I'm _not_ thin and it distresses me so much I get to pretend it's not the case and you all must play along. Not "ignore my weight", but specifically tell me I'm _thin_? Yes, society has issues about this, but instead of confronting prejudices against fat people and trying to fix society, we just pretend there's a "thin" that has nothing to do with the state of people's bodies, and insist that responding to what you actually see is just horribly insensitive because this delicate flower needs to be lied to except it's not lying but seeing a deeper reality?

Sigh. I know what side of this I should be on. Old men when I was growing up telling boys to cut their long hair so they don't look like a girl: they were wrong. But on the gender roles thing I went to "this is a social construct, it's not real, they're all just people", and the current generation is going "no, it's completely real, this is fundamental to my identity, challenging my beliefs is painful to me, the world must change to match my conception of myself". Um... ok?

Maybe it's a religion. Never hugely got those, either.

Oddly enough I follow, and enjoy, the webcomic misfile, one of the characters of which is a boy changed into a girl by supernatural means, trying to turn back into a boy despite his life actually being better as a girl in a number of measurable ways. To me, it's a bit like the "prince and the pauper" story, him being raised as a boy is like him being raised in a medieval castle, it's an artificial state but it's the cultural context of his upbringing and what he spent his life preparing for, and therefore something to fight to regain rather than adapting to a world in which it is no longer the case. This doesn't mean there's actually such a thing as "royalty", it's still just a bunch of people pretending. But Emperor Norton apparently had a nice life, or at least the Sandman comics said so...

I dunno, I'm used to being attacked by people who insist that "they" isn't an appropriate gender neutral pronoun (it was the default pronoun until sexist grammarians changed it to "he" in the 18th century), and that they get to pick what pronoun you use to refer to them with things like "Ze", and that it is _offensive_ that I refer to them by any other pronoun. (That conversation just about devolved into the "word the Knights who say Ni cannot hear" bit from Monty Python and the Holy Grail.) I normally avoid this topic because there are some really raw nerves, but... I don't get it. And it _bothers_ me that I don't get it..

Speaking of monty python, the "I want to be called Loretta" bit from Life of Brian sums up my attitude to this fairly well: blank incomprehension. What's the _point_? I'm sure that skit is now considered horribly insensitive these days. Outdated. Society has moved on. I'm trying to understand what everybody younger than me seems to instinctively get, and I'm stuck back at blank incomprehension.

My chosen field (computer science) has a horrible gender imbalance and I try to do what I can to change that, or at least pay attention to the people who are fighting the good fight. But I watched "austin linuxchix" dissolve a decade ago due to a shortage of what Sally and Kandy (two of my co-workers who belonged to it) referred to as "factory original females". The people born female were so outnumbered they all lost interest and wandered off, and since they'd founded the local branch of the organization it rolled to a stop without them. I'm sure that makes them evil and something-ist for not spending all their time working towards somebody else's agenda. That's not the only "women in tech" event I've seen get derailed by "I'm a woman too, your scholarship must pay out to me or you're racist grandma". I watch old white men constantly make everything be about them... and here come men saying they're women now so this too must be about them. Oh well, if the women running the organization this time want to let their goalposts move, it's their show...

Sigh. I want to understand this. A good friend of mine's going through hormone treatments right now (after years of suicide attempts, hospitalization, alcoholism, and extensive body modification, although of course this isn't just another symptom) and I want to support them. But trying to understand isn't the same as actually understanding. It looks to me like people hurting themselves, which is of course their right as adults, but apparently I'm seeing it wrong?

If Michael Jackson had done all that plastic surgery to himself until his nose fell off because he hadn't wanted to be male (instead of hadn't wanted to be black), would that have made a difference? Either way, it was his body and his right to do it, it's the "no this is healthy, and that's who he always _really_ was, so pretend he was never _not_ that and has now turned into a perfectly executed whatever it was he was trying to achieve" aspect I'm wondering about.

I've never seen an article about Snowden mention Bradley Manning, because you have to say "Chelsea Manning" and then explain that's not a british punk rock group from the 1970's, nor does it have anything to do with "Milton Keynes", instead that's who Bradley Manning turned into after a year of solitary confinement and sleep deprivation (being woken up every 15 minutes all night every night), but no that's not a sign of a mental breakdown but a liberation of who they always really were and it's a good thing they decided that this expression of self was totally worth making all news reports about them dry up and blow away so the government could do anything they like without notable opposition.

Meanwhile, Eddie Izzard wears dresses and makeup (has a whole "executive transvestite" bit explaining it as part of his act), but still goes by "he". (Not even gay, just likes dressing that way.) He wants to be treated as a man dressed in what's traditonally women's clothing, and if women get to wear business suits then of course. Turnabout is fair play. (Corollary to that whole do unto others bit.)

But... Izzard is _not_ transgender. He's a man dressing as a woman, and is quite clear on that. I think I understand what Izzard is doing. I don't understand "no, this actually makes me _be_ a woman". I don't know what that's supposed to _mean_.

I'm a mostly non-practicing nudist, because long ago I asked "why do we wear clothes" until I worked through the unthinking biases to actually _see_ the social conventions. But I wear stuff every day anyway, to make other people comfortable. And it turns out I get about 25% more money if I wear shirts with collars to work instead of t-shirts, which is _insane_ but fairly reliable. (And a decade ago I hit the point where I'm more effective interviewing in work clothes than in a suit and tie, not sure if that's because I'm "looking the part" (and _not_ looking like a recent graduate or corporate cog but like, as they say in show business, "the talent") or because the suit and tie is dying in our culture.) All of which adds up to "Society is both weird and complicated", but doesn't make any of it _real_. A generation ago they wanted a suit and tie every day while _doing_ the job, now it would just look weird. A generation before that you needed a hat, so on back to elizabethan neck ruffs. This too shall pass. You humor the social conventions of the day to avoid getting arrested, unless you feel protesting the conventions is worth getting arrested for.

Society does strange, powerful things to people. You can get dancing and laughter as contageous diseases. Five years ago there was a panic about, I kid you not, penis theft. All this can make you sick, make you bleed, even kill you. So saying it's "not real" when they stone you to death because you ate cheese on a tuesday and offended the great broccoli... people fight wars over ideas all the time, and the wars are real. The "weapons of mass destruction" in Iraq didn't have to exist to cause the deaths of thousands of american soldiers. People have to live in a social context, and a lack of social interaction is bad for your health.

But this "sex != gender" thing... I don't get it. I see the social hole where people are pointing, between "gay", "transvestite", "misogynist", "gender assignment surgery", and so on... but it's totally Emperor's New Clothes for me. There's _nothing_there_ that I can emotionally connect with.

And it's making me feel really old and out of touch. It's not like not getting football or rap music, that was "my parents moved me from kwajalein to new jersey at age 10, and I do not want any part of new jersey". But this... Society is doing a new thing I do not grok, and it's because I'm old. I'm being insensitive because I do not sense this. I should shut up and smile and nod and play along because This Is The Future, but I do not actually understand.

Still trying to, though...

May 3, 2014

Saturday again.

Still slowly grinding away at posix's tr command, but it's got a lot of non-obvious edges. Basically, the sucker works in a lot of different modes that interact in non-obvious ways.

The base mode is "tr IN OUT" where IN is a series of characters, and OUT is a corresponding series of characters. Each character of IN is replaced with the character in the same position of OUT. So in the naieve case, the position of each character in each character set matches, which implies the length of the two sets is important.But the lengths not matching isn't an error: if OUT is shorter than IN, the last character of OUT gets repeated to fill out the rest of the space. If OUT is longer, only the ones matching positions in IN ever get used, so end of it is ignored.

Then there's the [a-z] ranges, which a quick check of the gnu/dammit version shows doing:

$ echo hello | tr [g-va-b] woot
oettt
$ echo hello | tr [g-vb-a] woot
tr: range-endpoints of `b-a' are in reverse collating sequence order

Well of course. You can also say "[g*8]" which expands to 8 repeats of g. And there's some escape sequences so \n and \0 such are matchable. (Except posix doesn't _explicitly_ say \0, instead it says "octal digit", maxing out at 3 digits, so \012 is the same as \12 but \0123 is \12 followed by 3.) Remember to double the \\ if shell escape processing is going to eat one, and that 'tr a "\n"' works for different reasons than 'tr a \\n'.

But the really fun ones are the dozen or so isdigit() variants of the form "[:class:]", such as "[:isalpha:]". Given that we care exactly which caracters that expands to, in what order, and how many of them there are... and given that I want to support utf8...

Tricksy hobbitses.

Posix says that only upper and lower are allowed as substitution targets, matching each other, but that doesn't address things like:

echo "one two three" | tr [:space:][:alpha:] abcdefghijklmnopqrstuvwxyz
zzzfzzzfzzzzz

And yes, that implies the gnu/dammit version does indeed expand to the whole darn set. Which is _crazy_ to try to do in the utf8 namespace, you're looping through tens of thousands of entries unless there's domain knowledge of character layout (is klingon official?) that would simplify matters.

I was hoping I could just do a 0-128 table and then any character above that have a slow path matching function against the pattern string that either takes a character and gives its position within the set, or a position within a set and returns the character. So the real world use is fast and the theoretical cases are right. I still sort of can, except I need to know the full character set each isboing() macro expands to.

(I could generate a table at compile time, make it a bitmask... except the libc should already be _doing_ this...)

(What's the highest defined utf8 character anyway? Ok, technically utf8 is an encoding and "unicode" is what I should be saying for the integer it exapands to. I prefer to handle utf8 and leave unicode to people who speak it, but in this case I can't.)

Sigh. There are times when "if I'm doing to do this, I want to do it _right_" bumps against "this is genuinely a hard problem to address corner cases it's highly unlikely a single person other than me will ever notice, let alone genuinely care about".

That's why this one keeps getting kicked down the todo list every time it bubbles to the top...

Hmmm. Musl does this via a header file called alpha.h which has a big comma separated list of bytes "generated by a simple tool based on the code posted previously to the mailing list". On or before April 23, 2012 according to the git commit's datestamp. Well that's nice. And "git log alpha.h | grep" has no other hits since then. Time to track down a mailing list posting...

April 26, 2014

Saturday!

The toybox cpio command isn't doing chown(), or setting the time right, or detecting hardlinks. The third one needs a hash table (to store inode, device major, and device minor), and although we've got hcreate(), hsearch(), and hdestroy() in posix I've never actually encountered code using them.

April 25, 2014

Oh good, if I use the qemu-system-ppc from the qemu 2.0 release, its idea of -hdc lines up with the kernel's. So I don't have to fiddle with the powerpc image more, just require the current qemu release to run aboriginal linux powerpc system images.

(I feel bad about that, but they can always do manual setup for the old one. The automation scripts have always been a bit hazy between "automation" and "example code so you can set up your board".)

April 23, 2014

When I reply to the first release of ubuntu with systemd in it being called "Utopian Unicorn" by saying "I roll to disbelieve", people ask me what's wrong with systemd. I can point them at various technical articles on the subject, but the important issues are more fundamental. There are several problems with systemd unrelated to code quality:

The first issue is that Linux has always been a commodity white box OS built from interchangeable parts available from multiple sources, just like the PC. Systemd goes out of its way to break that, and in doing so represents an enormous loss of flexibility.

Linux systems are modular, with multiple implementations of anything important. If you don't like the gnu tools, use the BSD ones, or busybox. If you don't like openssh, use dropbear. Early in its history Linux switched from libc5 to glibc, then uClibc and dietlibc showed up, and now there's musl. When udev happened I wrote mdev. Even when we had a dominant piece of infrastructure like gcc it was implementing standards (c89/c99 and posix) and there were always prominent forks (gcc vs egcs happened long before pcc and llvm). Browsers, word processors, pdf viewers, chat programs, desktops, you name it: if it's worth doing we did multiple versions of it. We were _not_ a monoculture. The IETF standards process requires two interoperable implementations for good reason.

This provides a lot of flexibility. When putting together a linux from scratch system, I can run init=/bin/sh. If I do that PAM and SELinux and such won't work, but I can turn those off. It's a modular system, and none of the modules is actually _required_.

Even the kernel (which _defines_ Linux, and is thus the one thing that has some sort of excuse to always be there) is internally modular, via menuconfig, so you can switch off almost everything it does. You're not even required to be able to run ELF executables. Everybody switched over from a.out to ELF in 1996, but if you want to use a.out support is still technically there. The kernel even has pluggable modules so you can add or remove things at runtime, which lets people disagree about whether they should be using proprietary video drivers or open source reimplementations, and that very flexibility is what allows those open source reimplementations to be developed alongside those proprietary versions. The forcedeth and noveau drivers (and dozens more) exist because we can have multiple implementations of the same functionality, and people are free to say "no, I don't want to use that, I want to do this instead".

Back around 2003 Alan Cox took a year off to get a master's degree because he didn't _want_ his kernel tree to displace Linus's (which it almost did until Linus adopted source control). People who didn't like the xfree86 license change were free to fork x.org. The idea that people might disagree with you and want to do it another way is baked in.

Busybox is just about the definition of a monolithic integrated userspace linux package. Busybox is a giant monolithic thing I worked on for years, building a single swiss-army-knife binary that incorporates/implements hundreds of other commands. Busybox copied the kernel's kconfig infrastructure to configure it, allowing you to switch off everything it does. (You can literally build an "allnoconfig" busybox that doesn't do _anything_.) And if you don't like the swiss-army-knife binary, you can build each of its commands as individual binaries, and even get a shared library of the busybox common code (which busybox itself doesn't use). If you want to use gnu tar with busybox's gzip, or vice versa, go for it, we _explicitly_ made that work. One of the central design ideas of busybox is letting you _not_ use it, to select as much or as little of it as you needed for your particularly setup. Our _most_ monolithic userspace package goes out of its way to interoprate with alternate implementations of the same functionality.

But systemd isn't like that. It's a single all or nothing blob, highly integrated from a single development team with a strong "not invented here" syndrome that presents a moving target. Even if you were interested in cloning it (or a compatible subset of it), there's no "it" to clone. No spec, no clear goal...

There are a bunch of _other_ init attempts. MacOSX did launchd, which is being ported to freebsd _and_ doesn't have to run as PID 1. (Reason Linux hasn't cloned it yet: xml config files.) Ubuntu did upstart, but ubuntu also did unity, and mir, and pointed /bin/sh to dash, and their launchpad is based on the obsolete "bazaar" source control system... People not following Ubuntu's technical lead is pretty deeply justified at this point. There's also svinit and openrc and so on, and presumably more to come. Those of us waiting to see how it shakes out feel we're having systemd shoved down our throats by a development team actively threatened by the idea of people _not_ using their stuff.

The biggest "other init" is Android's. To me this is probably the biggest issue: Systemd's license is incompatible with android, and it's made itself hard to clone.

Android has its own init process (a hardwired monstrosity that reads a config file that _looks_ like a shellscript, but isn't). Some of us have been pondering what an init upgrade path for Android looks like, and consider this a big as yet unsolved problem.

Android also has a no GPL in userspace policy, so it won't be using the existing implementation of systemd _ever_. (If this was easy to overcome, Android would have incorporated busybox. Busybox predates android, still doesn't ship with it. Android isn't adopting systemd before it adopts busybox, which means it isn't adopting this implementation of systemd, ever.

But even if it wanted to, android can't easily clone systemd, becuase it's (intentionally) a moving target with no standard and no documentation. The only way to figure out what systemd does is to reverse engineer the magic implementation and match its behavior _exactly_ because there's no spec to appeal to. Whatever it does is how you do systemd, by definition.

So the systemd developers are saying a billion android devices aren't interesting, that a shrinking pool of desktop PCs (as the smartphone kicks their comparatively big iron up into "the cloud") are what's interesting, and that the embedded world should take its cues from Red Hat Enterprise.

Other historical examples of this kind of magic implementation include websites that "work best in internet explorer", excel and word back before _massive_ effort went into cloning them, flash, and perl. (Which is why Perl's attempt to rebase on Parrot for version 6.0 languished for a decade; Perl's own developers couldn't clone it.) And now systemd.

The systemd developers are responding to upstart and launchd and android init as things they must _defeat_, an establish a new standard by crushing all the competing implementations. This means developers who want gradual staged transitions, and thus ask questions like "what if I don't want to switch yet", or "how do I get the old behavior out of the new thing", are enemies of systemd. Those questions are anathema to the systemd plan for world domination, if you're not using their stuff already you're the enemy, a relic of history to be buried. We can't opt out and see how it goes, we must fight to stay where we are. The systemd developers are basically taking the Microsoft approach to development: they don't want you to have the option of NOT using their stuff.

Again, this is not how Linux _ever_ worked. Linux maintained the 2.4 kernel for a full decade after 2.6 was out, x.org beat xfree86 on its own merits, ncurses and termcap coexisted for a decade. But the systemd developers are actively hostile to the _idea_ of non-systemd systems, because they want to establish themselves as The Standard, and they see the rest of us "waiting for a git" (and then actually _using_ mercurial's git interoperability mode) as enemies obstructing their grand vision of The One True Way.

To be comfortable upgrading, a lot of people want to have a fallback position. This isn't a Linux-specific issue: windows users staying with XP for years after microsoft wanted them to move forced its end of life date to be moved several times. Python 2.7 got a 10 year time horizon for support, because nobody wants to go to 3.0. People were free to stay with xfree86 until _they_ were ready to move to x.org, the change was never forced upon them.

We've been down this road before with devfsd, and there was a time when x11 wouldn't work without hald. They went away. The systemd guys are blustering that they're not the same kind of blind alley, and they have Red Hat behind them just like devfsd did.

This is why resisting systemd is justifiable. Yes, sysvinit needs replacing the way cvs needed replacing. But systemd ain't git. Having Ubuntu shoehorn people to Bazaar and Red Hat shoehorn people to Monotone does not improve matters, this is not the new system we need. Making it ubiquitous won't make it a better system.

I firmly believe that the Linux community needs to derive a new workstation OS from Google's Android code the way BSD derived a new system from AT&T's Unix code. Clone the proprietary bits, install on the same hardware, provide our version as a series of upgrades to the preinstalled version where possible until we can get vendors to preinstall our version, and meanwhile be closely compatible with the existing preinstalled version that has the giant userbase.

Systemd has no role in this. It has excluded itself, and that makes systemd an irrelevant dead end.

April 22, 2014

So let's try a thing. In the toybox source dir:

mkdir -p good bad ; for i in $(sed -n 's/.*TOY($[^,]*$,.*/\1/p' toys/*/*.c); do PREFIX=good/ scripts/single.sh $i || touch bad/$i; done

That tries to build every NEWTOY() and OLDTOY() in toybox as a standalone command using scripts/single.sh. The ones it can build wind up in "walrus". the ones it _can't_ build wind up in "bad".

Afterwards, we have 170 commands in "good", and can do a quick smoketest on them ala:

cd good ; for i in *; do ./$i --help | grep -q "usage: $i" || echo $i; done | xargs

Which says the ones that don't identify themselves via usage: strings are: arping brctl clear dhcpd dumpleases false fsck getty groupdel last more nbd_client sh syslogd telnetd top true watch.

Some of that's because "Usage:" is capitalized, and some is because there isn't a usage line. All of that should probably be fixed.

In the bad directory, we've got: addgroup adduser cd chown deallocvt delgroup egrep exit fgrep fstype ftpput groups halt logger mv nc pkill poweroff sha1sum toysh traceroute6 udpsvd unix2dos zcat

Most of that is OLDTOY():

sed -n '/OLDTOY/s@.*TOY($[^,]*$.*@\1@p' toys/*/*.c | sort | xargs

Produces: addgroup adduser chown delgroup egrep fgrep fstype ftpput groups halt mv nc pkill poweroff sha1sum toysh traceroute6 udpsvd unix2dos. Another, zcat, is using the new "two NEWTOY() in one file". I need to upgrade the infrastructure to deal with that.

Another two, cd and exit, are shell builtins. So building them standalone makes no sense. (That's what TOYFLAG_NOFORK currently means.) I've pondered adding some kind of TOYFLAG_CANFORK for things that could be built standalone, but aren't required to run in a separate process context, but cleanup for all the error_exit() cases (often implicit in xmalloc() and such) is a bit of a pain. Yes, it can do longjmp() for those, but resource cleanup is a pain. Freeing memory, closing files, even undoing mmap() are the easy bits, there's so much _subtle_ state implicit in setsid(), chdir(), unshare(), nice(), umask(), where stderr points... for the moment I'm just going "simple is good" and requiring at least a fork() so you can exit. Needing to exec is optional, but forking and letting the OS clean up the debris covers a plethora of sins.

That leaves deallocvt and logger, both in pending. The problem with both is "shared infrastructure neither in the current file nor lib": deallocvt depends on openvt, logger depends on syslogd, each uses infrastructure from a completely separate command that it doesn't share a file with. (This command needed to use the two NEWTOY() in one file thing, but it predates it.)

So, the fixes for those are to put deallocvt in openvt, and put logger in syslogd, both using the new CLEANUP infrastructure.

April 19, 2014

This post was a reply to the patentless public domain software idea, and it seems to boil down to an objection that if we don't solve the entire problem in one go, we shouldn't do anything.

There's two levels of objection here, what we _can_ do and what we _should_ do.

On the "can" front, public domain software offers opportunities that licensed software does not. Licensed software has already been extensively litigated; you can't easily make a new argument through the courts about it. With public domain software you can point at standing (the authors have no more rights over this software than anybody else), historical precedent (back when public domain software was common, there were no software patents), possibly make a constitutional case about the stated purpose of intellectual property being served, and so on.

On the "should" front, we can break this into two stages. 1) Can we get this specific narrow thing more easily than a larger more ambitious goal? 2) Where would we go from there?

A limited, easily bounded, easily defined exception is easier to get through a disfunctional system full of strong opposition. This doesn't threaten parmaceutical companies, which are to patent law what Disney is to copyright.

The very fact there _isn't_ as much public domain software right now is an advantage if it makes it nonthreatening: we have to build everything all over again in order to take advantage of this, that gives them a few more years of trolling and corporate mexican standoffs. (Of course given that toybox is reimplementing what busybox reimplemented after gnu reimplemented it after bsd reimplemented it after the bell labs guys wrote it in the first place, and that's ignoring all the _side_ branches like minix and coherent... I'm not hugely worried about the need to write new code.)

Encouraging the creation of more public domain software is not a bad thing. Requiring all software to drag along legal boilerplate is a sad relic of the pre-internet days and something we should really outgrow. The only people who really consider elaborate open source licenses a _superior_ approach (instead of something we had to put up with in the days of floppies and 300 baud modems with long distance bills) are the copyleft crowd. I can see that abandoning the GPL seems like a loss to them, but if the FSF releasing GPLv3 fatally split "the GPL" into incompatible camps, moving on is actually progress. (Perhaps some people will stay and mope for the rest of their lives, with decreasing relevance. Or maybe they'll come up with a GPLv4 that miraculously unifies the Linux and GNU codebases instead of further fragmenting stuff. The future is hard to predict, but I'm not holding my breath.)

It's also an argument we can pull out if and when we get sued: it gives public domain software authors incentive to fight patent suits because we have a potential upside, and that potential might help fund a kickstarter to cover legal fees if enough other people want to see how it plays out. (Of course you'd try to invalidate the patents in question in parallel...)

Let's go through the clauses of the post's objection:

1) DOD procurement regulations are not the same as copyright law, which is not the same as patent law. If open source is so easily defined, what is the OSI for?

The Open Source Initiative just redid its board again and is consutling with the FSF about going forward. This is an organization that tries to adjudicate what is and isn't open source. Their open source definition is up to version 1.9. It has 10 clauses. They've approved sixty nine licenses.

As for FUD against the public domain being a problem: that's not case law, that's public perception. You change public perception by not buying into it and publicly stating the opposite, and the whole being the change you want to see thing. (So if you're looking for a bicentennial quarter, carry a drum and wear the appropriate hat.) We _need_ guys like Lawrence Lessig defending fair use in order to still have fair use. I think the public domain is a good thing, and I think it's a more coherent response to the FSF splitting the GPL into warring factions than the Dan Bernstein "just don't ever state what the legal status of your code is, put it out there and ignore anyone who brings up licensing" that became so popular on Github last year.

2) Precisely because public domain software is currently small, it's nonthreatening. If you start litigating the future of android, large entrenched interests descend from the clouds to deadlock over the status quo. Small voices can be heard over small issues, and writing new public domain code is not a big deal.

Pursuing this doesn't _block_ other people from trying to dismantle patent law entirely, or get a copyleft carveout. A "huge shift in the imagination of the movement" means what? It's not we're currently doing doesn't mean we shouldn't start.

3) Public domain is smack dab in the center of this "commons". You can't get more commons than public domain. If putting stuff in the public domain is a price, some of us are already paying it. Asking for something in return for sunk costs isn't a big deal for those of us already releasing public domain software.

4) Everybody I've spoken to who is not a lawyer has a strong opinion about the law. Including the people "releasing code into the public domain under the GPL". (Yes really, more than 10k Google hits each for "public domain under GPL" and "public domain under the GPL".)

Asking programmers to be lawyers is a _bad_idea_, but it's inherent in depending on specific license text to do specific things. Back when GPLv2 was "the GPL" and you only had to ever care about a single license (and everything else was either perfectly convertible to that, or it wasn't), this was just about manageable. But GPLv3 destroyed that: the Linux Kernel and Samba implement two ends of the same protocol, each end under a GPL, and they can't share code.

5) Licenses being inferior to the public domain is what my sadly incoherent Ohio LinuxFest talk was about.

For a couple decades, the GPL was synonymous with copyleft. It was a category killer license. But GPLv3 rendered copyleft moot by irreperably shattering "The GPL" into multiple incompatible factions. In the absence of a universal receiver the under-30 crowd is switching en masse to universal donor (hence the resurgence of BSD variants), but without the actual public domain acting as a category killer in that space, nobody using a near public domain license can quite re-use anybody else's code without answering whether ISC and 3-clause BSD are more or less compatible than ISC and 2-clause BSD. Until then, "not invented here, let's reinvent the wheel" reigns supreme.

Meanwhile, over in the BSD world, you get long accumulations of license text as projects just append license after license together into an ever-increasing compost heap of legal boilerplate. Dropbear is an interesting example: it's based around the public domain libtommath and libtomcrypt, each of which have a simple one sentence declaration placing the code in the public domain. But Dropbear itself has a long multi-page license compost heap anyway, because the rest of the components are BSD variants that say "copy this slightly different phrasing of the same thing, verbatim", so you can only append, never simplify.

(Note: if there was anything _wrong_ with the public domain, it would be wrong with dropbear, which depends on large swaths of public domain code for everything it does. But people act like a random third party taking it and relicensing it somehow fixed that.)

The main thing public domain does that BSD doesn't is allow the compost heap of duplicate boilerplate license text to collapse together. We have the internet now, we don't have to glue permission statements to every single file in the tarball to avoid losing track of the code's status.

Then there's the crowd that's decided software copyrights are as bad as software patents and they're not going to participate. "No license specified" was the #1 license on github for the same reason Napster was so popular, and some portion of that is widespread civil disobedience opting out of a broken system while waiting for its collapse.

License fragmentation is not going to _improve_. Adding more licenses can't solve the problem of too many licenses. THAT is what the public domain accomplishes. It's the zero of licenses, and "solving for zero" cracks open lots of problems.

Promoting the public domain is the only way I can see of returning to a common state where I know a large pool of projects can share code without endless uncertainty, armchair lawyering, and the xkcd strip about standards. Now that GPLv2 has been dethroned, no one license is going to become "the license" the way that once was. But _no_ license can become the license, because "it's either public domain or it's not" is still a valid distinction that means programmers don't have to be lawyers.

Public domain is when the author has retained _no_ special rights, and placed _no_ restrictions on the use of this code. Whether they used CC0, the unlicense, zero clause BSD, or simple statements like the ones on libtommath and libtomcrypt, they're all interchangeable. If you took CC0 code and put it in libtomcrypt, you wouldn't have to commemorate this forever with an entry in an endlessly increasing LICENSE file (and cthulu help you if you ever had to figure out what the contents of that LICENSE file actually _meant_ in any sort of legal context).

April 18, 2014

Yes, I am crazy enough to implement a "tr" with utf-8 support.

(Because busybox 1.21.1 won't build defconfig using its own tr, the link fails trying to figure out what libraries to include. No, I didn't notice before because I wasn't building defconfig, just the restricted config aboriginal needs minus all the toybox stuff. Well _obviously_ the correct fix is to write a tr that can build this package.)

April 17, 2014

Ah, "stdbuf" is a command now. Ok.

It's good that people are finally implementing various obvious missing pieces I spent the whole of the 2000's complaining about lacking. When I submitted "count" to busybox back around 2004 they turned it into pipe_progress (in an epic bout of bikeshedding). I removed the "touch -l" option from toybox when I found out about the truncate command. Now somebody FINALLY made a buffer command part of the standard set. (Still no ratelimiting though. Found one that did that a few years ago, but it was kind of overdesigned.)

I await the "loop" command to plug the output of a pipeline back into the input at the start of said pipeline. (Alas, from poking at it myself I think it has to be implemented as part of the shell. It's one of my many toysh todo items.)

April 16, 2014

I hope bloomberg has good luck with his single issue advocacy group to counter the NRA, but I don't think it'll work.

The reason I don't think it'll work is anybody with a "stop thinking and do what I say" button in their heads was already grabbed by the Republicans to buttress the racist core of the confederacy that switched poles when LBJ signed the civil rights act, and went from "voting against Abraham Lincoln because of the emancipation proclaimation stopped slavery" to "voting against Lyndon B. Johnson because of the Civil Rights Act stopped Jim Crow".

I wrote about this at length last year.

April 11, 2014

I'm sad that Python no longer follows its own "There should be one obvious way to do it" dictum. It would have been nice if they'd admitted Python 3 is a different language, called it "Anaconda" or something, and had Guido go off to do that the way Ken Thomspson went off to do Plan 9 instead of Unix, and Go instead of C.

Then the rest of us could have stayed where we were happy, and he wouldn't have undermined the existing userbase via the Bad Sequel effect. (See also GPLv3 vs GPLv2 and the Police Academy movies.)

April 10, 2014

Watching a ted talk on my phone this morning, it occurred to me: maybe we can fix software patents by arguing they shouldn't apply to public domain software?

Establishing that public domain source is not appropriate material for patent coverage is a narrow exception, which legally _was_ the case back before copyright applied to software by Apple vs Franklin decision in 1983. The revival of the public domain should mean a return to unpatented status for that category of software. This used to be normal, by copyrighting software people opened themselves up to other IP claims on it. Removing copyright should also remove the patent attack surface.

We can of course push to get a law passed to that effect (and should try), but there are also historical and constitutional arguments which can be made through the courts. The constitutional purpose of patents, to promote the progress of science and industry, is better served by open source software than by the patent system. The tradeoff for patents was documenting the invention in exchange for a limited monopoly, open sourcing the code is the best possible documentation of how to do it, and by doing so the authors give up any proprietary interest in the code. How do you have grounds to sue them for code they have no proprietary interest in?

The reason to say "public domain" instead of "open source" is partly that open source is difficult to legally define: the open source initiative couldn't even defend a trademark on it. Microsoft released the old dos source with a clickthrough agreement preventing reposting, is that "open source"? GPLv2 and GPLv3 are incompatible, and neither can contribute code to an LGPL project, how much "project X can't use my code" is allowed while still being open source? Does the GPL's "or later" clause invalidate the defense if a hijacked FSF could release a future version with who knows what clauses in it? Does the Artistic license qualify? What about the licenses where anybody can use it but this one company gets special privileges, ala the first draft of the Mozilla license?

Public domain hasn't got that problem. It avoids the whole can of worms of what is and isn't: the code is out there with zero restrictions. The price for freedom from patents should be zero restrictions: if the authors have no control over what people can do with it, why should uninvolved third parties have a say? Ideally the smooth, frictionless legal surface of the public domain should go both ways.

That's the constitutional argument: freely redistributable, infinitely replicable code serves the stated constitutional purpose of copyrights and patents better than patents do. Releasing your copyrights into the public domain should also prevent patent claims on that code.

The historical reason to say "public domain" instead of "open source license" is possible legal precedent: back when software was unpatentable, it was also uncopyrightable. An awful lot of public domain software used to exist, and when people added copyrights to it, they opened it to patents as well. Software that _isn't_ copyrighted, historically, also wasn't patented. If somebody tries to enforce patents against public domain software, we can make a clear distinction and ask a judge to opine.

(The Apple vs Franklin decision went in Apple's favor because Franklin looked bad. There was clear and obvious copying going on, Franklin took Apple's work and profited from it. If a patent troll or large for-profit company sues a public domain open source project, they'll look bad. If we can say "these other patent suits were against copyrighted software, this is public domain software, we _can't_ profit directly from this", it might be a legally significant distinction.)

A few obvious objections:

"This makes patents meaningless!" Nope. This doesn't apply to Microsoft Office, because it's not in the public domain. If you think you can win a patent suit against microsoft, go for it. This doesn't apply to a proprietary for-profit competitor to Microsoft Office either. Heck, it doesn't even apply to Linux. It's a very narrowly tailored exception. Beyond that: drug patents aren't affected, which gets the biggest patent mover off our backs. And the "reducible to a device" test we've been making in court still applies: selling a dedicated device that does something may still be patented even if public domain software and a general purpose computer by theselves aren't.
"But Piracy!" Except you can't place a copyright you don't own into the public domain, so this doesn't apply to pirated software. If you clone somebody else's program maybe they can get you for trademark infringement (you don't need to patent pac-man to sue pac-man clones), but that's a separate (existing) issue.
"So if somebody makes a racist shoot-em-up and releases it into the public domain there's nothing anybody can do?" That has nothing to do with patents. Software authorship is still speech so you may be able to prosecute them for hate speech.
"But this doesn't advance the goals of the FSF to put everything under the GPL!" So what?
"You didn't say human readable public domain source code." That's right. Reverse engineering a binary isn't hard. Legally defining "human readable" might be. And being able to sue people for patents on public domain binaries would suck. Keep it simple: trying to complicate a system to shoehorn people into the specific actions you want them to take just causes more corner cases people can use to game the system. Let's not go there.

The old "math isn't patentable" arguments once held sway. They got abandoned about the time the public domain was abandoned (with the apple vs franklin decision and the rise of proprietary software), because once math was _copyrightable_ it was a short step to making it patentable. We can position this as a return to a historical position, possibly even an unlitigated corner case that was ignored by those original decisions in the rush to resolve _competing_ IP claims, not jurisdiction over a _lack_ of IP claims.

Doing so doesn't threaten the business model of anybody actually _doing_ anything. Microsoft sucked in the BSD network stack and happily profited off it for years. Despite this, BSD survived long enough for Apple to suck in the WHOLE of BSD a decade later. (And yet BSD still exists...)

April 2, 2014

Hmmm, glitch with toybox when cross compiling, it builds an a.out with the cross compiler then tries to run it (to probe the value of O_NOFOLLOW). I think the reason for this is some recent-ish variant of ubuntu was refusing to #define it without the _GNU_DAMMIT micromanagement symbol, or some such? I forget, and the checkin comments aren't particularly revealing. I haven't been blogging as much as I used to. Hard to search your notes-to-self if you didn't write it down in an obvious place. Hopefully now I've gotten kernel documentation maintainership handed off (and the corresponding frustration) I'll get back to that.

Anyway, put it back in portability.h with an #ifndef to set it to 0 if the headers haven't got it. (That's also the failure mode of the cross compiling case: if O_NOFOLLOW isn't there, we can't use it.)

But first, I need to fix the date command, which broke when I added checking for invalid inputs: sscanf() returns number of entries read, not number of bytes consumed. If you're reading an unsigned 2 digit number (%2u) and get "1Q", the number of elements read won't help (it got a 1 and set it, it's happy), you need %n to say the position it ended parsing at...

April 1, 2014

The next Linux Luddites is up with replies to my interview last time, and one of the questions was about BSD kernels. My own flirting with PCBSD was ambiguous (got it installed, but the package management system didn't want to work, without which the toolchain was missing Important Bits). Still better than free, net, or open managed.

So I pinged a BSD developer on twitter about kernel building, and she pointed me to the BSD source and instructions.

Something to poke at...

March 31, 2014

Dalias pointed me at tinyssh, which has no source or mailing list yet, but is somebody poking at building ssh out of the NaCL public domain crypto library. So I looked into nacl. (Pronounced "sodium chloride".)

It's _hilaribad_. Dan Bernstein is involved, so you know this library hates you, personally. But I expected basic competence. (Yeah, I dunno why either.)

The library download is an http URL. If I change that to https, it's a self-signed key. There are no signatures for the tarball on the website (md5, sha1, sha2, gpg, nothing).

I complained about this on twitter, and Dan Bernstein replied that anybody wanting to inject flaws into the tarball would have no trouble subverting the https registrars, and that's why they don't even bother trying.

That's presumably also why there are no signatures on the website so you can verify the tarball after download is the one the developers think they wrote. Further exchanges with other nacl users were about how "delegated trust" is bad in the absolute sense, so what you must do is read the code and become such an expert that you yourself can detect things like the recent subtle iOS and gnutls flaws that the maintainers of the relevant projects themselves didn't spot for years. And if you can't do that, you have no business using Dan Berstein's code.

This is why I don't use code written by Dan Berstein. I'm sure he's an excellent crypto researcher and/or ivory tower academic, but as a software project maintainer he's deeply really annoying. And why I've gone back to poking at libtomcrypt, which is also public domain, and I can get a known copy of through dropbear to compare against other versions. (Maybe dropbear was compromised years ago, but a lot more people have looked at that and I can diff of a known base to see what changed. And the maintainer hasn't expressed incredulity about why I might want to do that, or suggested that only people capable of writing this code are ever qualified to use it.)

March 27, 2014

Finally emailed Randy Dunlap asking if he wants kernel documentation maintainership back. Off-list, because I don't need the drama. It's not "Mom, James Bottomley was clueless at me!" and it's not that the kernel.org guys might as well be cardboard cutouts. It's that I have too many other things to do with my time, which are more important and _way_ more fun.

If I want to engage with a bureaucracy, I _am_ still subscribed to the posix standards committee mailing list and there are Things They Should Do. Both tar and cpio _used_ to be standardized in the 2001 version (SUSv2) and they need to come back and be modernized; admit "pax" was a mistake nobody uses. Add "truncate" which Linux picked up from freebsd over 5 years ago. Explain what happens to getline()'s lineptr field when you pass in NULL requesting it allocate memory but the read fails: does it reliably stay NULL or does it return a memory allocation you need to free even in the failure case, or does it change the pointer but free the memory itself so you DON'T free it for failure? The last one seems unlikely if it's doing remalloc() but I can't quite rule it out...

The posix committee at least never _claimed_ to be a hobbyist endeavor, so if nothing else they're not being hypocritical about it.

Looking back, I can see Linux development "selling out" at least as far back as 2007, where IBM's needs trumped Linux on the Desktop's needs because IBM was just coming off its billion dollars a year annual investment in Linux, and they wanted to follow the money. Red Hat had retreated up to the "Enterprise" market eating Sun's lunch, so who cared about something silly like Knoppix? What was important was who paid developer salaries! I found that creepy.

But I thought there was at least a remaining _role_ for hobbyists until now. Live and learn: it's corporate all the way down now. Forms and procedures so they can categorize your submission progress through the kernel development process. It's all tracking and delegation and risk management assessment now, collecting required approvals. They have procedure update procedures (discussed at the kernel summit). Multiple submissions policy documents (in triplicate, and that's before you get to semipractical details like security or coding style). There's even a checklist (currently 26 steps).

The bureaucracy isn't paralyzing yet. But if you're wondering why there are no more hobbyists joining in...

March 26, 2014

I got cpio.c cleaned up and promoted out of "pending", but haven't done a proper test suite yet, and only really tested the extraction side (as non-root). I have a longish todo list for it, including teaching it to understand the kernel's initramfs creation syntax.

The kernel's scripts/gen_initramfs_list.sh and usr/gen_init_cpio.c respectively create and consume a file list where each line has not just a filename but type, ownership information, and so on. All the stat info so you only ever need to look at files to get contents. This is the same general idea behind squashfs -p or genext2fs -D, both of which let you specify filesystem entries (such as device nodes) that you can't actually create without root access. This ability to supply extra data lets you create root filesystem images without running as root.

This is really useful, and the data format's been stable for just under a decade (symlink, pipe, and socket support added January 2005)

March 13, 2014

Linux Weekly News's article about development statistics for Linux Kernel version 3.14 says "The number of contributions from volunteers is back to its long-term decline." and later "Here, too, the presence of volunteer developers has been slowly declining over time."

Gee, whoda thunk?

March 12, 2014

Got Isaac's cpio -d patch handled, and now I'm cleaning up the rest of cpio. The vast majority of Isaac's patch factored some common code out of mkdir (albeit in a way that subtly broke mkdir so I had to reimplement it), but as long as we're touching cpio, it's not actually that big...

I'm making an executive decision that cpio belongs in the posix directory, because it _was_ in posix. Just not posix-2008. It was in posix-2001, and they removed it from the standard just about the time that RPM and initramfs and such started heavily using the format. (The same thing happend to "tar", although that was even more widely used for longer.) Both were deprecated in favor of Sun Microsystems' "pax" command, which nobody uses for anything, and which I have no interest in implementing.

I am a bit concerned that cpio has 8 hexadecimal digits for date: that's a 32 bit value and thus the 2038 problem. Ok, interpreting it as unsigned gives us another ~80 years beyond that so it's not an immediate problem. But still. I should poke the initramfs guys and go "huh"?

Unspecified posix question du jour: if I feed a pointer containing NULL as the first argument to getline() (which posix 2008 says tells it to allocate its own buffer), and the read fails (function return value -1), does it still write non-NULL into the pointer in that case, and if so is it a still-valid memory allocation I'm responsible for freeing?

March 8, 2014

This morning a little programmed phone alarm reminded me, one hour before the fact, that I was on a podcast! (Ok, that's an oversimplification, but we'll run with that, shall we? I did eventually remember what it was for.)

Alas, I _meant_ to set up skype and garageband on Fade's computer a week ago, when she was still here. Of course doing a skype password reset meant downloading my 5000 pending emails (I apparently hadn't checked my email since _last_ weekend), but it didn't quite take the full hour to get to the end of it through pop3. (Thunderbird and gmail have conflicting hardwired assumptions about folder layout, using pop bypasses these irreconcilable differences.)

Anyway, we got it to work and we talked for an hour and change, so presumably when Linux Luddites episode 11 comes out, I should be on it. Woo! (I'd trust them to edit me down to something coherent, but they say that their editorial policy is just to cut out pauses. Not sure that's ideal in my case, but oh well.)

Meanwhile, in that giant pile of email, amongst the endless flood of "every kernel patch series that includes a documentation component, plus all ensuing discussion of said patches" (which the kernel's find maintainer script says I should be cc'd on, and then the kernel social norm is to reply all), there were actually interesting things!

It turns out there are prebuilt ellcc binaries, (somebody emailed me about it on tuesday, I'd say who but my current email reading solution is X11 forwarding over ssh from a machine that isn't world accessable, so havign internet on my phone doesn't help when I'm out. Forwarding an ssh port to it is a todo item, not helped by the fact the box with the working mail client is dhcp and its address changes weekly. You probably did not need to know this.)

Anyway, I downloaded these tarbals, tested one to see what its file layout looked like (I have _learned_ that even though the unix/linux norm is "tarballs extract into a directory with the same name as the tarball" nothing actually _enforces_ this, and indeed this tarball didn't do that), found out it was creating a "bin" and "libecc" directory, and went "hmmm" because how do the files in "bin" find that libecc? Do do find the directory their binary is in and do ../libecc?

The answer, from experimentally building hello world, is "no, they don't":

$ bin/ecc -v hello.c
clang version 3.5 (trunk)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/4.6
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/4.6.3
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.3
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
"/home/landley/ellcc/x86_64/bin/ecc" -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -mrelax-all -disable-free -main-file-name hello.c -mrelocation-model static -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -target-linker-version 2.23.2 -v -resource-dir /home/landley/ellcc/x86_64/bin/../libecc -internal-isystem /usr/local/include -internal-isystem /home/landley/ellcc/x86_64/bin/../libecc/clang -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdebug-compilation-dir /home/landley/ellcc/x86_64 -ferror-limit 19 -fmessage-length 79 -mstackrealign -fobjc-runtime=gcc -fdiagnostics-show-option -vectorize-slp -o /tmp/hello-dd5e79.o -x c hello.c
clang -cc1 version 3.5 based upon LLVM 3.5svn default target x86_64-unknown-linux-gnu
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/home/landley/ellcc/x86_64/bin/../libecc/clang
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
"/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crt1.o /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.6/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../.. -L/lib -L/usr/lib /tmp/hello-dd5e79.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/4.6/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crtn.o

What the New Jersey?

It's... finding gcc on the host. Why does it _care_? It's then using /usr/bin/ld (which is not the bin/ecc-ld linker) and calling it against host headers and libraries. So that's totally derailed at that point.

I tried adding --sysroot both with $PWD and with $PWD/libecc and both times it died saying it couldn't find stdio.h. Looking _into_ libecc there's a Makefile in there (?) but it seems like there are headers and library binaries in there too? Sort of? (It's a multilib setup, which I generaly avoid, but this is a compiler that supports every target in one binary. How I'm not quite sure. What you set CROSS_COMPILE= to when building with this, I dunno. But have to get it to work on the host before worrying about that...

This worked for somebody, so having it is progress. It's just not as much progress as I'd hoped.

The ironic part is the obvious way forward here is for me to finish the ccwrap rewrite, and task it with wrapping THIS compiler so I can tell it where to find its darn headers and libraries and give it the --nostdlib --nostdinc stuff so it ignores the ones on the host. :)

So... back to what I was doing then.

March 7, 2014

I meant to add many, many links to the previous two blog entries, but the problem with "furious apathy" is oscilating between doing WAY TOO MUCH and not being sure I care at all.

As Zaphod beeblebrox said, "Nuts to your white mice."

March 6, 2014

Work's picking up. When I interviewed for this job I actually interviewed for a QA position on the theory it was something I hadn't really done before, and now they've set up a spare BVT (Build Verification and Test) machine, and I'm slowly picking through a giant python 2.4 test suite on top of my ongoing poking at buildroot.

This means work isn't just taking up most of my time and energy, it's actually taking up a certain amount of headspace because I'm learning stuff and doing designs and plans, which the previous janitorial work didn't stress so much.

On the one hand, this is sort of energizing for my open source work. (Nothing quite so draining as enforced boredom.) On the other hand, it's making me LESS interested in trying to make nice with James Bottomley. His day job is doing Linux kernel stuff. Mine is not, and he's made sure I won't find it _fun_ either.

I've got plenty of other hobbyist programming that IS fun. Wasting time on kernel documentation stuff is drudgery I don't get paid for, and now that Bottomley was kind enough to clarify that _nobody_cares_, and in fact the kernel guys can't comprehend the idea of anyone NOT having a day job doing kernel development (I have yet to _look_ at kernel code for this job, it's all userspace; Jose does the initial board bringup and the broadcom or San Jose guys handle driver issues), it's probably time to hand it off.

Yes, I _can_ persist through things like the 5 years necessary to get the perl removal patches in, or the giant gap between this and the initmpfs patches going in. But mostly I don't care enough to bother. I only submitted miniconfig upstream three times, and even though other people still find it useful enough to namecheck in the docs today (No, I didn't add that) and a few years later it was the obvious solution to Linus's arm defconfig woes... all I really care about is that it works for me. Sure, I'll tell other people how to use it when it seems useful, but if the kconfig maintainer says no to it and tells me to go off and do an order of mangitude more work before he'll even pay attention again, upstream can go hang. (I'm aware that guy is no longer maintainer. Just as Paul Mundt is no longer sh4 maintainer. They "got over it", and I'm still using the solutions that they rejected. And I'm off to do other things...)

(Yes, I gave a talk a few years ago explaining bounceback negotiation in free software projects. I understand what they're trying to do. But the threshold of dedication they expect from people is way beyond hobbyist and somewhere between "this is my day job" and "cult status".)

My kernel documentation todo list is all things like try yet again to collate the arch directories because after it got bikeshedded repeatedly it fell down my todo list. Moving that many files does require a git tree because the gnu/dammit patch doesn't understand move file syntax git intorduced, but after the kernel.org guys went "of course you upload websites in git, just like you browse the web using git" I decided not to set up a public git tree until I get rsync back. Since they've made it clear the current kernel.org admins aren't actually capable of doing that, even if they wanted to... (Despite me pointing them at the appropriate ssh passthrough syntax for restricted rsync from an Ohio LinuxFest security presentation...)

Heck, even trying to filter out the the device tree stuff in the Documentation maintainers entry got bikeshedded enough that I wandered away and lost interest. (I was also trying to filter out the internationalization directories, which I argued against including but was overruled by somebody who doesn't speak chinese either. Endless todo items...)

My non-kernel todo list hasn't gotten _shorter_ since the busybox days. I have tons of things other than kernel development that I'd like to do. Making time for linux-kernel documentation was public service, one that makes it significantly harder to read my email.

In that context, James questioning my commitment to sparkle motion because I'm not putting in as much time as _he_ does (with his full-time job at Parallels working on the kernel), calling me weird for being too stubborn persist with this Linux thing in the face of Windows or iPhone's continuing market dominance (I.E. weird for not getting over things and moving on)... fundamentally being offended that this is _not_ my day job and might compete with other things for my hobby time? How dare I _volunteer_? Nobody does that anymore, they're all PAID to do it...

If that's what kernel development's come to, he's right. I do have other things to do with my time.

(P.S. I thought it was a bad sign when the kernel guys did a whole "Is there still such a thing as a kernel hobbyist? Let's find such a unicorn and sponsor their trip to LCA!" And then Philip Lougher's horror story where he took a year off from work to finally get squashfs merged did _not_ win him the trip; as far as I can tell nobody noticed it. No wonder Con Kolivas flounced (although he seems to be back lately). No wonder the average age of kernel developers is Linus's age and rising: no new hobbyists in the past decade, and people like me are replaced by people working on Oracle's fork of Red Hat Enterprise...)

Anyway, if you wonder why I haven't been able to politely reply to James Bottomley's questioning my commitment to sparkle motion... this is the tip of the iceberg of the _anger_ that comes out on the topic.

I wouldn't be angry if I didn't care, but I'm working on that.

March 5, 2014

Musl-libc is in feature freeze for 1.0, meaning I spent most of last night on irc with the maintainer working out the wording of a press release to announce it to the world. (I'm pretty sure the phrase "for immediate release" is in Linux Weekly News' spam filter, but Rich insisted.) I learned the difference between marketing and sales many years ago (and that I can do marketing pretty well, but can't close to save my life), so i worked out their initial marketing plan and now we're digging it up for 1.0.

My todo items from this are to bring the wiki's current events page up to date (basically another kernel-traffic variant like the qemu weely news I tried to do for a while, and rewriting ccwrap.c to work with musl so I can port aboriginal Linux from uClibc to musl.

Over on the aboriginal side, I'm way behind on release (largely due to the sparkle motion thing making me not want to look at the new kernel). I decided to skip a release, but next one's coming up, and I still need to fix powerpc. A week or so back the musl guys asked me for an sh4-strace binary, which needs to build natively. The sh4 emulated board is crap (64 megs ram, one disk, and if you hit ctrl-c it kills the _emulator_ rather than passing it through). I made an ext2 sh4 root filesystem with 2 gigs of extra space to combine my /dev/hda and /dev/hdb into one disk, and then added a 256 meg swap file to overcome the insufficient memory thing, and then wget the static-tools.hdc image and loopback mounted it. At that point the build failed because the board doesn't emulate a battery backed up clock so the clock thinks it's 1990, meaning make complains all the file dates are in the future. When I tried to set the date by hand I found a bug in the toybox date command, so I need to fix that. (Meanwhile the musl guys got their sh4 port largely complete without me, using the last aboriginal sh4 release image. But still: I should finish that up. Oh, and the sh4 kernel config forces EXPERT which causes collateral damage i need to fix up to, by ripping "select EXPERT" out of the sh4 kconfig.)

The big aboriginal todo item is the ccwrap rewrite so I can port aboriginal's toolchain to building everything against musl. (Yes ellcc remains a todo item but the build breakage there goes pretty deep

Meanwhile over in toybox I'm working on the deflate compression side, because I don't want to ship with half a gzip implementation (sflate) I'm not going to keep. The japanese guys have shown they'll happily use code and become dependent on code out of "pending" that's default n in the config, so if I'm going to swap implementations I want to do it before the release. (I'm also partway through adding grep -ABC, need to rewrite the cpio -d patch, and so on. Figure out which o those go in the toybox release I need to cut to include in the Aboriginal Linux release.)

Oh, and one of the busybox guys emailed me to ask me to update the busybox binary images for the current busybox release, which is also sort of blocked on getting an aboriginal release out. (New aboriginal uses new busybox, I usually build binaries with that. But I might just do a one-off by hand with the old release images to get it off my plate.)

I'm probably behind on the toybox and aboriginal mailing lists again, but since Sparkle Motion I've only been able to stomach reading my email once or twice a week because 95% of what I have to wade through there is irrelevant kernel documentation crap that I can't just _ignore_ but have to filter for bits to go upstream. Any patch series that includes a documentation component cc's me personally on the entire series AND the ensuing discussion, and that's something you brace yourself to wade through at the best of times. And of course getting documentation on a topic you know nothing about and having to _evaluate_ it requires more focus and study time than I usually have when I'm so tired I can't do anything more productive than catch up on email...)

I also need to renew my domain registration (expires on the 11th) but I don't want to just renew it, I want to move it to dreamhost (which throws in a free domain with web hosting anyway) and that _also_ involves reading documentation (on both the old and new services) to unlock and transfer the domain without bricking my website and email. Might wind up just paying the old guys another year to not have to deal with it right now, but I'm trying not to do that.

Oh, and I have to set up skype and a recording thing on Fade's macintosh because some guys in... ireland? want me on a podcast this weekend.

Anyway, that's the immediate, time-critical stuff. I think.

March 4, 2014

Today, I remembered my netbook's power cord. And getting logs on my netbook turn out to be approximately as time consuming as I expected, not just because it's slow to build (a full target build on the netbook takes most of an hour). No, it's because development involves iterately answering the question "_now_ what have I screwed up?"

Forgetting to pass a mode to the open() of the log file so all opens after the first fail because the stack trash it used as permissions for the newly created file didn't have the write bit set. Doing a build for an architecture that doesn't currently compile because I'm in the process of redoing its config to not force "EXPERT" and it turns out there's kernel version skew in the patch that applies that. Logging just the after without logging the before command lines. And a typo in ccwrap that breaks the build didn't get noticed until the end of simple-cross-compile.sh, _twice_, and then I had to redo it with CPUS=1 because the before and after sequences aren't stable otherwise and it's kinda important to match them up...

Three minutes of fixing the last bug, start the build over from the beginning, go do dayjob for an hour or more until i get a break, check the log, three minutes of fixing the next bug, rinse repeat...

March 3, 2014

Banging on ccwrap, actually debugging the build in place is kinda horrible (especially on the netbook), so I've come up with the idea of logging the before and after gcc command lines, and running the 'before' through the new ccwrap and having it print out the new 'after' instead of running it, and then I can compare the files. There's a gazillion other fiddly bits (such as environment variables), but it's a start.

At least that's what I _would_ have worked on today if I hadn't forgotten to bring my netbook's charger. (Enough battery for the bus ride(s) in and the bus ride(s) home, but not enough to leave it running a long build while I'm actually at work...)

Of course getting the logs on my netbook turn out to be a bit time consuming, not just because it's slow to build. (Forgetting to pass a mode to the open() of the log file so all opens after the first fail because the stack trash it used as permissions for the newly created file didn't have the write bit set. Doing a build for an architecture that doesn't currently compile because I'm in the process of redoing its config to not force "EXPERT" and it turns out there's kernel version skew in the patch that applies that. Logging just the after without logging the before command lines. And a typo in ccwrap that breaks the build didn't get noticed until the end of simple-cross-compile.sh, _twice_, and then I had to redo it with CPUS=1 because the before and after sequences aren't stable otherwise and it's kinda important to match them up,

March 2, 2014

My new phone has netflix, which has the same problem with the nightly netflix watches with Fade: if I'm programming, I want some background noise but not something hugely distracting.

Which is why I'm currently re-watching the Disney tinkerbell movies (which I have in fact already seen with the niecephews).

You've gotta wonder about the ecological catastrophe that required all these manual fixups of that parallel earth's biosphere. Luckily there was a friendly alien race around to leave a colony to do just that. (Presumably out of guilt from having contaminated our biosphere in the first place? Dunno, they seem to have regressed a bit, maintaining some very user-friendly nanotechnology but not a whole lot of actual records...)

(In other news, George Carlin's 1978 HBO special wasn't as funny as his later HBO specials. Presumably it's something HBO acquired later rather than having existed for in-situ...)

(Possibly I'm not being as un-distracted as was the intent...)

March 1, 2014

Deflate compression side is eerily familiar. I've written this code before. (In Java! In 1996. Ported from info-zip.)

Corner cases I need to add tests for: gunzip -S "" blah, gunzip .gz, gunzip ..gz, touch ook && gunzip ook.gz...

That first one, gunzip prompts to overwrite and if you say y it deletes the file. That's nice of it. I notice that -f doesn't force it to decompress an unknown suffix.

I'm sort of tempted for the "gunzip .gz" case to produce "a.out", on general principles.

February 22, 2014

Chipping away at the email backlog. Still not coming up with a civil answer to James Bottomley's sparkle motion thing. I _am_ coming up with a long list of other things I want to do that's convincing me bothering at all with kernel documentation is a complete waste of time.

Started to send a message to the list describing my solution to the "multiple commands with different command line option parsing in a single C file" problem, and during the writeup realized I hadn't solved the entire problem and I have to redo more of the header file generation. (Disabled FLAG_x symbols need to be present but defined to 0 to avoid bulid breaks. Right now if the command is disabled the flag block isn't present in the header, so the clean happens but the new command's definitions don't.)

It would also be nice if all CLEAN_ blocks preceded all FOR_ blocks, because otherwise if they've both got a FLAG_b it clashes. My original idea was that command code go in alphabetical order within a file, because that's how the stanzas occur in the file so a CLEANUP_A FOR_B pair will work if A comes before B alphabetically.

On the one hand, that's annoyingly subtle. On the other, it's a pain to teach mkflags.c to cache output. On a third hand (or possibly tail), I'm not sure if complicating the build infrastructure or complicating the code is worse, it's one of them tradeoff things without an obviously superior way to go...

It's always the near ties, where it probably doesn't hugely matter which one you pick because both suck indistinguishably equally, that are hardest to decide. Precisely because neither _is_ clearly better...

February 21, 2014

Blah, what have I done this month.

So after the last entry before the big gap, where a kernel developer questioned my commitment to sparkle motion (how _dare_ I not have a day job working on this stuff, and have multiple other things competing for my hobby time), I pretty much stuck my fingers in my ears on email and did other things.

One such thing was writing that new inflate implementation from scratch. Took a week to work out some details (at one point there's a huffman code table used to decode two other huffman code tables, and there doesn't seem to be an obvious _name_ for this meta-table so the documentation talking about it is unnecessarily vague) and then another couple weeks to debug it (fun fencepost error in my input bit buffer, got the static huffman literal annd distance tables swapped, the usual teething troubles). But I could do that on my netbook without even internet access.

The largest gzip file I had lying around was the gnu/dammit patch source tarball I needed to reproduce a bug last year, and wow is the deflate encoding in gnu stuff crazy. Almost every encoding block is followed by an unnecessary zero length literal block, for NO REASON.

I have two wild guesses about the reason behind this crazy:

1) The pipe through gzip works as a streaming protocol, and it needs to flush the output data promptly when the input stops writing (so if you pipe ssh through gzip, when you type a command and hit enter you want it to execute immediately, not when you type another 64k of commands. Is there some sort of nagel in there?) And when tar pipes data through gzip, it's in this mode so treating each short read as an explicit flush, which is marked by these zero length literal blocks to make sure the far end knows.

Of course this is a horrible thing to do for PERSISTENT storage, you want a tarball to be optimized and explicit blocks that store no data are clearly wasted bytes. And you can probably tell this mode is not appropriate when the input isn't a terminal or similar...

2) It's some setup to let you decompress in parallel? Scan ahead in the data to find the start of the next block? You'll have a byte aligned "0x0000FFFF" each literal block. In theory you could have that as a false positive in the data and there aren't per-block checksums to show the next block is valid, but there are a couple ways to deal with that: 1) when you've finished decompressing the previous block check if it ends and where you through the next block started, discard any that get passed by other blocks. So output's a bit serialized but that's I/O bound anyway. 2) There are a number of ways decoding can error out, and that shows it's not a real block. The huffman table symbol decoding needs to add up to a specified length (it's sort of a "hit me/bust/blackjack" thing that should always match the number and not exceed it), and with most huffman tables an encoding can lead off the end of the table. Either way, that's not a valid block.

Next I need to do the compression side...

February 20, 2014

Finally got a doctor's appointment about the way the endless cold went into my lungs a few weeks back. (Might have something to do with inhaling a chunk of potato, but when your chest HURTS on both sides, coughing feels like something snapped in your chest every time, and you're kept awake at night by the crackling noises your breathing is making... yeah, time to talk to a professional. At least by week two...

They gave me a prescription for ten days of Cipro. I remember when that was the superdrug they gave to all the people mailed weaponized anthrax in 2001. Now it's apparently a first line antibiotic they hand out like candy, because everything's immune to the older stuff. (We've been giving 80% of all antibiotics to animals for decades, they tend to stop working after that...)

The list of side effect warnings on Cipro is nuts. It apparently eats your tendons, and if you exercise while on this stuff they snap. It also makes you sensitive to sunlight (in Texas, that should end well). Oh and it can give you peripheral neuropathy, and trigger depression, because it eats your brain too.

Also, if you complain about the side effects on twitter, the antivax people come out of the woodwork and insist you're somehow just recreationally using antibiotics, and would be better of with the pneumomia because not dying of simple infections at a young age is unnatural. (Um, yes? Not getting eaten by lions is unnatural. I'm all for it.)

February 6, 2014

I've been sitting on a reply to James Bottomley until I can answer him in a civil manner.

Don't hold your breath.

February 5, 2014

Yesterday's comment about busybox wasn't because I was looking at their deflate implementation (when I did the "bruce didn't build that" analysis back in 2007, it was just a stale version of gzip). It was to see what command line options busybox users decided were an essential subset.

The sflate approach of doing gzip, zlib, zip, and raw deflate in a single binary is clever, but using "-z" to mean zlib and "-p" to mean zip is strange, and "-l" has an existing meaning in the gnu/dammit version of gzip, and "-L" means output the complete copy of the GPL text stored in the gzip binary because the FSF thinks that's a good idea...

The posix-but-dead "compress" command has more freedom of command line options. Compress fell out of use because some idiot asserted a patent on the compression algorithm it used, thus causing users to flee the protocol. In fact Phil Katz released the "deflate" algorithm he did for his original zip implementation gratis after a lawsuit with the guy behind the ARC algorithm. That's why deflate took over, and where the "appnote.txt" file I mentioned earlier comes from. It was Phil's "by all means, clone this, make it a standard and make ARC a historical footnote" writeup of his own algorithm, which took out unix compress for the same reasons.

Zip itself is a combination archiver and compressor, but unix already had an archiver (tar) that glues files together, and then it ran the result through compress to create *.tar.Z files. So unix needed a streaming compressor that _didn't_ try to map the content to a directory of files, which is where both gzip and zlib came from. Those were two independent implementations of Phil's deflate algorithm with different wrappers: gzip using crc32 checksumming and zlib using adler32, with different magic bytes at the start to identify which format it was. (Zip checksummed each file, and the checksum was part of the directory metadata it stored.) So, three formats, and the fourth is just raw deflate output with no wrapper. The magic bytes identifying each format are that zip files start with the ascii characters "PK" (Phil Katz's initials), gzip starts with the 3 bytes 0x1f, 0x8b, and 0x08, and zlib is crazy (first byte & 0x8f = 0x08, second byte can't have bit 5 set because a "preset dictionary" of data you feed into the compressor WITHOUT PRODUCING ANY OUTPUT is just nonsense in a DATA COMPRESSION PROTOCOL and we're not even trying to support that, and then first two bytes must be divisible by 31 when viewed _big_ endian even through everything else deflate does is _little_ endian Because Reasons. When compressing just use 0x78 0xda, but we can't trust zlib itself to produce that because "crazy", above.)

So when I inevitably write my own from scratch rather than trying to clean the external submission up some more (I try not to dissuade contributors, but this one wasn't contributed, the contributor instead requested I prioritize adding the functionality, without specifying how)... anyway, having "compress" be the deflate multiplexer probably makes sense.

Which sort of implies I should teach it -Z, since the patent's expired now. Hmmm...

February 4, 2014

The more I read the sflate code, the more I just want to write a new deflate implementation from scratch. It's doing the "switch/case into the middle of loops" thing that the original bunzip did.

I also want to reuse the bunzip bit buffers, but reading the deflate spec everything there is little endian and bzip is big endian. Not just the byte order, the _bit_ order. Adding a test for that to the hot path would not be fun. Haven't looked at xz yet, because it's time to go sit in a cubicle again...

Heh. The busybox binary I have on my system (same one I uploaded to the busybox website, I should do a new one but that's in the aboriginal release todo heap) implements gzip and its help doesn't mention -9 but it supports it anyway. (The again the gnu/dammit version supports -3 even though --help just mentions -1 and -9.) Tricksy hobbitses.

February 2, 2014

So kernel releases require aboriginal releases which require toybox releases, pretty much driving my open source development schedule based on an external calendar.

This time, the hiccup is that the powerpc target broke. I bisected it to commit ef1313deafb7 and got back a "works for me, what toolchain are you using", meaning they almost certainly leaked a new toolchain feature into their build that gcc 4.2.1 (the last GPLv2 release) doesn't have.

And checking my email, that's exactly what happened. And it's not just an extra #define, it's "altivec support' which was a large elaborate patch that I can't use under GPLv2.

The long-term fix is to switch toolchains to ellcc, although they're of the opinion that supporting the 2013 C++ "standard" means rewriting the compiler in that dialect so it only builds with compilers less than 18 months old. This is the sound of a project disappearing up its own ass, but it's that or the FSF so no contest really. (You can get away with a lot and still be less crazy than the FSF.)

The short term solution is #ifdeffery in the kernel headers. I should work on that, but haven't got the heart for it right now. Banging on sflate/gzip instead.

February 1, 2014

Happy birthday to me. I am now The Answer years old.

For dinner we went to "Emerald Tavern", which is a newish place right next door to that Sherlock Holmes themed bar. It's a combination game store, coffee shop, and bar, which sounds like they designed the place with Fade in mind. I had a peanut butter and jelly sandwich run through a panini press. Inexplicably, the place does not sell energy drinks.

I got a new phone for christmas. Nexus 5, and a switch back to T-mobile, this time with the "we acknowledge that tethering your phone is something you will be doing" plan. Did you know Dungeon Keeper is now a free download in the app store? (They try very hard to sell you gems for real money.)

January 31, 2014

In celebration of the fact we now have enough pieces of paper to file taxes, I made rice pudding. (As with so much in my life, it's a Hitchhiker's Guide to the Galaxy reference.)

The stuff's pretty easy to make: 4 cups milk, 1 cup dry white rice, 6 heaping tablespoons sugar, pinch of salt. Boil the lot of it slowly, stirring enough to keep it from sticking to the bottom and dissolving the skin back in (basically every couple minutes), until you've run out of liquid (maybe 20 minutes). Add a shot of vanilla extract (it boils out if you do it at the start), and maybe some raisins if you feel like quoting Better Off Dead.

Went over well with both Fade and Camine, and since making the two of them happy is one of my major life goals, I'm calling it a good day.

January 30, 2014

Finally glued together Szabolcs Nagy's "sflate" deflate/inflate code into a single file I can nail onto toybox and then spend forever cleaning up.

Way back in the dark ages (1997, back when I was working on OS/2 for IBM) I ported the info-zip deflate code from C to Java 1.0, by which I mean I read the info-zip code (and the pkzip appnote.txt file) to understand the algorithm and then wrote a java implementation of said algorithm. It worked, in that the C version could extract the compressed output my Java code produced.

But before I got around to implementing the decompression side, java 1.1 came out with inflate/deflate added to the java standard library (implemented in C and thus several times faster than a native Java implementation), so I abandoned it and went on to do other things. But the important thing is that at one point I did wrap my head around how deflate works, so I wasn't too worried about doing one for toybox. It's just one of those "need to get around to it" things like mount, mdev, toysh, or updating the initmpfs code in the kernel so automounting devtmpfs works for that. (The hard part is working up the energy to do more programming when I'm not sitting in my cubicle at work. The hard part at work isn't the programming, it's SITTING IN A CUBICLE. Those things suck the life out of me for some reason.)

Anyway, nsz (his irc handle, he hangs out of the freenode #musl channel) wrote a prerfectly serviceable implementation of this stuff with gzip, zlib, zip wrappers, and one of the Japanese companies using toybox that wishes to remain anonymous after that whole "tempest in a toybox" nonsense (yup, there's more than one, and no they've still never given me a dime) reminded me I said I had plans for data compression stuff and asked me to prioritize adding it. Since I try to be responsive to my users (whether or not they're deploying this stuff to millions of people), it's time to check in what I've got of the help parsing code and switch my chineese-water-torture level of development effort (drip, drip, drip) to deflate.

Step 1: glue everything together like I did for xzcat and ifconfig. Step 2: make it run as a toybox command. My first impulse was to make it "zcat" (ala bzcat and xzcat), but I guess "gzip" is the logical name for said command since that's the command line streaming version and can do both inflate and deflate (ala the -d switch). Historical accuracy says it should be zip since Phil Katz invented the algorithm for pkzip (and documented it, which is why there's so many independent cloens ala info-zip and zlib and gzip and so on), but zip is an archiver that handles multiple files and that's still a todo item here. (Note to self: dig up appnote.txt again when back on the internet, maybe archive.org has it. Actually these days there's almost certainly a wikipedia article on deflate, which there wasn't last time I messed with this.)

There's probably about as much cleanup to do here as there was for ifconfig. Oh well. I need to get the command line option parsing behavior (including the OLDTOY aliases) done before I can cut a release, because people _do_ use stuff out of pending and I don't want them getting too used to "gzip -z" or similar...

(Step 3: remove the camel case.)

January 25, 2014

Eh, what am I working on...The ellcc build broke in binutils, because I haven't got makeinfo installed on the host. I applied the aboriginal patch and it A) ellcc rebuilt everything from the start again, B) binutils then broke for a second makeinfo reason. (The binutils build has a "missing" script that its' configure substitutes for makeinfo when it's not avaiable, but it doesn't work. I hacked it to work, and then one of the binutils subdirectories doesn't use it and calls the nonexistent makeinfo directly. Wheee. The FSF, ladies and gentlemen!) So I need more tries.

Re-poked the kernel guys about powerpc not building in the new kernel, and they said it builds for them with my config and want to blame my toolchain. So I need to debug it myself, and that's blocking an aboriginal release.

I wanted to do a patch to make initmpfs auto-mount devtmpfs when the config option's enabled. If it's to go in this merge window, I should do/submit that this weekend.

The toybox help parsing is being weird, I've tracked it down to one of the strings in the to-sort array having the value of its pointer written into the string. (I.E. there's a redundant write that's doing an extra dereference). Even though I wrote this, I'm boggling a bit. (How did I manage to screw it up _that_ way?) Areas of the code currently have as many fprintf(stderr) lines as actual code.

Need to merge Szabolcs Nagy's flate (deflate/inflate) into toybox because some of the project's japanese users need it. This can share code with the bunzip implementation, which needs a proper bunzip2 front end and not just bzcat...

(Note: the help parsing glitch was a malloc(clen+tlen) needing to have its size multiplied by sizeof(char *). Not the last bug, but a weird one to track down because the effect and cause were separated by a few steps. I should do some screencap podcasts on debugging as part of the cleanup.html series.)

January 20, 2014

Thread on the toybox mailing list about building toybox with llvm (from BSD guys) made me dig up ellcc again, and it's still... annoying. Checking in all the extracted source instead of building from proper tarballs and patches means A) the endless svn checkout of doom, B) I can't easily see what versions they're using, what they've changed locally, or try swapping in my own stuff.

Still ignoring all that, the _real_ annoyance is the way llvm/configure barfs because Ubuntu 12.04.3 LTS has gcc 4.6 instead of 4.7. It's specifically checking for gcc 4.7 at configure time and refusing to build if it's older than that.

Obvious step 1: find that test and COMMENT IT OUT. Because refusing to support last year's toolchain is just dumb.

In other news: 3.13 kernel is out. Muchly todo items in the near future. (One of which should probably be a patch to make initmpfs mount devtmpfs on /dev when that config symbol is enabled. Because putting /dev/console in your initramfs cpio file is just sad. I should do a doc patch noting that while I'm at it.)

January 19, 2014

Torn about kernel documentation stuff. There's so much I want to DO, but it's hard to care until the kernel.org guys give me rsync back to update kernel.org/doc, and it doesn't look they're capable of that anymore.

Possibly I should hand it off to somebody else. But who? I basically do monthly roundups of patches that fell through the cracks, and the occasional reply. Even my attempt at updating the maintainers entry to exclude the devicetree stuff (which is well maintained by multiple other people and just clogs both my inbox and the kernel doc mailing list with enough noise to render it useless) got bikeshedded enough that I lost interest in pushing it.

That's Documentation/ for you. Even when I grab pieces quickly, it's all about stuff that's got other maintainers so other people put it in through their trees (without telling me) resulting in collisions. Plenty of patches go in to Documentation that never went to linux-kernel or me anyway. I've taken to ignoring anything that's part of a patch series, because it'll go in through somebody else's tree. (Maintaining this directory is janitorial work at best.)

What I really want to do is reorder it all, such as putting the arch directories in an arch/ directory. But last time I tried that, oh the bikeshedding...

I suppose I should make another attempt to care.

January 17, 2014

The internet's a weird place. Thirty years ago, millions of people had strange side projects in dusty notebooks in the back of a closet or under a pile of papers on a desk. Something they spent hundreds of hours on at one point, and then got buried in day to day rush. Maybe they occasionally came back to it, showed a couple friends, but mostly nobody else remembered it existed.

Now that sort of thing tends to go on the net, where it can sit fallow for years before somebody else bumps into it and goes "hey, look at this thing somebody put hundreds of hours into, lemme reference it in this new thing I'm doing."

This is the real power of the internet. Harnessing people's junk drawers. It's still horribly indexed, but not _impossibly_ indexed. A file in somebody's house they forgot they even did isn't something I can stumble across. A five year old web page comes in handy all the time.

January 15, 2014

So we want to collate help entries that are: 1) enabled, 2) have the same usage: name, 3) in declaration order. (This means we don't have to parse depends.)

If the first entry in each usage: string is an [-option] block (single dash, no space), collate the blocks and alphebetize. For the remainder, put the later config symbols' usage first on the collated usage line, because the non-option arguments tend to be in the first bit and should remain at the end of the usage line.

Getting the spacing right's a bit of a challenge, but then string parsing is always horrible in C. (It's not really suited for it. Beats shell, though.)

January 14, 2014

Huh, looks like Howard Tayler (the Schlock Mercenary guy) has swapped Penguicon for ConFusion. Makes sense.

Probably takes a bit of backstory to explain _why_ it makes sense.

Many moons ago, Tracy Worcester and I created a convention called Penguicon. The third year we were both distracted (her with thyroid cancer, me by replicating the whole thing in Austin as Linucon), but the first, second, fourth, and fifth years involved a lot of hard work from both of us. I recruited all the guests of honor those four years, and each year I tried to introduce something new. It was a dumping ground for "wouldn't it be cool" ideas. The "food track" was inspired by the Minnesota Munching Movement my sister's involved in (she's been behind the scenes at ConClave since before Minicon became Microcon; less so with 4 kids but I went with her to her convention during my stay in Minnesota). Liquid nitrogen ice cream was something I saw in the General Technics party at Millenium Philcon, and Mirell and I worked out how to reproduce it for Linucon. For panel recording I bought 5 mp3 lecture recorders and taped 'em down to tables in the panel rooms (presumably they have something better now).

Of course lots of things Tracy's crowd took and ran with, such as when I posted a youtube video of local Austin guys with musical Tesla coils to the penguicon-general mailing list and they found their own local version to have a concert... And of course there was plenty of stuff I had no hand in at all: the swordfighting workshops, turkish coffee, scotch tasting, chocolate tasting, brazilian beef, whatever that 'brick panel' was... Tracy and her friends poured in tons of ideas, and they were the locals who actually _ran_ the convention. The only reason I could build Penguicon higher each year is that she and her friends were holding it up. (I regret that I _didn't_ jump on the "we should invite the mythbusters" suggestion back around year 2; at the time I'd never heard of them. My "why don't you go do that" was sincere and heartfelt: I wasn't against anybody else pulling in more stuff they found cool, I just wasn't motivated to go research people I'd never heard of. I haven't had cable since last century and they didn't have a big net presence yet.)

The local Michigan convention scene had 3 conventions, one of which had only died recenty (ConTraption, due to political infighting on the part of its con staff), and Tracy used their timeslot and mailing list to launch Penguicon. Tracy had previously been a con chair of the larger of the other two conventions, chairing "ConFusion 19100" (the Y2K version of the convention Howard's going to now). Why does it make sense that Howard rebased from Penguicon to ConFusion? Because of Mr. Penguicon.

Unfortunately a guy named Matt Arnold went to Penguicon 1 (his first ever convention) and got fixated on Penguicon (to the point he lost more than one job over it), and in the best Igor from Dork Tower tradition went "It must be mine!", I.E. the convention had to becom all ABOUT HIM. For example, I did the website for the first year and made sure there was a "heartbeat blog" letting everybody know that we were still hard at work and cool things were coming. The second year he created the "minister of communications" position so every public communication from the convention had his name attached to it, and he was the one getting interviewed on local television about it (although he wasn't the con chair until after I stopped attending).

The real problem is that in the process or taking it over, taking complete credit for it, making it about himself, and turning himself into Mr. Penguicon... he had to eliminate all competention for the title, starting with the actual founders. During the the third year where Tracy was sick and I was busy back in Austin, he started a whispering campaign against us and we didn't even notice. I wasn't local and had largely moved on to other things, but introducing LN2 ice cream and panel recording were both things I had to do with no help from the concom (I just showed up and did them), because any idea I proposed was automatically blocked. And I only bothered through year 5; the following year (last last year I attended) a group of Matt's friends stood around in a circle chanting mocking rhymes about one of Tracy's proposals. (Tracy and I weren't the only ones, everybody who might conceivably overshadow Mr. Penguicon had to be written out of history so Mr. Penguicon could stand alone as the creator-god.)

The reason _that_ was a problem for the convention is that in the entire time I had to deal with him, Matt never actually had an idea. He did extensive social engineering and took credit for other people's ideas, but I think a big part of what impressed him about Penguicon was that it fundamentally wasn't something he could have done himself. Tracy and myself never viewed Penguicon as irreplaceable, if all else failed we could do it again from scratch. (And I did, with Linucon, but couldn't sustain it without the support network Tracy had in Michigan. I needed to go work at an existing convention for a couple years to recruit concom. Moving to Pittsburgh during what would have been year 3 didn't help. But those were learning experiences, not blockers. I haven't done it again because I'm busy with aboriginal and toybox and being married and staying properly employed to pay for an actual house... but mostly because I already _did_ it. Been there, done that, twice. Moving on...)

But to Matt, Penguicon was magic. Combine that with some deep psychological need for the spotlight, and it meant he wanted to take credit for this thing that had impressed him, so Tracy and I had to go, as did anyone else who might conceivably take the spotlight away from his starring role in Matt Arnold's Penguicon by Matt Arnold. This incessant politicing took all the fun out of it for me, so I stopped pushing new content in after Penguicon 5, and just tried attending for a year: the year where Matt wasn't technically chairing but invented the "assistant con chair" position for himself (we hadn't had such a thing before, year 1 I stepped back ~3 months before the event so Tracy could chair because you need one point to the wedge). Matt organized opening and closing ceremonies so he had twice as many lines as the actual con chair, and when he announced he would be con chair the following year, I didn't bother coming back, and haven't been back since.

Since then I've mostly tried to ignore it. Bruce Perens trolling busybox may have helped dissuade me from caring _too_ much what happened to it after I left: if other people are having fun knocking down a sand castle I helped build after I went home, it's none of my business. I wasn't boycotting it or anything, I didn't stop Fade from going on her own the year Matt chaired to have fun hanging out with her friends in the area. I vented about it a bit while stuck in an airport with nothing better to do (that link has lots of links to sources for things I've mentioned here, because I was bored and sleep deprived with internet access; haven't bothered this time around). But mostly, Penguicon just hasn't come up much in the past 5 years.

Sure I heard rumors of trouble from other convention organizers next time I passed through the area, but that was a "Don't you want to fix this? No? Oh, ok then." professional courtesy sort of thing. I actually got such rumors from multiple angles: one of our guests of honor in one of the last years I attended was Randy Milholland of Something Positive. (Randy did the "Gurps Marriage" book cover that Steve Jackson used when he officiated at our wedding at P5. I still have it, in a box, signed by both of them. Yes, I abused the fact I was still helping arrange the panel schedules to get us a private panel room for an hour. We had to move it twice, once because it was scheduled opposite an Elizabeth Bear panel Fade wanted to go to, and once because it was scheduled opposite a Charlie Stross panel that Steve and Eric Raymond (best man) wanted to go to).

Anyway, Randy returned to the convention as a vendor in later years, and the first I heard that Penguicon might be going stale from an attendee point of view was when he tweeted that it had become "just another con". (We had a tradition of trying to give our guests such a good time they'd come back on their own time. Hence the "nifty guest" designation, a lot of which were previous Guests of Honor who got lifetime free admission to the con if they came back to attend.)

Howard Tayler was another perennial Nifty. I was a fan of his comic from early on, and back when he still worked at Novell we tried to use the technical nature of my half of the convention to convince his employer to fly him in on their dime, since Novell had _just_ bought SuSE. (Year 2 I think? Year 1 was in the hotel with the leaky ceiling, the Dick Van Dyke or some such. I don't remember him wandering around that building. Year 2 was the year Eric Flint came to talk about the Baen Free Library, and Howard gave me a Novell Linux shirt that I think I was wearing when I went to Flint's panel, so that sounds about right? It's been a while...)

Howard made a bunch of friends and new fans in Michigan, and came back to visit them each year. (The first convention he attended as a full-time web cartoonist was my Linucon 1 in Austin, which was a few months before Penguicon 3, I think?)

So that's the context in which Howard going to ConFusion instead of Penguicon is a "huh, makes sense". Given that ConFusion is in the same city 3 months earlier, it's not a big stretch to go to that instead of going to Penguicon. When Penguicon started we were pulling in lots of new people who'd never attended a science fiction convention before. (Aegis Consulting, the swordfighting people, originally found out about us because of their Linux dayjobs.) But now? If you go to ConFusion, you can see all the same people. Skipping Penguicon makes sense.

Maybe I should pencil in ConFusion next year. Sounds like fun.

January 12, 2014

Toybox uses the menuconfig help text for its --help output. For the longest time a python script was harvesting the kconfig data to produce generated/help.h, but A) python should not be a build-time dependency (and the hacks to work around that are brittle and crappy), B) lots of commands have more than one config symbol and it hasn't been collating them.

A while back I decided to rewrite them in C, but haven't had time to actually do it. I'm too tired when I get home to get much done, so I'm back to getting up at 5am to try to steal a couple hours before work.

The first step is just writing a C parser so I can discard the python. That bit's pretty much done now.

The second step is to come up with the list of commands with config stuff to merge, which can be found via:

$ grep -h usage: toys/*/*.c | awk '{print $2}' | sort | uniq -d

Giving cd, cp, df, ls, mke2fs, mv, netcat, and sort.

Step 3: look at the config entries of that and work out rules by which their text can be merged.

(Note: I want to delete everything but the README out of generated/ and right now it's listing each file to delete. Possibly I should just move the README to the code walkthough on the web page? Or mark the README read only. There don't seem to be a lot of exclusionary wildcards...)

January 11, 2014

My current email workflow involves starting thunderbird via ssh -X, and every time I do so it goes:

(process:30776): GLib-CRITICAL **: g_slice_set_config: assertion `sys_page_size == 0' failed

Doesn't seem to hurt anything. There's a reason I'm against asserts. After a few years of python programming, the correct approach to C coding (other than code inspection) is to have a good regression test harness to show that the result actually does what you think it does.

I need to fluff out the test suite for toybox. But as of yet, I still haven't caught up on the cleanup writeups...

January 4, 2014

The weekend!

Catching up on the cleanup writeups for toybox.

January 2, 2014

I was just getting over the cough that's been plaguing me since December 2, and then today the Cedar pollen started up.

By our powers combined we are, seriously annoying!

*cough*