Oct 25, 2005

So I'm ready for a release, but the webserver on Eric's machine is still down. Sigh. (Email's back, web isn't.) I need to get get my own server installed and up and running anyway. I should work on that today. A real-world use case for Firmware Linux...

Let's see, ssh, web, dns, and email. Dropbear, busybox's built-in httpd and postfix seem like obvious choices, but what to use for dns? Not bind, not oak, not djbdns... Hmmm...

Oct 24, 2005

So a longstanding problem with funneling the build through UML is ctrl-c doesn't do anything. This is because UML is running the build on /dev/console, which has no controlling TTY, and running it as PID 1, which has the kill signal blocked anyway.

So I've made a dumb little Advanced Init Substitute I'm calling outnit, which basically forks and runs its arguments as a process attached to /dev/tty0, then does the whole zombie reaping thing (and calls reboot() if the process it forked exits.) The upshot of all this is, ctrl-C works.

Unfortunately, feeding arguments to init through UML turns out to be non-trival. In theory, anything unrecognized on the kernel command line gets passed through as arguments to init. In practice?

UML running in SKAS0 mode
Checking PROT_EXEC mmap in /tmp...OK
Unknown boot option `/home/landley/newbuild/firmware-build/sources/scripts/1.0-tools-umlsetup.sh': ignoring
System halted.

Beautiful. Turns out if you have a period in the argument, it gets rejected. That is a broken heuristic, and I've submitted a patch to linux-kernel.

Oct 23, 2005

The full development tools build is 18 megabytes. OUCH. The previous full development tools build (based on gcc 3.3 and a correspondingly older binutils) was about 12 megabytes. I knew FSF code tends to bloat with time, but an extra 6 megabytes is just PAINFUL.

A little of that is because my build script didn't delete the lilo source code out of tmp before making the squashfs. But that's not even a megabyte of the compressed total. It installed megabytes of info pages (hello FSF: nobody, anywhere, uses info for anything, except you), and all sorts of internationalization crap that I forgot to tell it not to. So I can strip it down a bit from where it is, and plan to. Another chunk is /usr/include, but that's actually needed and is highly compressible text...

Still, 18 megabytes. Ow. The base system with no developer tools is 2.5 megabytes, so we're talking a good 15 megabytes for the toolchain. At least 10 megabytes of that has to be gcc and binutils. Compare that to tcc, which is about 100k...

Oct 22, 2005

Spent most of the day in San Antonio, doing the tourist thing with Fade's parents. Continue to be unimpressed by the Alamo. Email still out.

Now I'm fiddling with stage 1.1 (the /tools build). This is a pain and a half.

Ok, intro to toolchain building. Toolchain building sucks because it has three times as many dependencies as normal building, and two of those three are mirror images that it's really easy to confuse. (The compiler and linker create binaries linked to a set of shared libraries and a library loader. But the compiler and linker also _are_ binaries linked to a set of shared libraries and a library loader. Yes, a necessary cross-compiling step is creating a gcc that runs against glibc but creates binaries linked against uClibc. And this is glossing over the whole idea of where all the standard header files live.)

This would be easier if there was consistent terminology for it, but yesterday I had to give ldd dumps of the various stages and point. "That bit is what's currently wrong." You can keep it straight in your head if you can think visually, but how do you document or take notes?

The other fun thing is when you're building on a system that is different from the one you're going to be running the software on. I've growled about libmudflap before, which doesn't keep this straight. I recently found out you can just rm -rf libmudflap out of the gcc sources, and the gcc build seems ok with that. Yay.

In theory binutils and gcc have grown some new potentially useful config options, which I intend to play with in future. In theory, I can tell binutils configure --with-lib-path=/tools/lib, and then tell gcc configure --dynamic-linker=/tools/lib/ld-uClibc.so.0 --prefix=/tools --exec-prefix=/tools, possibly specify --oldincludedir (I don't know what that does yet)...

Of course it's not distinguishing the environment these tools will run in from the environment the binaries produced by these tools should run in. If I can just consistently get the second meaning I can control the first by making temporary tools and building a final set of tools with those temporary tools. (It's what I'm doing now, actually. ONly instead of specifying stuff cleanly on the command line I'm performing surgery on the source code with sed and grep.)

Add in the fact that juggling around the order of things can easily break the _other_ things that got reordered, even if you don't think you changed them. (I broke the uClibc utilities build by doing a "make clean" after the library build and then trying to build the utilities. It cleaned the temporary headers out of the source's headers directory, and the utilities build needed those.)

Fiddly. And I miss having working email. Busybox less is segfaulting, and I haven't got time to track it down myself right now...

Oct 21, 2005

Busy couple of days. Build compiling != build working, of course.

I mentioned I need switch_root, becaue pivot_root doesn't work on ramfs in 2.6.13 (and it was a bug it ever did in the first place). And I have most of one coded up, to submit to busybox if my mail ever gets working again. But that's a side issue: to debug the build I can just cd/mount --move/chroot. Deleting the old stuff out of rootfs before it becomes inaccessible just saves space is all. (P.S. If you mount something on /, you can't ever umount it later, short of a reboot. Fun.)

Fiddled with the initramfs a bit, got things working again (took a while, not the world's greatest debugging environment). Got the build generating one that works, now with msh instead of ash even.

I also fiddled with making things more configurable, so you could _not_ build the development toolchain and stuff. Built a minimal system with just uClibc and busybox, but it _didn't_ _work_. None of the stuff would execute. Suspicion? Library paths were wrong. Huh, try running ldd. That wouldn't execute either. Try "mount --bind / /tools" and run it again, and yes ldd is linked against /tools.

So the linker paths in gcc 4.0.2 are _all_screwed_up_. Big surprise. If I don't build development tools, everything is linked against /tools/lib (which isn't there on the final system). Even if I _do_ build a new toolchain, anything built before the new gcc is on the system (including uclibc's helper utilities like ldd, and all the bin86 binaries like ld) are still linked against /tools. So building a toolchain to go in the firmware doesn't actually fix the problem, just makes it less obvious...

My little 3.3 trick tweaking the spec file at the end of the /tools build so the toolchain starts building stuff linked against /lib? Doesn't work in 4.0.2. That specfile no longer exists. I go find the new specfiles, and feed them the correct information. No effect, whatsoever. It turns out that the paths are hardwired into the binaries, and nothing short of rebuilding those binaries will change the paths.

Time out for our regularly scheduled swearing at any code ever touched by the FSF...

I want a /tools directory that internally depends on /tools/lib but _generates_ binaries dependent on /lib. This is not that esoteric a requirement. But to get this, I need to build one set of tools that link against /tools/lib and then build another set of tools that link against /lib.

Of course I'm already building both binutils and gcc twice during the /tools build. The existing reason for that is the first build works fine for building more binaries to live in /tools, it just doesn't run out of /tools. (It runs against the host system's libraries, probably glibc.) I.E. (pseudo code):

/tools/bin/gcc hello.c -o hello
ldd hello
	/tools/lib/ld-uClibc.so.0
	/tools/lib/libc.so.0 => /tools/lib/libuClibc-*.so
ldd /tools/bin/gcc
	/lib/ld-linux.so.2
	/lib/libc.so.6 => /lib/*/libc-*.so

I want a gcc binary whose libraries look like the hello world binary's. Hence the second build, built with the first.

But if I move that second build to the _end_ of the /tools build, the build can generate binaries whose run-time dependencies live in /tools, and will thus execute out of the chroot environment. The fact that the toolchain itself won't run anywhere but the host system is fine. Then at the _end_ of the tools build, when I don't need to add anything more to /tools itself, I can tweak the source directory so any binaries the new tools generate now point to /lib, and build these tools with the ones I'm about to replace so that the generated toolchain binaries depend on /tools/lib. So I get:

/tools/bin/gcc hello.c -o hello
ldd hello
	/lib/ld-uClibc.so.0
	/lib/libc.so.0 => /lib/libuClibc-*.so
ldd /tools/bin/gcc
	/tools/lib/ld-uClibc.so.0
	/tools/lib/libc.so.0 => /tools/lib/libuClibc-*.so

Gotta be worth a shot. Still _way_ more finicky than I'd like, but then gcc always was. It only _ever_ works by a series of carefully engineered coincidences. (Too bad the output of tcc is so crappy from an optimization standpoint. And it doesn't do c++...)

Oct 18, 2005

Ok. Build is back together, and everything is now compiling. (The dropbear install is failing, but that's just an application not in the base OS anyway. It's trivial breakage: for some reason when I try to create the scp symlink it says it's already there. Not a big deal, commented it out for now, fix it later.)

And that means I should soon have a build that's more or less ready to snapshot and call 0.9. Tons of to-do items left, of course. The biggest is that it's still only creating the UML version. Creating the bootable version is easy (it's the same packaging on a different file), but I haven't made an installer yet, which renders such a bootable version noticeably less useful. I dragged the server from the upstairs closet and intend to inflict this thing upon it, possibly even later this evening.

Of course this would be the perfect time for Eric's server (the one my website and email are on) to get cracked and start spewing spam, so naturally that's what happened. (He was running sendmail, so it's not much of a surprise. His machines are all Fedora and he wants to install Fedora Core 4 on it, but nobody's been able to download FC4 CDs with good checksums. The current theory is that the published checksums are wrong.) In any case, it also means that my website and email are both down right now, until the machine in question gets reinstalled.

So I stare at the server next to the table and go "hmmm"... It'll need a webserver, but I've configured apache before and busybox has one too. I've needed a second nameserver for a while anyway (I've used bind before but a _small_ one would be nice, and djbdns has the downside of being written by Dan Bernstein, which rules that out...) Gotta get dropbear working of course, but that's just fixing the install. The mail server's going to be tough (postfix is the obvious candidate, but I've never set up a mail server before). And of course get the kernel install setup working...

I shouldn't hold up the 0.9 release for this, though. It's finally downloading all the source code dynamically from the right websites (today's snapshot of busybox finally has everything in it I need), so putting up a source tarball (containing my build scripts, a few small patches, and some directories full of symlinks for organizational purposes) is only about 42k, gzipped. If the website was up, I'd just post that as-is as soon as I get a working UML image out of it. (Well, after fixing the dropbear build.)

The fact that gcc can't build in 128 megs or ram is a bit of a pain, too. I still don't know if it's a real gcc issue or a UML memory leak. If I can't easily make it go away, I suppose I could feed it a block device to use as swap. Dunno what the performance impact of that would be, but it's unlikely to suck much more than taking a quarter-gig from the host system when you haven't got it...

And I should mention on the main webpage that the sucker's designed to give you a quick status report via grepping for "===" on the output. (I.E. if you've redirected it to a file, it prints a === line before each major checkpoint. Although I suspect some are missing...

Ah, of course. Build made it to the end, created a UML image, and it failed to boot. Why did it fail to boot? Because I upgraded to 2.6.13 and pivot_root on rootfs now fails. Which is a good thing, and I knew it was coming, but it means I have to incorporate switch_root into busybox before I can get a working system.

Right. Existing to-do item moves to the front of the list...

Oct 17, 2005

The stage 2 build broke because my cut and paste of the libiberty fix smashed the tab into spaces. (Patching a makefile, I need to insert a tab. Busybox sed doesn't support \t. It's a to-do item.) Turned the spaces back into a hard tab, right.

Now the gcc build is barfing because busybox awk can't parse "gcc-4.0.2/gcc/opt-functions.awk". Wheee. That's one I can't easily fix myself (awk is big and complicated). Posted about it to the list to see if the awk maintainer's around. I guess that's it for the night, unless I want to clean up the rest of the source tarball downloads...

Oh that was _prodigiously_ stupid. You know how the "unwrapped" 2.2 build involves --bind mounting the source code into the directory? (And it's not a read-only mount because 2.6 doesn't support read only bind mounts in the 2.6.10 kernel?)

Guess what I just did!

Yup, forgot to umount sources before doing a rm -r in the temporary build directory. (And _this_ is why the unwrapped version is dangerous!) Right, I have a backup from... the 15th. Beautiful. Anything newer? That's from the 12th...

Right. Recreate the last two days' changes, and most of the work I was doing was in busybox or long debugging sessions that resulted in fairly small fixes once the problem was known. Not too bad. I saved when I got the tool build working unwrapped...

Oct 16, 2005

So I'm reading through the source code to bb_getopts_ularg. Luckily, vodz seems to speak C.

The limiting thing is it's a wrapper around the libc function getopt_long...

Later that day...

Ok,the busybox bug was that the getopt string contained "-1:", which meant it always needed at least one unparsed argument left over. This may be true for create, but create already has the "cowardly refusing to create empty archive" test and message. For extract, "tar xvjf thing.tbz", f tages an argument so there are no leftover unexpected arguments and thus it dies. I.E. I didn't break it, it didn't work before my patch.

Apparently, gcc no longer builds in 48 megabytes of RAM. (UML's out of memory killer triggered on genattrtab. Fun. Upped to 64 megs.... Good grief, 64 megs isn't enough? Ok, gcc 4.0.2 is a _pig_. How about 80 megs? Sheesh! 96?

Ok, now the OOM killer isn't triggering on genattrtab, but on the the gcc invocation _after_ it. Unless this puppy's shelling out to gcc or we've got some weird asynchronous thing going on, it sounds like a memory leak in UML. (There was one of those mentioned recently, but I'm not suing 3-level page tables, am I?) Right.

Ok, first off, let's try giving the bastard 256 megs of ram and see if _that_ makes it happy. (Since UML is a normal userspace program the parent system can swap it out. Performance may suck, but we'll see...

Yeah, it builds with UML allocating 256 megabytes of RAM. That's just _sad_. I'll leave it for now and see about trimming it down later. (Either UML is leaking or gcc is an _amazing_ pig.)And the stage 2 build has a couple of non-obvious breaks. Ok, do the same run 2.2-* outside of UML trick I did to debug 1.1. (And top's ability to sort by memory usage would be cool if I was paying attention at the right time during such an unwrapped build...)

At midnight a snapshot of the fixed busybox goes up so I can upgrade the build process to auto-download something rather than the handcrafted tarball I'm using at the moment. Meanwhile, Fade and I are going to see the Nightmare Before Christmas sing-along at the Alamo Drafthouse in an hour, and we're biking so we should leave now...

P.S, wrote this earlier:

I need to document the "how to bypass UML" build bit. I built /tools ok with gcc 4.02 before because I did it outside of UML, and my laptop has half a gig of ram with swap space on top of that. The UML setup only has as much memory as I tell it to allocate, and I haven't configured any swap for it.

To do the /tools build, you need to do one thing as root (make a symlink, /tools, pointing an empty directory), and then drop back to your normal user and run sources/scripts/1.1-* which should build the /tools directory. Doing this does briefly require root access, and you have to be running a recent 2.6 kernel, but it's noticeably easier to debug. (And runs a bit faster.)

The 1.0 stage is the UML init script that sets up the environment for 1.1, and 1.2 packages /tools into tools.sqf.

Now running stage 2 outside UML, that requires root access in lots of places. (It has to be run in a chroot environment, it calls mknod, it changes the ownership of files...) But what you can do is something like:

mkdir -p sub/{tools,sources}
cd sub
mount --bind /tools tools
mount --bind /path/to/sources sources
chroot . sources/scripts/2.2-*

Oct 15, 2005

So gcc 4.0.2 seems to be building now, and compiling stuff against uClibc-0.9.28. But to install uClibc-0.9.28 I need to patch busybox to have an --exclude option (because the uClibc install uses it). In theory, I have now done so. In practice, I did so late last night and intended to test it in the morning, and between then and now Vodz (the Russian busybox developer who communicates via a babelfish derivative) "fixed" it. And since he's the one who wrote the code and I mentioned in the checkin that I couldn't understand his babelfishese documentation, I heartily approve of him doing so. But:

Now busybox tar doesn't work at all. Even trying to do "tar tvjf file.tbz" it dumps a usage message, meaning it fails to parse the arguments. I don't know if vodz broke it or if I did, and I'm waiting for him to take another look at it before I go in and do more damage, because this is his area of the code and not mine.

After several consecutive days of working on this, I could use a break anyway. I've updated the index.html file in preparation for the new release, and I just sent my friend Mark and my fiance Fade off to see Mr. Sinus Theatre professionally make fun of the movie "Lost Boys" (I already saw them do this last time, and it was quite good). I think I'll go biking.

Oct 14, 2005

So gcc 4.0 has apparrently grown support for uClibc, using the --target=i386-linux-uclibc option to ./configure. Except it considers this cross-compiling (which I suppose it is) and wants to find binaries named i386-linux-uclibc-ar and such.

I'll think I'll try to get the "sledgehammer and soldering iron" approach working again first, since their automated method can't handle relocating /lib and /include under the /tools directory anyway. (Choc-full-o hardwired assumptions, that's gcc for you. So whatever I do, running sed against their source multiple times is still necessary.).

Right.

So there's a fairly delicate sequencing issue. Building binutils 2.14 and gcc 3.2 used to require symlinks from /tools/lib and such to the parent system's libraries during the build, so it could find things during the build. This was because the cross-compiler ran on the parent system, and thus used the parent system's libraries. The new uClibc I was building wouldn't necessarily run under an older kernel version than the one described by the headers package I gave it.

Now that I'm using UML, the tools build is done under the new kernel, no matter what the system is actually booted with. So in theory these tools can run against uClibc and everything should be ok, and this means that if I move the uClibc build before the binutils and gcc builds, the libmudflap thing might magically stop causing a problem...

And it got me past that bug to a _new_ bug. Woot.

Nope, false alarm... Grrr...

Later that day...

Right. Another fun little issue, uClibc yanked a symbol called dl_iterate_phdr and gcc 4.0.2 wants to use it. I posted about it to the uClibc mailing list, but in the meantime I've just patched the gcc source to turn the relevant test into if(0). I think it's for debugging anyway (the symbol does stack unwinding, which shouldn't normally be needed for anything by C).

And apparently uClibc doesn't successfully export __libc_stack_end, which gcc really seems to want now. But Linux From Scratch has a patch for it! Woot. (I'm told this patch is wrong. But it builds, and seems to work...)

Ok, _that_ was a pain. But the base toolchain seems to actually be working now. (Dunno if it's building stuff properly, but compiling the rest of the system should give me a hint...)

And gcc 4.0.2 breaks bash 2.05b. The fsf doesn't even like the fsf. (The error is that a goto: label is the last thing in a function, except for some #ifdeffed out code. And instead of calling this a warning, they made it an error. Solution? Add do {;} while(0); right after the label to TELL THE BROKEN COMPILER TO SHUT UP. (It is not unclear what the program says to do. What's wrong with goto the end of the function instead of return? A compiler should not stop, sit down in the middle of the road, and throw a tantrum over something like this. Sheesh! Did they not NOTICE they broke bash 2.05b? What, "everybody upgrade to 3.0"? Not happening, I'm going to write a new shell for busybox to replace bash, and until then I'm sticking with the old verison thanks.)

Ok, 1.1 built to the end, now let's see about the rest of it...

Added --exclude support to tar. Didn't quite do it right, but the guy who wrote busybox's option parsing code doesn't speak english, and uses a russian to english translator instead. The result is seldom fully comprehensible.

Huh. The hardwired paths in collect2 went away. On the one hand, this is a positive sign. On the other hand, now I have to figure out where /lib and /usr/lib are being specified now so it doesn't include both and do bad things since I collapsed them together...

Oct 13, 2005

Huge amounts of progress recently. Got everything building under the new 2.6.13.2 UML (with -skas0 mode, much faster) and decided to go for broke. I've now gotten about 80% of the packages upgraded to the most recent version and it's automatically downloading the new ones from the relevant websites (and checking sha1sum, and keeping the old copy around to use next build...)

The automatic downloading of source means I only have to put up a <100k tarball for people to download, not the 90 megabyte monstrosities I've avoided putting up because I don't want to suck up too much of Eric's bandwidth. Once I get the rest of them converted I can have a serious _release_.

The packages left to migrate over are mostly easy, and include uClibc (I'm still at 0.9.27, need to upgrade busybox tar to understand --exclude for the 0.9.28 to install its headers properly), busybox (using a daily snapshot of the 1.1 tree, just need to pick one recent enough), lilo/bin86/nasm (haven't gotten around to it yet)...

And of course the one I've spent too many hours on already, gcc. Ow.

Ok, it's my fault for trying to move from 3.3 not just to 3.4, but to 4.0. But there's also the fact that the people who wrote this need to be HARMED. Yeah, fixincludes is pointless braindamage you have to disable to keep it from screwing stuff up, and the one sed invocation to do this became three because they're now installing a README file in includes warning people that they broke your headers and it's too much effort for them to actually care. Right. Fine. Three sed arguments and that's toast.

Then there's the fact I have to adjust the paths for the /tools step, because despite having specfiles they hardwire the suckers into their source code anyway (well of course, this is the FSF we're talking about). So I have to look at their actual source code.

I have now been deeply and thoroughly reminded why nobody should ever look at any source code the FSF has ever had anything to do with, ever. Not unless they've done something to really really deserve it, and even then you should wait a bit to see if there's a last minute phone call from the governor or something...

I'd go into more detail but I literally have a headache now, and am going to bed. I'm going to try to finish the conversion in the morning. I suspect not having done the darn gcc and collect2 patches properly last time is the reason the tools.sqf build was about the same size as the final system, despite having less in it. (Redundant linker paths cause gcc to go nuts and create binaries as big as if they're statically linked, but still needing the libraries to link against. Remember how I squashed together /lib and /usr/lib? Sigh...)

Later that day...

The previous entry was written just after midnight, and this one is just before. It's been about a day.

So, imagine The Tick (the big blue nigh-invulnerable one) saying "Heh, those wacky gcc developers" (ala "those pesky ninjas"). This is about how I feel, as an alternative to going postal.

There's something called "libmudflap", a subdirectory under the gcc build. I don't know what it is, or what it does (no README), and I really don't care. What I do know, is that in 4.0.2 its ./configure script tests to see if the compiler that just got built can create a runnable a.out file, which it can't because it's trying to dynamically link against a c library loader that i haven't installed yet. i'm fairly certain 3.3 didn't do this.

This configure script is creating a log file. it's also deleting the log file at the end of the run, whether or not it was successful. It was easier for me to edit the configure script and stick in printfs (well, echoes) than it was to figure out how to prevent it from deleting the log file.

Now _that's_ good design...

Oct 11, 2005

Since I finally had a decent reproduction sequence I sat down and debugged the problem last night, of course it wasn't a UML problem. UML was merely triggering a long-dormant bug in busybox that had survived 22 months without biting anybody except me.

The bug (I fixed it) survived for 22 months (since Nov 27, 2003) by A) only occurring under uClibc (not glibc), B) only occurring under a kernel newer than 2.6.11 due to _what_ random garbage was left on the stack, C) and then only occurring intermittently.

You may have been wondering why this project is proceeding so slowly? In addition to being entirely done in my spare time, it spins off so many tangents that I wander down, and can get distracted by for a long time before wandering back to work on FWL. I've got a dozen distractions: Linux From Scratch, BusyBox, uClibc, User Mode Linux, Matt Mackall's 2.6-tiny tree, squashfs, dropbear... I haven't even got X11 in this thing, can you imagine what kinds of tangents I'll wander down once that sucker comes in?

I want to get a release out other people can use so I have people poking me to draw my attention back to the central project. Some reason to set up a better web page and a mailing list and a subversion repository (or maybe even Mercurial). More tangents to wander down, of course... :)

It'd be different if this is what I did for a living, but it isn't. Work is over in the "real life" category taking time away from working on this.

By the way, I'd just like to take a moment to say how much I hate "vim". It does gratuitous synchronous I/O I didn't ask for and can't seem to stop, and sometimes I have processes running in the background that keep the disk I/O bound. And when this happens, my attempts to edit a file in vi will spontaneously pause for upwards of _30_seconds_ while it tries to sync its' stupid log file, and then behave normally for another 15 seconds or so, and then freeze again. (Yeah, I looked up how to make it not keep a log file once, but then it has no "undo" functionality at all. What I want is "don't call fsync() unless you _mean_ it", which doesn't seem to be an option.)

Yes, it's currently doing this to me. The busybox implementation of vi doesn't do this. (Then again, I'm not sure the busybox version has "undo"...)

Oct 10, 2005

Many things going on behind the scenes that I haven't updated this to mention. Let's see.

Linucon happened, Eric and Cathy visited for a bit, and so on. (I.E. "real life".) Lots of busybox stuff too. (It now has a "less" implementation, so I can drop that package once I upgrade. I'm putting together a 1.0.2 release and then trying to lock down 1.1-pre1 in hopes of getting busybox 1.1 shipped in January.)

My blocking problem with Firmware Linux has been the inability to upgrade beyond a 2.6.11 kernel without UML going wonky, but if my two most recent posts to the UML-developer list (here and here) don't let them reproduce the sucker, I don't know what will.

And once _that's_ working, I can unblock all sorts of stuff...

Sep 17, 2005

I spent a couple weeks working on other things (busybox and Linucon come to mind), but I'm wandering back to FWL now.

The makefile reorg is finished, although I'm adding new functionality still. I need to autodetect when it's running as root and skip the UML wrapper steps, and the source tarball downloader.

The new script to download all the source code from the original locations is something I've meant to do for a while. Among other things, documenting where all this stuff comes from is important. Also, I'm trying to get a release together to put on the website, and the 100 megabyte tarball I put together last time is unwieldy. Keeping previous versions of such a tarball around is a real pain, discouraging minor updates with only one or two files changed, and being a bit of a pain for anybody trying to download the new version. The new script needs to download them only if they're not already there, should do an sha1sum to verify their integrity, should fallback to my website as a place to get them, etc.

I'm also putting together a monster to-do list to figure out what I need to do before the release and what has to happen after. I'm not on a schedule with this, but I've let it languish far too long already...

Another thing about getting it up is I need to move my website off of the machine in Eric Raymond's basement to the server in my bedroom closet hooked up to my cable modem with the static IP and the symmetrical 3-megabit connection. But I want to install firmware linux on my server before doing this. (Eating my own dogfood, and all that. Real field test of the thing.) The downside of not being on a schedule is I've been going down too many side paths rather than focusing on getting this up and running...

Sep 16, 2005

Linux kernel 2.6.13.1 has a working SKAS0 mode (Single Kernel Address Space mode, without requiring a modified host kernel), which should be noticeably faster than the -tt (Tracing Thread) mode I've been using. I initially couldn't use this because it doesn't support TLS (some threading optimization) yet, and recent glibc versions (such as the one in ubuntu) detect "ooh, a 2.6 kernel should support TLS!" and then barf if it can't initialize it. (Yes, glibc has an error message, but won't fallback to not using the feature. I consider this a glibc bug.)

The whole reason I was using -tt mode in the first place is that even though it's slower, it runs everywhere. The UML guys want SKAS0 to replace -tt, but if it wouldn't run due to this glibc bug, I couldn't use it.

BUT: It turns out there's an obi-wan "this is not the kernel you're looking for" move. If you set the environment variable "LD_ASSUME_KERNEL=2.4.1", glibc doesn't even try to use TLS. I confirmed that the following uclibc command line gives me a working command shell inside UML running in skas0 mode:

./linux rootfstype=hostfs rootflags=/ rw mem=48M init=/bin/sh LD_ASSUME_KERNEL=2.4.1

Aug 27, 2005

Ok, the gcc build break turns out to be because the uClibc install isn't installing the headers. The problem is that when running uClibc-0.9.28's make install_dev with busybox tar, the header copy is done with a pipe between two tar instances, and the creating tar instance has an "--exclude CVS" option but there isn't a CVS directory anymore. The busybox 1.00 version of tar complains about this and dies, thus no files are copied.

This is both a uClibc bug and a busybox bug. First of all, busybox shouldn't die because you tell it _not_ to copy a file that isn't there. It should be fine to --exclude something that doesn't exit. On the uClibc side, it shouldn't say --exclude CVS now that it no longer uses CVS, and more seriously the install should notice that tar failed rather than falling through and claiming success when it actually failed.

I hath pestered both mailing lists, and in the morning I'll take a whack at dealing with it myself, but for right now it's time to pass out.

Aug 26, 2005

Grinding away.

Breaking up the makefiles into a master "build.sh" that runs 0-make-tempdir.sh, 1-make-tools.sh, and 2-make-firmware.sh. Stage 1 runs the sub-scripts 1.0-tools-umlsetup.sh (the UML init script), 1.1-tools-build.sh (the actual build), and 1.2-tools-package.sh (to make the squashfs image for tools). Stage 2 also has three corresponding sub-scripts.

The upside of this is that you can choose to run the build without UML, you just need to run 1.1-tools-build.sh (as root, on a recent enough kernel), bind mount the resulting -tools directory into an empty directory, chroot into that to run 2.1-firmware-build.sh. It also means I can swap in different 2.2 stages to build the UML demonstration firmware or the actual bootable lilo firmware.

Lots of little details, though, such as the fact that a UML instance really doesn't exit with an interesting error level, so it can be a bit difficult to figure out if it's finished or not. (To get around that, I create a file in the filesystem when it's finished successfully. But, of course, _where_ to do that is non-obious because the files in the loopback mounted ext2 image aren't readable outside of the UML, because the non-root user in the parent system can't loopback mount...)

And yes, this turns out to be a necessary step before upgrading too many new packages, because under the current monolithic UML-based build script (which I apparently haven't posted to the website yet and really need to), a build break that happens partway through what is now stage 2.1 is a real pain to debug.

Building on the parent system as root (instead of as a non-root user under UML) is much easier to debug. And way faster to build, too. (Of course one particularly bad mistake this way could lobotomize my laptop, but that's why backups are a good idea.)

Grind, grind, grind...

Aug 24, 2005

Ok, reverted to a reliably working build. 2.6.11 UML, 0.9.27 uClibc, busybox 1.00 (with patches), and obsolete versions of just about everything else.

Started out the upgrades with uClibc to 0.9.28. The tools part built fine, but then gcc died trying to build under that.

Let the head scratching commence...

Aug 23, 2005

Wow, it's horked.

I upgraded the kernel to 2.6.12.3, updated uClibc to 0.9.28, updated busybox to 1.01 (which should be 1.0.1 but ask Erik what we call 1.1, or better yet 1.1.1), redid the build script to not have nested here documents...

Now I've reverted to the version to the website to figure out where I broke it so badly...

I'd grumble at myself about changing too much at once without regression testing, but I actually don't think that's what happened here. The problem is, an intermittent race condition cropped up, meaning sometimes the build runs to completion and sometimes it breaks in a completely random place, meaning I wasted a while examining entirely the wrong things, until I ran it unmodified three times in a row and got breaks in three completely different areas. Fun.

I _think_ the problem is that UML in 2.6.12.3 has a race condition in its filesystem; the build never breaks in the same place twice. I know I built it and it ran to completion when I first put UML in, but that doesn't help here. (It's clearly aace condition, the failure is always a file not found error, usually for files that the build should have just created. For example, install breaks trying to chmod the file it's installing...)

It's also possible that changing the loopback ext2 image to be a sparse file (dd now does a seek to the end and writes 4k, rather than filling the whole space from /dev/zero) triggered a bug in the kernel. It _is_ a rather impolite thing to do to poor linux kernel (both UML and the host kernel), but it saves space on the hard drive, skips a build step that lags the rest of the system considerably, and the downside should just be a bit of extra fragmentation in a temporary file (the loopback image is deleted after the build)...

It could also be that I'm trying to build it under ubuntu, which has a number of things wrong with it. (A make allyesconfig of busybox 1.01 fails with an ipv6 error, and what you have to remember is I put this sucker together and it _did_ work, but I think I tested that bit before before my laptop had ubuntu on it.) The server upstairs also has ubuntu on it now (well, in both instances kbuntu), and although it builds the current version of dropbear, it doesn't work. Each connection is immediately dropped with an ipv6 error...

However, the whole point of building the tools directory is to isolate the final system from the parent system, and ubuntu misbuilding stuff is highly unlikely to cause race condition errors or two different machines. A problem caused by a horked build environment _should_ be deterministic...

Update: Wow. What's on the website is _ancient_. I need to update the website.

Ok, what I need to revert to is the last build using the 2.6.11 kernel. A UML built from that worked reliably, and all the other changes (like the sparse ext2 image) came later too. Luckily, I have lots of backup tarballs, I just have to figure out which one I'm looking for...

Update to update: Ok, "reliably" is an overstatement. There are some very strange things about stdin and stdout in my setup, and gnu tar cares too much about these things.

The build runs within UML, and as far as the build is concerned it's talking to /dev/console. The UML instance providing that /dev/console is configured with the "stdio console" option, which means it forwards the input and output of that to file handles 0 and 1 (stdin and stdout) of the UML process running on the host system. If that UML process is running with its output redirected on the host kernel (piped to tee so it can be copied to a file), gnu tar's verbose option causes tar to abort because it thinks it hasn't got a stdio (even though it does).

Admittedly, this stdio setup is strange and evil, but gnu tar is just broken. First it's hallucinating a problem that doesn't exist, which is just a simple bug. The fact it's aborting in response to something that could safely be ignored even if it was real is a bigger problem, and a design issue. It's the verbose option to tar that's failing; aborting tar because it thinks it can't be verbose is _wrong_.

tar: Error in writing to standard output
tar: Error is not recoverable: exiting now

Yes, failure to write -v output is fatal. Nothing else is being written to stdout.

I'd already have replaced gnu tar with busybox tar throughout the build (since busybox tar isn't broken in this way), but A) the busybox I build is linked against uclibc, so I'd have to build a second one to run on the parent system, B) I need the parent system's tar in order to extract the initial tarballs anyway (including the busybox tarball). So I might as well use it consistently. Except for it being broken...

Aug 17, 2005

And so I return to work on FWL.

I put together a busybox 1.01 release (which was kind of overdue), and got the mount rewrite checked into the 1.1 line. Over in uClibc land, Erik got out uclibc 0.9.28. And it's long past time that I bump the compiler version up to at least gcc 3.3, maybe 3.4. Which means dredging up a new binutils, and I should upgrade the kernel to 2.6.12, plus the build script needs to be broken up back into the stage files because nested here documents are just silly...

Right.

Jul 22, 2005

Ok, the problem with unbutu is you have to modprobe loop in order for the loop devices to show up. (Note: they'll WORK if you don't have this module loaded, but udev doesn't get notified to make the /dev entries. This is one of the dumbest things I've come across in a while. And yet /dev/loop/0 is there without the loop module installed, even though lanana.org is quite clear that /dev/loop0 is what it should be.)

In other news, 2.6.12.3 has working tt-mode UML support again (bugfix went in), and I've shoehorned it and the new 2.6.12 libc headers into the build to see what breaks. That's 2 packages updated, several dozen left to go.

My laptop is running grub (that's what Ubuntu has), and the only bootloader I've patched to have a length option is LILO. I'm under the impression adding a length option to grub would involve modifying assembly code. And grub doesn't have a -R option to change the command line for the next boot only. (This lets you do a provisional boot; have the new firmware change the default only if it comes up successfully. If it can't boot the new firmware, power cycle the box and the old firmware comes up. As far as I know, grub can't do that if there's no keyboard attached to the machine and the only interface to the thing is via a web browser...)

I'm still spending most my hacking time playing World of Warcraft working on the Busybox 1.0.1 release. I'll still need to apply the sort patch to that since sort is a new feature, not a bugfix. But on the bright side, it looks like I'll be able to use busybox gzip/gunzip in the build, which is cool.

Jul 10, 2005

There's no such thing as "/dev/loop/0". Ubuntu people, are you listening to me? You use udevd. The standard device name for the first loop device is /dev/loop0.

I'm also not particularly thrilled about breaking all these libraries into two packages (one of which, the "-dev" package, installs the headers so you can compile stuff against it). I thought this was stupid when Red Hat did it: headers are small and eminently compressible. Look into squashfs or something.

Yes, this broke the firmware linux build when I finally downloaded the code and just tried to run it on my new laptop. (Running the existing build before trying to do new work, as a sanity measure to make sure my new environment was working on a known good build...)

And no, I didn't change my script, I made the darn symlink in /dev. In this case, ubuntu is wrong.

Jul 9, 2005

New laptop, new distro (kbuntu). Can't say I recommend either. Took two weeks to get it together, between the Dell hardware going south (I didn't know a power supply brick could beep; who is stupid enough to put circuitry in there? It's supposed to be a dumb transformer, its' JOB is to get hot and put off electromagnetic interference well away from the circuitry. But no, Dell makes it all flaky...)

I didn't install Fedora because Red Hat's continuing deterioration is just too painful to watch. (I can see not shipping decss because you don't care if your customers can play their DVDs, but not shipping mp3 playing software when the patents only cover recording and not playback is way too yellow for me to stomach association with a company that not only won't stand up for anybody but backs down in the face of _imagined_ threats. They don't even install OpenOffice, instead they install an _installer_ for OpenOffice that makes you agree to a license before downloading it from the website. Things like GCC 2.96 and shoving gnome apps down KDE users' throats are just gravy compared to the sheer legal cover-your-ass cowardice Red Hat has developed. Oh yeah, these guys are going to stand up and fight the good fight about letting us actually play DVDs on our laptop someday. Not.)

Knoppix hasn't got a decent installer; I didn't leave it running overnight to copy 20 gigs of backup files to an ext3 partition just so it can insist on reformatting it reiserfs. I don't even like reiserfs. And peeling it off the CD and loopback mounting it manually got trickier since they introduced union mounts. Plus a few new fun bugs moving from 3.6 to 3.9 (switching tabs should not make Konqueror resize its window). Running it from the CD on an ongoing basis is incredibly painful: not just slow spinning the CD up all the time but the endless stupid confirmation boxes... (Yes, entering text at the google prompt is going to send it unencrypted across the network. Shut up. Yes, it's ok to let google set cookies: it's ok to let everybody set cookies, just treat them all as session cookies. A pop-up prompt for every single cookie is _INSANE_. Yes, I sometimes want to move from encrypted to unencrypted pages, such as when I type a completely new page in at the URL bar to re-use an existing tab. Shut up. No, I NEVER want you to save login information in some stupid wallet for any site, anywhere...)

And because it's running from the CD, you have to jump through these hoops every time you boot. The way I do email via ssh tunneling is another fun thing that needs setup under a cd version of knoppix, that's assuming I set up one of its strange "scour the hard drive and find my home directory" things...

Slackware 10.1 is still using a 2.4 kernel. Yeah, I know Patrick got sick and has some catching up to do, but if I consider a distro that doesn't use udev to be a bit behind the times, what can I say about a 2.4 kernel in 2005?

I will never use unfiltered Debian. Back in 1998 I was wandering back to Linux (after a long divergence through OS/2 and Java). The last time I'd actually run it was 1993 or 94 (and that was SLS, which wasn't around anymore), and after a bit of searching I found that Debian was still around. Downloaded over a dozen floppies, got them all installed, then tried to configure my dialup connection to install the rest. My modem was on a nonstandard IRQ for that serial port, and I knew this, and know how to tell OS/2 this, but didn't know how to tell Linux this. (The answer I was looking for was "man setserial".) I made the mistake of asking on the debian developer list. I didn't get an answer to my question, but reading the list over the next few days the permanent flamewar expressed the sentiment that the people on the list would "rather see Debian die than start pandering to newbies like Red Hat". And I went "Great! Red Hat! That's what I'm looking for." That was 5.something, and I was a happy Red Hat user for years... And I am NEVER giving Debian another chance. I don't care what their technology is like, and I'll use Debian derivatives, but the politics of their developer community is something I just don't want to be directly exposed to ever again. (Plus the fact it takes them several years to release each new "Debian stale", and no there isn't a b missing from that.)

My friend Stu Green gave me some Ubuntu CDs a while back, and based on that I downloaded their current KDE version last week. Boy is there a lot wrong with it. In order to get a root login I had to crack my own system (boot with init=/bin/sh, mount -o remount,rw, vi /etc/shadow, cut and paste to replace the "*" in root with the password from my user... The base install was laughably insufficient and the "install additional stuff" option in the menus was not obvious to me (stu pointed out system->kynaptic, which is apparently obvious to Debian people). Even after it had installed gcc it didn't have "gcc" in the path. (It had gcc-3.3, I had to make my own symlink.) The installer didn't notice it had run out of disk space, leaving lots of packages broken until I figured this out and reinstalled. It spins the hard drive down every 15 seconds when it's on battery power (I need to tweak the init scripts, probably just run "laptop-mode stop" at the end). And so on, with a gazillion small idiocies...

But kbuntu is less than a year old, and the direction it's heading in seems to be promising. It's at last trying to serve the desktop market. It uses udev, it correctly recognized the Dell laptop's wierd monitor size (widescreen: 1280x800), set up the sound properly... It's _trying_. It's not ready for end-users yet, but someday it might be, and in the meantime it's something I can hammer into a useful shape.

Jun 11, 2005

My new job has taken a lot of my time getting up to speed (especially learning perl). I've only gotten to check my personal email every three days or so.

What free time I've had in the past few weeks has gone to trying to keep busybox development from stalling again, since I seem to be the only lieutenant of Erik's more concerned about Busybox than uClibc.

That said, it seems to be easing off a bit, and I looked at a new laptop today that would make it a lot easier to work on this without network access. (Right now to do a build, I have to ssh into a faster server at home. My current laptop is a cheap replacement I bought used after Linucon, and it swaps itself to death just with the browser windows I keep open...)

We're having a Linucon concom meeting tomorrow, and I'd like to have a 1.0 release of FWL by Lincon (end of September) to give a presentation on.

Deadlines. What would I get done without them? Not much...

May 20, 2005

I've been slacking off a bit, I know. And to top it off, I got a new job, which I start monday.

My server in the closet is much faster than my laptop (and has twice as much memory, and doesn't have a couple dozen open Konqueror tabs causing it to swap its guts out so badly the mouse pointer freezes for 30 seconds at a time on a regular basis) when I'm _not_ running a compile in the background. As a result, I tend to want to ssh from my laptop to the remote system to run a build.

This turns out to be non-obvious. You can't just "./make 2>&1 | tee out.txt" because the build takes hours and if your network connection cuts out, the ssh session will die and kill your child processes (the build). I often want to kick a build off and go to lunch, which involes suspending my laptop...

In theory, "(./make > out.txt 2>&1 &)&" followed by "tail -f out.txt" is the answer, and the double fork does indeed detach the make process from its parent shell. (Of course ssh still refuses to exit if there are unfinished child processes, but this hang is a bug in ssh that is not shared by xterm, telnet, or anything else but ssh. Yes the ssh developers are aware of it and refuse to fix it because they're OpenBSD guys who hate Linux.

In practice, if you try the second incantation you'll notice that User Mode Linux exits immediately. Why? If it gets EOF on stdin while initializing the stdio console, it throws a temper tantrum.

The answer is "(cat /dev/zero | ./make > out.txt 2>&1 &)&", which works.

May 7, 2005

Ok, it's May, I can start writing "2005" in the date field now. (Went back and fixed all the previous dates back to January.)

So I'm finally popping my head up and looking around at what the rest of the world has been doing. I got subversion installed so I can merge patches into busybox, and rather than starting by merging my own patches I've been clearing up the patch backlog from the mailing list. Of course the only way to get people to reliably object to a patch is merge it, so I'm having to revert several of them, but still. Progress is being made.

I burned a Knoppix 3.8.1 CD. (I bought one at Penguicon, but it wound up in Cathy's luggage and is in Pennsylvania at the moment.) There's a show-stopper UI bug preventing me from actually upgrading my laptop to it yet (Konqueror resizes its window when you switch tabs. Web browsing is over 50% of what I do at the computer, and that kind of constant annoyance is just... *Shudder*...), but they're using Unionfs!

Yes, Linux now has a patch that provides working union mounts! I've been waiting for this for years. Now I just need to figure out how to get it to compile built-in rather than as a module. (Have I mentioned I hate makefiles?)

Debian stale is about to ship a new release still using XFree86 rather than X.org. The X.org switch happened at least two fedora core releases ago, and before the Ubuntu project was even commissioned, yet Debian "stable" still hasn't got its new release switched over yet. Is there any wonder it's acquired the nickname "Debian stale"?

Apr 30, 2005

Ok, back from Penguicon.

I finally broke down and installed subversion on my laptop, and I've been going through the busybox archives to check in some old patches that have been lounging around on the list. Once I get the patch backlog worked through a bit, I'll start doing my own fixes to the busybox to-do list I wrote. The upgrade to "find" that currently prevents most packages' "make clean" from working would be a good start.

In parallel, I need to upgrade the versions of the various packages and put together an installable version of the firmware image. (Doing the firmware image is easy, writing the installer that lets you do anything useful with it is the hard bit. Bit of a chicken and egg problem of needing a modified lilo to install it, which is in the new system that you boot into. Yeah, I could do a boot CD, and eventually probably will, but for the moment I'm thinking a UML image could lock a UBD onto a normal block device like /dev/hda, although getting the geometry information is tricky. UML can't read from a pipe through hostfs, so actually getting data out of the parent system's /proc can be problematic...

It'll be a few more days before I get around to working on it, though.

Apr 18, 2005

I just put up a release, version 0.8. It builds the User Mode Linux version of the firmware image, from source. I need to redo the main page and put out some kind of announcement on freshmeat or some such, but not until I get my website moved to a faster internet connection.

I also did a first pass of a todo list. Which is not complete, of course...

P.S. I don't know why tar was unhappy. I was piping the output to "tee", and when I replaced gnu tee with busybox tee, it built all the way through. (I also moved it from the fedora machine upstairs to my knoppix laptop, so I could remove the network as a consideration as well.) So this seems to be a case of one GNU program not liking another GNU program, possibly with some interference from Fedora. Odd. I'll worry about it later.

Apr 17, 2005

The build process has been having the strangest crash. Right before the end, right as it finishes untarring the kernel tarball to build the UML instance to make the firmware-uml file out of, tar exits with:

linux-2.6.11/MAINTAINERS
linux-2.6.11/Makefile
linux-2.6.11/CREDITS
linux-2.6.11/README
tar: Error in writing to standard output
tar: Error is not recoverable: exiting now

But it successfully wrote that message, didn't it? And it only seems to happen if I let the whole build run: if I run just parts of it, the parts complete fine.

Odd. Still tracking it down. It would help if I could find a reproduction sequence that took less than 2 hours to run on my fastest machine...

Apr 12, 2005

It works, for certain values of works. For the moment, here's a test binary of the UML version. As I ranted about last time, it's 13.7 megabytes until I have time to strip it down.

My build script is creating a single file, firmware style UML executable with built-in root partition. (See link above.) You run it like a normal executable and it boots up UML and gives you a shell prompt. (No, ctrl-c doesn't work because that shell is attached to /dev/console of the UML instance, which doesn't give a controlling terminal. To-do item.) I need to splice the script snippets together into one big build script, and do a 0.90 release of the build for that. Hopefully tomorrow.

/bin/sh is running as pid 1 in that, and if you try to exec /sbin/init it gets confused. Right now when it tries to run busybox init, there's no inittab so busybox supplies a default inittab internally, which tries to open consoles on tty1 through tty3. The UML config tries to open those as an xterm, but support for that isn't built into this UML instance (mostly because it needs some helper program I don't seem to have), and it starts spamming the UML console with this:

Bummer, can't open /dev/tty3
Using a channel type which is configured out of UML
Using a channel type which is configured out of UML
Using a channel type which is configured out of UML
Using a channel type which is configured out of UML
Using a channel type which is configured out of UML

With init running there is a working shell attached to /dev/console, so you can type commands blind and they happen, but without being able to see the results about the only useful thing you can do is "halt". Then again, halt works beautifully with init running. (Without it, just exit the shell. Yes it'll panic "tried to kill init", but that shouldn't hurt anything. Call sync first if you have any writeable space mounted. I should fix that so exiting init syncs before panicing...)

Yes, it's 13.7 megabytes, although there's a lot of opportunities to trim it (some of which I detailed last time). I'll do a size reduction pass after I do an update pass (the versions of some of the tools I'm using are pretty old), which I'll do after I clean it up enough to do a release. (Although getting the static initramfs busybox to be built against uClibc instead of glibc I can do now. The UML instance might have to stay built against the parent system's library; it's statically linked but the headers it was built against are probably for a newer kernel than the systems its expected to run on, so what syscalls is it making? Something to ask on the UML list when I'm more awake...)

The main remaining unresolved issue is where do I mount the hostfs so the build can access the parent's files. There are a few ways of doing this: it can supply its own environment entirely (which the test version is doing now, although its only writeable space is ramfs at the moment), or it can supply its own environment but attach the writeable hostfs at a known location, or it can glue one new directory on to a hostfs and use the host's tools and libraries to run stuff out of that directory.

Eventually I'll probably code up all those options, or at least examples of them. But for right now, I need to clean what I've got up, get my website using it, and ship it as 0.90...

Oh yeah: the "strip" command will chop off the entire root partion I've appended. Right now, Don't Do That Then (tm). Eventually, I should be able to make this look like an ELF section that strip can understand enough to retain, but that's way in the future.

Apr 11, 2005

Build bashing going along fine...

(I'd like to mention that World of Warcraft has already delayed work on this project by a month, probably with more to come...)

Apr 9, 2005

So last night, while the laundry was going, I sat down to make a firmware style UML image. Guess what? The losetup command opens the file read/write, and if the file is /proc/self/exe then it's a running executable and this fails with "text file busy". The mount is going to be read only, but the losetup command doesn't know that. The mount command knows this and can open the file read only when it does the loopback mounting (it doesn't shell out to losetup, it calls the two ioctls itself), but then you can't pass an offset. There's no offset option.

Now I can easy make a little C program to #include , fill out a loop_info struct, open(O_RDONLY) the file, and call ioctl(LOOP_SET_FD) and ioctl(LOOP_SET_STATUS). This is not brain surgery. But a possibly cleaner solution is to add a -r option to busybox losetup. Otherwise, simplicity says I might as well just move all the functionality of the initramfs/init script into the C program, but readabilty argues against it. I'd rather not open that can of worms if there's an option.

So I whipped up a patch to busybox. Dunno if they'll take it...

Anyway, with that patch, I now have a "firmware" style UML which weighs in at 14 megabytes and boots to a command prompt in a loopback mounted squashfs firmware linux root partition. (And there was much rejoicing.) I can trim several megabytes off that (the big low-hanging fruit is the two megabyte busybox in the initramfs: it's statically linked against glibc and is "make allyesconfig". That gzips down to much less than that in the initramfs cpio, but still. Then Matt Mackall's -tiny tree can trim a megabyte or so off the linux kernel, although a naive application of the whole patch results in a broken UML compile, so I'm going through the broken-out patches and adding them one at a time to see what works and what doesn't for UML...)

The 12 megabyte root partition I'll have to look at some more. A lot of that's gcc and binutils: they're enormous. Not a lot I can do about that except give the option to exclude them. And if they go, a lot of the rest of the system can get a quick trim for embedded usage. (The uncompressed size of some obvious candidates: /usr/include is 5915k, /usr/share/info is 4607k, /usr/share/locale 4678k, /usr/share/man is 1526k... I could parse the squashfs output to see how much space the compressed versions take up, or I could just delete 'em and see.)

But all that goes on the to-do list after getting a 0.9 release up with everything upgraded to the current versions of all the packages and an install mechanism that lets you actually boot into a firmware file. (And of course getting the build to make these images automatically rather than me doing it by hand. :)

Oh, and I found another bug, probably in UML. (Might be in busybox mount, but I doubt it.) After the pivot_root, I have the old initramfs mounted on /mnt and under that is proc mounted in proc. So I want to move that mount to /proc. When I try to "mount -o move sub/proc /proc", it hangs hard. I have a theory...

Apr 3, 2005

Ha!

I don't need the bingrep program to search for the start of the squashfs partition. The elf header at the start of an executable file is 16 bytes and it only uses 0-8, bytes 9-15 are padding. I can use 12-15 to store a 32 bit offset for the start of squashfs and still have a three byte safety margin in case of future expansion.

This doesn't help me with the actual bootable firmware file, but it does making a firmware-style UML image. Which I should go do. (So naturally, I'm fixing the CONFIG_FEATURE_SH_STANDALONE_SHELL of busybox. But after I do that...)

Apr 1, 2005

New code is up. No, it's not an april fool's joke. (Never really liked that holiday anyway; I've never seen the humor in trying to trick people.) You can download it here.

Currently, it's pretty cheesy. Download the tarball and run ./make.sh, and when it finishes the file tmpdir/workspace.img is an ext2 image containing the root filesystem. Not very useful at the moment, but you can loopback mount it and chroot into it and play around a bit.

The next step is to create firmware-uml and firmware-linux images you can run and try it out with. The UML version will just run ala "./firmware-uml" so you can play with it. Using the actual bootable firmware-linux one would be a bit of a chicken and egg problem because you need the modified copy of lilo it contains in order to install it. I mentioned last time a potential approach using ./firmware-uml with a script that attaches a UBD device to a boot device to run lilo against, if the drive geometry querying isn't too big a deal. I'll have to play with it.

I'm working my way to a 0.9 release. The build got to the end, so I'm putting a snapshot of it online to replace the hideously out of date code that's up there.

I have a list of things I want firmware linux to accomplish, and it's not quite doing most of 'em yet. A 1.0 release is where it does all of 'em.

Show that busybox can fully replace the GNU tools.

I want to show that busybox can entirely replace the GNU tools, for regular real-world use. The heaviest test case I can come up with here is using them in a development environment to build an entire Linux distribution. Hence firmware linux should be able to rebuild itself under itself, with just busybox and a compiler toolchain.

I'm most of the way there. Busybox has replaced most of the GNU packages, but there are still specific bugs or omissions in things like gzip, find, diff... (See the busybox TODO file. I wrote it and it's bascially my busybox todo list for firmware linux.)

Firmware image

I want a single file that boots and runs the whole of Linux, including any bundled applications. Note that such a system wouldn't have a package management system like .RPM or .DEB files, because the firmware image IS its package management system. This makes sense for things like bootable CD/DVD distros (which are read only anyway), for corporate workstations, for embedded devices... It's one more package management option, an atomic read-only system image you can easily upgrade and version control.

I don't know how squashfs interacts with rsync. It's possible that upgrading multiple workstations via rsync would be pretty easy for small changes, or it might require upgrading the whole image anyway. I also don't know what it would take to combine rsync and bittorrent, or how much work it would be to make a bittorent-based block device...

Desktop stuff

I want to be able to use it on my laptop to replace knoppix. Right now, it isn't even running X.

Configurable build

It should be possible to make a really stripped down version that doesn't have build tools in the final image. (Right now it's a little over 10 megs with gcc and all the header files and such; without it should be possible to get it under 3, and with the linux-tiny tree down to about 2.)

Administrative stuff

I need to move my website off the server in Eric Raymond's basement. I've been paying for a fairly beefy cable modem connection with a static IP for well over a year now, yet where's my website? Behind a 40 kilobyte per second DSL line in pennsylvania that isn't even mine. Putting up an 80 megabyte tarball for download from there is impolite at best...

I also need a download option where the 80 megs of source code aren't bundled with the build scripts, but instead have a script that downloads them all from their original websites. (Not brain surgery, I just need to do it.) And for the one big tarball version, I should set up bittorrent.

Getting what I currently have booting in a firmware image (which is just a matter of doing it), plus the administrative changes to the website, is enough for the 0.9 release. Possibly this weekend, if I'm lucky...

Mar 30, 2005

Spent yesterday playing World of Warcraft on my fiance's computer, but today the WoW servers are down (again), so I'm taking that as a sign I should get real work done.

The only tweak to the /tools directory in the second stage was changing the gcc .spec file to link against /usr/lib instead of /tools/lib, and it did that right after building uClibc (so there was something to link against). In theory, uClibc doesn't link against external libraries, so that tweak can be done before building uClibc and thus /tools can be a readonly squashfs mount. In practice, gcc wants libgcc and a few other things to build _anything_. The fix is to symlink /tools/lib as /usr/lib until the end of the uClibc build, and then replace the symlink with an empty directory before doing the uClibc install. So now /tools can be a squashfs mount.

Eventually, this will allow me to ship a unified UML binary (User Mode Linux plus appended squashfs filesystem) of the first half of the build, which can do more flexible kind of builds, with the option to do things like not have any build tools in the resulting system, to save space. That comes later, though.

It's also getting close to time to think about an installer. I use a modified version of lilo, and having them install that on a parent system is a bit silly. A bootable CD would be good (that's what I did a few years back), but another fun little thing I can do is make a UML based installer that runs my modified version of lilo against a UML Block Device (UBD) that's set up to point at one of the real block devices (/dev/hda or some such). Feeding it the appropriate drive geometry might take a bit of doing, though...

Mar 28, 2005

I'm up at Trianon, a coffee shop in the arboretum, and the sun has just gotten to the right angle to shine off the cars in the parking lot, through the window, and into my eyes. Sigh.

I bought a new computer. I shouldn't have, but it was $170 or so at Fry's ($250-ish with extra memory and taxes) and came preinstalled with Linux ("Linspire"), and presumably working 3D. I dunno the details yet, I immediately swapped the hard drive out from my server and put it in the closet, and I'm running builds on it. (It's got a 1500 mhz "AMD Sempron 2200", which is three or four times as fast as my previous server.)

Lots of progress getting the build together. The tools build is working, and the second half of the build (equivalent of LFS chapter 6) is getting de-bitrotted. (Mostly directories moved under it.) Hopefully, I can get a snapshot up later this evening.

Only _after_ I get the new version up do I worry about updating all the apps to current LFS versions, applying the -tiny tree to the linux kernel, upgrading the various busybox apps I need to fix, redoing the web page, installing the firmware thing on the server and moving my web page to it...

Home stretch...

Mar 22, 2005

Blaiosorblade on the UML mailing list has a patch for the problem I've been seeing with hostfs permissions, so possibly I'll be able to stop mucking about with a ramfs mount over /dev early on. That would be cool.

More cool is that my contract at Dell ends friday (Yay!) so I may finally have a bit of free time (non-sleep deprived, even) coming up to seriously get this project in shape. That would be good, my current code is way past what's on the website, and I also need to install it on my home server and get my website moved over to run under it.

The other thing I did recently was move all the scripts and source under a firmware-linux directory to make it easy to ship a tarball. (No more "download sources.tar.bz2 and download these scripts", it's all one tarball now with sources in a subdirectory.) Once I get my new server configured, I can make a torrent out of it...

Mar 17, 2005

I'm playing with Matt Mackall's "tiny" tree. Lots and lots of little patches to figure out what they do. (Among other things, they can make UML smaller...)

Updating everything to current versions is an important to-do item after I get the build back together and producing an actual runnable firmware image. One of the things I intend to do is create the correct list of where all the source code can be downloaded from. (Among other reasons, I can never remember http://ep09.pld-linux.org/~mmazur/linux-libc-headers/ and have to keep looking it up. Each time. I can't spell "Mariusz Mazur", either...)

As always, the sourceforge mirror system screws things up by not making it easy to select a darn cannonical download location. (The only way to get a URL you can point wget at is to pick a mirror and fight with your browser for a bit. And may I say how NICE it is that they get you to a page that isn't the download you want, but auto-starts the download, so there's no easy point at which you can right click on a link to "save as". You have to hit "stop" to abort the download it's trying to do (with the way my browser is set up, pointing it at a tarball will _OPEN_ the tarball and view its contents with a file manager), and then right click on the "if your download doesn't auto-start" link...

Whoever designed sourceforge's mirror system really should be punished...

Another issue is seeing what else is out there other than gcc and binutils. There are lots of free compilers out there, but eliminating the ones closed source ones (like Intel's), the 16 bit ones, and the ones that aren't on Linux, the list narrows a lot. Ideally, I'd want one that could be a reasonable replacement for GCC, not just on x86. It has to be able to compile the Linux kernel (which is tricky with lots of gcc extensions and dependencies on things like inlining in the proper place), produce reasonable code, and C++ support would be nice too.

Fabrice Bellard's Tiny C Compiler is interesting, but with no C++ support it can't handle python or anything derived from qt (like Konqueror), the code it generates is still pretty sucky (big and slow, although the compiler itself zooms through the build amazingly fast), and although it has compiled the linux kernel (google for tccboot) it wasn't an unmodified Linux kernel because it's missing a few things yet. On the bright side, the fact the codebase is small and simple means it's the easiest of the lot to extend..

SGI has GPLed their "Open64" compiler, and a company called PathScale derived a commercial thing from that (which is confusingly licensed, I _think_ it's still GPL, but their website doesn't make it easy to find out unless I want to download a free 30 day trial version...) I think PathScale gets eliminated for sheer cluelessness on that front, and I don't know much about Open64 yet.

There are a few other possibilities; I really need to spend a lot of time trying them out, and that won't be any time soon.

Now that busybox really can replace the gnu tools in an actual development environment, my goal with a system based on busybox, the linux kernel, and lilo is to present a working system to Richard Stallman that hasn't got any GNU code it it and ask him if he'd still call it "GNU/Linux". Building with gcc, he'd probably say yes, even though the FSF hasn't been in charge of gcc since the EGCS fork inherited the name. (Presumably, if I build with Intel's compiler I have Intel/Linux.) Shipping gcc in the result, he'd definitely claim credit for it...

Actually, whatever I do he's still going to say yes, but I'd like to highlight the absurdity of it as much as possible. Everybody needs a hobby...

I should put up my todo list on the main firmware linux page...

Mar 16, 2005

Wandered to Metro (my local 24 hour coffee shop with wireless internet) with my laptop to get some work done on Firmware Linux. Spent the evening banging on the makefiles of User Mode Linux instead. Of course...

I don't even LIKE make files, but I want to do away with the snapshot .config file I'm using for the UML build and instead cat the new symbols I'm switching on to the end of a .config produced by "make allnoconfig", and then run "echo '' | make oldconfig" to update it. This is only guaranteed to work if I don't have to remove any existing symbols, and right now there's CONFIG_LD_SCRIPT_STATIC=y vs CONFIG_LD_SCRIPT_DYN=y, and it's basically one boolean (are we statically linking or not).

So I'm puzzling out enough of the kernel's makefile syntax to make it an if/else condition on just the static symbol, which defaults to unset and is one of the ones I currently add for TT mode.

I probably don't even really NEED to do this, since oldconfig seems to zap the DYN symbol if both it and the STATIC one are set, but still. There should be only one, so I'm cutting the other symbol's head off and ripping it out of the config menu too. (I should probably zap CONFIG_STATIC_LINK too, but not tonight...)

So I spend the evening solving a problem only I'm ever likely to notice, but I get to rip stuff out and throw it away. Always a plus...

On an unrelated note, I'm putting /tools (LFS chapter 5) in a squashfs appended to a usermode linux instance. This means everything in it is read-only, but after I build the new uclibc I need to update a spec file so that what it builds gets linked against the library in /lib rather than /tools/lib. Since uclibc is statically linked, I don't know having the version of the spec file that links against /tools around for the uclibc build is necessary at all. I should test that...

If I'm going to play around with UML much longer, I _really_ need a version of dump_stack that can write to an arbitrary print function. I'm just saying...

Also, the /tools directory is about as big as the final build because the binaries in it are HUGE. Looks like the old duplicate library in the search path problem triggering whatever bug that is in the gnu linker. To-do item to track this down, I've taken a whack or two at it and it's not easy...

Oh, and I'm playing around with getting busybox applets to compile into independent executables, too. It's pretty simple for some of 'em:

#!/bin/sh

cd libbb
make
cd ..

for i in sed awk patch vi
do
  APPLET=$i
  FILES=editors/${i}.c
  gcc -Os -s -o $APPLET standalone.c $FILES libbb/libbb.a -Iinclude -DAPPLET_main=${APPLET}_main -DAPPLET_full_usage=${APPLET}_full_usage
done

Others take actual thought. And turning the above into makefile syntax is unlikely to be fun, of course. (Did I mention I really don't like makefiles?)

Mar 15, 2005

Woah, long time no update.

I spent last weekend thumping on User Mode Linux to fix the console output issue (patch sent to the list, and accepted into the queue), and along the way learned more about the Linux tty layer than man was meant to know. I have an inkling why they keep saying it needs to be cleaned up, and will probably look into it some more soon.

The busybox mount rewrite on Feb 4 has been "done save for polishing" for a while now. I polished that a bit too, need to resubmit it to the busybox list. (Or wrestle with the darn bug database, although this is more a new feature than a bug.) I should replace the sort patch on the firmware page with that, since the sort patch is now checked in to busybox for the next release. (Speaking of longstanding patches, I need to dig up and polish my bzip2 compression side rewrite and get that in, and also there's recently been interest in my old init rewrite that got eaten by the many month busybox freeze (both the feature freeze before 1.0 and the endless repository switch to subversion afterwards). Naturally, now that all that's cleared up and I can get stuff in again, I'm swamped with day jobness...

None of this is necessary for forwarding firmware linux, of course. (The UML console thing is cosmetic, and I worked around the mount bug ages ago.) Now that Linucon's website is on Mark's server (ooh, another thing that's been eating time: Linucon hotel contract negotiation. Once again, not going smoothly, but nobody's suprised...) Anyway, now my upstairs server is free to be reinstalled with firmware linux. If I just have time to put it together.

A really simple "proof of concept" I can post to the UML list is a UML instance that has a squashfs appended, and does the hostfs mount trick to borrow the parent filesystem long enough to open /proc/self/exe and loopback mount said squashfs. I'm thinking of making build stage 2 work like this, although I haven't figured out if I want the 80 megabytes of source code to stay on the hostfs or be in the squashfs. Probably the former, but I have to figure out the cleanest way to do it...

I moved all the build stuff under a single directory last week, with sources under there and the scripts in there, so that I can make just one tarball you extract, cd into, and run. I need to make time to finish that...

Feb 27, 2005

Grumble grumble grumble...

Replacing /dev with a minimal ramfs substitute is a hard problem. The gnu toolchain blows chunks left and right for subtle reasons, and often the actual build break is several steps after where the problem was...

Yet using the host system's /dev via hostfs is brittle and requires several things to be replaced anyway: /dev/loop0 has the wrong permissions, and /dev/console only belongs to your user if you're logged in under X, and not when logged in from a text console or via ssh. And that's what broke on the distros I've tried, who knows what else is strange elsewhere?

So to get something portable, I need to make the ramfs /dev work, and just trying to see what breaks as I go along is frustrating and time consuming. So I'm setting up parallel builds with the original /dev and the ramfs /dev, so I can stop them at various points and compare what's diverged.

Doing things right can really be a lot of work sometimes...

By the way, strace -f still segfaults under UML. Dunno why. (Possibly it segfaults without it, I don't know.) Replacing the actual system binary of /usr/bin/ld with a script ala "#!/bin/sh\n\necho $* >> logfile\nld $*" proved useful drilling through the "collect2" wrapper, though. When in doubt, use brute force...

Feb 26, 2005

It's taking a little while to figure out what I need to add to /dev to make the /tools build happy. (I could just run the Linux From Scratch makedev script in there, but I'm trying to replace that with udev soon so adding another instance of it would be kind of uncool.)

The ar from binutils 2.14 needs access to either /dev/urandom or /dev/random, or else it reports an incorrect error message (that the archive it's trying to create doesn't exist). This problem does NOT occur with a more recent binutils (maintained by somebody other than the FSF, and yes I plan to upgrade after I get an actual release out doing the firmware thing).

So the build broke on my laptop, I took it to a faster machine running Red Hat Enterprise instead of knoppix, the break happened there too, I hacked the mktools-umlsetup.sh script to run /bin/sh instead of mktools-build.sh, run the build script by hand, and then run the offending line that caused the break in the shell but forget to set the path to include /tools/bin like my build was doing, and the problem wouldn't occur.

I lost hours to this. Naturally, my first instinct was to think something else went screwy with UML, but UML is actually getting pretty robust. (By process of elimination it would seem.) Once I figured out /tools/bin/ar behaved differently than /usr/bin/ar, and managed to reproduce the problem from the command line, I tracked down the actual problem pretty quickly by running ar under strace. Yes, in UML. This used to segfault immediately, but not it works. (Yes, this is running a ptrace-based debug tool under an emulation environment based on ptrace. So I'm ptracing the thread doing the ptracing, and it _works_. The mind boggles. Pretty cool, though.)

So that's how I spent my evening. (Well, that and reading Roger Zelazny's "The Changeling" while variants of the build ran to the breaking point over and over. And this is not counting the first half of the evening, spent watching my fiance play world of warcraft and eating (I'm not making this up) leftover cherry custard/cheesecake pizza from Mustache Pete's. It might actually be better cold out of the fridge than it was hot out of the oven...)

New code drop on the website Real Soon Now (tm). But not today.

Feb 24, 2005

Finally feeling better, just in time to help my fiance move in with me. Much packing, but managed a little coding time in there too.

Various idiosyncrasies using UML with a hostfs root filesystem have been dealt with. A new tools build should be up in the next day or two.

The tools build script is now three parts. The master build script (mktools.sh) compiles User Mode Linux, and then runs the rest of the build under that.

User Mode Linux starts out with a hostfs mounted copy of the root filesystem, so we can use all the build tools and libraries of the host system and don't have to supply a binary filesystem image of our own. (I'm trying very hard to let this thing build under any reasonably sane Linux distro. A system that can only build itself under itself just isn't flexible enough.)

Unfortunately, there are a number of problems with trying to actually do much under a hostfs mounted root filesystem. For one thing, hostfs is weird about /dev entries: the permission check is done as if they were normal files, which means if the host user (the user running the UML binary) can't access it, then the root user inside UML can't access it either. This means that /dev/console is inaccessible when running from a tty rather than an xterm, and /dev/loop0 is inaccessible either way.

Also, even though we've got writeable space we can't mknod or chown in it, and when extracting a tarball as root, tar exits with an error if it can't chown the files to the UID and GID of the original owner. (Yeah, there's a long option ala --don't-do-that-then in gnu tar, but not in busybox tar, and it's too obscure for me to really want to add it to busybox.) And on top of that, we still need to make the /tools entry at the top of the root directory, and the user running UML can't write there...

The solution to all this is the mktools-umlsetup.sh script, which UML runs to set up an environment we can write into. It makes a half gig loopback mounted ext2 image, bind mounts all the hostfs directories into it, adds a ramfs /dev with the devices we need (with the right permissions) and the UML procfs, redirects stdin/stdout/stderr to the new /dev/console just in case UML couldn't access the old one, and chroots into it to run the third script, mktools-build.sh.

If you're not using UML, you can skip mktools-umlsetup.sh and just run mktools-build.sh instead. To do this, you need to be running the same (or newer) kernel version we're building, and you must be running the build as root.

The other change is that all the writing the build does is now done into a directory called "tmpdir". At the end of the build, tmpdir contains a tarball (tools.tar.bz2) with the new /tools directory it built. Everything else in there is a tempfile that can be deleted.

What I'd like to do is use a variant of the firmware linux technique to glue a squashfs onto the end of the UML instance, containing the /tools and the source code. That way, there's a self-contained build executable that you just run ./build and away it goes. Unfortunately, to get this to actually _work_, the init program in the initramfs needs to know A) the complete path to the UML executable so it can do the hostfs mount of that directory and loopback mount the executable to get at the squashfs, B) know the current directory so it can hostfs mount that and create a temporary file to put a loopback mountable ext2 filesystem into to have writeable space. (This could be in /tmp, but assuming that /tmp has 500 megs of free space is a much more questionable assumption that assuming the current directory does.)

So anyway, the new flow of control for the first half of the build is to run ./mktools.sh, which builds UML and uses it to run mktools-umlsetup.sh, which runs mktools-build.sh once it's got a sane environment to do so in. Then you grab tmpdir/tools.tar.bz2 and delete tmpdir.

That is, once I get the new build scripts and tarball uploaded. Still tweaking stuff...

Feb 19, 2005

Still sick. This cold just hangs on...

Okay, instead of using a ramfs to make / writeable in the UML instance, I've switched to making a half-gig loopback mountable ext2 image and using that. This means that /tools (I.E. the cross-compiled toolchain created by Linux from Scratch chapter 5) doesn't have to be a symlink, it can just be at the root level of the ext2 image, and everything under it is fully writeable space that the UML root user can chown and mknod and such into. Life is good.

I was worried for a bit that reiserfs-addicted distros like suse might not have mke2fs available, but I checked SuSE 9.2 and that had it, so I'm declaring victory and moving on.

Feb 15, 2005

Sick since the 9th. (*cough*) If this explanation makes even less sense than usual, well, my head isn't entirely clear.

So User Mode Linux is building (2.6.11-rc4 is fine out of the box and it looks like 2.6.11 final might actually be usable). I built it at the start of mktools.sh and tried to run the rest of mktools under it, ala:

linux rootfstype=hostfs rootflags=/ rw init=/path/to/myscript.sh

That uses the parent system's filesystem, but runs my script within the new UML kernel. First I tried out just running UML as root with init=/bin/sh, upgrading the kernel headers to the new versions I couldn't use under the old kernel (because they caused anything trying to use the resulting uclibc to segfault), and running the existing mktools.sh to build a system. That eventually worked, although on the first try the gcc build failed because it's such a memory hog that a system with 32 megabytes of ram and no swap (what UML allocates for itself by default) runs out of memory trying to build gcc. Adding mem=48M to the UML command line got me past that, and everything else went smoothly. (Adding the keyword "quiet" to the UML command line is nice, too.)

So then I tried to modify the mktools.sh to build and use UML automatically. Not only that, but to get everything to work as a normal user rather than as root. In theory, running UML as the normal user should let me use the simulated root user inside UML to do all the mount and chroot dirty work I need to do. In practice, filesystem permissions are still a bit of a problem.

UML still needs a filesystem with runnable binaries in it, so we're borrowing the parent system's filesystem via hostfs. (Yes I could supply my own binaries, but that's what the tools directory is, and this is the script to build the tools directory.) So even though we're root within the UML, the hostfs filesystem we're using only gives us the access permissions of the user the UML binary is running as. So when we try to meddle with the filesystem to chown files, or create or delete files where we don't have write access, it fails.

In theory, /tools doesn't need any files with nonstandard permissions or ownership, and once we've got /tools we can . In practice, some tools get very confused when root can't do stuff.

The first hurdle is that I couldn't create the /tools symlink in the hostfs. (The symlink has to be at the top of the filesystem so binutils and such can access libraries and the shared library loader at the right path before the chroot actually puts it at that location.) No problem: I whipped up a little script to mount a ramfs, --bind mount all the top level directories into that (and duplicate any top level symlinks), and then chroot into that directory. Voila, have the UML run the script and the root directory is now a writeable ramdisk. (Okay, not quite voila since the busybox mount bug on January 23 was uncovered by this, but I got it working eventually.)

The next problem is that when extracting a source tarball as root, gnu tar tries to chown all the files to their original owner. And this generates an error on hostfs, so after 8 gazillion error messages tar exits with an error code, bringing my build script to a halt. Grrr. It doesn't try to do this if it's not running as root, but if it thinks it's running as root and the filesystem doesn't agree, the gnu code makes a mess.

This is a small problem with a suprisingly thorny potential resolution set. There is a --same-owner option to the gnu thing that no other tar in the world recognizes, so as-is busybox tar would barf on that option. If I'm modifying busybox tar I can just hack up a quick patch to never call chown and build that right before building UML, although building anything in busybox causes it to compile the whole libbb directory because nobody's ever bothered to do accurate dependencies for that library, they just let the linker cherry-pick what it wants at the end. And who knows if any of the actual build or install phases will barf on a similar attempt to chown something?

I could try to call 'whoami' before UML is run, pass the username in, and have su call tar as that user, but that's pretty ugly too and I suspect that pam or shadow passwords (hostfs can't read /etc/shadow) or something similar might find a way to interfere with this.

I can also mount a filesystem that has full write access, and do the build in that. This can't be a ramfs because last I checked the high water mark of the mktools build is 370 megabytes, so I guess I should create a loopback mountable filesystem. The obvious candidate is ext2, but what if somebody runs this on SuSE, which has an unhealthy fascination with reiserfs? Can I assume mke2fs is available there? Should I build it from source before calling uml?

I hate having a cold.

Feb 4, 2005

Been a bit busy. Got engaged, had a birthday, adopted a feral kitten, lots of day job stuff, the Linucon hotel search... And of course my personal programming time has been mostly devoted to completely rewriting busybox mount. (Which is almost done, by the way.)

The major to-do areas for the weekend are:

That ought to be more than I can get done in one weekend. :)

January 23, 2005

Spent six hours tracking down a User Mode Linux bug that turned out to be a busybox bug.

I now have a working, more or less mainline version of UML (2.6.11-rc1-mm2). Build that with hostfs and go:

./vmlinux rw rootfstype=hostfs rootflags=/path/to/chroot/dir init=/bin/sh

(Still some idiosyncracies with the console permissions to work out, but I think I can work around that with an initial ramdisk...)

So I'm making a little script to run the build under User Mode Linux, and since part of the point of this is to avoid needing to run as root, this means that the hostfs I'm mounting when I switch into UML is effectively read only. Namely, creating the /tools symlink (from Linux From Scratch chapter 5) isn't something you can write into the top level of hostfs as a normal user. You may be root within UML, but the hostfs server is running as a normal user on the parent machine.

To get around this, I'm making a script that mounts a ramfs and bind mounts all the directories at the root level into the ramfs, does some other prep work (like mounting the UML proc, which has different info than the host proc), and then chroots into it. Viola, we have UML running with the host's binaries, but with the ability to write to the root device. We also did the equivalent of a chroot without needing to be root, and we can run binaries linked against the uClibc we build without segfaults, even if the kernel headers we build uClibc against are newer than the kernel we're running. The UML kernel is new enough to understand the syscalls the new kernel headers tell that uClibc to make, and it can translate it to standard libc calls.

The thing that didn't work was trying to do a bind mount within uml. "mkdir /tmp/walrus; mount -o bind /proc /tmp/walrus" worked fine on the parent system, and worked fine in a chroot, but it refused to work under the UML I built.

I spent six hours tracking this down. Learned a lot about UML and how the linux kernel in general handles syscalls along the way. (I love being able to stick printf calls in the kernel source to make it tell me what it's doing. Being able to compile and run the result without rebooting is really nice too.) Still, this is no replacement for the thing actually WORKING.

The result? It turned out to be a bug in busybox mount. Once I'd stuck a printf into execute_syscall_tt (in arch/um/kernel/tt/syscall_kern.c) to tell me the syscall number, looked up what each number meant in include/asm-i386/unistd.h, then looked up what function that connected to in arch/um/include/sysdep-i386/syscalls.h, and then tracked down the actual functions (mostly in places like fs/open.c and fs/stat.c, but some are in arch/um/kernel/syscall_kern.c)... Well, I now have a lot more appreciation for strace. (Too bad it doesn't seem to run under UML yet...)

Busybox wasn't actually making the "mount" syscall for my bind mount. For the proc mount, yes, but not for the second one. Instead, it opened /proc/filesystems, and once I figured out it works if I go "mount -t anything -o bind /proc /tmp/walrus", I knew what the problem was.

If I specify the type (which isn't _used_ when you do a bind or move mount), life is good. This is because I didn't select any of the block device backed filesystems when I was configuring UML. (No point, I'm only using hostfs and ramfs at the moment, and I want to see what the minimum feature set I can strip it down to is.)

The probelms is that when busybox mount is doing an "auto" mount (which it does when you haven't specified a filesystem type), it reads /etc/filesystems (an obsolete relic I don't have), and then /proc/filesystems to get the list of filesystem types supported by the kernel, and skips all the ones labeled "nodev". It tries the mount with every filesystem type it doesn't skip, and stops trying with the first success. That's how auto mount is implemented.

Since I don't have any non-nodev filesystems, it reaches the end of /proc/filesystems (and reports a failure) without ever having tried to do an actual mount. This is despite the fact that, with -o bind or -o move, the type is completely ignored and could be anything. Filesystem type "walrus" works just fine with a bind mount. But not if it never actually tries a mount...

I've been meaning to rewrite busybox mount.c for a long time, because the code could definitely be cleaned/tightened up. I now have an incentive, it seems.

January 17, 2005

Made some progress, wrote about it in my livejoural.

January 15, 2005

Spent a week of Copious Free Time (tm) fighting with UML. Results are mixed.

The "gcc couldn't find ld" thing turns out to be brain damage from the free software foundation: if "." is in the path (even at the very end, as the default $PATH of uml puts it), gcc decides it knows better than the sysadmin and ignores the entire path. Thank you, Free Software Foundation. That took a long time to find because I never expected them to be that arogant, or so stupid that if they had a problem with the $PATH they'd silently ignore it rather than giving an error message. (And you wonder why I'm trying to rip out all their stuff and replace it with something sane?)

Blaisorblade's 2.6.9-bb4 tree provides a fairly stable user mode linux, and that's apparently the trick to getting something usable. But it's not perfect: now that I know how to get gcc to work, the makefile is hanging after the ./configure stage. (Spins eating 100% of the CPU and never makes progress.) Maybe it's a disguised out of memory condition with the OOM killer not getting triggered. UML claims 32 megs for itself on boot, which _should_ be enough to build things but possibly isn't. I'll try again with mem=64M perhaps... Nope. Same hang.

Meanwhile, the UML guys insist their patches are getting merged into 2.6, and they are. But 2.6.10-rc1 doesn't build UML for me, with a stripped down .config disabling all the subsystems I think I can do without. The first error (duplicate definition) was easy to fix: pick one and yank it. The second (lots of other stuff undeclared in arch/um/kernel/sys_call-table.c) requires more work. Maybe I should grab the -bk top and see what that does...

Or I could just say "screw it, UML still ain't finished" and just go upgrade all the packages I still use out of Linux From Scratch to the 6.0 versions. Probably a better use of my time. Come back to UML when 2.6.11 comes out...

Meanwhile, busybox and uclibc just introduced some random defect tracking database that all changes are now supposed to go through. Since I'm never going to use it, I suppose I should just maintain a patch list here on my little project page and occasionally post a notification to the list when I make some new change to busybox. Hmmm...

January 8, 2005

So I'm still fighting with User Mode Linux, which I need to build a version of uclibc with more recent kernel headers than the build environment the system is being compiled under. (Right now, firmware linux won't build on anything older than a 2.6.6 kernel, which is just wrong. But applications linked against the uclibc it builds have to run on the parent kernel. With user mode linux, this would not be the case...)

I just joined the UML mailing list, which took a while to find. (I found it because I know how sourceforge works and where it keeps its mailing lists. Later, I found that way down the list of links on the user-mode-linux.sf.net web page is a contacts link, which links to the mailing lists. Right.)

I posted an info dump about the problems I've been having with UML, and hopefully somebody will say something useful about it soon.

By the way, the UML build has a menuconfig option for a path to a directory containing the root partition to cpio up an initramfs out of. But the normal x86 build does not seem to have this option. Luckily, I know how to use cpio to make one. (Well, a small one worked anyway. The knoppix initrd hasn't yet, but the night is young...) Unluckily, busybox doesnt' support cpio so there's another to-do item while I grab the conventional (gnu?) cpio package and throw it in the build list...

Getting close to the actual one-file-system here. All the pieces work, now it's just putting them together...

January 5, 2005

Squashfs!

This is the filesystem I want to use. Its compression is comparable with cloop (slightly larger on the knoppix root partition in my tests, but much better than zisofs), without the speed penalty and need for module arguments to specify a filename.

Yes, it's an out-of-tree filesystem, but it does what I want, which none of the in-tree filesystems quite manage. It's not like my build process isn't applying numerous patches to various things already... :)

December 21, 2004

I have a cold. I hate having a cold.

Over the past few days I've surveyed various filesystems to use in the firmware. I want an "archiver" type filesystem, where instead of making a block device of a given length, loopback mounting it, and filling it up with files, I instead point a command at a directory and have it make some kind of archive that can be mounted as a filesystem.

The first of these is romfs, but it doesn't compress, doesn't do the full range of ownership and file permissions, and uses a simple linked list approach that doesn't seem designed to scale to large deeply nested filesystems.

The compressed rom fs is "cramfs", which would be ideal except that it's old. The biggest file it can handle is 16 megabytes, and the maximum starting offset of a file in the filesystem is 256 megabytes. (Plus it has 8 bit gids and 16 bit uids, and various other limitations.) I should offer it as an option, but it's not something I could use to repackage knoppix, for example.

Last time I did something like this, I used the transparent compression cd-rom extensions (zisofs). That still works, and scales up nicely, but it's not that efficient. A quick prototype converting the knoppix cloop image to zisofs bloated it from 725 megabytes to almost a gigabyte.

Knoppix uses cloop, which has a number of problems. The performance sucks (a design issue more than an implementation issue), you have to load it as a module because module arguments tell it which filename to operate on (ouch), and not only am I building a static kernel but I don't know if that would be compatible with losetup -o anyway. Plus it's not integrated into 2.6.9, and apparently the reason it's not integrated is the code is really ugly.

Still, in terms of absolute size cloop is smallest. It's the old tar/gzip vs zip file issue again: compressing each individual file is less efficient than compressing a group of files all at once. That's also where the performance problems come in, of course. (And why you can't loopback mount a gzipped tarball, although you theoretically could do so with a zip file. Not that I have time to write my own filesystem driver to be able to mount a zip file; I've hoped somebody would for years, but no. And no, I'm not playing with userspace filesystem drivers either.)

JFFS2 has compression support, but it's not an archiver style filesystem: I'd have to guess at what size block device I need. I could also track down the old ext2 compression extensions, but again: I want something that's already in the kernel and works like an archiver...

So it looks like zisofs does what I want. I should see if cramfs produces noticeably smaller archives for small filesystems, and give the option to use that if so. In actual embedded devices with a few megabytes of firmware, cramfs would be nice. But for boot-from-cd (or live alongside windows) style desktop distributions, it could be very limiting...

Decmeber 19, 2004

New job definitely impacting my open source hacking time. Oh well, if I can do this for a few years I can retire...

Merged build.sh and make-chroot.sh so that the second is now a "here document" in the first. It was also kind of ugly that I was using a modified busybox tarball, so I upgraded to a new/clean busybox cvs snapshot and am now applying my "sort" patch against that. (Everything else, up to and including the loop.c fix, is checked into CVS by now...)

The gcc build is dumping out lots of errors about unknown usage of busybox "tail". Yet another to-do item...

But first: Whip up a knoppix image using the combined kernel/root image trick, both as a proof of concept and something to run on my laptop. (My laptop is still running a knoppix image I pried off the CD and nailed to the hard drive. Knoppix is more or less the minimal distro that has both a build environment I can compile stuff with and all the the desktop goodies I want: web browser, email, openoffice, xmms, and so on. Eventually I'd like to get Firmware Linux to replace it, but it doesn't even have X11 yet.)

December 11, 2004

Puttered around making a sources.txt file listing all the websites each package comes from, both to document where they all come from out on the web and so I can make a script that will download all the sources a user needs so I don't have to put up big tarballs.

Right now, every small tweak that changes one file in the source collection means I put up a new 100 megabyte tarball, and keeping a half-dozen old versions around would eat up a disproportionate amount of space. Especially considering that 95% of each tarball would be the same stuff. I wouldn't mind having a CGI on my website that creates each tarball on the fly, but A) I've got to move the site off of Eric Raymond's DSL line to my cable modem before I really try to attract too much attention to the thing, B) it would be nice if people know where to go to get package upgrades anyway.

Took out perl, which I'm not building at present. Aso moved groff, man, and man-pages out to the todo pile, hopefully to be replaced with doclifter and a tiny man shell script that calls lynx. (Gotta get python working first to run doclifter, though. At least as a build tool.)

I suppose I should come up with some way to put the rest of the development tools in /tools, so you can have a finished system with just the bare essentials. New to-do item, as if I had a shortage...

Speaking of which, finally put this log up on the firmware linux web page. Go me.

December 9, 2004

Added a new feature to lilo, a length option for kernel images to tell it the length (in bytes) of a file. (Goes with the other options like append=, feed it a decimal number.) That way, I can use an initramfs, append a zisofs image to the end of the kernel, and viola: whole os in one file.

Send the patch off to the lilo maintainer, who said he'd "take it under advisement". He's not sure how much demand there is for it, but oh well. It was three lines...

Now I need to get initramfs working (which means port cpio creation to busybox), get a zisofs creation step going, actually whip up some init scripts, work out some kind of install mechanism... And of course get X11 building so it's got a desktop.

December 7, 2004

Upgraded the version on the website to the new no-coreutils version. Made a note at the top of the firmware linux main page, "I upgraded busybox sort so I could finally dump the last of coreutils. Go me." Decided I should do a more regular log. (Shamelessly using vi, like I did on my old flash.net webpage back in 1997, before the word "blog" was invented.)

Still not integrated yet. Back on the 3rd I asked Eric Andersen if I could check in the new sort and he said he wants to convert the busybox cvs repository to subversion and fork off a new development branch first, which could take a while. Oh well.

December 6, 2004

Actually built firmware linux with the new sort. (There was a thinko, of course. All my tests supplied input from stdin, actually feeding it a file led to an infinite loop. Fixed, with patch sent to the busybox list...)

Along the way did more work on sort.c. I cleaned up the -n support so it has an integer implementation if you don't declare SORT_BIG. I also took the seperate GNU extensions selector away and just made it all part of SORT_BIG, and moved a lot of other stuff under SORT_BIG so that the tiny version doesn't support -o and such. Actually added a menuconfig entry for it too, and posted it all to the list.

But the point is, coreutils is _toast_. And there was much rejoicing. Yes, I'll need to implement comm when I want to build perl. I want a busybox diff too, so I can dump diffutils. (diff -u, anyway.) Sort has some "read files into memory, line by line" functionality that I should generecize to put into libbb, eventually.

November 27, 2004

Posted my new sort.c to the busybox list today. I started that months and months ago, before Linucon started eating up my time and put firmware linux on hold. This one should be SuSv3 compliant, with a few GNU extensions. (I need to make more tests before I can really say whether or not it IS SuSv3 compliant. And the real test, of course, would be building gcc and binutils and the rest of firmware linux with it...)