Dec 30, 2005

Insanely busy this week, but finally I get to go collapse for a bit.

I haven't looked at Firmware Linux or busybox all this week. No time.

Dec 22, 2005

It's christmas, I'm in Pennsylvania, and my Copious Free Time for most of this month has gone towards the busybox 1.1.0 release. Don't expect much before my return to Austin on January 3...

I now have about 5 people interested in FWL, and will probably start a mailing list after I get settled in Pittsburgh. Possibly late February.

Dec 18, 2005

I got the Pittsburgh job, and will be moving sometime in January.

So: fixing busybox mount so "mount -o remount /dir" works. (Not specifying a block device.) Doing it right involves getting the flags from /etc/mtab (or /proc/mounts) rather than /etc/fstab (I.E. reuse the _current_ flags as the base, not the default flags). And that's ok because the format's the same, so the change to the parsing code is fairly minor.

But what if somebody does "mount -a -o remount"? Hmmm...

Dec 16, 2005

Ok, design time.

The /tools build is happy and feeding into stage 2, which is happily building a squashfs image the firmware squashfs image build is happy, and it's building an installer squashfs. The packing step (2.3) isn't making

Dec 14, 2005

So I've have a cold, of the "feeling very blah" variety more than the specific symptoms variety. Most likely to due travel on friday. It happens. Good time to close out busybox bugs that have already been fixed.

I'm beating on mount, getting "mount -o remount /path" to work. It's possible that suppling /dev/dev0 instead of /path might work too, haven't worked through the logic yet. If remount of /path/to/file works (due to the automatic translation to /dev/loop? re-using any existing binding to the same) I shall laugh hysterically, or possibly maniacally. And no matter how I try to spell maniacally it looks wrong right now.

I have a cold.

Dec 12, 2005

I really, really, really should be spending the evening working on the bug triage list. So what do I do? Clean up the new mdev.c Frank Sorenson posted to the busybox list, of course. (It's sadly misguided, but in a "don't ask questions, post errors" sort of way that sort of _compels_ you to go "no, you do it like this...")

You know I've been banging on C code for too long when instead of putting a close-paragraph tag on the end of the last paragraph, I put */

Right.

Dec 11, 2005

Mostly relaxing, somewhat working on the busybox 1.1.0-rc2 bug triage list.

Need to properly export my snapshots directory so the truly bored can download my latest stopping point between releases...

Dec 8, 2005

I'm in a plane heading from Atlanta to Pittsburgh with 170 mile an hour tailwind that the first officer says puts us at 9/10 the speed of sound relative to the ground. Despite this, we're getting in an hour late. It's 27 degrees and snowing in Pittsburgh. If the job wasn't _really_cool_...

Triaged the busybox bug list last night, and 49 of them fall in the "todo for 1.1" category. I'd hoped to bang on them during the plane flight, but queuing up that many separate changes to go into SVN, without net access, would suck. (It's more screwing up my repository that's bad. Yes, I could back it up, but what's there now is a mess and I want to clean it up and revert to a fresh checkout, not clone the mess.) It's times like this that learning git seems almost worth the pain.

Catching up on the linux-kernel mailig list instead. While the battery holds out...

Dec 7, 2005

Haven't done a darn thing on FWL yet today. Too many consecutive days of racing for a deadline and trying not to face issues that would cause a delay if properly dealt with have left a bit of a mess I'm unlikely to have cleaned up before I have to get on a plane tomorrow.

Besides, it's all rainy and cold out. Hard to feel inspired while it's all rainy and cold...

Dec 6, 2005

Okay.

Finally got all the fallout from the reorganization fit back together, and now I'm faced with the design problem. It's now building the components of an installer, and it's putting them in a squashfs. I could follow what Firmware Linux is doing and bolt the squashfs on and do the initramfs dance to bring it up (that code's all already written), but there's just not enough _there_. The initramfs code is bigger than the lilo I need to invoke, I might as well just put it all in the initramfs. The problem is, right now the build's just making a squashfs and the host kernel can't extract that. (Simple solution: have the build make both tarball and squahfs each time. Better for debugging anyway.)

Dec 5, 2005

Obviously didn't get 0.9 shipped yesteday.

Building the installer. It's being persnickety, of course. There are several times where I should have just duct-taped something and come back and fixed it properly later. Instead I'm refactoring things to do it right and breaking other stuff. Just moved the config files out of system and into their own configs directory, which broke a dozen things of which I've fixed maybe half...

The usual. The lack of somebody actually breathing down my neck on this means I'm not doing the last minute duct-taping to get it out on time. Well, not enough.

Dec 4, 2005

Ok, I want to ship 0.9 today. I need to get the installer working. The problem I left off at last night is that /tools builds binaries linked against /bin/uclibc and I didn't put one in the system because I want to make a statically linked lilo and busybox. For some reason this caused gcc to misbehave and believe it was building a debug version of lilo. (Not that getting gcc 4 to misbehave is hard to do.)

A deeper problem is that to build lilo, I first have to build as86 and bin86, and it's a bit messy to have those on the final system. I suppose I could always append them to the /tools build, but only a bootloader needs a 16-bit assembler or linker anymore. They're pretty lilo-specific. Logically they go in extra build tools if you want them in the final image, but I need to build them elsewhere in order to _use_ them to build lilo, which is needed for the installer for even the most minimal images. Hence the refactoring binge over the last two days.

I suppose I could build and then cherry-pick. Or just let the installer bloat a bit since it doesn't go on the final system anyway.

Coffee shop time!

(Much later...) Still grinding away at the build. The refactoring made the current directory relevant: scripts cd into tmpdir and then cd back out again, and there are places I call a script from something that's already in tmpdir. Need to make sure DIR is globally set to the build directory and cd "$DIR/tmpdir" instead.

I also need to make my snapshot script scp this log up to the website as well. I often update this thing 2 or 3 times a day, but only remember to sync it to the website a couple times a week...

Wasting too much time with the restart stuff. Lots of stuff needs to be decoupled to make it work properly. (For example, it would be nice to grab tools.img, zap tmpdir, throw tools back in there, and restart the build. Except that it won't re-build UML if I do that, because that's 0.2- and 0- is guarded by tools.img in ./build.sh...)

Metro's network went down an hour ago. Not the stupid ipw2200 bug in Ubuntu "Horny Hedgehog" that keeps blinding it to the access point I've been using. No, I'm connected to the access point just fine, I just can't dhcp or route packets through it. (About five other people I've talked to in the coffee shop can't either. I bugged the lady at the front desk and her idea of power cycling the box was to unplug and plug the phone cord to the DSL. Madame: I can't ping the box, and it won't give me a dhcp address for the masqueraded size of things. I don't think that's the side you need to worry about...) I should probably wander home, but the big long /tools build is going again. (Currently building make, which means it's got another binutils and gcc build to go. Building gcc takes a while. It's a pig.)

I need to make a better progress indicator. I already have the "===" anchors for each major build section. I can do a full build with the output going to a log, and then have a script count how many lines get printed out between each ===, and work out a bar graph progress indicator that gets updated with each line printed based on that many ticks. (Plus a global bar graph for the total number of sections.) Then it's a simple matter of piping the output of the build into the thing displaying this progress meter. Global progress, current section (with previous and next sections shown too), bar graph for current section, and about the last 3 lines of raw output from the build itself.

Not hard to do, but I'd probably want to do it in python or something and that's not in the build. (I still want this sucker rebuilding under itself, remember? It's been a while since I've tested it and I'm sure something's broken for 0.9, but 1.0 will rebuild under itself and although I'm fine adding python as an application, I'm _not_ adding it as a build dependency. Don't even bring up perl, it's evil.) I suppose I could do it in C. Seems a bit silly, but not actually difficult.

I remember fiddling with curses a few years ago, thinking about something like this. Curses sucks pretty badly too. Luckily, everything these days understands good old ANSI escape sequences from the days of DOS, and that's all that I need.(Time passes...)

I debug the _strangest_errors_. The _way_ lilo barfs when there's no /lib directory (yet the compiler is creating binaries linked against /lib/ld-uclibc.so.0 and /lib/libc.so.0) is that it builds a binary called "version", attempts to run it, this fails, the build doesn't notice (because its output is piped somewhere and in cases like that you get the error code from the _right_ half of the pipe), and then the build happily continues thinking it's building a debug version of lilo (minor version code >50, don't ask me where it got that from), and _that_ triggers a build break because line 609 of lilo.c is debug code that uses an indentifier called semi that is #defined to ";" in common.h but only under #ifdef __MSDOS__. Fun, eh? The lilo debug code only compiles under dos. So I waste time boggling at what broke, which gives no indication of what's actually wrong.

At times like this where you've put together an environment from scratch and things break, you have to figure out what's wrong from first principles. The symptoms you get can just be _strange_, because the firery explosion can occur after miles of skidding and swerving, and you have to track it back to where the nail lying in the road is. You can study the fireball and the wreckage all you like, it won't give you a _clue_ aobut what actually went wrong. And getting distracted by "the steering didn't work right, let's check the hydraulics"... That can eat a day.

Dec 3, 2005

It'd be nice of the -f option to ln actually worked. But no. If the destination is a symlink to a _directory_, it happily traverses it and drops a redundant symlink _in_ the directory. It seems -f will only overwrite the destination if it's a file, not if it's another symlink.

Of course what I want is something like mkdir -p, where it's happy if the thing is already there and doesn't fail with an error. Instead I have to hand-code a test for each one, which makes something like "ln -s share/{man,doc,info} usr" a little tricky to do all at once. (No, that's not ambiguous since the usr target is a real directory and not a symlink.)

(What annoys me about the gnu tools isn't that they have 8 gazillion options but that they can have so many strange and useless options yet not provide the ones I need to do what I'm doing. I don't mind sparse, I mind pointless. Then of course they situations where there's two ways to do the same thing (like [ -h blah ] and [ -L blah ]), or cases where something fairly straightforward (like tar --exclude) can _only_ be accessed via a long option...)

After the reorganization to split 2.2 into 2.2.[0-5], I'm now making the build more robust about rebuilds. If you ctrl-C in the middle of it and resume, it should be better about rerunning only the sections you need. This is of course lightly tested just now. I'm trying to err on the side of "do it over again" rather than "build break", but I'm not holding up the 0.9 release for this.

(Time passes...)

Remind me to break out an installer.sqf file. Right now lilo is forced into the build (non-optional) because I can't make the hard drive image without it. But really what I should do is build a dedicated installer with yet another stripped down busybox statically linked uClibc, and run lilo out of a ramdisk or some such.

Darn it, I'm on a deadline here, and I'm redesigning the build...

Dec 2, 2005

I hate this kind of bug. If I switch on static linking in the busybox build and build under the ./firmware-uml image, it _hangs_ right at the end. Tried it three times, left it hung for an hour...

The reason I hate this kind of bug is it could be a busybox bug, a uClibc bug, a gcc 4.0 bug, or a User Mode Linux bug. At the moment, I'm leaning towards gcc bug because when I cut and paste the final link command and ran it by itself, it did link properly but I got this:

/mnt/home/landley/newbuild/firmware-build/sources/packages/busybox/coreutils/coreutils.a(nohup.o): In function `close_stdout':
nohup.c:(.text+0x6a): warning:

That's it. There's no more to that warning. I'm guessing it's finding something to warn about, but the attempt to warn is apparently following a wild pointer? Weird. (Somebody suggested I should join the gcc mailing list today. Whole lotta "I'm just not going there" goin' on. I'm already on lists for busybox, uclibc, linux-kernel, uml-devl and uml-user, and what all those have in common is I'm willing to poke around in the source code if necessary. I poked at the gcc source code extensively at Rutgers, and it was really, really evil. In the decade since, the source and binaries have gotten much much bigger, and its memory requirements have increased more than ten fold. Not positive developments, really...)

Anyway, I got the busybox binary I needed out of it. One with the ls command, and you have _no_ idea how much you miss that until you have to do without it. You can do "echo *" but that's not quite the same: it won't distinguish directories from files or symlinks, won't tell you how big a file is, where a symlink points, the major and minor numbers on a block device...

The actual problem, by the way, was that I had the major and minor numbers swapped around in the device node I was creating. That'll do it. Time for lunch...

Oh yeah, I fly up to Pittsburgh on thursday. Just a 24 hour stay, but it gives me a good deadline, so I'm trying to get a Firmware release out this weekend and a busybox-1.1.0-pre2 out wednesday.

Helps to configure squashfs into the bootable kernel as well as the UML version. (Applying the patch is not the same thing as selecting it in config. Right.) NOW go to lunch.

(Fiddling around with laptop at lunch...)

It booted!

I now have a script that is creating a hda.img that qemu can boot from. It generates 8 gazillion warnings just gives you a command prompt without even bothering to populate /dev, but the initramfs loopback mounted the squashfs correctly and the command prompt is coming up as pid 1 in the correct filesystem.

This would be a -pre release if I did prereleases. Ok, what's my remaining todo list. Integrate mkhda.sh into stage 2.3 (with the quiet option), make sure the initramfs-busybox file the build is making contains everything that's needed, update the documentation, fix the init scripts to set the PATH and populate /dev...

Nope, not quite that simple. Gotta make an installer. The parent is unlikely to have my patched version of lilo, and the one I built is in a squashfs image. So, make a UML, cherry-pick the relevant files out of the squashfs, throw it all in an initramfs, and make an installer. Good semantics for the installer would be would be "firmware-install blockdev image" but can showhorn UML into giving that without worrying about strange collisions (what if they call the image "quiet", or "mem"? Hmmm...)

I don't want to require anything be in the final squashfs image, either. Hmmm... I could just build lilo again, except that needs bin86 and as86, which is a bit of a pain. Looks like I should split that off from 2.2. It should build as long as there's /tools, so... Hmmm...

Right, the obvious thing to do is split 2.2 into a half-dozen 2.2.x scripts. Then I can build the build.sqf image and an install.sqf image with a different subset of stuff...

Dec 1, 2005

So I upgrade to 2.6.15-rc3 and what happenes? 2.6.15-rc4 the next day, due to some show stopper but that I haven't seen. (The build made it to the end, haven't gotten to do much other poking.)

The darn busybox mount error message bug is reproducing!

$ mount -t proc /proc /proc
mount: Mounting  on  failed: Device or resource busy

Now if it just continues to reproduce as I poke at it...

(Time passes...)

Got distracted by other things. Spent a rather frustrating hour trying to swap in a "make allyesconfig" busybox into initramfs so I can do better debugging of the init scripts, and having it tell me that no init was found (which could mean that the file's not there, the filename is wrong, the permissions are wrong, or it can't find any of a number of shared libraries. Thinking about the latter I noticed that allyesconfig busybox links to libm and libcrypto, and set about eliminating the applets that caused those dependencies (which took a bit of finding, and there should be an easier way to do this in the configurator). But eliminating those didn't help...

Finally I copied the new one next to the old one, booted up, poked around a bit, realised that there _are_ no shared libraries in my initramfs. The old one was statically linked. (This isn't an "Oh, I knew that" moment, this is an "I did that, and then forgot about it".) The things I lose an hour of work to...

I told somebody on the uClibc list I'd have a release out sometime this weekend. Time to buckle down and get the hard drive image booting, make some init scripts, integrate at least the script version of mdev...

Nov 29, 2005

This is all immensely, _deeply_ silly. If I started spouting Star Trek technobabble (the heisenberg compensators are interfering with the inertial drift dampners, and the ship's only toilet has backed up in the warp drive) it would probably be less silly than what I write here every day or two. How can anybody possibly _care_ about my weird little hobby? If I built stuff out of popsickle sticks I'd at least be able to photograph it.

Right.

It turns out libgcc_s was pointing into /tools after the build, which prevented me from compiling anything. My fault, leftover crud from Linux From Scratch that doesn't apply to what I'm doing now. Twiddled the /lib part of the build into submission (the symlinks uClibc installs can get a bit weird, but at least they're the right ones now).

Now that I can compile stuff under ./firmware-uml again (after populating /dev and mounting something writeable over /tmp), I've reproduced the losetup problem in a stripped down busybox with just losetup in it that I can repeatedly recompile from source. Now to find out what's wrong with it.

But first, lunch...

(Time passes...)

So, losetup is now beaten into shape ("losetup /dev/loop1 file" and "losetup -d" both work properly now, plus I added "losetup /dev/loop1" to tell what it's currently bound to, plus decent error messages for all of the above when they _don't_ work). For example:

losetup /dev/loop0
/dev/loop0: 941692 /mnt/home/landley/newbuild/firmware-build/firmware-uml

The number is the offset into the file it's mounted at (-o). I should go improve the documentation... Done.

The "mount" error message problem is, of course, refusing to reproduce for me right now. Hmmm... When I try to mount an invalid file.img (fresh copy from /dev/zero, unformatted) the error message I'm getting says failed to mount /dev/loop2 on sub, which is technically correct but doesn't match the command line I gave it. Hmmm...

Nov 28, 2005

Sigh, busybox losetup seems to be broken again. (Darn it, this was working. I tested it. Now what broke?)

One of the main reasons I bang on my Firmware build so much is that it really gives Busybox a workout. I find stuff wrong with it all the time. (And when I'm not finding stuff wrong with busybox, I'm finding stuff wrong with busybox linked against the most recent release of uClibc. Yes, lots of stuff in busybox that works fine linked against glibc doesn't under uclibc. Probably what broke losetup was linking it against uClibc; I tested under glibc. Or possibly gcc 4.0.2 is miscompiling something, that happens too...)

Mostly all these little fixit notes are accumulating in the todo list of doom at the moment (158 lines and counting), but I intend to swing round and focus on that after I get Firmware Linux 0.9 and busybox 1.1 out, and try to catch up a bit.

Right now, I'm making fairly good progress on the partitioned image creation script. Not only is a partitioned image file something I can test boot under QEMU, but I can trivially make a bootable CD out of it. The el-torito format doesn't just take floppy images, it'll take hard drive images too. But they have to be partitioned...) And there's my installer I talked about.

So I'm running the script under a UML instance (so I can loopback mount the partition I create to copy the firmware image into it and run lilo against the hard drive image). The difference between today and yesterday is that today I'm running it with hostfs and borrowing the host kernel's tools, which means I'm not using the busybox versions of fdisk, mke2fs, and losetup. Yesterday I was trying to put together this script under the firmware-uml instance I built, and it was _not_ happy. (See busybox losetup seems broken, above.) Also, the busybox version of fdisk.c is a mess, but at least it's a usable mess.

Currently bugs like this probably prevent it from rebuilding itself under itself. There's nothing fundamentally wrong; there can't be since stage 2 builds under /tools, and stage 1 builds /tools with nothing but busybox, uclibc, gcc, binutils, make, and bash. (And bbsh will zap bash there eventually.) It's already a system built under a busybox and uclibc based toolchain, if anything particularly important goes wrong the build can't complete at all. But the packing step is run on the host system, and UML is run on the host system. There's 8 gazillion little details like "is /dev/shm a tmpfs mount so you can run UML" that have to be right to actually get the whole build to work from executing "./build.sh" through to final image creation, and things like the packaging steps that usually run against the host system's tools haven't been tried under a Firmware partition for a while.

In fact right now when I run ./firmware-uml it isn't even bothering to populate /dev on the way up, just jumping straight to /bin/sh with as little initialization as I can get away with. I need to replace /bin/sh with an actual init, and make some init scripts for the thing. (The /dev populating should be done with mdev in busybox. I could do a quick hack using the /dev populating script I use to bring up stage 2. But that's not the proper fix, mdev is.)

So I'm starting to wander from "building the system" to configuring the final system's init scripts and fstab and such. Getting closer...

Nov 27, 2005

Eventually beat lilo into submission (with a big disk= block manually specifying _everything_, including ubda1's relationship to ubda). Now trying to get the early boot stuff happy. (The -boot version needs a few busybox applets that the uml one doesn't, including echo, cat, mknod, and sed.)

Wandered through the "unwrapped" build again, and spent a confused half hour tracking down why /dev/shm wasn't working. (Because the 2.6.10 kernel actually honors the size=0 thing I put on /dev for security reasons. Ah. 2.6.15 doesn't anymore, so I hadn't noticed that was still there.)

The thing is, it's nice to have all writeable mounts on the Firmware system mounted noexec and nodev, so even if they crack root they have to jump through hoops to actually drop a rootkit on the system that they can run. (You can't stop somebody who's cracked root, but you can annoy the hell out of them.) Years ago on 2.4 I had a with a static /dev on zisofs, and this was easy. But now /dev is tmpfs, which is writeable to root, and obviously it can't be nodev, but I _can_ make it so that it has 0 free space. Except that in order to run UML on the new system, /dev/shm has to be writeable and executable. Argh.

Nov 26, 2005

So I'm trying to inflict lilo upon a User Mode Linux block device, and it's saying that ubda is not a device with partitions. Grrr. I mounted ubda1 and put the appropriate kernel image there, synthesized at least a first guess at a lilo config file, but lilo just ain't happy here...

I could always tear lilo apart and duct-tape the appropriate bits of it into the disk image via dd. (I need the boot sector, the menu loader, and a sector list for the kernel image. This should not be brain surgery.) But that would massively suck.

Time for a dredge through the source code to see if there's an easy way to lie to the sucker and convince it "look, it's just like hda!"...< Oh yeah, explicitly specifying the disk geometry. Well, maybe it'll help... Nope. I was afraid of that. It doesn't know that ubda1 and ubda are connected... Ah, it's got a partition option... Yes indeed, lilo allows you to lie to it quite extensively through the config file. And there was much rejoicing...

"Could not umount /dev/udba1: Invalid argument." That can't be a good sign. So is busybox umount broken, or is it a UML problem? Grrr. Right, to-do item...

"Unrecognized token "length" at or above line 17 in file '/dev/lilo.conf'"? Sigh, what happened to my patch? Rummage, rummage... Ah. I switched the patches from .bz2 files to uncompressed (easier to flip through and edit), and didn't edit the line that applies that one, and of course in a "bzcat blah.bz2 | patch -p1 &&" pipeline, the && is on the _second_ command which completed without error because it got no input. Of course.

Much breakage, much breakage...

Nov 24, 2005

Burp. Boston Chicken does a remarkably good thanksgiving lunch when you don't feel like extensive cooking and aren't off visiting relatives who do.

Found $HOSTTYPE yesterday, which looked useful for making an architecture independent build. (Still playing with the x86-64 server.) Unfortunately, in stage 2 of my build it was set to i686 instead of i386. Fortunately, digging through the bash source code to see where it got that info (it's a hard-wired option that ./configure gets from config.guess) I found out it ultimately comes from the kernel via `uname -m` and the uname() syscall. Unfortunately, that's what's returning i686.

Right, switch my script to using uname -m, but throw in a test for i686 and substitute i386 instead. I should probably genericize that to i?86, and I have no idea what other platforms have special cases, but at least it's getting x86_64 right. (I was worried about the "_" vs "-" thing.)

On the kernel list, I can't figure out of Roman Zippel doesn't understand what I'm trying to do with miniconfig, or simply doesn't care, but in the course of discussion his disgust at somebody else touching his .c files did point out that a ./configure wrapper script wouldn't be a bad way to implement miniconfig. If ./configure by itself looks for mini.conf (with a helpful error message when it can't find it), and ./configure /path/to/file is the way to specify an alternate name, and ./configure -s (or --shrink) does the miniconfig.sh stuff, and ./configure -? (or --help) dumps the documentation... The whole thing can be one file. And it doesn't even have to depend on the recent changes to allnoconfig either, it can just be a sed invocation and then it'll work with busybox and uClibc, too. (Although I'm not quite sure about the performance of the shrinker script, but I can zap blank and comment lines to get back some of that.)

I figured out why User Mode Linux has punished my laptop so much: it assumes that /tmp is mounted tmpfs, and this is not the case in Ubuntu "Horny Hedgehog". So all the pages that UML dirties get scheduled for writeout, and thus when it's doing the build the disk light is constantly on (and anything else that wants to talk to the disk just has to wait, especially if like vi or kmail it does an fsync at irregular intervals to save the file you're typing on and blocks in the middle of typing for ten or more seconds at a time).

On ubuntu there is a tmpfs mount, it's /dev/shm. And setting the environment variable TMPDIR=/dev/shm does indeed make life much nicer while running UML. I added a check to runuml.sh to see if /dev/shm exists and if so set TMPDIR to point to it.

The reason this wasn't noticed earlier is that 2.4 had an optimization to not schedule writeouts of dirty pages for files that have been deleted, but only do so to free up pages under memory pressure. UML creates a temp file, mmaps it, and then deletes it, so on 2.4 it doesn't matter if you're using tmpfs or not. But on 2.6, this optimization got yanked. (And I can sort of see why since "has been deleted" is somewhat nebulous in a world with multiple hard links to files and where open filehandles also count as a link to the file, and real deletion of the file only happens when there are no more references. This optimization needed to distinguish between directory references and open filehandles, for a relatively obscure case that was basically used to simulate shared memory before SysV shared memory was invented.)

I also brought up the possibility of using SysV shared memory to the UML guys, and they collectively went "ick". Can't say I blame them.

A quick check of the systems I can currently remember having access to (the x86-64 PLD box, a gentoo box, a Fedora Core 4 box, whatever sourceforge is running, and my laptop) shows that _none_ of them have tmpfs on /tmp, and all but the PLD box have it on /dev/shm, world writeable with the sticky bit set. So /dev/shm sounds like a much better default for UML than /tmp, actually...

Nov 23, 2005

User Mode Linux is working on x86 again, thanks to a patch from Jeff Dike. Ok, time to reassemble this mess...

So of coure the immediate question is, "Why isn't most of the contents of init/Kconfig showing up for ARCH=um"? (It's a totally unrelated question, but it's the tangent I'm going down at present...) Ah, it is. Just somewhere other than I was expecting it. Ok...

And arch/um/Kconfig needs a minor patch to make a CIFS dependency shut up. (Make miniconfig exits with an error if there was a config error, and my script exits on errors...)Unified the stage 2.3 build to create both the -uml and the -boot firmware from the same script, in a for loop. Finding the resulting image needs a case-dependent if statement (uml doesn't produce a bzImage), but it should make keeping them in sync noticeably easier.

Nov 20, 2005

My fiance, Fade, has consented to have Linux installed on her laptop. (The windows that was on it got taken out by viruses and spyware and she had to backup and reinstall anyway, and she was annoyed enough at the spyware to give Linux a try.) I installed Ubuntu on the theory that they're at least trying to attract desktop users, which Red Hat and SuSE stopped doing a while ago

The verdict so far is intense hate. (She whispers over my shoulder "Intense, very intense hate.") I'm having her make a list of everything that annoys her in a text file, to email to me at some point in the future.

The verdict so far seems to be that Ubuntu is unsalvageable. Part of it is Gnome, which I don't use. I use KDE, but Kbuntu doesn't seem to have nearly as many packages available in its installer as Ubuntu does. (For example, where's the GUI tool for kbuntu to list wifi access points and select which one to associate with? The gnome side has one. Ok, it _sucks_, but it 's there...)

Might try Slackware next. The upside of the laptop being newly formatted and reinstalled is it's easy to wipe it and try again. This may happen more than once...

Nov 19, 2005

If I'd known to do "man sfdisk" I could have saved some time yesterday. Why that man page didn't come up on my google search is an open question (other than man pages being only slightly less obsolete a data format than info pages, which are only slightly _more_ obsolete than gopher).

So I like the new mini.conf feature of 2.6.15-rc1, and have already switched the build scripts over to use it. But I can't get the sucker to build a working UML. I've used this is an opportunity to bang on the unwrapped build, but that's working now and it's getting to the end where it needs to build the firmware-uml image. Meaning it needs to build a working UML.

Hmmm. I banged on x86-64 for several days and got that working without realising that straight -rc1 doesn't work on my x86 laptop. (It compiles, and it boots up to a PID 1 shell prompt, and you can even type something at the prompt. But as soon as you hit enter and it tries to fork: segfault.) About half the changes to fix x86-64 were to the assembly syscall interface (stub_segv_handler and stub_clone_handler), and as far as I can tell that's also the problem area on x86. So in theory another round of the same kind of debugging effort might fix it.

However, there are now enough patches in flight (in Jeff's custody or funneling through the -mm tree straight from Blaisorblade) that trying put together the current cutting edge -UML tree involves a significant amount of guesswork and scraping stuff off two different mailing lists. I can just designate Jeff's tarball of patches against 2.6.15-rc1 as the thing to test and try to get that working, except that's a mess at the moment too (to get it to compile at all you have to "find . -size 0b | xargs rm", and _then_ the build breaks in the linker stage). It's a nicely broken out series and I could iterate through and see exactly which patches are breaking the build (and I should). But I'm _hoping_ that enough of the pending patches land in 2.6.15-rc2 for it to go back to working out of the box. I should wait for -rc2 to clear the air, and then debug whatever's left...

So for the moment I'm commenting out the firmware image build and focusing on getting the bootable firmware working under qemu. When 2.6.15-rc2 ships, I'll see if UML has started working again.

(Time passes...)

So what did I wind up doing instead? Implementing make miniconfig in the linux configurator, of course...

Nov 18, 2005

It turns out that the x86 build was never quite happy anyway under 2.6.15-rc1. Lots more debugging of Jeff's patches on top of -rc1. Possibly a good fix for the /lib64 problem on x86.

Meanwhile, I now have Stage 1 building "unwrapped". When you run the build as root, it doesn't invoke UML but just builds directly. (This is much faster, and easier to debug. It's also sort of dangerous (if something's wrong with the script it can eat the host system), and it requires you be running a recent 2.6 kernel.

(And yes, I have had systems that require building as root zap my host system. A year or two back I tried "make uninstall" in buildroot and it deleted gzip off my host system. Installing uclibc binaries over a glibc system's binaries is another fun way of winding up with it-don't-boot-no-more-itis. There is a REASON I came up with the wrapped build.)

As soon as I get the unwrapped build working I need to do the bootable firmware file and corresponding installer, neither of which are brain surgery. (In fact with qemu, making a boot CD becomes pretty easy. Don't have to burn it to test it, just "qemu -cdrom thingy.iso -hda thingy.img". That's marvelous and I'm looking forward to using it.

I'm currently in the "making slow progress on 11 different fronts" stage, which means it's hard to see what I'm doing but a lot of stuff should get done relatively close together. (It always works that way: as you get stuff in the cluster done there's more effort to devote to the remaining tasks, so they accelerate...)

Not far from 0.9...

Stage 2 is now building unwrapped. (And I put in sheer paranoia umount calls because the readonly option doesn't seem to be honored with mount --bind yet. Yes, I deleted my source code already, thanks. Had to redo a couple days' work.)

The uClibc guys might appreciate a copy of /tools, since it's a really easy and fairly nonintrusive way to build uClibc stuff. I should write up a thingy for the uClibc list when I get 0.9 released.

(Time passes...)

So I need to feed qemu a _partitioned_ disk image. Luckily, qemu always seems to guess that all IDE drives have 16 heads and 63 sectors (the maximum values for those two fields, respectively, and what most modern large disks return when asked for a C/H/S value.) So each "cylinder" is 16*63*512 bytes long, or 516096 bytes. So it's about 2 cylinders per megabyte, or 2048 cylinders for a gigabyte. Easy enough, and then I have the C/H/S values to feed to both fdisk and to lilo.

But then I want to put data on these suckers from the host system. Hmmm. Figuring out where partitions start so you can "losetup -o $offset" them isn't very well documented. Since qemu is guessing all hard drives have 16 heads and 63 sectors I guessed that the second and later partitions start at byte offset (starting_cylinder*16*63*512), which was _almost_ right. (It turns out it's starting_cylinder-1, which makes sense. It's how much space you have to _skip_ to find the start of this partition, and fdisk starts counting cylinders at 1, not 0. Ok.)

But it seemed highly unlikely that the first partition would start right at the start of the disk because there's the master boot record there and space for bootloaders. How much space? Well, I remember that there was a bug in dos (way back in the dark ages) where dos couldn't use a partial something or other, and had to round up to the start of the next one, so a certain number of sectors after the master boot record were wasted and _that_ is what people started shoehorning bootloaders into. (I remember this from the OS/2 days.) But the details are fuzzy and google is not being helpful.

And so:

#!/bin/sh

x=1
while true
do
  echo $x
  losetup /dev/loop1 hda.img -o $[512*$x]
  mount /dev/loop1 sub
  if [ $? -eq 0 ]; then exit 0; fi
  losetup -d /dev/loop1
  x=$[1+$x]
done

And the answer of how many sectors to skip is 63! Which is the "sectors" number of C/H/S, and matches my vague recollection that dos couldn't use a partial track and was thus rounding up to the start of the next one at the start of the disk. (Why there isn't a web page out there explaining this already, I have no idea. I should probably write one.)

I think that's all I need. I know losetup doesn't have a "length" option, but mke2fs has a "blocks-count" and mkswap has a "size", and in theory that should cover it...

Nov 17, 2005

The unwrapped build is progressing, and the new mini.config feature of 2.6.15 is quite cool. Just wrote up a howto and made a couple suggestions for improvements...

Nov 16, 2005

The patch to make UML build on x86-64 breaks the x86 build.

Sigh.

The reason I've felt under the weather for the past few days has been tentatively traced to an allergic reaction. Of course my sleep schedule is all screwed up now, but what else is new?

I'm currently booting Knoppix under qemu. This is highly cool. (Also highly weird. But cool!) In some ways it's extremely slow. (It's using my 2 ghz laptop to fake a 600 mhz Pentium II, and I suspect even that speed is a touch on the optimistic side.) But things like the bouncing progress indicator next to the kde mouse cursor are moving at full speed (based on a timer, I guess).

Due to the "loopback doesn't understand partitions" issue, I have to make an installer of some kind in order to actually install firmware into a hard drive image in such a way that qemu can boot the sucker. For the moment, I'm looking at knoppix as an easy way to avoid making an installer.

Wow, it grabbed my mouse when I clicked on the Knoppix desktop, and was using it, and here I am thinking "ok, how do I get it back", and then I notice the title bar says "press ctrl-alt to exit grab". Now that is cool. The version of Frozen Bubble in Knoppix 4.0 is suprisingly playable under qemu. (30 levels later) Ok, stop now.

So, running UML under an emulated knoppix running in qemu... Would the build complete today? I must do this thing. (I must also implement the autodetection for the "unwrapped" build.)

The interesting question is: how do I get data _into_ the knoppix running under qemu? There are several painful and ugly ways of doing this: loopback mount the partition out of the image (calculating the losetup offset -- remind me to add some kind of "mount -o offset=12345" thing to busybox mount), set up the qemu network device and scp it in, try to attach an unpartitioned hdb image to qemu... The hdb option sounds least painful on an ongoing basis.

Ok, trying that... You know, hdparm -t says that the emulated /dev/hda is managing just under 12 megabytes per second. That's pretty impressive. (Have I mentioned Fabrice Bellard is cool? That's something I like about the open source community, it has a number of people in it smarter than I am. I spent _years_ being far and away the best programmer I knew. That ended when I got involved with Linux. Thank goodness.)

There are many times I've been proud of this community, but right now, testing a hardware boot on entirely simulated hardware... It's a beautiful thing. It is shiny and happy and life is good. (Ok, it's dog slow, but I don't care. It does not, in point of fact, exist, and it's _running_. Wheeeeee!)

I even like the way that when the emulated Knoppix has nothing to do, qemu drops from 100% CPU usage down to about 6%. (Because Linux uses the HLT instruction rather than spinning in the idle task, and it's done this from day 1 long before power consumption or virtualization were issues simply because Linus thought it was the Right Thing to do.)

It is currently running UML in Knoppix under QEMU. Oh wow is it slow. UML has allocated 64 megs of ram and 256 megs of swap space. QEMU has 128 megs of ram and 256 megs of (different) swap space. The parent system has 1024 megs of ram and 750 megs of swap space. And they're fighting it out.

Nov 15, 2005

Ok, an x86-64 version of User Mode Linux is finally building after four or five seprate fixes (thanks Jeff), and I've got a version of qemu to build with graphics (I broke down and installed the SDL headers on my ubuntu laptop) and thus boot a kernel for me. Did you know that if you boot a linux kernel under qemu against an empty image file the kernel will tell you what C/H/S to use in its boot messages, so you can then run lilo against it and partition the sucker properly? That was easier than I expected...

I'd like to be able to run UML against a partitioned disk image too. I might be able to figure out the offset by hand (starting cylinder * (heads+sectors) * 512?). Running mke2fs from there I'd also have to specify how long it is, and although losetup has a --offset it doesn't seem to have a --length. I keep running into this problem. :)

Still working on the writeable system. The "unwrapped" build and the "boot against this ext2 image rather than squashfs" are really different problems, but I need both for various reasons...

Nov 14, 2005

So busybox "tar xCf directory file.tar" is broken, but only under uClibc. Under glibc it works fine, but under uClibc the argument that C gets is "f", not "directory'. I posted a note to the uClibc list, but the response was strange demands relating to their testsuite, which is of course undocumented, highly user-unfriendly, and doesn't seem to be testing much at the moment.

So I need to fix uClibc myself, which has spent today fighting with various other todo items. Getting User Mode Linux working on x86-64 has eaten a suprising amount of time considering how little progress has been made, but the bug can't hide forever, and I don't know how long I'll have access to the x86-64 machine to test on so it takes precedence.

I have a couple other issues (such as the "less" segfault) that only seem to reproduce under uclibc, so I suspect I'll be patching this a lot. And no, I'm not patching their darn testsuite. I'm patching the busybox test suite, which is where I reproduce these things.

I checked in the "modprobe multiple arguments" patch verbatim, because even though there are changes to it I want to make, they're all related to post-1.1 issues. (For one thing, this parsing code is related to the needs of bbsh, and then there's the process_escape_sequence() stuff that this and busybox sed and bbsh should all be using...) And of course one of the blocking issues for 1.1 is the long file support in wget and ftpgetput, and this brings up the question of why the heck aren't they sharing any code? Which is another post-1.1 issue. I need to thump on the bug list some more and try to close out more issues...

In other news, I shoehorned oneit into the final build, and it broke. I'm tracking that down too, but that and uclibc both really require a final system with a writeable root partition, and that's related to the "unwrapped build" issue. So I'm working on _that_ so I can work on the other things.

Someday, I need to acquire minions so I can delegate tasks to them, in hopes of removing items from my todo list faster than I add them. I didn't say it would _work_, just that it would be nice to try...

Nov 11, 2005

The tar segfault was in the "v" display code: localtime(&struct->mtime) segfaulted and localtime(&(struct->mtime)) didn't. Yet the compiler didn't complain...? Might be a gcc 3 issue. Dunno.

Used the x86-64 machine to reproduce the UML failure and got the new and improved debug trace to Jeff Dike. Hopefully something good will come out of that.

I need to make a tar test suite. I need to fix mount. My todo list is getting kind of huge.

I'd be making _much_ faster progress on all this if I hadn't started playing World of Warcraft again...

Nov 10, 2005

Got a login to an x86-64 machine (woo!) from Tomasz Mateja. It doesn't have zlib installed, which means mksquashfs won't build, which is one of the first things the build does (stage 0.1). I encountered this before (ubuntu doesn't seem to install it by default either, or at least not the development headers), and since I'm building zlib in stage 2 anyway, it would be good to eliminate this dependency on the host environment. So last night I threw a quick zlib build into tmpfs to statically link mksquashfs against. (The result is over 100k, but it's just a temporary build tool so I don't really care.) I'll try it on the x86-64 machine (which is apparently in Poland) once I've got it building again on my laptop.

I also got encouraging email from Niklas Brunback (who has dots in his name). I should get the new server installed and get a mailing list started.

On the busybox front, I'm arguing with Vladimir again. He doesn't speak english (uses a Russian-English translator program), his ISP spam-blocks my email so we can't speak off-list, I often disagree with his technical judgement and due to the translation issue he's not good at explaining himself and probably finds it as frustrating as we do reading it. Add in the fact that he sometimes takes disagreement with his technical judgement as a personal attack, and it gets interesting. Oh yeah, and he's one of the top five most prolific busybox developers, so I'm trying very hard _not_ to alienate him. Yeah, I get all the easy problems. (Since this is the busybox list I could theretically just call for a Deus Ex Erik on the technical issue, but on the political side of things I'm trying not to drag him into this.)

I got the sed tests converted over to the new format (except for one big one that tests the a, i, and c commands, which I should probably break into smaller commands). Next up, going through the spec and testing the 80% of the commands we have no test for yet. Plus going through my implementation notes and testing funky corner cases and gnu extensions that the spec doesn't mention. By no means done there.

Upgraded the Firmware build to last night's snapshot just to see if anything broke and it turns out tar is broken. (Extracting files it aborts on the first file.) Head scratching. Ok, back in my may busybox thwacking directory, make allyesconfig; make; ./busybox tar tvjf testfile.tgz... And it segfaults. Beautiful.

So, I think, here's an opportunity to try out this new "ups" debugger I stumbled across recently. (A source level debugger with a C interpreter in it so you can stick in printfs on the fly! ups.sourceforge.net) Of course their downloads are via sourforge's pathological mirror system, but I can also get it through ibiblio. Cool. Grab the latest presumably stable release (3.36, is it?), do a ./configure; make... Breaks because it can't find the X11.h files, thanks ubuntu. Ok, install all that crud (and SDL so I can build qemu with video), try again... Build breaks. Think for a bit... Run make clean and ./configure again, try one more time... Now it breaks trying to use logf(), and man logf says it's a math function when they're obviously trying to use it to write log output. This is the only call to it in the program and it looks unnecessary, comment it out... Try again, and now ao_elfcore.c has a duplicate case statement? Rethink earlier impression that this was a stable version...

Nov 7, 2005

David Lang has been kind enough to test my build under x86-64. It doesn't work. Most of the problems are UML related, and the UML list is collectively scratching its head about them as we speak. On a related note, I've been offered a login to an x86-64 system, which would be cool. Making everything work on other platforms is why I've got qemu on my to-do list.

I've started on the grand unified busybox shell, which was going to be called "bush" until I remembered the current occupant of the white house and changed it to "bbsh". We need one codebase that, via config options, scales up from at least as small as lash all the way to a full bash replacement. What we have right now are four shells that don't share any code (lash, hush, msh, and ash), and none of which are reasonable bash replacements. Right.

The first requirement of a new shell is that it be able ot replace the smallest shell, lash. (This is why hush failed: it couldn't CONFIG down small enough to obsolete the smallest shell, and this is busybox we're talking about: size matters.) So when I say "started on bbsh" what I mean is I made a "hello world" applet called bbsh and then spent a day and a half reading through lash.c. I've now read most of it, and my brain hurts. (Ok, I sort of knew about terminal types and process groups from my long-lost init.c rewrite, but I didn't think the shell needed to mess with that so much. Command line parsing is _ugly_ (apparently, glob() was designed by drunken weasels or something, I need to read through all that again). I've found several bugs in bash already just from trying out "hang on, how would _this_ corner case work?" For example, echo `sleep 10` and then hit ctrl-z. Congratulations, your shell is hung. (Ctrl-c gets you back the first time. Now do it again.) My shell not only needs to do all of this right, but somehow I need to come up with automated test cases for this stuff.

Speaking of which, last night I beat the busybox testsuite infrastructure back into shape. Since I wandered away from it, new features had been added without taking into account my original design goals, so I had to shuffle lots of stuff around to make everything work again. This clears the way for The Great Project, I.E. doing a comprehensive sed test case. I've meant to do this for a while, but it's kinda huge. The whole SUSv3 compliance audit of busybox (and associated upgrading) shouldn't just be "oh, we looked at it and yep, it's standards compliant". It should be an automated regression test that PROVES it's compliant.

Meanwhile, I need to make an installer for Firmware Linux so I can release 0.9. :)

Nov 5, 2005

Hmmm... Still playing with ifenslave. (I suppose it makes sense to bond together two gigabit interfaces for a particularly active server, since a single 10gig interface seems to be running around $1600 right now, and who knows how much the switches cost, and the ones I found don't run over standard cat5.) Finally got it to work between two UML instances, but only after this fun little hiccup. Trying this:

./linux LD_ASSUME_KERNEL=2.4.1 eth0=daemon,33:33:33:33:33:33 \ 
eth1=daemon,44:44:44:44:44:44 rootfstype=hostfs rw init=/bin/sh

Sets the mac address on eth1, but not on eth0. Why? I traced it to arch/um/drivers/net_kern.c:

    if(addr[0] & 1){
        printk(KERN_ERR
            "Attempt to assign a broadcast ethernet address to a "
            "device disallowed\n");
        return(0);
    }

Is that a valid test? I thought the broadcast address was FF:FF:FF:FF:FF:FF, is there more than one? (I'm not online to look up the relevant standard right now.) Easy enough to work around, of course (use 00:33:33:33:33:33), but it's just one more little piece of trivia you have to learn along the way to getting this stuff to work...

There are of course five or six other things I should be banging on right now, and I'll probably switch to one of them soonish...

Nov 4, 2005

New release almost ready. I could have released it today. (I emailed a copy to David Lang, who is trying to get things to build under x86-64, which is highly cool.) But I want to get the build actually using init (or at least oneit) before releasing 0.8.10.

So what did I spent today doing? Banging on other things, including busybox and UML. Since I finally managed to get a busybox 1.1-pre1 release shoved out, I've been going through the bug list and trying to close them. Today's bug is the request for ifenslave, which is sort of reasonable but the patch they provided was only lightly busyboxed, and I want to do a lot of cleanup on it.

The problem with banging on ifenslave is I don't have a test environment to see if I broke it. The last time I thumped on this sucker was several employers ago, and I had rackmount servers with two ethernet interfaces apiece, and crossover cables between them. My laptop only has one ethernet interface. (And one wireless interface, but they'd be hard to bond. :)

So, fire up UML and try to bond two if its interfaces together, talking to another UML instance. Easy? Nope. Ubuntu doesn't have TUN/TAP, nor does it have Slip. And the there's no tarball of the source code to the daemon. There's instructions how to find a CVS server, but I installed subversion instead.

Eventually, I downloaded the source files out of the CVS web viewer (one by one, with my browser) and got it to build. Relatively simple to get the UML network connected after that, except that now that I fire up the bonding thing I remeber how much of a niche it is, and am wondering if we really want it? (I tested this out back at BoxxTech, and we ended up not going with it because buying twice as many switches cost as much as upgrading everything to gigabit ethernet anyway.) And it's not like it's hard to "gcc linux-2.6.*/Documentation/ifenslave.c -Os -s -o ifenslave", and the resulting binary is about 14k... So I asked if we care on the busybox list. Let somebody else make the call...

I might be moving to pittsburgh in six months. Odd thought, that.

Oct 30, 2005

Wheee. While tracking down a bug in busybox mount (preventing the tmpfs mount for /dev from working reliably and screwing up the error message as well), I found a bug in busybox sed (preventing make allbareconfig from working).

This is why the "dogfood" option in the busybox TODO is so important. When you develop a system under itself, you find lots of bugs...

(Time passes...)

I don't know how the people in the closed source operating systems do it. I'd be totally lost if I couldn't stick printfs into the user mode linux kernel right now. (Of course the possibility that the bug is _in_ the UML kernel is also pretty strong just now. Weeeeeird behavior... I want a printf(stack_dump());)

(Time passes...)

Ok, odd. CONFIG_TMPFS wasn't enabled (it was at one point, guess I mis-merged script versions). Turns out tmpfs is still _in_ the kernel when it's not enabled, it's just not mountable from userspace. (I sort of knew this, but hadn't seen it manifest before and didn't match up the symptoms.) Besides, I thought the darn thing was enabled...

Right. (And yes, most debugging is like this. You only stick printfs into somebody else's source code to find bugs in the other code maybe 1/5 of the time. The rest of the time, you're sticking printfs into other people's source code to see how exactly it expects to be configured/used.)

Oct 29, 2005

Huh. Apparently, 2.6.14 isn't putting ubda in /sys/block. I'm not sure if this is because the command line argument changed (although the ./linux --help description didn't), or because something broke, or because the CONFIG option changed name, or...?

Always something to track down when you upgrade, and the kernel's a component that touches everything. (Running the build under UML helps find kernel breakage during the build, rather than after. Admittedly in this case it's a UML idiosyncrasy I'm tracking down, but oh well.)

Not quite sure how the -c argument in switch_root is supposed to work. If /dev is maintained by udev, then we haven't got /dev/console until udev runs. What, is udev supposed to run from initramfs? Or should we synthesize a block device

Oct 28, 2005

Downloaded linux-2.6.14, and as long as I'm upgrading that I thought I might as well make the build less version-dependent. Did the 1.1 script, anyway. Section 2 I'll worry about in the morning. The symlink names in the non-package directories (system and builtools and such) no longer have version numbers in them, and the build itself does a lot of things like "cd linux*". Since it's extracting these things into an otherwise empty temp directory (which at least for script 1.1 I just renamed renamed from /tools/sources to /tools/tmp) this should be fairly safe.

Oct 27, 2005

What have I been doing the past few days...

Renamed outnit to oneit since the point of the thing is a wrapper to run just one executable. Yeah, init could do this if I wanted to feed it a config file, but it's swatting a fly with a sledgehammer.

Added switch_root to busybox. Need to resubmit the initramfs documentation I did for linux-kernel with the typo corrections that were pointed out and the new info that busybox has switch_root. First I need to test my switch_root implementation to make sure it actually works, of course.

Splitting the linux-kernel build to have separate source and object directories (make ARCH=um menuconfig O=/tmp/linux-build), which means that I can patch the source once (squashfs) and then build both the UML and bootable versions cleanly. (Woot.) Of course right now I build UML in stage 0 and again in stage 2.3 (the first has no initramfs, the second has an initramfs containing a busybox built against uClibc), and in between it builds both tools.sqf and build.sqf. That's why I'm deleting the kernel tarball and re-extracting it, although disk space in the build environment really isn't that much of an issue these days. Still...

I'm also thinking of adding an "unwrapped build" option, possibly "./build.sh --crazy" that, when run as root, builds the system without going through UML. It's noticeably faster, takes less memory, is easier to debug if something goes wrong, but has to run as root under a recent Linux kernel. Running a big script under root (which internally does rm -rf several times) should justifiably make people nervous.

What else? I figured out how to get /dev populated from /sys with a half dozen lines of shell script. The permissions are all wrong, but other than that it holds off the need for udev for a bit longer. (The shell script that runs in initramfs currently just has an "echo fixme" for the "could not mount hostfs" case (meaning we're running in the real root, not UML), and I'm adding the ability to parse a kernel command line option ala: "FIRMWARE=hda2:/path/to/firmware.img". No, this is not easy to autodetect. (I might be able to squeeze the boot device out of the bootloader, but the path to the file? Lilo doesn't even know, it stores sector arrays.) Adding an argument to the bootloader is the easy way. Anyway, this means that initramfs has to have /dev, but you just have to mount sys and do a find "/sys/block/$DEVICE/dev", extract the major and minor out of that, mknod, and life is good...

Of course I'm trying to convert the build to use my trivial dynamic /dev script (and ditch the MAKEDEV script: Everything it does is under root anyway, so permissions 700 are fine), but trying to mount a dynamic /dev uncovered a bug in busybox mount. Sigh. For security reasons (certainly in the final system, so I might as well have the build work that way to make sure this configuration can withstand some stress), I want /dev to be a tmpfs with no free space except dentries. Which is simple enough: "mount -t tmpfs -o size=0k,nr_inodes=8k,mode=700 /dev /dev". Works fine with util-linux mount. Unfortunately, that not only fails to work with busybox mount, but gives a corrupted error message. Beautiful. (And less is still segfaulting when built with gcc 4.0.2.) These are to-do items...

Speaking of udev, you'd think there would be some simple documentation for udev (download, run _this_ to populate /dev on bootup and point /proc/sys/kernel/hotplug at _this_), but no. In addition to assuming you already know all that (or will figure it out from the source code), you need to provide a big complicated config file to specify persistent naming and permissions. Pondering adding udev to busybox anyway. Looks like the config file parser is most of the work, possibly there's a simple subset that's good enough for embedded use...

And speaking of stuff that's hard to configure, I downloaded qemu again. In theory this means I can try my build under various other systems easily (I downloaded UML's Red Hat 7.2 image, which would be good to see how busybox compiles under anyway). Plus I can eventually try to build FWL for x86_64, PPC, ARM... And I can debug my firmware-bootable image without endless reboots and periodically lobotomizing my test system. Coolness. (Yeah, I have a knoppix CD. That's the backup plan. I'm not playing with the bootloader on my laptop, though, which means at the moment I can only test at home with the server physically next to me so I can power cycle it.)

Built qemu from source, which disabled graphics because ubuntu doesn't have SDL development stuff installed. (Well of _course_ not.) Then tried to run it without installing it, which it's very unhappy about. (Look, your darn bios files are RIGHT HERE. Check the source code... The -L option! Why doesn't the -? help mention this option? The one compiled from source lists a completely different set of options than the downloadable 386 binary, and the downloadable binary says that -L sets the elf interpreter prefix...?) But in the _source_ (vl.c) we have:

            case QEMU_OPTION_L:
                bios_dir = optarg;
                break;

...which has NOTHING TO DO with elf interpreters. What would that _mean_, anyway? Override /lib/ld-linux.so.2 the qemu binary is linked against? Isn't qemu a hardware emulator you feed a hard drive imag so it can load the boot sector and start emulating processor instructions and hardware? It could be running windows in there, which doesn't even use ELF (it uses PXE instead, and it supplies its own darn interpreter, thanks). That's working at a completely different level than qemu...

What it _wants_ me to do is log in as root, cd /, and extract the tarball of precompiled binaries. Which is not my first choice for playing around with new software.

The other thing is it has no standard terminology for whether an image is a whole disk (with bootloader and bootsector and partitions and everything), or just a filesystem image. (I.E. are we talking /dev/hda or /dev/hda1?) I suspect the UML thing I have is just a filesystem image. Of course qemu has options to load a linux kernel and act _as_ a bootloader, but how...?

Hopefully the answers are either here or here. Maybe I can feed it tomsrtbt and work my way up from there...

What else. Redoing the index.html for Firmware Linux, that should go up when 0.9 is ready (which will be when it's making a usable firmware-boot, including at least a cheesy installer). My current test machine should become the new landley.net server once I get firmware-boot installed on it. (Which means I'm poking a bit at servers, although that's relatively easy. It needs ssh, web, and dns to start with. Email comes later...)

Just puttering around, really. Need to put out busybox 1.0.2. Re-reading Andre Norton's old Witch World novels (yes, from the 1960's). Biking. Getting serious about job hunting even though my cell phone is still in the mail. (Mailed five days ago from California, where it was extracted from the luggage of Fade's Parents when they got home. Unfortunately, via the US postal service, not something competent, so I may never see it again. Did I mention the usps.com parcel tracking page is a joke? I says the post office was notified the package exists, and the page will be updated when it has been delivered. That's it. I have no _idea_ where my cell pone is.)

Oct 24, 2005

So a longstanding problem with funneling the build through UML is ctrl-c doesn't do anything. This is because UML is running the build on /dev/console, which has no controlling TTY, and running it as PID 1, which has the kill signal blocked anyway.

So I've made a dumb little Advanced Init Substitute I'm calling outnit, which basically forks and runs its arguments as a process attached to /dev/tty0, then does the whole zombie reaping thing (and calls reboot() if the process it forked exits.) The upshot of all this is, ctrl-C works.

Unfortunately, feeding arguments to init through UML turns out to be non-trival. In theory, anything unrecognized on the kernel command line gets passed through as arguments to init. In practice?

UML running in SKAS0 mode
Checking PROT_EXEC mmap in /tmp...OK
Unknown boot option `/home/landley/newbuild/firmware-build/sources/scripts/1.0-tools-umlsetup.sh': ignoring
System halted.

Beautiful. Turns out if you have a period in the argument, it gets rejected. That is a broken heuristic, and I've submitted a patch to linux-kernel.

Later that day...

So I'm ready for a release, but the webserver on Eric's machine is still down. Sigh. (Email's back, web isn't.) I need to get get my own server installed and up and running anyway. I should work on that today. A real-world use case for Firmware Linux...

Let's see, ssh, web, dns, and email. Dropbear, busybox's built-in httpd and postfix seem like obvious choices, but what to use for dns? Not bind, not oak, not djbdns... Hmmm...

Killed httpd and re-ran it and now the website's working again. I have no _idea_ what was wrong. Right.

Updated the web page. I'm calling it 0.8.9 because it's still not building a bootable version, but it's _out_! (Making an installer is going to be interesting. I have to install lilo on the new system: should I make a bootable CD, or try to get UML to bind a ubd to hda and see if I can somehow determine the appropriate geometry...?)

In the meantime, I should get the final build running either init or outnit, so ctrl-c works. (Now trying to do ctrl-z in that is likely to do something stupid, and I have yet to think of an easy way around that...)

Oct 23, 2005

The full development tools build is 18 megabytes. OUCH. The previous full development tools build (based on gcc 3.3 and a correspondingly older binutils) was about 12 megabytes. I knew FSF code tends to bloat with time, but an extra 6 megabytes is just PAINFUL.

A little of that is because my build script didn't delete the lilo source code out of tmp before making the squashfs. But that's not even a megabyte of the compressed total. It installed megabytes of info pages (hello FSF: nobody, anywhere, uses info for anything, except you), and all sorts of internationalization crap that I forgot to tell it not to. So I can strip it down a bit from where it is, and plan to. Another chunk is /usr/include, but that's actually needed and is highly compressible text...

Still, 18 megabytes. Ow. The base system with no developer tools is 2.5 megabytes, so we're talking a good 15 megabytes for the toolchain. At least 10 megabytes of that has to be gcc and binutils. Compare that to tcc, which is about 100k...

Oct 22, 2005

Spent most of the day in San Antonio, doing the tourist thing with Fade's parents. Continue to be unimpressed by the Alamo. Email still out.

Now I'm fiddling with stage 1.1 (the /tools build). This is a pain and a half.

Ok, intro to toolchain building. Toolchain building sucks because it has three times as many dependencies as normal building, and two of those three are mirror images that it's really easy to confuse. (The compiler and linker create binaries linked to a set of shared libraries and a library loader. But the compiler and linker also _are_ binaries linked to a set of shared libraries and a library loader. Yes, a necessary cross-compiling step is creating a gcc that runs against glibc but creates binaries linked against uClibc. And this is glossing over the whole idea of where all the standard header files live.)

This would be easier if there was consistent terminology for it, but yesterday I had to give ldd dumps of the various stages and point. "That bit is what's currently wrong." You can keep it straight in your head if you can think visually, but how do you document or take notes?

The other fun thing is when you're building on a system that is different from the one you're going to be running the software on. I've growled about libmudflap before, which doesn't keep this straight. I recently found out you can just rm -rf libmudflap out of the gcc sources, and the gcc build seems ok with that. Yay.

In theory binutils and gcc have grown some new potentially useful config options, which I intend to play with in future. In theory, I can tell binutils configure --with-lib-path=/tools/lib, and then tell gcc configure --dynamic-linker=/tools/lib/ld-uClibc.so.0 --prefix=/tools --exec-prefix=/tools, possibly specify --oldincludedir (I don't know what that does yet)...

Of course it's not distinguishing the environment these tools will run in from the environment the binaries produced by these tools should run in. If I can just consistently get the second meaning I can control the first by making temporary tools and building a final set of tools with those temporary tools. (It's what I'm doing now, actually. ONly instead of specifying stuff cleanly on the command line I'm performing surgery on the source code with sed and grep.)

Add in the fact that juggling around the order of things can easily break the _other_ things that got reordered, even if you don't think you changed them. (I broke the uClibc utilities build by doing a "make clean" after the library build and then trying to build the utilities. It cleaned the temporary headers out of the source's headers directory, and the utilities build needed those.)

Fiddly. And I miss having working email. Busybox less is segfaulting, and I haven't got time to track it down myself right now...

Oct 21, 2005

Busy couple of days. Build compiling != build working, of course.

I mentioned I need switch_root, becaue pivot_root doesn't work on ramfs in 2.6.13 (and it was a bug it ever did in the first place). And I have most of one coded up, to submit to busybox if my mail ever gets working again. But that's a side issue: to debug the build I can just cd/mount --move/chroot. Deleting the old stuff out of rootfs before it becomes inaccessible just saves space is all. (P.S. If you mount something on /, you can't ever umount it later, short of a reboot. Fun.)

Fiddled with the initramfs a bit, got things working again (took a while, not the world's greatest debugging environment). Got the build generating one that works, now with msh instead of ash even.

I also fiddled with making things more configurable, so you could _not_ build the development toolchain and stuff. Built a minimal system with just uClibc and busybox, but it _didn't_ _work_. None of the stuff would execute. Suspicion? Library paths were wrong. Huh, try running ldd. That wouldn't execute either. Try "mount --bind / /tools" and run it again, and yes ldd is linked against /tools.

So the linker paths in gcc 4.0.2 are _all_screwed_up_. Big surprise. If I don't build development tools, everything is linked against /tools/lib (which isn't there on the final system). Even if I _do_ build a new toolchain, anything built before the new gcc is on the system (including uclibc's helper utilities like ldd, and all the bin86 binaries like ld) are still linked against /tools. So building a toolchain to go in the firmware doesn't actually fix the problem, just makes it less obvious...

My little 3.3 trick tweaking the spec file at the end of the /tools build so the toolchain starts building stuff linked against /lib? Doesn't work in 4.0.2. That specfile no longer exists. I go find the new specfiles, and feed them the correct information. No effect, whatsoever. It turns out that the paths are hardwired into the binaries, and nothing short of rebuilding those binaries will change the paths.

Time out for our regularly scheduled swearing at any code ever touched by the FSF...

I want a /tools directory that internally depends on /tools/lib but _generates_ binaries dependent on /lib. This is not that esoteric a requirement. But to get this, I need to build one set of tools that link against /tools/lib and then build another set of tools that link against /lib.

Of course I'm already building both binutils and gcc twice during the /tools build. The existing reason for that is the first build works fine for building more binaries to live in /tools, it just doesn't run out of /tools. (It runs against the host system's libraries, probably glibc.) I.E. (pseudo code):

/tools/bin/gcc hello.c -o hello
ldd hello
	/tools/lib/ld-uClibc.so.0
	/tools/lib/libc.so.0 => /tools/lib/libuClibc-*.so
ldd /tools/bin/gcc
	/lib/ld-linux.so.2
	/lib/libc.so.6 => /lib/*/libc-*.so

I want a gcc binary whose libraries look like the hello world binary's. Hence the second build, built with the first.

But if I move that second build to the _end_ of the /tools build, the build can generate binaries whose run-time dependencies live in /tools, and will thus execute out of the chroot environment. The fact that the toolchain itself won't run anywhere but the host system is fine. Then at the _end_ of the tools build, when I don't need to add anything more to /tools itself, I can tweak the source directory so any binaries the new tools generate now point to /lib, and build these tools with the ones I'm about to replace so that the generated toolchain binaries depend on /tools/lib. So I get:

/tools/bin/gcc hello.c -o hello
ldd hello
	/lib/ld-uClibc.so.0
	/lib/libc.so.0 => /lib/libuClibc-*.so
ldd /tools/bin/gcc
	/tools/lib/ld-uClibc.so.0
	/tools/lib/libc.so.0 => /tools/lib/libuClibc-*.so

Gotta be worth a shot. Still _way_ more finicky than I'd like, but then gcc always was. It only _ever_ works by a series of carefully engineered coincidences. (Too bad the output of tcc is so crappy from an optimization standpoint. And it doesn't do c++...)

Oct 18, 2005

Ok. Build is back together, and everything is now compiling. (The dropbear install is failing, but that's just an application not in the base OS anyway. It's trivial breakage: for some reason when I try to create the scp symlink it says it's already there. Not a big deal, commented it out for now, fix it later.)

And that means I should soon have a build that's more or less ready to snapshot and call 0.9. Tons of to-do items left, of course. The biggest is that it's still only creating the UML version. Creating the bootable version is easy (it's the same packaging on a different file), but I haven't made an installer yet, which renders such a bootable version noticeably less useful. I dragged the server from the upstairs closet and intend to inflict this thing upon it, possibly even later this evening.

Of course this would be the perfect time for Eric's server (the one my website and email are on) to get cracked and start spewing spam, so naturally that's what happened. (He was running sendmail, so it's not much of a surprise. His machines are all Fedora and he wants to install Fedora Core 4 on it, but nobody's been able to download FC4 CDs with good checksums. The current theory is that the published checksums are wrong.) In any case, it also means that my website and email are both down right now, until the machine in question gets reinstalled.

So I stare at the server next to the table and go "hmmm"... It'll need a webserver, but I've configured apache before and busybox has one too. I've needed a second nameserver for a while anyway (I've used bind before but a _small_ one would be nice, and djbdns has the downside of being written by Dan Bernstein, which rules that out...) Gotta get dropbear working of course, but that's just fixing the install. The mail server's going to be tough (postfix is the obvious candidate, but I've never set up a mail server before). And of course get the kernel install setup working...

I shouldn't hold up the 0.9 release for this, though. It's finally downloading all the source code dynamically from the right websites (today's snapshot of busybox finally has everything in it I need), so putting up a source tarball (containing my build scripts, a few small patches, and some directories full of symlinks for organizational purposes) is only about 42k, gzipped. If the website was up, I'd just post that as-is as soon as I get a working UML image out of it. (Well, after fixing the dropbear build.)

The fact that gcc can't build in 128 megs or ram is a bit of a pain, too. I still don't know if it's a real gcc issue or a UML memory leak. If I can't easily make it go away, I suppose I could feed it a block device to use as swap. Dunno what the performance impact of that would be, but it's unlikely to suck much more than taking a quarter-gig from the host system when you haven't got it...

And I should mention on the main webpage that the sucker's designed to give you a quick status report via grepping for "===" on the output. (I.E. if you've redirected it to a file, it prints a === line before each major checkpoint. Although I suspect some are missing...

Ah, of course. Build made it to the end, created a UML image, and it failed to boot. Why did it fail to boot? Because I upgraded to 2.6.13 and pivot_root on rootfs now fails. Which is a good thing, and I knew it was coming, but it means I have to incorporate switch_root into busybox before I can get a working system.

Right. Existing to-do item moves to the front of the list...

Oct 17, 2005

The stage 2 build broke because my cut and paste of the libiberty fix smashed the tab into spaces. (Patching a makefile, I need to insert a tab. Busybox sed doesn't support \t. It's a to-do item.) Turned the spaces back into a hard tab, right.

Now the gcc build is barfing because busybox awk can't parse "gcc-4.0.2/gcc/opt-functions.awk". Wheee. That's one I can't easily fix myself (awk is big and complicated). Posted about it to the list to see if the awk maintainer's around. I guess that's it for the night, unless I want to clean up the rest of the source tarball downloads...

Oh that was _prodigiously_ stupid. You know how the "unwrapped" 2.2 build involves --bind mounting the source code into the directory? (And it's not a read-only mount because 2.6 doesn't support read only bind mounts in the 2.6.10 kernel?)

Guess what I just did!

Yup, forgot to umount sources before doing a rm -r in the temporary build directory. (And _this_ is why the unwrapped version is dangerous!) Right, I have a backup from... the 15th. Beautiful. Anything newer? That's from the 12th...

Right. Recreate the last two days' changes, and most of the work I was doing was in busybox or long debugging sessions that resulted in fairly small fixes once the problem was known. Not too bad. I saved when I got the tool build working unwrapped...

Oct 16, 2005

So I'm reading through the source code to bb_getopts_ularg. Luckily, vodz seems to speak C.

The limiting thing is it's a wrapper around the libc function getopt_long...

Later that day...

Ok,the busybox bug was that the getopt string contained "-1:", which meant it always needed at least one unparsed argument left over. This may be true for create, but create already has the "cowardly refusing to create empty archive" test and message. For extract, "tar xvjf thing.tbz", f tages an argument so there are no leftover unexpected arguments and thus it dies. I.E. I didn't break it, it didn't work before my patch.

Apparently, gcc no longer builds in 48 megabytes of RAM. (UML's out of memory killer triggered on genattrtab. Fun. Upped to 64 megs.... Good grief, 64 megs isn't enough? Ok, gcc 4.0.2 is a _pig_. How about 80 megs? Sheesh! 96?

Ok, now the OOM killer isn't triggering on genattrtab, but on the the gcc invocation _after_ it. Unless this puppy's shelling out to gcc or we've got some weird asynchronous thing going on, it sounds like a memory leak in UML. (There was one of those mentioned recently, but I'm not suing 3-level page tables, am I?) Right.

Ok, first off, let's try giving the bastard 256 megs of ram and see if _that_ makes it happy. (Since UML is a normal userspace program the parent system can swap it out. Performance may suck, but we'll see...

Yeah, it builds with UML allocating 256 megabytes of RAM. That's just _sad_. I'll leave it for now and see about trimming it down later. (Either UML is leaking or gcc is an _amazing_ pig.)

And the stage 2 build has a couple of non-obvious breaks. Ok, do the same run 2.2-* outside of UML trick I did to debug 1.1. (And top's ability to sort by memory usage would be cool if I was paying attention at the right time during such an unwrapped build...)

At midnight a snapshot of the fixed busybox goes up so I can upgrade the build process to auto-download something rather than the handcrafted tarball I'm using at the moment. Meanwhile, Fade and I are going to see the Nightmare Before Christmas sing-along at the Alamo Drafthouse in an hour, and we're biking so we should leave now...

P.S, wrote this earlier:

I need to document the "how to bypass UML" build bit. I built /tools ok with gcc 4.02 before because I did it outside of UML, and my laptop has half a gig of ram with swap space on top of that. The UML setup only has as much memory as I tell it to allocate, and I haven't configured any swap for it.

To do the /tools build, you need to do one thing as root (make a symlink, /tools, pointing an empty directory), and then drop back to your normal user and run sources/scripts/1.1-* which should build the /tools directory. Doing this does briefly require root access, and you have to be running a recent 2.6 kernel, but it's noticeably easier to debug. (And runs a bit faster.)

The 1.0 stage is the UML init script that sets up the environment for 1.1, and 1.2 packages /tools into tools.sqf.

Now running stage 2 outside UML, that requires root access in lots of places. (It has to be run in a chroot environment, it calls mknod, it changes the ownership of files...) But what you can do is something like:

mkdir -p sub/{tools,sources}
cd sub
mount --bind /tools tools
mount --bind /path/to/sources sources
chroot . sources/scripts/2.2-*

Oct 15, 2005

So gcc 4.0.2 seems to be building now, and compiling stuff against uClibc-0.9.28. But to install uClibc-0.9.28 I need to patch busybox to have an --exclude option (because the uClibc install uses it). In theory, I have now done so. In practice, I did so late last night and intended to test it in the morning, and between then and now Vodz (the Russian busybox developer who communicates via a babelfish derivative) "fixed" it. And since he's the one who wrote the code and I mentioned in the checkin that I couldn't understand his babelfishese documentation, I heartily approve of him doing so. But:

Now busybox tar doesn't work at all. Even trying to do "tar tvjf file.tbz" it dumps a usage message, meaning it fails to parse the arguments. I don't know if vodz broke it or if I did, and I'm waiting for him to take another look at it before I go in and do more damage, because this is his area of the code and not mine.

After several consecutive days of working on this, I could use a break anyway. I've updated the index.html file in preparation for the new release, and I just sent my friend Mark and my fiance Fade off to see Mr. Sinus Theatre professionally make fun of the movie "Lost Boys" (I already saw them do this last time, and it was quite good). I think I'll go biking.

Oct 14, 2005

So gcc 4.0 has apparrently grown support for uClibc, using the --target=i386-linux-uclibc option to ./configure. Except it considers this cross-compiling (which I suppose it is) and wants to find binaries named i386-linux-uclibc-ar and such.

I'll think I'll try to get the "sledgehammer and soldering iron" approach working again first, since their automated method can't handle relocating /lib and /include under the /tools directory anyway. (Choc-full-o hardwired assumptions, that's gcc for you. So whatever I do, running sed against their source multiple times is still necessary.).

Right.

So there's a fairly delicate sequencing issue. Building binutils 2.14 and gcc 3.2 used to require symlinks from /tools/lib and such to the parent system's libraries during the build, so it could find things during the build. This was because the cross-compiler ran on the parent system, and thus used the parent system's libraries. The new uClibc I was building wouldn't necessarily run under an older kernel version than the one described by the headers package I gave it.

Now that I'm using UML, the tools build is done under the new kernel, no matter what the system is actually booted with. So in theory these tools can run against uClibc and everything should be ok, and this means that if I move the uClibc build before the binutils and gcc builds, the libmudflap thing might magically stop causing a problem...

And it got me past that bug to a _new_ bug. Woot.

Nope, false alarm... Grrr...

Later that day...

Right. Another fun little issue, uClibc yanked a symbol called dl_iterate_phdr and gcc 4.0.2 wants to use it. I posted about it to the uClibc mailing list, but in the meantime I've just patched the gcc source to turn the relevant test into if(0). I think it's for debugging anyway (the symbol does stack unwinding, which shouldn't normally be needed for anything by C).

And apparently uClibc doesn't successfully export __libc_stack_end, which gcc really seems to want now. But Linux From Scratch has a patch for it! Woot. (I'm told this patch is wrong. But it builds, and seems to work...)

Ok, _that_ was a pain. But the base toolchain seems to actually be working now. (Dunno if it's building stuff properly, but compiling the rest of the system should give me a hint...)

And gcc 4.0.2 breaks bash 2.05b. The fsf doesn't even like the fsf. (The error is that a goto: label is the last thing in a function, except for some #ifdeffed out code. And instead of calling this a warning, they made it an error. Solution? Add do {;} while(0); right after the label to TELL THE BROKEN COMPILER TO SHUT UP. (It is not unclear what the program says to do. What's wrong with goto the end of the function instead of return? A compiler should not stop, sit down in the middle of the road, and throw a tantrum over something like this. Sheesh! Did they not NOTICE they broke bash 2.05b? What, "everybody upgrade to 3.0"? Not happening, I'm going to write a new shell for busybox to replace bash, and until then I'm sticking with the old verison thanks.)

Ok, 1.1 built to the end, now let's see about the rest of it...

Added --exclude support to tar. Didn't quite do it right, but the guy who wrote busybox's option parsing code doesn't speak english, and uses a russian to english translator instead. The result is seldom fully comprehensible.

Huh. The hardwired paths in collect2 went away. On the one hand, this is a positive sign. On the other hand, now I have to figure out where /lib and /usr/lib are being specified now so it doesn't include both and do bad things since I collapsed them together...

Oct 13, 2005

Huge amounts of progress recently. Got everything building under the new 2.6.13.2 UML (with -skas0 mode, much faster) and decided to go for broke. I've now gotten about 80% of the packages upgraded to the most recent version and it's automatically downloading the new ones from the relevant websites (and checking sha1sum, and keeping the old copy around to use next build...)

The automatic downloading of source means I only have to put up a <100k tarball for people to download, not the 90 megabyte monstrosities I've avoided putting up because I don't want to suck up too much of Eric's bandwidth. Once I get the rest of them converted I can have a serious _release_.

The packages left to migrate over are mostly easy, and include uClibc (I'm still at 0.9.27, need to upgrade busybox tar to understand --exclude for the 0.9.28 to install its headers properly), busybox (using a daily snapshot of the 1.1 tree, just need to pick one recent enough), lilo/bin86/nasm (haven't gotten around to it yet)...

And of course the one I've spent too many hours on already, gcc. Ow.

Ok, it's my fault for trying to move from 3.3 not just to 3.4, but to 4.0. But there's also the fact that the people who wrote this need to be HARMED. Yeah, fixincludes is pointless braindamage you have to disable to keep it from screwing stuff up, and the one sed invocation to do this became three because they're now installing a README file in includes warning people that they broke your headers and it's too much effort for them to actually care. Right. Fine. Three sed arguments and that's toast.

Then there's the fact I have to adjust the paths for the /tools step, because despite having specfiles they hardwire the suckers into their source code anyway (well of course, this is the FSF we're talking about). So I have to look at their actual source code.

I have now been deeply and thoroughly reminded why nobody should ever look at any source code the FSF has ever had anything to do with, ever. Not unless they've done something to really really deserve it, and even then you should wait a bit to see if there's a last minute phone call from the governor or something...

I'd go into more detail but I literally have a headache now, and am going to bed. I'm going to try to finish the conversion in the morning. I suspect not having done the darn gcc and collect2 patches properly last time is the reason the tools.sqf build was about the same size as the final system, despite having less in it. (Redundant linker paths cause gcc to go nuts and create binaries as big as if they're statically linked, but still needing the libraries to link against. Remember how I squashed together /lib and /usr/lib? Sigh...)

Later that day...

The previous entry was written just after midnight, and this one is just before. It's been about a day.

So, imagine The Tick (the big blue nigh-invulnerable one) saying "Heh, those wacky gcc developers" (ala "those pesky ninjas"). This is about how I feel, as an alternative to going postal.

There's something called "libmudflap", a subdirectory under the gcc build. I don't know what it is, or what it does (no README), and I really don't care. What I do know, is that in 4.0.2 its ./configure script tests to see if the compiler that just got built can create a runnable a.out file, which it can't because it's trying to dynamically link against a c library loader that i haven't installed yet. i'm fairly certain 3.3 didn't do this.

This configure script is creating a log file. it's also deleting the log file at the end of the run, whether or not it was successful. It was easier for me to edit the configure script and stick in printfs (well, echoes) than it was to figure out how to prevent it from deleting the log file.

Now _that's_ good design...

Oct 11, 2005

Since I finally had a decent reproduction sequence I sat down and debugged the problem last night, of course it wasn't a UML problem. UML was merely triggering a long-dormant bug in busybox that had survived 22 months without biting anybody except me.

The bug (I fixed it) survived for 22 months (since Nov 27, 2003) by A) only occurring under uClibc (not glibc), B) only occurring under a kernel newer than 2.6.11 due to _what_ random garbage was left on the stack, C) and then only occurring intermittently.

You may have been wondering why this project is proceeding so slowly? In addition to being entirely done in my spare time, it spins off so many tangents that I wander down, and can get distracted by for a long time before wandering back to work on FWL. I've got a dozen distractions: Linux From Scratch, BusyBox, uClibc, User Mode Linux, Matt Mackall's 2.6-tiny tree, squashfs, dropbear... I haven't even got X11 in this thing, can you imagine what kinds of tangents I'll wander down once that sucker comes in?

I want to get a release out other people can use so I have people poking me to draw my attention back to the central project. Some reason to set up a better web page and a mailing list and a subversion repository (or maybe even Mercurial). More tangents to wander down, of course... :)

It'd be different if this is what I did for a living, but it isn't. Work is over in the "real life" category taking time away from working on this.

By the way, I'd just like to take a moment to say how much I hate "vim". It does gratuitous synchronous I/O I didn't ask for and can't seem to stop, and sometimes I have processes running in the background that keep the disk I/O bound. And when this happens, my attempts to edit a file in vi will spontaneously pause for upwards of _30_seconds_ while it tries to sync its' stupid log file, and then behave normally for another 15 seconds or so, and then freeze again. (Yeah, I looked up how to make it not keep a log file once, but then it has no "undo" functionality at all. What I want is "don't call fsync() unless you _mean_ it", which doesn't seem to be an option.)

Yes, it's currently doing this to me. The busybox implementation of vi doesn't do this. (Then again, I'm not sure the busybox version has "undo"...)

Oct 10, 2005

Many things going on behind the scenes that I haven't updated this to mention. Let's see.

Linucon happened, Eric and Cathy visited for a bit, and so on. (I.E. "real life".) Lots of busybox stuff too. (It now has a "less" implementation, so I can drop that package once I upgrade. I'm putting together a 1.0.2 release and then trying to lock down 1.1-pre1 in hopes of getting busybox 1.1 shipped in January.)

My blocking problem with Firmware Linux has been the inability to upgrade beyond a 2.6.11 kernel without UML going wonky, but if my two most recent posts to the UML-developer list (here and here) don't let them reproduce the sucker, I don't know what will.

And once _that's_ working, I can unblock all sorts of stuff...

Sep 17, 2005

I spent a couple weeks working on other things (busybox and Linucon come to mind), but I'm wandering back to FWL now.

The makefile reorg is finished, although I'm adding new functionality still. I need to autodetect when it's running as root and skip the UML wrapper steps, and the source tarball downloader.

The new script to download all the source code from the original locations is something I've meant to do for a while. Among other things, documenting where all this stuff comes from is important. Also, I'm trying to get a release together to put on the website, and the 100 megabyte tarball I put together last time is unwieldy. Keeping previous versions of such a tarball around is a real pain, discouraging minor updates with only one or two files changed, and being a bit of a pain for anybody trying to download the new version. The new script needs to download them only if they're not already there, should do an sha1sum to verify their integrity, should fallback to my website as a place to get them, etc.

I'm also putting together a monster to-do list to figure out what I need to do before the release and what has to happen after. I'm not on a schedule with this, but I've let it languish far too long already...

Another thing about getting it up is I need to move my website off of the machine in Eric Raymond's basement to the server in my bedroom closet hooked up to my cable modem with the static IP and the symmetrical 3-megabit connection. But I want to install firmware linux on my server before doing this. (Eating my own dogfood, and all that. Real field test of the thing.) The downside of not being on a schedule is I've been going down too many side paths rather than focusing on getting this up and running...

Sep 16, 2005

Linux kernel 2.6.13.1 has a working SKAS0 mode (Single Kernel Address Space mode, without requiring a modified host kernel), which should be noticeably faster than the -tt (Tracing Thread) mode I've been using. I initially couldn't use this because it doesn't support TLS (some threading optimization) yet, and recent glibc versions (such as the one in ubuntu) detect "ooh, a 2.6 kernel should support TLS!" and then barf if it can't initialize it. (Yes, glibc has an error message, but won't fallback to not using the feature. I consider this a glibc bug.)

The whole reason I was using -tt mode in the first place is that even though it's slower, it runs everywhere. The UML guys want SKAS0 to replace -tt, but if it wouldn't run due to this glibc bug, I couldn't use it.

BUT: It turns out there's an obi-wan "this is not the kernel you're looking for" move. If you set the environment variable "LD_ASSUME_KERNEL=2.4.1", glibc doesn't even try to use TLS. I confirmed that the following uclibc command line gives me a working command shell inside UML running in skas0 mode:

./linux rootfstype=hostfs rootflags=/ rw mem=48M init=/bin/sh LD_ASSUME_KERNEL=2.4.1

Aug 27, 2005

Ok, the gcc build break turns out to be because the uClibc install isn't installing the headers. The problem is that when running uClibc-0.9.28's make install_dev with busybox tar, the header copy is done with a pipe between two tar instances, and the creating tar instance has an "--exclude CVS" option but there isn't a CVS directory anymore. The busybox 1.00 version of tar complains about this and dies, thus no files are copied.

This is both a uClibc bug and a busybox bug. First of all, busybox shouldn't die because you tell it _not_ to copy a file that isn't there. It should be fine to --exclude something that doesn't exit. On the uClibc side, it shouldn't say --exclude CVS now that it no longer uses CVS, and more seriously the install should notice that tar failed rather than falling through and claiming success when it actually failed.

I hath pestered both mailing lists, and in the morning I'll take a whack at dealing with it myself, but for right now it's time to pass out.

Aug 26, 2005

Grinding away.

Breaking up the makefiles into a master "build.sh" that runs 0-make-tempdir.sh, 1-make-tools.sh, and 2-make-firmware.sh. Stage 1 runs the sub-scripts 1.0-tools-umlsetup.sh (the UML init script), 1.1-tools-build.sh (the actual build), and 1.2-tools-package.sh (to make the squashfs image for tools). Stage 2 also has three corresponding sub-scripts.

The upside of this is that you can choose to run the build without UML, you just need to run 1.1-tools-build.sh (as root, on a recent enough kernel), bind mount the resulting -tools directory into an empty directory, chroot into that to run 2.1-firmware-build.sh. It also means I can swap in different 2.2 stages to build the UML demonstration firmware or the actual bootable lilo firmware.

Lots of little details, though, such as the fact that a UML instance really doesn't exit with an interesting error level, so it can be a bit difficult to figure out if it's finished or not. (To get around that, I create a file in the filesystem when it's finished successfully. But, of course, _where_ to do that is non-obious because the files in the loopback mounted ext2 image aren't readable outside of the UML, because the non-root user in the parent system can't loopback mount...)

And yes, this turns out to be a necessary step before upgrading too many new packages, because under the current monolithic UML-based build script (which I apparently haven't posted to the website yet and really need to), a build break that happens partway through what is now stage 2.1 is a real pain to debug.

Building on the parent system as root (instead of as a non-root user under UML) is much easier to debug. And way faster to build, too. (Of course one particularly bad mistake this way could lobotomize my laptop, but that's why backups are a good idea.)

Grind, grind, grind...

Aug 24, 2005

Ok, reverted to a reliably working build. 2.6.11 UML, 0.9.27 uClibc, busybox 1.00 (with patches), and obsolete versions of just about everything else.

Started out the upgrades with uClibc to 0.9.28. The tools part built fine, but then gcc died trying to build under that.

Let the head scratching commence...

Aug 23, 2005

Wow, it's horked.

I upgraded the kernel to 2.6.12.3, updated uClibc to 0.9.28, updated busybox to 1.01 (which should be 1.0.1 but ask Erik what we call 1.1, or better yet 1.1.1), redid the build script to not have nested here documents...

Now I've reverted to the version to the website to figure out where I broke it so badly...

I'd grumble at myself about changing too much at once without regression testing, but I actually don't think that's what happened here. The problem is, an intermittent race condition cropped up, meaning sometimes the build runs to completion and sometimes it breaks in a completely random place, meaning I wasted a while examining entirely the wrong things, until I ran it unmodified three times in a row and got breaks in three completely different areas. Fun.

I _think_ the problem is that UML in 2.6.12.3 has a race condition in its filesystem; the build never breaks in the same place twice. I know I built it and it ran to completion when I first put UML in, but that doesn't help here. (It's clearly aace condition, the failure is always a file not found error, usually for files that the build should have just created. For example, install breaks trying to chmod the file it's installing...)

It's also possible that changing the loopback ext2 image to be a sparse file (dd now does a seek to the end and writes 4k, rather than filling the whole space from /dev/zero) triggered a bug in the kernel. It _is_ a rather impolite thing to do to poor linux kernel (both UML and the host kernel), but it saves space on the hard drive, skips a build step that lags the rest of the system considerably, and the downside should just be a bit of extra fragmentation in a temporary file (the loopback image is deleted after the build)...

It could also be that I'm trying to build it under ubuntu, which has a number of things wrong with it. (A make allyesconfig of busybox 1.01 fails with an ipv6 error, and what you have to remember is I put this sucker together and it _did_ work, but I think I tested that bit before before my laptop had ubuntu on it.) The server upstairs also has ubuntu on it now (well, in both instances kbuntu), and although it builds the current version of dropbear, it doesn't work. Each connection is immediately dropped with an ipv6 error...

However, the whole point of building the tools directory is to isolate the final system from the parent system, and ubuntu misbuilding stuff is highly unlikely to cause race condition errors or two different machines. A problem caused by a horked build environment _should_ be deterministic...

Update: Wow. What's on the website is _ancient_. I need to update the website.

Ok, what I need to revert to is the last build using the 2.6.11 kernel. A UML built from that worked reliably, and all the other changes (like the sparse ext2 image) came later too. Luckily, I have lots of backup tarballs, I just have to figure out which one I'm looking for...

Update to update: Ok, "reliably" is an overstatement. There are some very strange things about stdin and stdout in my setup, and gnu tar cares too much about these things.

The build runs within UML, and as far as the build is concerned it's talking to /dev/console. The UML instance providing that /dev/console is configured with the "stdio console" option, which means it forwards the input and output of that to file handles 0 and 1 (stdin and stdout) of the UML process running on the host system. If that UML process is running with its output redirected on the host kernel (piped to tee so it can be copied to a file), gnu tar's verbose option causes tar to abort because it thinks it hasn't got a stdio (even though it does).

Admittedly, this stdio setup is strange and evil, but gnu tar is just broken. First it's hallucinating a problem that doesn't exist, which is just a simple bug. The fact it's aborting in response to something that could safely be ignored even if it was real is a bigger problem, and a design issue. It's the verbose option to tar that's failing; aborting tar because it thinks it can't be verbose is _wrong_.

tar: Error in writing to standard output
tar: Error is not recoverable: exiting now

Yes, failure to write -v output is fatal. Nothing else is being written to stdout.

I'd already have replaced gnu tar with busybox tar throughout the build (since busybox tar isn't broken in this way), but A) the busybox I build is linked against uclibc, so I'd have to build a second one to run on the parent system, B) I need the parent system's tar in order to extract the initial tarballs anyway (including the busybox tarball). So I might as well use it consistently. Except for it being broken...

Aug 17, 2005

And so I return to work on FWL.

I put together a busybox 1.01 release (which was kind of overdue), and got the mount rewrite checked into the 1.1 line. Over in uClibc land, Erik got out uclibc 0.9.28. And it's long past time that I bump the compiler version up to at least gcc 3.3, maybe 3.4. Which means dredging up a new binutils, and I should upgrade the kernel to 2.6.12, plus the build script needs to be broken up back into the stage files because nested here documents are just silly...

Right.

Jul 22, 2005

Ok, the problem with unbutu is you have to modprobe loop in order for the loop devices to show up. (Note: they'll WORK if you don't have this module loaded, but udev doesn't get notified to make the /dev entries. This is one of the dumbest things I've come across in a while. And yet /dev/loop/0 is there without the loop module installed, even though lanana.org is quite clear that /dev/loop0 is what it should be.)

In other news, 2.6.12.3 has working tt-mode UML support again (bugfix went in), and I've shoehorned it and the new 2.6.12 libc headers into the build to see what breaks. That's 2 packages updated, several dozen left to go.

My laptop is running grub (that's what Ubuntu has), and the only bootloader I've patched to have a length option is LILO. I'm under the impression adding a length option to grub would involve modifying assembly code. And grub doesn't have a -R option to change the command line for the next boot only. (This lets you do a provisional boot; have the new firmware change the default only if it comes up successfully. If it can't boot the new firmware, power cycle the box and the old firmware comes up. As far as I know, grub can't do that if there's no keyboard attached to the machine and the only interface to the thing is via a web browser...)

I'm still spending most my hacking time ~~playing World of Warcraft~~ working on the Busybox 1.0.1 release. I'll still need to apply the sort patch to that since sort is a new feature, not a bugfix. But on the bright side, it looks like I'll be able to use busybox gzip/gunzip in the build, which is cool.

Jul 10, 2005

There's no such thing as "/dev/loop/0". Ubuntu people, are you listening to me? You use udevd. The standard device name for the first loop device is /dev/loop0.

I'm also not particularly thrilled about breaking all these libraries into two packages (one of which, the "-dev" package, installs the headers so you can compile stuff against it). I thought this was stupid when Red Hat did it: headers are small and eminently compressible. Look into squashfs or something.

Yes, this broke the firmware linux build when I finally downloaded the code and just tried to run it on my new laptop. (Running the existing build before trying to do new work, as a sanity measure to make sure my new environment was working on a known good build...)

And no, I didn't change my script, I made the darn symlink in /dev. In this case, ubuntu is wrong.

Jul 9, 2005

New laptop, new distro (kbuntu). Can't say I recommend either. Took two weeks to get it together, between the Dell hardware going south (I didn't know a power supply brick could beep; who is stupid enough to put circuitry in there? It's supposed to be a dumb transformer, its' JOB is to get hot and put off electromagnetic interference well away from the circuitry. But no, Dell makes it all flaky...)

I didn't install Fedora because Red Hat's continuing deterioration is just too painful to watch. (I can see not shipping decss because you don't care if your customers can play their DVDs, but not shipping mp3 playing software when the patents only cover recording and not playback is way too yellow for me to stomach association with a company that not only won't stand up for anybody but backs down in the face of _imagined_ threats. They don't even install OpenOffice, instead they install an _installer_ for OpenOffice that makes you agree to a license before downloading it from the website. Things like GCC 2.96 and shoving gnome apps down KDE users' throats are just gravy compared to the sheer legal cover-your-ass cowardice Red Hat has developed. Oh yeah, these guys are going to stand up and fight the good fight about letting us actually play DVDs on our laptop someday. Not.)

Knoppix hasn't got a decent installer; I didn't leave it running overnight to copy 20 gigs of backup files to an ext3 partition just so it can insist on reformatting it reiserfs. I don't even like reiserfs. And peeling it off the CD and loopback mounting it manually got trickier since they introduced union mounts. Plus a few new fun bugs moving from 3.6 to 3.9 (switching tabs should not make Konqueror resize its window). Running it from the CD on an ongoing basis is incredibly painful: not just slow spinning the CD up all the time but the endless stupid confirmation boxes... (Yes, entering text at the google prompt is going to send it unencrypted across the network. Shut up. Yes, it's ok to let google set cookies: it's ok to let everybody set cookies, just treat them all as session cookies. A pop-up prompt for every single cookie is _INSANE_. Yes, I sometimes want to move from encrypted to unencrypted pages, such as when I type a completely new page in at the URL bar to re-use an existing tab. Shut up. No, I NEVER want you to save login information in some stupid wallet for any site, anywhere...)

And because it's running from the CD, you have to jump through these hoops every time you boot. The way I do email via ssh tunneling is another fun thing that needs setup under a cd version of knoppix, that's assuming I set up one of its strange "scour the hard drive and find my home directory" things...

Slackware 10.1 is still using a 2.4 kernel. Yeah, I know Patrick got sick and has some catching up to do, but if I consider a distro that doesn't use udev to be a bit behind the times, what can I say about a 2.4 kernel in 2005?

I will never use unfiltered Debian. Back in 1998 I was wandering back to Linux (after a long divergence through OS/2 and Java). The last time I'd actually run it was 1993 or 94 (and that was SLS, which wasn't around anymore), and after a bit of searching I found that Debian was still around. Downloaded over a dozen floppies, got them all installed, then tried to configure my dialup connection to install the rest. My modem was on a nonstandard IRQ for that serial port, and I knew this, and know how to tell OS/2 this, but didn't know how to tell Linux this. (The answer I was looking for was "man setserial".) I made the mistake of asking on the debian developer list. I didn't get an answer to my question, but reading the list over the next few days the permanent flamewar expressed the sentiment that the people on the list would "rather see Debian die than start pandering to newbies like Red Hat". And I went "Great! Red Hat! That's what I'm looking for." That was 5.something, and I was a happy Red Hat user for years... And I am NEVER giving Debian another chance. I don't care what their technology is like, and I'll use Debian derivatives, but the politics of their developer community is something I just don't want to be directly exposed to ever again. (Plus the fact it takes them several years to release each new "Debian stale", and no there isn't a b missing from that.)

My friend Stu Green gave me some Ubuntu CDs a while back, and based on that I downloaded their current KDE version last week. Boy is there a lot wrong with it. In order to get a root login I had to crack my own system (boot with init=/bin/sh, mount -o remount,rw, vi /etc/shadow, cut and paste to replace the "*" in root with the password from my user... The base install was laughably insufficient and the "install additional stuff" option in the menus was not obvious to me (stu pointed out system->kynaptic, which is apparently obvious to Debian people). Even after it had installed gcc it didn't have "gcc" in the path. (It had gcc-3.3, I had to make my own symlink.) The installer didn't notice it had run out of disk space, leaving lots of packages broken until I figured this out and reinstalled. It spins the hard drive down every 15 seconds when it's on battery power (I need to tweak the init scripts, probably just run "laptop-mode stop" at the end). And so on, with a gazillion small idiocies...

But kbuntu is less than a year old, and the direction it's heading in seems to be promising. It's at last trying to serve the desktop market. It uses udev, it correctly recognized the Dell laptop's wierd monitor size (widescreen: 1280x800), set up the sound properly... It's _trying_. It's not ready for end-users yet, but someday it might be, and in the meantime it's something I can hammer into a useful shape.

Jun 11, 2005

My new job has taken a lot of my time getting up to speed (especially learning perl). I've only gotten to check my personal email every three days or so.

What free time I've had in the past few weeks has gone to trying to keep busybox development from stalling again, since I seem to be the only lieutenant of Erik's more concerned about Busybox than uClibc.

That said, it seems to be easing off a bit, and I looked at a new laptop today that would make it a lot easier to work on this without network access. (Right now to do a build, I have to ssh into a faster server at home. My current laptop is a cheap replacement I bought used after Linucon, and it swaps itself to death just with the browser windows I keep open...)

We're having a Linucon concom meeting tomorrow, and I'd like to have a 1.0 release of FWL by Lincon (end of September) to give a presentation on.

Deadlines. What would I get done without them? Not much...

May 20, 2005

I've been slacking off a bit, I know. And to top it off, I got a new job, which I start monday.

My server in the closet is much faster than my laptop (and has twice as much memory, and doesn't have a couple dozen open Konqueror tabs causing it to swap its guts out so badly the mouse pointer freezes for 30 seconds at a time on a regular basis) when I'm _not_ running a compile in the background. As a result, I tend to want to ssh from my laptop to the remote system to run a build.

This turns out to be non-obvious. You can't just "./make 2>&1 | tee out.txt" because the build takes hours and if your network connection cuts out, the ssh session will die and kill your child processes (the build). I often want to kick a build off and go to lunch, which involes suspending my laptop...

In theory, "(./make > out.txt 2>&1 &)&" followed by "tail -f out.txt" is the answer, and the double fork does indeed detach the make process from its parent shell. (Of course ssh still refuses to exit if there are unfinished child processes, but this hang is a bug in ssh that is not shared by xterm, telnet, or anything else but ssh. Yes the ssh developers are aware of it and refuse to fix it because they're OpenBSD guys who hate Linux.

In practice, if you try the second incantation you'll notice that User Mode Linux exits immediately. Why? If it gets EOF on stdin while initializing the stdio console, it throws a temper tantrum.

The answer is "(cat /dev/zero | ./make > out.txt 2>&1 &)&", which works.

May 7, 2005

Ok, it's May, I can start writing "2005" in the date field now. (Went back and fixed all the previous dates back to January.)

So I'm finally popping my head up and looking around at what the rest of the world has been doing. I got subversion installed so I can merge patches into busybox, and rather than starting by merging my own patches I've been clearing up the patch backlog from the mailing list. Of course the only way to get people to reliably object to a patch is merge it, so I'm having to revert several of them, but still. Progress is being made.

I burned a Knoppix 3.8.1 CD. (I bought one at Penguicon, but it wound up in Cathy's luggage and is in Pennsylvania at the moment.) There's a show-stopper UI bug preventing me from actually upgrading my laptop to it yet (Konqueror resizes its window when you switch tabs. Web browsing is over 50% of what I do at the computer, and that kind of constant annoyance is just... *Shudder*...), but they're using Unionfs!

Yes, Linux now has a patch that provides working union mounts! I've been waiting for this for years. Now I just need to figure out how to get it to compile built-in rather than as a module. (Have I mentioned I hate makefiles?)

Debian stale is about to ship a new release still using XFree86 rather than X.org. The X.org switch happened at least two fedora core releases ago, and before the Ubuntu project was even commissioned, yet Debian "stable" still hasn't got its new release switched over yet. Is there any wonder it's acquired the nickname "Debian stale"?

Apr 30, 2005

Ok, back from Penguicon.

I finally broke down and installed subversion on my laptop, and I've been going through the busybox archives to check in some old patches that have been lounging around on the list. Once I get the patch backlog worked through a bit, I'll start doing my own fixes to the busybox to-do list I wrote. The upgrade to "find" that currently prevents most packages' "make clean" from working would be a good start.

In parallel, I need to upgrade the versions of the various packages and put together an installable version of the firmware image. (Doing the firmware image is easy, writing the installer that lets you do anything useful with it is the hard bit. Bit of a chicken and egg problem of needing a modified lilo to install it, which is in the new system that you boot into. Yeah, I could do a boot CD, and eventually probably will, but for the moment I'm thinking a UML image could lock a UBD onto a normal block device like /dev/hda, although getting the geometry information is tricky. UML can't read from a pipe through hostfs, so actually getting data out of the parent system's /proc can be problematic...

It'll be a few more days before I get around to working on it, though.

Apr 18, 2005

I just put up a release, version 0.8. It builds the User Mode Linux version of the firmware image, from source. I need to redo the main page and put out some kind of announcement on freshmeat or some such, but not until I get my website moved to a faster internet connection.

I also did a first pass of a todo list. Which is not complete, of course...

P.S. I don't know why tar was unhappy. I was piping the output to "tee", and when I replaced gnu tee with busybox tee, it built all the way through. (I also moved it from the fedora machine upstairs to my knoppix laptop, so I could remove the network as a consideration as well.) So this seems to be a case of one GNU program not liking another GNU program, possibly with some interference from Fedora. Odd. I'll worry about it later.

Apr 17, 2005

The build process has been having the strangest crash. Right before the end, right as it finishes untarring the kernel tarball to build the UML instance to make the firmware-uml file out of, tar exits with:

linux-2.6.11/MAINTAINERS
linux-2.6.11/Makefile
linux-2.6.11/CREDITS
linux-2.6.11/README
tar: Error in writing to standard output
tar: Error is not recoverable: exiting now

But it successfully wrote that message, didn't it? And it only seems to happen if I let the whole build run: if I run just parts of it, the parts complete fine.

Odd. Still tracking it down. It would help if I could find a reproduction sequence that took less than 2 hours to run on my fastest machine...

Apr 12, 2005

It works, for certain values of works. For the moment, here's a test binary of the UML version. As I ranted about last time, it's 13.7 megabytes until I have time to strip it down.

My build script is creating a single file, firmware style UML executable with built-in root partition. (See link above.) You run it like a normal executable and it boots up UML and gives you a shell prompt. (No, ctrl-c doesn't work because that shell is attached to /dev/console of the UML instance, which doesn't give a controlling terminal. To-do item.) I need to splice the script snippets together into one big build script, and do a 0.90 release of the build for that. Hopefully tomorrow.

/bin/sh is running as pid 1 in that, and if you try to exec /sbin/init it gets confused. Right now when it tries to run busybox init, there's no inittab so busybox supplies a default inittab internally, which tries to open consoles on tty1 through tty3. The UML config tries to open those as an xterm, but support for that isn't built into this UML instance (mostly because it needs some helper program I don't seem to have), and it starts spamming the UML console with this:

Bummer, can't open /dev/tty3
Using a channel type which is configured out of UML
Using a channel type which is configured out of UML
Using a channel type which is configured out of UML
Using a channel type which is configured out of UML
Using a channel type which is configured out of UML

With init running there is a working shell attached to /dev/console, so you can type commands blind and they happen, but without being able to see the results about the only useful thing you can do is "halt". Then again, halt works beautifully with init running. (Without it, just exit the shell. Yes it'll panic "tried to kill init", but that shouldn't hurt anything. Call sync first if you have any writeable space mounted. I should fix that so exiting init syncs before panicing...)

Yes, it's 13.7 megabytes, although there's a lot of opportunities to trim it (some of which I detailed last time). I'll do a size reduction pass after I do an update pass (the versions of some of the tools I'm using are pretty old), which I'll do after I clean it up enough to do a release. (Although getting the static initramfs busybox to be built against uClibc instead of glibc I can do now. The UML instance might have to stay built against the parent system's library; it's statically linked but the headers it was built against are probably for a newer kernel than the systems its expected to run on, so what syscalls is it making? Something to ask on the UML list when I'm more awake...)

The main remaining unresolved issue is where do I mount the hostfs so the build can access the parent's files. There are a few ways of doing this: it can supply its own environment entirely (which the test version is doing now, although its only writeable space is ramfs at the moment), or it can supply its own environment but attach the writeable hostfs at a known location, or it can glue one new directory on to a hostfs and use the host's tools and libraries to run stuff out of that directory.

Eventually I'll probably code up all those options, or at least examples of them. But for right now, I need to clean what I've got up, get my website using it, and ship it as 0.90...

Oh yeah: the "strip" command will chop off the entire root partion I've appended. Right now, Don't Do That Then (tm). Eventually, I should be able to make this look like an ELF section that strip can understand enough to retain, but that's way in the future.

Apr 11, 2005

Build bashing going along fine...

(I'd like to mention that World of Warcraft has already delayed work on this project by a month, probably with more to come...)

Apr 9, 2005

So last night, while the laundry was going, I sat down to make a firmware style UML image. Guess what? The losetup command opens the file read/write, and if the file is /proc/self/exe then it's a running executable and this fails with "text file busy". The mount is going to be read only, but the losetup command doesn't know that. The mount command knows this and can open the file read only when it does the loopback mounting (it doesn't shell out to losetup, it calls the two ioctls itself), but then you can't pass an offset. There's no offset option.

Now I can easy make a little C program to #include , fill out a loop_info struct, open(O_RDONLY) the file, and call ioctl(LOOP_SET_FD) and ioctl(LOOP_SET_STATUS). This is not brain surgery. But a possibly cleaner solution is to add a -r option to busybox losetup. Otherwise, simplicity says I might as well just move all the functionality of the initramfs/init script into the C program, but readabilty argues against it. I'd rather not open that can of worms if there's an option.

So I whipped up a patch to busybox. Dunno if they'll take it...

Anyway, with that patch, I now have a "firmware" style UML which weighs in at 14 megabytes and boots to a command prompt in a loopback mounted squashfs firmware linux root partition. (And there was much rejoicing.) I can trim several megabytes off that (the big low-hanging fruit is the two megabyte busybox in the initramfs: it's statically linked against glibc and is "make allyesconfig". That gzips down to much less than that in the initramfs cpio, but still. Then Matt Mackall's -tiny tree can trim a megabyte or so off the linux kernel, although a naive application of the whole patch results in a broken UML compile, so I'm going through the broken-out patches and adding them one at a time to see what works and what doesn't for UML...)

The 12 megabyte root partition I'll have to look at some more. A lot of that's gcc and binutils: they're enormous. Not a lot I can do about that except give the option to exclude them. And if they go, a lot of the rest of the system can get a quick trim for embedded usage. (The uncompressed size of some obvious candidates: /usr/include is 5915k, /usr/share/info is 4607k, /usr/share/locale 4678k, /usr/share/man is 1526k... I could parse the squashfs output to see how much space the compressed versions take up, or I could just delete 'em and see.)

But all that goes on the to-do list after getting a 0.9 release up with everything upgraded to the current versions of all the packages and an install mechanism that lets you actually boot into a firmware file. (And of course getting the build to make these images automatically rather than me doing it by hand. :)

Oh, and I found another bug, probably in UML. (Might be in busybox mount, but I doubt it.) After the pivot_root, I have the old initramfs mounted on /mnt and under that is proc mounted in proc. So I want to move that mount to /proc. When I try to "mount -o move sub/proc /proc", it hangs hard. I have a theory...

Apr 3, 2005

Ha!

I don't need the bingrep program to search for the start of the squashfs partition. The elf header at the start of an executable file is 16 bytes and it only uses 0-8, bytes 9-15 are padding. I can use 12-15 to store a 32 bit offset for the start of squashfs and still have a three byte safety margin in case of future expansion.

This doesn't help me with the actual bootable firmware file, but it does making a firmware-style UML image. Which I should go do. (So naturally, I'm fixing the CONFIG_FEATURE_SH_STANDALONE_SHELL of busybox. But after I do that...)

Apr 1, 2005

New code is up. No, it's not an april fool's joke. (Never really liked that holiday anyway; I've never seen the humor in trying to trick people.) You can download it here.

Currently, it's pretty cheesy. Download the tarball and run ./make.sh, and when it finishes the file tmpdir/workspace.img is an ext2 image containing the root filesystem. Not very useful at the moment, but you can loopback mount it and chroot into it and play around a bit.

The next step is to create firmware-uml and firmware-linux images you can run and try it out with. The UML version will just run ala "./firmware-uml" so you can play with it. Using the actual bootable firmware-linux one would be a bit of a chicken and egg problem because you need the modified copy of lilo it contains in order to install it. I mentioned last time a potential approach using ./firmware-uml with a script that attaches a UBD device to a boot device to run lilo against, if the drive geometry querying isn't too big a deal. I'll have to play with it.

I'm working my way to a 0.9 release. The build got to the end, so I'm putting a snapshot of it online to replace the hideously out of date code that's up there.

I have a list of things I want firmware linux to accomplish, and it's not quite doing most of 'em yet. A 1.0 release is where it does all of 'em.

Show that busybox can fully replace the GNU tools.

I want to show that busybox can entirely replace the GNU tools, for regular real-world use. The heaviest test case I can come up with here is using them in a development environment to build an entire Linux distribution. Hence firmware linux should be able to rebuild itself under itself, with just busybox and a compiler toolchain.

I'm most of the way there. Busybox has replaced most of the GNU packages, but there are still specific bugs or omissions in things like gzip, find, diff... (See the busybox TODO file. I wrote it and it's bascially my busybox todo list for firmware linux.)

Firmware image

I want a single file that boots and runs the whole of Linux, including any bundled applications. Note that such a system wouldn't have a package management system like .RPM or .DEB files, because the firmware image IS its package management system. This makes sense for things like bootable CD/DVD distros (which are read only anyway), for corporate workstations, for embedded devices... It's one more package management option, an atomic read-only system image you can easily upgrade and version control.

I don't know how squashfs interacts with rsync. It's possible that upgrading multiple workstations via rsync would be pretty easy for small changes, or it might require upgrading the whole image anyway. I also don't know what it would take to combine rsync and bittorrent, or how much work it would be to make a bittorent-based block device...

Desktop stuff

I want to be able to use it on my laptop to replace knoppix. Right now, it isn't even running X.

Configurable build

It should be possible to make a really stripped down version that doesn't have build tools in the final image. (Right now it's a little over 10 megs with gcc and all the header files and such; without it should be possible to get it under 3, and with the linux-tiny tree down to about 2.)

Administrative stuff

I need to move my website off the server in Eric Raymond's basement. I've been paying for a fairly beefy cable modem connection with a static IP for well over a year now, yet where's my website? Behind a 40 kilobyte per second DSL line in pennsylvania that isn't even mine. Putting up an 80 megabyte tarball for download from there is impolite at best...

I also need a download option where the 80 megs of source code aren't bundled with the build scripts, but instead have a script that downloads them all from their original websites. (Not brain surgery, I just need to do it.) And for the one big tarball version, I should set up bittorrent.

Getting what I currently have booting in a firmware image (which is just a matter of doing it), plus the administrative changes to the website, is enough for the 0.9 release. Possibly this weekend, if I'm lucky...

Mar 30, 2005

Spent yesterday playing World of Warcraft on my fiance's computer, but today the WoW servers are down (again), so I'm taking that as a sign I should get real work done.

The only tweak to the /tools directory in the second stage was changing the gcc .spec file to link against /usr/lib instead of /tools/lib, and it did that right after building uClibc (so there was something to link against). In theory, uClibc doesn't link against external libraries, so that tweak can be done before building uClibc and thus /tools can be a readonly squashfs mount. In practice, gcc wants libgcc and a few other things to build _anything_. The fix is to symlink /tools/lib as /usr/lib until the end of the uClibc build, and then replace the symlink with an empty directory before doing the uClibc install. So now /tools can be a squashfs mount.

Eventually, this will allow me to ship a unified UML binary (User Mode Linux plus appended squashfs filesystem) of the first half of the build, which can do more flexible kind of builds, with the option to do things like not have any build tools in the resulting system, to save space. That comes later, though.

It's also getting close to time to think about an installer. I use a modified version of lilo, and having them install that on a parent system is a bit silly. A bootable CD would be good (that's what I did a few years back), but another fun little thing I can do is make a UML based installer that runs my modified version of lilo against a UML Block Device (UBD) that's set up to point at one of the real block devices (/dev/hda or some such). Feeding it the appropriate drive geometry might take a bit of doing, though...

Mar 28, 2005

I'm up at Trianon, a coffee shop in the arboretum, and the sun has just gotten to the right angle to shine off the cars in the parking lot, through the window, and into my eyes. Sigh.

I bought a new computer. I shouldn't have, but it was $170 or so at Fry's ($250-ish with extra memory and taxes) and came preinstalled with Linux ("Linspire"), and presumably working 3D. I dunno the details yet, I immediately swapped the hard drive out from my server and put it in the closet, and I'm running builds on it. (It's got a 1500 mhz "AMD Sempron 2200", which is three or four times as fast as my previous server.)

Lots of progress getting the build together. The tools build is working, and the second half of the build (equivalent of LFS chapter 6) is getting de-bitrotted. (Mostly directories moved under it.) Hopefully, I can get a snapshot up later this evening.

Only _after_ I get the new version up do I worry about updating all the apps to current LFS versions, applying the -tiny tree to the linux kernel, upgrading the various busybox apps I need to fix, redoing the web page, installing the firmware thing on the server and moving my web page to it...

Home stretch...

Mar 22, 2005

Blaiosorblade on the UML mailing list has a patch for the problem I've been seeing with hostfs permissions, so possibly I'll be able to stop mucking about with a ramfs mount over /dev early on. That would be cool.

More cool is that my contract at Dell ends friday (Yay!) so I may finally have a bit of free time (non-sleep deprived, even) coming up to seriously get this project in shape. That would be good, my current code is way past what's on the website, and I also need to install it on my home server and get my website moved over to run under it.

The other thing I did recently was move all the scripts and source under a firmware-linux directory to make it easy to ship a tarball. (No more "download sources.tar.bz2 and download these scripts", it's all one tarball now with sources in a subdirectory.) Once I get my new server configured, I can make a torrent out of it...

Mar 17, 2005

I'm playing with Matt Mackall's "tiny" tree. Lots and lots of little patches to figure out what they do. (Among other things, they can make UML smaller...)

Updating everything to current versions is an important to-do item after I get the build back together and producing an actual runnable firmware image. One of the things I intend to do is create the correct list of where all the source code can be downloaded from. (Among other reasons, I can never remember http://ep09.pld-linux.org/~mmazur/linux-libc-headers/ and have to keep looking it up. Each time. I can't spell "Mariusz Mazur", either...)

As always, the sourceforge mirror system screws things up by not making it easy to select a darn cannonical download location. (The only way to get a URL you can point wget at is to pick a mirror and fight with your browser for a bit. And may I say how NICE it is that they get you to a page that isn't the download you want, but auto-starts the download, so there's no easy point at which you can right click on a link to "save as". You have to hit "stop" to abort the download it's trying to do (with the way my browser is set up, pointing it at a tarball will _OPEN_ the tarball and view its contents with a file manager), and then right click on the "if your download doesn't auto-start" link...

Whoever designed sourceforge's mirror system really should be punished...

Another issue is seeing what else is out there other than gcc and binutils. There are lots of free compilers out there, but eliminating the ones closed source ones (like Intel's), the 16 bit ones, and the ones that aren't on Linux, the list narrows a lot. Ideally, I'd want one that could be a reasonable replacement for GCC, not just on x86. It has to be able to compile the Linux kernel (which is tricky with lots of gcc extensions and dependencies on things like inlining in the proper place), produce reasonable code, and C++ support would be nice too.

Fabrice Bellard's Tiny C Compiler is interesting, but with no C++ support it can't handle python or anything derived from qt (like Konqueror), the code it generates is still pretty sucky (big and slow, although the compiler itself zooms through the build amazingly fast), and although it has compiled the linux kernel (google for tccboot) it wasn't an unmodified Linux kernel because it's missing a few things yet. On the bright side, the fact the codebase is small and simple means it's the easiest of the lot to extend..

SGI has GPLed their "Open64" compiler, and a company called PathScale derived a commercial thing from that (which is confusingly licensed, I _think_ it's still GPL, but their website doesn't make it easy to find out unless I want to download a free 30 day trial version...) I think PathScale gets eliminated for sheer cluelessness on that front, and I don't know much about Open64 yet.

There are a few other possibilities; I really need to spend a lot of time trying them out, and that won't be any time soon.

Now that busybox really can replace the gnu tools in an actual development environment, my goal with a system based on busybox, the linux kernel, and lilo is to present a working system to Richard Stallman that hasn't got any GNU code it it and ask him if he'd still call it "GNU/Linux". Building with gcc, he'd probably say yes, even though the FSF hasn't been in charge of gcc since the EGCS fork inherited the name. (Presumably, if I build with Intel's compiler I have Intel/Linux.) Shipping gcc in the result, he'd definitely claim credit for it...

Actually, whatever I do he's still going to say yes, but I'd like to highlight the absurdity of it as much as possible. Everybody needs a hobby...

I should put up my todo list on the main firmware linux page...

Mar 16, 2005

Wandered to Metro (my local 24 hour coffee shop with wireless internet) with my laptop to get some work done on Firmware Linux. Spent the evening banging on the makefiles of User Mode Linux instead. Of course...

I don't even LIKE make files, but I want to do away with the snapshot .config file I'm using for the UML build and instead cat the new symbols I'm switching on to the end of a .config produced by "make allnoconfig", and then run "echo '' | make oldconfig" to update it. This is only guaranteed to work if I don't have to remove any existing symbols, and right now there's CONFIG_LD_SCRIPT_STATIC=y vs CONFIG_LD_SCRIPT_DYN=y, and it's basically one boolean (are we statically linking or not).

So I'm puzzling out enough of the kernel's makefile syntax to make it an if/else condition on just the static symbol, which defaults to unset and is one of the ones I currently add for TT mode.

I probably don't even really NEED to do this, since oldconfig seems to zap the DYN symbol if both it and the STATIC one are set, but still. There should be only one, so I'm cutting the other symbol's head off and ripping it out of the config menu too. (I should probably zap CONFIG_STATIC_LINK too, but not tonight...)

So I spend the evening solving a problem only I'm ever likely to notice, but I get to rip stuff out and throw it away. Always a plus...

On an unrelated note, I'm putting /tools (LFS chapter 5) in a squashfs appended to a usermode linux instance. This means everything in it is read-only, but after I build the new uclibc I need to update a spec file so that what it builds gets linked against the library in /lib rather than /tools/lib. Since uclibc is statically linked, I don't know having the version of the spec file that links against /tools around for the uclibc build is necessary at all. I should test that...

If I'm going to play around with UML much longer, I _really_ need a version of dump_stack that can write to an arbitrary print function. I'm just saying...

Also, the /tools directory is about as big as the final build because the binaries in it are HUGE. Looks like the old duplicate library in the search path problem triggering whatever bug that is in the gnu linker. To-do item to track this down, I've taken a whack or two at it and it's not easy...

Oh, and I'm playing around with getting busybox applets to compile into independent executables, too. It's pretty simple for some of 'em:

#!/bin/sh

cd libbb
make
cd ..

for i in sed awk patch vi
do
  APPLET=$i
  FILES=editors/${i}.c
  gcc -Os -s -o $APPLET standalone.c $FILES libbb/libbb.a -Iinclude -DAPPLET_main=${APPLET}_main -DAPPLET_full_usage=${APPLET}_full_usage
done

Others take actual thought. And turning the above into makefile syntax is unlikely to be fun, of course. (Did I mention I really don't like makefiles?)

Mar 15, 2005

Woah, long time no update.

I spent last weekend thumping on User Mode Linux to fix the console output issue (patch sent to the list, and accepted into the queue), and along the way learned more about the Linux tty layer than man was meant to know. I have an inkling why they keep saying it needs to be cleaned up, and will probably look into it some more soon.

The busybox mount rewrite on Feb 4 has been "done save for polishing" for a while now. I polished that a bit too, need to resubmit it to the busybox list. (Or wrestle with the darn bug database, although this is more a new feature than a bug.) I should replace the sort patch on the firmware page with that, since the sort patch is now checked in to busybox for the next release. (Speaking of longstanding patches, I need to dig up and polish my bzip2 compression side rewrite and get that in, and also there's recently been interest in my old init rewrite that got eaten by the many month busybox freeze (both the feature freeze before 1.0 and the endless repository switch to subversion afterwards). Naturally, now that all that's cleared up and I can get stuff in again, I'm swamped with day jobness...

None of this is necessary for forwarding firmware linux, of course. (The UML console thing is cosmetic, and I worked around the mount bug ages ago.) Now that Linucon's website is on Mark's server (ooh, another thing that's been eating time: Linucon hotel contract negotiation. Once again, not going smoothly, but nobody's suprised...) Anyway, now my upstairs server is free to be reinstalled with firmware linux. If I just have time to put it together.

A really simple "proof of concept" I can post to the UML list is a UML instance that has a squashfs appended, and does the hostfs mount trick to borrow the parent filesystem long enough to open /proc/self/exe and loopback mount said squashfs. I'm thinking of making build stage 2 work like this, although I haven't figured out if I want the 80 megabytes of source code to stay on the hostfs or be in the squashfs. Probably the former, but I have to figure out the cleanest way to do it...

I moved all the build stuff under a single directory last week, with sources under there and the scripts in there, so that I can make just one tarball you extract, cd into, and run. I need to make time to finish that...

Feb 27, 2005

Grumble grumble grumble...

Replacing /dev with a minimal ramfs substitute is a hard problem. The gnu toolchain blows chunks left and right for subtle reasons, and often the actual build break is several steps after where the problem was...

Yet using the host system's /dev via hostfs is brittle and requires several things to be replaced anyway: /dev/loop0 has the wrong permissions, and /dev/console only belongs to your user if you're logged in under X, and not when logged in from a text console or via ssh. And that's what broke on the distros I've tried, who knows what else is strange elsewhere?

So to get something portable, I need to make the ramfs /dev work, and just trying to see what breaks as I go along is frustrating and time consuming. So I'm setting up parallel builds with the original /dev and the ramfs /dev, so I can stop them at various points and compare what's diverged.

Doing things right can really be a lot of work sometimes...

By the way, strace -f still segfaults under UML. Dunno why. (Possibly it segfaults without it, I don't know.) Replacing the actual system binary of /usr/bin/ld with a script ala "#!/bin/sh\n\necho $* >> logfile\nld $*" proved useful drilling through the "collect2" wrapper, though. When in doubt, use brute force...

Feb 26, 2005

It's taking a little while to figure out what I need to add to /dev to make the /tools build happy. (I could just run the Linux From Scratch makedev script in there, but I'm trying to replace that with udev soon so adding another instance of it would be kind of uncool.)

The ar from binutils 2.14 needs access to either /dev/urandom or /dev/random, or else it reports an incorrect error message (that the archive it's trying to create doesn't exist). This problem does NOT occur with a more recent binutils (maintained by somebody other than the FSF, and yes I plan to upgrade after I get an actual release out doing the firmware thing).

So the build broke on my laptop, I took it to a faster machine running Red Hat Enterprise instead of knoppix, the break happened there too, I hacked the mktools-umlsetup.sh script to run /bin/sh instead of mktools-build.sh, run the build script by hand, and then run the offending line that caused the break in the shell but forget to set the path to include /tools/bin like my build was doing, and the problem wouldn't occur.

I lost hours to this. Naturally, my first instinct was to think something else went screwy with UML, but UML is actually getting pretty robust. (By process of elimination it would seem.) Once I figured out /tools/bin/ar behaved differently than /usr/bin/ar, and managed to reproduce the problem from the command line, I tracked down the actual problem pretty quickly by running ar under strace. Yes, in UML. This used to segfault immediately, but not it works. (Yes, this is running a ptrace-based debug tool under an emulation environment based on ptrace. So I'm ptracing the thread doing the ptracing, and it _works_. The mind boggles. Pretty cool, though.)

So that's how I spent my evening. (Well, that and reading Roger Zelazny's "The Changeling" while variants of the build ran to the breaking point over and over. And this is not counting the first half of the evening, spent watching my fiance play world of warcraft and eating (I'm not making this up) leftover cherry custard/cheesecake pizza from Mustache Pete's. It might actually be better cold out of the fridge than it was hot out of the oven...)

New code drop on the website Real Soon Now (tm). But not today.

Feb 24, 2005

Finally feeling better, just in time to help my fiance move in with me. Much packing, but managed a little coding time in there too.

Various idiosyncrasies using UML with a hostfs root filesystem have been dealt with. A new tools build should be up in the next day or two.

The tools build script is now three parts. The master build script (mktools.sh) compiles User Mode Linux, and then runs the rest of the build under that.

User Mode Linux starts out with a hostfs mounted copy of the root filesystem, so we can use all the build tools and libraries of the host system and don't have to supply a binary filesystem image of our own. (I'm trying very hard to let this thing build under any reasonably sane Linux distro. A system that can only build itself under itself just isn't flexible enough.)

Unfortunately, there are a number of problems with trying to actually do much under a hostfs mounted root filesystem. For one thing, hostfs is weird about /dev entries: the permission check is done as if they were normal files, which means if the host user (the user running the UML binary) can't access it, then the root user inside UML can't access it either. This means that /dev/console is inaccessible when running from a tty rather than an xterm, and /dev/loop0 is inaccessible either way.

Also, even though we've got writeable space we can't mknod or chown in it, and when extracting a tarball as root, tar exits with an error if it can't chown the files to the UID and GID of the original owner. (Yeah, there's a long option ala --don't-do-that-then in gnu tar, but not in busybox tar, and it's too obscure for me to really want to add it to busybox.) And on top of that, we still need to make the /tools entry at the top of the root directory, and the user running UML can't write there...

The solution to all this is the mktools-umlsetup.sh script, which UML runs to set up an environment we can write into. It makes a half gig loopback mounted ext2 image, bind mounts all the hostfs directories into it, adds a ramfs /dev with the devices we need (with the right permissions) and the UML procfs, redirects stdin/stdout/stderr to the new /dev/console just in case UML couldn't access the old one, and chroots into it to run the third script, mktools-build.sh.

If you're not using UML, you can skip mktools-umlsetup.sh and just run mktools-build.sh instead. To do this, you need to be running the same (or newer) kernel version we're building, and you must be running the build as root.

The other change is that all the writing the build does is now done into a directory called "tmpdir". At the end of the build, tmpdir contains a tarball (tools.tar.bz2) with the new /tools directory it built. Everything else in there is a tempfile that can be deleted.

What I'd like to do is use a variant of the firmware linux technique to glue a squashfs onto the end of the UML instance, containing the /tools and the source code. That way, there's a self-contained build executable that you just run ./build and away it goes. Unfortunately, to get this to actually _work_, the init program in the initramfs needs to know A) the complete path to the UML executable so it can do the hostfs mount of that directory and loopback mount the executable to get at the squashfs, B) know the current directory so it can hostfs mount that and create a temporary file to put a loopback mountable ext2 filesystem into to have writeable space. (This could be in /tmp, but assuming that /tmp has 500 megs of free space is a much more questionable assumption that assuming the current directory does.)

So anyway, the new flow of control for the first half of the build is to run ./mktools.sh, which builds UML and uses it to run mktools-umlsetup.sh, which runs mktools-build.sh once it's got a sane environment to do so in. Then you grab tmpdir/tools.tar.bz2 and delete tmpdir.

That is, once I get the new build scripts and tarball uploaded. Still tweaking stuff...

Feb 19, 2005

Still sick. This cold just hangs on...

Okay, instead of using a ramfs to make / writeable in the UML instance, I've switched to making a half-gig loopback mountable ext2 image and using that. This means that /tools (I.E. the cross-compiled toolchain created by Linux from Scratch chapter 5) doesn't have to be a symlink, it can just be at the root level of the ext2 image, and everything under it is fully writeable space that the UML root user can chown and mknod and such into. Life is good.

I was worried for a bit that reiserfs-addicted distros like suse might not have mke2fs available, but I checked SuSE 9.2 and that had it, so I'm declaring victory and moving on.

Feb 15, 2005

Sick since the 9th. (*cough*) If this explanation makes even less sense than usual, well, my head isn't entirely clear.

So User Mode Linux is building (2.6.11-rc4 is fine out of the box and it looks like 2.6.11 final might actually be usable). I built it at the start of mktools.sh and tried to run the rest of mktools under it, ala:

linux rootfstype=hostfs rootflags=/ rw init=/path/to/myscript.sh

That uses the parent system's filesystem, but runs my script within the new UML kernel. First I tried out just running UML as root with init=/bin/sh, upgrading the kernel headers to the new versions I couldn't use under the old kernel (because they caused anything trying to use the resulting uclibc to segfault), and running the existing mktools.sh to build a system. That eventually worked, although on the first try the gcc build failed because it's such a memory hog that a system with 32 megabytes of ram and no swap (what UML allocates for itself by default) runs out of memory trying to build gcc. Adding mem=48M to the UML command line got me past that, and everything else went smoothly. (Adding the keyword "quiet" to the UML command line is nice, too.)

So then I tried to modify the mktools.sh to build and use UML automatically. Not only that, but to get everything to work as a normal user rather than as root. In theory, running UML as the normal user should let me use the simulated root user inside UML to do all the mount and chroot dirty work I need to do. In practice, filesystem permissions are still a bit of a problem.

UML still needs a filesystem with runnable binaries in it, so we're borrowing the parent system's filesystem via hostfs. (Yes I could supply my own binaries, but that's what the tools directory is, and this is the script to build the tools directory.) So even though we're root within the UML, the hostfs filesystem we're using only gives us the access permissions of the user the UML binary is running as. So when we try to meddle with the filesystem to chown files, or create or delete files where we don't have write access, it fails.

In theory, /tools doesn't need any files with nonstandard permissions or ownership, and once we've got /tools we can . In practice, some tools get very confused when root can't do stuff.

The first hurdle is that I couldn't create the /tools symlink in the hostfs. (The symlink has to be at the top of the filesystem so binutils and such can access libraries and the shared library loader at the right path before the chroot actually puts it at that location.) No problem: I whipped up a little script to mount a ramfs, --bind mount all the top level directories into that (and duplicate any top level symlinks), and then chroot into that directory. Voila, have the UML run the script and the root directory is now a writeable ramdisk. (Okay, not quite voila since the busybox mount bug on January 23 was uncovered by this, but I got it working eventually.)

The next problem is that when extracting a source tarball as root, gnu tar tries to chown all the files to their original owner. And this generates an error on hostfs, so after 8 gazillion error messages tar exits with an error code, bringing my build script to a halt. Grrr. It doesn't try to do this if it's not running as root, but if it thinks it's running as root and the filesystem doesn't agree, the gnu code makes a mess.

This is a small problem with a suprisingly thorny potential resolution set. There is a --same-owner option to the gnu thing that no other tar in the world recognizes, so as-is busybox tar would barf on that option. If I'm modifying busybox tar I can just hack up a quick patch to never call chown and build that right before building UML, although building anything in busybox causes it to compile the whole libbb directory because nobody's ever bothered to do accurate dependencies for that library, they just let the linker cherry-pick what it wants at the end. And who knows if any of the actual build or install phases will barf on a similar attempt to chown something?

I could try to call 'whoami' before UML is run, pass the username in, and have su call tar as that user, but that's pretty ugly too and I suspect that pam or shadow passwords (hostfs can't read /etc/shadow) or something similar might find a way to interfere with this.

I can also mount a filesystem that has full write access, and do the build in that. This can't be a ramfs because last I checked the high water mark of the mktools build is 370 megabytes, so I guess I should create a loopback mountable filesystem. The obvious candidate is ext2, but what if somebody runs this on SuSE, which has an unhealthy fascination with reiserfs? Can I assume mke2fs is available there? Should I build it from source before calling uml?

I hate having a cold.

Feb 4, 2005

Been a bit busy. Got engaged, had a birthday, adopted a feral kitten, lots of day job stuff, the Linucon hotel search... And of course my personal programming time has been mostly devoted to completely rewriting busybox mount. (Which is almost done, by the way.)

The major to-do areas for the weekend are:

Get my build actually producing a firmware image. (Use gnu cpio to make the initramfs for now, there's source code in the kernel tarball that I mean to glue into busybox, but that can come later.)
Boot said firmware image. (Mostly a question of making the appropriate initramfs script to mount and pivot_root and such.)
Update all the packages to current versions. (The tricky bit is the compiler and binutils, which I hacked for uClibc and to merge /lib and /usr/lib. I'm staying with the old bash because I hate the new "smarter" tab completion in bash 3. Maybe I WANT to tar xvjf a file that doesn't end in tbz or bz2, ever think of that?)
Build under User Mode Linux, and upgrade to newest kernel headers.
Add uClibc++ and corresponding C++ support to gcc.

That ought to be more than I can get done in one weekend. :)

January 23, 2005

Spent six hours tracking down a User Mode Linux bug that turned out to be a busybox bug.

I now have a working, more or less mainline version of UML (2.6.11-rc1-mm2). Build that with hostfs and go:

./vmlinux rw rootfstype=hostfs rootflags=/path/to/chroot/dir init=/bin/sh

(Still some idiosyncracies with the console permissions to work out, but I think I can work around that with an initial ramdisk...)

So I'm making a little script to run the build under User Mode Linux, and since part of the point of this is to avoid needing to run as root, this means that the hostfs I'm mounting when I switch into UML is effectively read only. Namely, creating the /tools symlink (from Linux From Scratch chapter 5) isn't something you can write into the top level of hostfs as a normal user. You may be root within UML, but the hostfs server is running as a normal user on the parent machine.

To get around this, I'm making a script that mounts a ramfs and bind mounts all the directories at the root level into the ramfs, does some other prep work (like mounting the UML proc, which has different info than the host proc), and then chroots into it. Viola, we have UML running with the host's binaries, but with the ability to write to the root device. We also did the equivalent of a chroot without needing to be root, and we can run binaries linked against the uClibc we build without segfaults, even if the kernel headers we build uClibc against are newer than the kernel we're running. The UML kernel is new enough to understand the syscalls the new kernel headers tell that uClibc to make, and it can translate it to standard libc calls.

The thing that didn't work was trying to do a bind mount within uml. "mkdir /tmp/walrus; mount -o bind /proc /tmp/walrus" worked fine on the parent system, and worked fine in a chroot, but it refused to work under the UML I built.

I spent six hours tracking this down. Learned a lot about UML and how the linux kernel in general handles syscalls along the way. (I love being able to stick printf calls in the kernel source to make it tell me what it's doing. Being able to compile and run the result without rebooting is really nice too.) Still, this is no replacement for the thing actually WORKING.

The result? It turned out to be a bug in busybox mount. Once I'd stuck a printf into execute_syscall_tt (in arch/um/kernel/tt/syscall_kern.c) to tell me the syscall number, looked up what each number meant in include/asm-i386/unistd.h, then looked up what function that connected to in arch/um/include/sysdep-i386/syscalls.h, and then tracked down the actual functions (mostly in places like fs/open.c and fs/stat.c, but some are in arch/um/kernel/syscall_kern.c)... Well, I now have a lot more appreciation for strace. (Too bad it doesn't seem to run under UML yet...)

Busybox wasn't actually making the "mount" syscall for my bind mount. For the proc mount, yes, but not for the second one. Instead, it opened /proc/filesystems, and once I figured out it works if I go "mount -t anything -o bind /proc /tmp/walrus", I knew what the problem was.

If I specify the type (which isn't _used_ when you do a bind or move mount), life is good. This is because I didn't select any of the block device backed filesystems when I was configuring UML. (No point, I'm only using hostfs and ramfs at the moment, and I want to see what the minimum feature set I can strip it down to is.)

The problem is that when busybox mount is doing an "auto" mount (which it does when you haven't specified a filesystem type), it reads /etc/filesystems (an obsolete relic I don't have), and then /proc/filesystems to get the list of filesystem types supported by the kernel, and skips all the ones labeled "nodev". It tries the mount with every filesystem type it doesn't skip, and stops trying with the first success. That's how auto mount is implemented.

Since I don't have any non-nodev filesystems, it reaches the end of /proc/filesystems (and reports a failure) without ever having tried to do an actual mount. This is despite the fact that, with -o bind or -o move, the type is completely ignored and could be anything. Filesystem type "walrus" works just fine with a bind mount. But not if it never actually tries a mount...

I've been meaning to rewrite busybox mount.c for a long time, because the code could definitely be cleaned/tightened up. I now have an incentive, it seems.

January 17, 2005

Made some progress, wrote about it in my livejoural.

January 15, 2005

Spent a week of Copious Free Time (tm) fighting with UML. Results are mixed.

The "gcc couldn't find ld" thing turns out to be brain damage from the free software foundation: if "." is in the path (even at the very end, as the default $PATH of uml puts it), gcc decides it knows better than the sysadmin and ignores the entire path. Thank you, Free Software Foundation. That took a long time to find because I never expected them to be that arogant, or so stupid that if they had a problem with the $PATH they'd silently ignore it rather than giving an error message. (And you wonder why I'm trying to rip out all their stuff and replace it with something sane?)

Blaisorblade's 2.6.9-bb4 tree provides a fairly stable user mode linux, and that's apparently the trick to getting something usable. But it's not perfect: now that I know how to get gcc to work, the makefile is hanging after the ./configure stage. (Spins eating 100% of the CPU and never makes progress.) Maybe it's a disguised out of memory condition with the OOM killer not getting triggered. UML claims 32 megs for itself on boot, which _should_ be enough to build things but possibly isn't. I'll try again with mem=64M perhaps... Nope. Same hang.

Meanwhile, the UML guys insist their patches are getting merged into 2.6, and they are. But 2.6.10-rc1 doesn't build UML for me, with a stripped down .config disabling all the subsystems I think I can do without. The first error (duplicate definition) was easy to fix: pick one and yank it. The second (lots of other stuff undeclared in arch/um/kernel/sys_call-table.c) requires more work. Maybe I should grab the -bk top and see what that does...

Or I could just say "screw it, UML still ain't finished" and just go upgrade all the packages I still use out of Linux From Scratch to the 6.0 versions. Probably a better use of my time. Come back to UML when 2.6.11 comes out...

Meanwhile, busybox and uclibc just introduced some random defect tracking database that all changes are now supposed to go through. Since I'm never going to use it, I suppose I should just maintain a patch list here on my little project page and occasionally post a notification to the list when I make some new change to busybox. Hmmm...

January 8, 2005

So I'm still fighting with User Mode Linux, which I need to build a version of uclibc with more recent kernel headers than the build environment the system is being compiled under. (Right now, firmware linux won't build on anything older than a 2.6.6 kernel, which is just wrong. But applications linked against the uclibc it builds have to run on the parent kernel. With user mode linux, this would not be the case...)

I just joined the UML mailing list, which took a while to find. (I found it because I know how sourceforge works and where it keeps its mailing lists. Later, I found that way down the list of links on the user-mode-linux.sf.net web page is a contacts link, which links to the mailing lists. Right.)

I posted an info dump about the problems I've been having with UML, and hopefully somebody will say something useful about it soon.

By the way, the UML build has a menuconfig option for a path to a directory containing the root partition to cpio up an initramfs out of. But the normal x86 build does not seem to have this option. Luckily, I know how to use cpio to make one. (Well, a small one worked anyway. The knoppix initrd hasn't yet, but the night is young...) Unluckily, busybox doesnt' support cpio so there's another to-do item while I grab the conventional (gnu?) cpio package and throw it in the build list...

Getting close to the actual one-file-system here. All the pieces work, now it's just putting them together...

January 5, 2005

Squashfs!

This is the filesystem I want to use. Its compression is comparable with cloop (slightly larger on the knoppix root partition in my tests, but much better than zisofs), without the speed penalty and need for module arguments to specify a filename.

Yes, it's an out-of-tree filesystem, but it does what I want, which none of the in-tree filesystems quite manage. It's not like my build process isn't applying numerous patches to various things already... :)