rss feed 2009 2008 2007 2006 2005 2004 2002 livejournal twitter

December 20, 2010

So far, Moscow is like a strange cross between Minneapolis, Detroit, and New Jersey. Except for the language thing. And they seem... not better dressed, exactly, but I don't see old/worn clothing. (Maybe it's under the layers?) There's a bit of tagging on buildings and such. (Wow, they got refugees from hurricane Katrina too?)

And the body language is different, it says to me "I don't care if you live or die, but I'm going to be polite about it", which is oddly reassuring. (It makes the women seem less threatening and the men more. There are some _stunningly_ pretty women here, and I don't feel as intimidated by them as I normally would because "they don't care if I live or die, but are going to be polite about it" is a situation I can understand and relax about. I know what's expected of me. In the US (where making eye contact is a form of sexual harassment), there's all this subtle "subliminal conflicting flirting signals I can't understand" which makes me think they're trying to sell me something or that high school _really_ screwed them up. My contry is full of people who put on make-up and high heels and then get angry if anyone notices. Or doesn't.

I may be just a little bit jetlagged.

December 18, 2010

QEMU git is borked in Ubuntu 10.04. I bisected the qemu mips bug to commit ec990eb622ad46df5ddc, and the commit before that works for qemu-system-mips.

December 17, 2010

Building LFS on non-x86 targets produces inconsistent results, which is the most frustrating type of result. For example, sometimes on mips a package will complain about being unable to recognize the machine type, as if "gcc -dumpmachine" is returning "xx", but I ran gcc -dumpmachine in a for loop that'd break out if it ever returned something other than "mips-unknown-linux" and it chugged along for a couple minutes without complaint. (Under both bash and ash.)

On another run, the mips build made it further and died building bash, with the error:

./mksyntax -o syntax.c
make: ./mksyntax: Command not found
make: *** [syntax.c] Error 127

Which is odd because it had previously made mksyntax:

gcc -DPROGRAM='"bash"' -DCONF_HOSTTYPE='"mips"' -DCONF_OSTYPE='"linux-gnu"' -DCONF_MACHTYPE='"mips-unknown-linux-gnu"' -DCONF_VENDOR='"unknown"' -DLOCALEDIR='"/usr/share/locale"' -DPACKAGE='"bash"' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I./lib -I./lib/intl -I/home/bash/lib/intl -g -o mksyntax ./mksyntax.c

But when I told bash to build again, it completed. (This involved a fresh copy of the bash source directory, and a complete configure/make/install in the new directory.)

Perhaps the mips semaphore implementation isn't protecting things it needs to?

December 17, 2010

My new job has been keeping me busy enough I haven't fired up my home laptop in a few days, and tomorrow I fly to Russia until christmas eve.

I've been blogging about work over at my old liveournal.

I still intend to get an Aboriginal Linux release out by the end of the month, mostly what's in source control right now plus better documentation.

December 12, 2010

After Camine went home last night I set my laptop running buildall.sh to build all the Aboriginal Linux targets with the unstable uClibc snapshot, and it filled up the disk. This makes XFCE really unhappy, I freed up 4 gigs since then but the available disk space applet seems to be stuck...

Ah, it had popped up a modal dialog which was hidden behind other windows. Right.

So, powerpc, sh4, sparc, m68k, and alpha are all horked. Alpha isn't checked in (and isn't finished, because no qemu system support last time I checked and the hardware was abandoned a decade ago anyway) so that doesn't count. I haven't been paying close attention to m68k because QEMU doesn't support it properly and the patch aranym needs for all the virtual devices is too big to maintain.

The sh4 target was already broken in 0.9.31, although now it's failing in uClibc and it used to make it all the way to the kernel before breaking:

arch/sh/kernel/process_32.c:303: error: conflicting types for 'sys_execve'
/home/landley/aboriginal/aboriginal/build/temp-sh4/linux/arch/sh/include/asm/syscalls_32.h:24: error: previous declaration of 'sys_execve' was here

But I hadn't particularly been paying attention to that one ever since the sh4 maintainer had a "platform limiting moment". It's on my todo list to fix the way sparc shared libraries are.

The other two (powerpc and sparc) built under 0.9.31 but break in the uClibc build with current -git. Emailed the list...

December 11, 2010

I'm slowly grinding through the to-read heap for work. I feel kind of bad that all this past week was taken up with HR paperwork and travel arrangements and setting up my new laptop and so on, and that I haven't actually gotten any _code_ written for them yet. (I'm aware that ramp-up at a new job is normal, but even after all these years I still feel really weird getting paid to do it. Being hired for your expertise isn't the same thing as a 100% fit that lets you hit the ground running, but it seems like it SHOULD be. As a consultant there was never really _time_ to come up to speed on anything.)

I'm also actually poking at http://kernel.org/doc a bit, checking in some tweaks I made over the past year and adding the 2010 OLS papers. I need to get quadrolith to update the Documentation and htmldocs directories from a cron job (although the htmldocs are once again not compiling cleanly). Is there really only a _draft_ proceedings on the OLS website for 2010, months after the actual conference happened? (One which has the abstract for the Open Invention Network guy's talk but not an actual paper?)

Somebody made noises about taking over kernel.org/doc a few months ago, but it didn't happen. I suppose I should yank the topic index from the main page (since I'm unlikely ever to fill it out, and other people have written books on the kernel already), so that the resources that ARE there (sections 1 and 2 basically) don't get lost in the noise.

December 10, 2010

Got up at 5:30 am, drove to Houston to visit the Russian consulate, where I spent a couple hours filling out visa application paperwork with the help of the nice lady womaning the desk. (The consulate was staffed by extremely pretty women and extremely dour men. I had not previously encountered "dour" in person, but there is apparently a use for that word. Wikipiedia is of the opinion that Russian culture has different body language, and that smiling in public is considered gauche over there. So "frowing while being friendly" is just something you get used to.)

Got a dozen empanadas from Marini's on the way home. Camine was visiting, so they didn't quite last until dinner. According to the little visa application receipt I get to pick it up at 4pm on the 15th, so possibly I'll drag Fade along with me this time and we can Empanada in person.

December 9, 2010

Reading about cgroups for my new job (as part of Parallel's OpenVZ team). I'm also reading this and this and this and this and this and the net/sunrpc directory of the kernel source code and a few other things buried in my tabsplosion.

Alas, my first week on the job (I started Monday) has not actually allowed me a lot of time to plow through the above. Human resources paperwork to be received, read through, filled out, and faxed. Parallels sent me a laptop (with Windows 7 on it), so I tracked down who knew the login password, shrank the partition, installed Xubuntu LTS on it, and when I get back to that I need to figure out how to get the built-in wireless to work (according to dmesg it's not finding the iwl firmware, gotta track that down and install it, then fling git on there and set up a working environment).

And of course travel arrangements. I'm visiting Russia (to meet the rest of the engineers and get up to speed) on the 18th, which is a surprisingly expensive proposition at the last minute (only had a couple thousand liquid cash in the bank, and between the ticket and visa applications... I think I can avoid cashing out investments and just wait for reimbursements and/or paycheck to arrive, but it's going to be tight around here until then).

Needing a visa is a new experience, neither Canada nor the UK require one for US citizens visiting for less than 6 months. The Russian Consulate in Houston is open until 6pm but stops accepting visa applications at noon. I missed that detail, all prepared to start the 3 hour drive at 10am, checking to make sure I had everything I needed (I didn't, they don't accept personal checks so you need a cashier's check, and they need a passport sized photo stapled to the application meaning not the one in your passport)... And I find out they close 2 hours into a 3 hour drive.

Poked the various visa services the consulate recommends if you don't want to show up in person, but I confirmed that they just go down to the consulate and apply in person for you. So they couldn't do anything today either, plus _they_ need the passport photo (and my passport) physically mailed to them (to arrive _before_noon_ if it's to do any good)...

Yeah, getting up reeeeally early in the morning.

December 7, 2010

I've unsubscribed from the democratic party's mailing list and unfollowed @barackobama. I can't take the spinelessness anymore, the preemptive unilateral "compromise". I don't mind rooting for somebody who fights and loses, but I will not support someone whose opening position is to concede on a regular basis.

From Fisa to yanking Wikileaks' DNS, these guys are laughable on Net Neutrality. Guantanamo is still open a year after the deadline he set for closing it, and he escalated in Afghanistan to make it _his_ war. Airports have the porno-scanners and the Freedom Grope. The commander in chief never _tried_ to issue an executive order suspending enforcement of "don't ask don't tell", and appealed a court order that actually _had_ it stopped for a few days. The lack of a public option in the health care thing means I'll soon be legally obligated to buy from the for-profit insurance companies that caused the problem in the first place. Citizens United happened on his watch and all he did was express lip-service dismay. The gulf oil spill was a huge opportunity to propose alternative energy research and it never even came UP.

And of course the last straw: when I voted in the midterms, it wasn't for tax cuts for the rich. (Clue: the recession came _after_ Bush passed his tax cuts. What makes you think renewing them is good for the economy when passing them in the first place so obviously wasn't? Borrowing money from china does not strengthen this country, it didn't when half the stimulus was wasted and it doesn't now.)

I no longer care what Obama does, or that dishrag Reid, because it bears no fruit. Nancy Pelosi at least DID HER JOB, but she's the one who lost it. I can't escape the conclusion that a vote for a Democrat is a wasted vote, because they do not effectively oppose Republicans. Heck, they can't keep the blue dogs in line. (The paralysis in the senate is their fault too: if you have 60 seats and they have 40, 1/3 of your guys can sleep and you still have enough guys on hand to win votes. Keep the senate in session 24/7 until their health gives out, they're not exactly _young_. Bill Clinton forced a government shutdown to neuter Gingrich and it _worked_, the other side was made to pay a political price for their actions. But Obama has all the offensive capabilities of Jimmy Carter, and that's really sad.)

December 3, 2010

When I try to build Linux From Scratch for all targets in parallel, armv4l, mips, and mipsel all fail building the "gmp" package:

configure: WARNING: you should use --build, --host, --target
checking build system type... Invalid configuration `xx': machine `xx' not recognized
configure: error: /bin/sh ./config.sub xx failed
Command exited with non-zero status 1

But when I launch a build of one of those targets from the command line, (more/native-build-from-build.sh armv4l build/control-images/lfs-bootsrap.hdc) it builds just fine!

I hate debugging inconsistent behavior. It's reproducible, but only as part of a much larger LUMP. My first guess at what's happening is that I've replaced config.guess with a call to "gcc -dumpmachine" (the build-one-package.sh script does this as part of package setup), and that's bouncing off of distcc, and when the system's under sufficient load the distcc call is failing and retrying locally, and that failure is outputting an error message that confuses the steaming heap of autoconfig that called it.

But that's just a guess. And why it's always those three targets (consistently, reliably) is...

Weird.

Sigh. Added logging to the gcc -dumpmachine output (piped it through tee) and the next build, _zlib_ died, with "make: vfork: Function not implemented".

You know, I think what's making the build work is "-j 1". If so, that's kind of sad...

December 1, 2010

Yeah, I'm overdue for getting an Aboriginal Linux release out. My real deadline is monday, when I start my new job telecommuting for Parallels. The todo list between now and then is:

Get lfs-bootstrap tested on every target.
Beat the minimal documentation into shape. (Specifically the about and FAQ web pages, and the README files in the www, downloads, and downloads/binaries directories.)
Poke at targets that don't build to see if any are easiliy fixed.
Write up release notes.
Set up a new aboriginal mailing list (install mailman on grelber).

There's a bunch of other todo items like refreshing the snapshots and redoing the cron jobs to properly test the git repos that I should really do, but which aren't blocking the release. (I'm holding off on uClibc upgrade, or trying to make more stuff build with the busybox "hush" shell, until after the release.)

The main new feature this release is a Linux From Scratch 6.7 control image, to natively build the Linux From Scratch 6.7 packages under each system image. This keeps the existing toolchain and C library (so I don't build glibc, binutils, and gcc, although I probably could), and installs each new package into the existing root filesystem (gradually replacing most of the busybox stuff as it goes along).

This turned out to be a necessary step before trying to bootstrap Gentoo, Fedora, or Debian, because "this distro's build system doesn't work" and "the packages it's trying to build don't work" are two separate problem sets which are only manageable if handled separately. I think natively building LFS has flushed most of the obvious problems out of the base packages. I've got stub internationalization support, and several fixes for things busybox and uClibc were doing wrong.

November 29, 2010

My laptop only has 2 gigs of ram, and running the memory hog that is chrome that means it's constantly swapping, even when I'm not doing some crazy disk hog activity (like more/build-control-images.sh).

Every hundred characters or so, vi calls fsync() on its' swap file, because making sure what I'm typing hits disk is apparently more important than letting me continue typing. This causes pauses of about a minute (and much worse behavioir is easy to trigger).

In theory, "set swapsync=n" tells it not to do that. In practice, this hasn't been working for me with Ubuntu's increasingly damaged vim since I installed 9.04 (which I'm still using because newer versions of ubuntu are even less pleasant to deal with). I eventually got really sick of sitting there waiting for vi to stop thumb twiddling, and hit it with a big hammer.

I made a file "shutup.c" containing this:

int fsync(int fd) { return 0; }
int fdatasync(int fd) { return 0; }
void sync(void) { return; }

Then built it with:

gcc -fpic -c shutup.c
gcc -shared shutup.o -o shutup.so

Move shutup.so into /usr/local/lib, and replace the /usr/bin/vi symlink (which was a symlink to /etc/alternatives/vi which was a symlink to /usr/bin/vim.tiny because ubuntu is insanely overcomplicated for no apparent reason)... Anyway, replace the symlink with this shell script:

#!/bin/sh

export LD_PRELOAD=/usr/local/lib/shutup.so
exec /etc/alternatives/vi "$@"

And viola (a type of stringed instrument), vi no longer takes an insane amount of time to do anything because it calls NOP stubs for the sync functions instead of waiting for data to hit disk which doesn't do me much good because :recover doesn't work reliably anyway.

November 28, 2010

Granularity and module organization is always the hard part of design. I want a "just do it" button for convenience (build.sh), and I want every single piece individually callable so you don't have to sit through unnecessary work (sources/sections), and I want nice intermediate levels that make logical sense (all the scripts build.sh calls). And I don't want to overwhelm the user with a tardis console full of options they don't know how to use.

Case in point: the native-build.sh stuff. This mounts three disk images into the virtual system, /dev/hda is the root disk (squashfs, read only), /dev/hdb is a 2 gigabyte ext2 partition mounted on /home so the development environment has writeable space, and /dev/hdc is the control image containing build scripts and source code to drive automated builds. (Possibly the script should be called "automated-build.sh", or "use-control-image.sh", but renaming at this point when I'm not 100% sure of the new name would just cause more confusion.)

The native-build.sh script is the script packaged with each system image launches the emulator and hooks up those three drives to perform an automated build. Along the way, it's also deleting an existing hdb.img so the build will happen fresh from the beginning each time it's re-run. The re-use of existing hdb.img is where today's design tangle comes up. The default behavior is safe, convenient, and limiting.

If the script _doesn't_ delete hdb.img, then running the build twice can screw things up. Running two builds with different control images will create different /home/control-image-name directories and possibly fill up the space on the disk.

This is running them one after the other. Running two at once would be bad because mounting the same disk writably from two systems will corrupt the filesystem. I _could_ make it so the hdb.img is essentially a temp file with a random name that gets deleted right after qemu launches, which would make two parallel runs in the same system image (I.E. building in the same target architecture using two different control-images) seamlessly work. (Remember, Linux won't actually reclaim the disk space while a program has the file open.) But it's too black magic and hard to explain, and means you can't look at the disk after the emulator exits to see logs and such, and in general makes me uncomfortable.

Right now you _can_ specify which file to use with $HDB. I suppose what I should do is have it only delete any existing hdb.img if you didn't specify a file to use. This is more subtle than I'm comfortable with, but _not_ cleaning up leftover debris from previous builds introduces subtle breakage I'm not comfortable with either.

I need more documentation.

November 27, 2010

Ran memtest86 on quadrolith for 14 hours, and it didn't find anything wrong.

I built everything on my laptop and all the targets I expected to build did so. (Sparc, sh4, and mips64 didn't work and I need to fix them, but that's not a big surprise.) That builds with -j 3, I.E. number of CPUS plus 50% and my laptop has two, so I did a nofork CPUS=3 build over on quadrolith, and lots of stuff broke, in seemingly random ways.

Randomness is generally caused by some weird asynchronous race condition (otherwise the darn thing fails in a deterministic manner, even for things like uninitalized variables), so I tried building with FORK=1 CPUS=1 (which builds all the targets in parallel, but builds each target with -j 1), and that built all the targets successfully. So it's happily building more than a dozen architectures in parallel, but as long as each architecture is a separate single-threaded build happening in its own directory.

Something about gentoo screws up builds with more than -j 1. It's not the hardware, it's the OS I have installed on the box.

Sigh.

November 26, 2010

Intermittent bugs are creepy. For example, the uClibc build for armv5l just did this on quadrolith (my 4-way cron job server):

In file included from libc/misc/time/_time_localtime_tzi.c:8:
libc/misc/time/time.c: In function '__time_localtime_tzi':
libc/misc/time/time.c:738: error: 'LONG_MAX' undeclared (first use in this function)

That tzi.c file is a wrapper around time.c, which #includes limits.c on line 138, which #defines LONG_MAX. So, as far as I can tell, that should work just fine. And in fact running it again several times in a row has worked just fine.

This could be a problem with quadrolith, perhaps bad memory. It could be something that only shows up doing a -j 6 build with really precise timing (but... what?) It could be some deeply obscure gcc bug, or something with ccwrap, or some kind of weird kernel problem with dentry cacheing, or some kind of crosstalk due to the hardlinks sharing a file.

Except that if dentries are going bye-bye or file contents are getting corrupted, gcc should have compained about something ELSE first, such as not being able to find a file to #include or having unbalanced #ifdefs. Bad memory leads to lots of weird intermittent errors and a kernel panics.

Just re-ran buildall and it went into an endless loop in the armv4eb build, configuring gcc:

./config.status: line 680: test: -gt: unary operator expected
expr: syntax error
./config.status: line 680: test: -gt: unary operator expected
expr: syntax error

And then ran it again and it didn't do that.

Sigh. Either the component package builds are insanely non-deterministic in the presence of more than 2 CPUs worth of parallelism, it's unhappy running on a Gentoo host, or quadrolith's hardware is horked. No hints in dmesg about any of this.

I'm used to doing development/stabilization cycles. I'm even used to things that USED to work having weird regressions for no obvious reason (because I updated packages and such). But I really want _DETERMINISTIC_ bugs.

How hard do I have to hit this, and with what, to make these darn failures deterministic? Sigh...

Step 1: run memtest86 on the box, and leave it running overnight.

November 22, 2010

I hate udev.

Currently, I hate this bit of udev's Makefile:

# move lib from $(libdir) to $(rootlib_execdir) and update devel link, if needed
libudev-install-move-hook:                                       
        if test "$(libdir)" != "$(rootlib_execdir)"; then \        
                mkdir -p $(DESTDIR)$(rootlib_execdir) && \     
                so_img_name=$$(readlink $(DESTDIR)$(libdir)/libudev.so) && \
                so_img_rel_target_prefix=$$(echo $(libdir) | sed 's,\(^/\|\)[^/]
                ln -sf $$so_img_rel_target_prefix$(rootlib_execdir)/$$so_img_nam
                mv $(DESTDIR)$(libdir)/libudev.so.* $(DESTDIR)$(rootlib_execdir)
        fi

If you don't specify --prefix, this hunk of unnecessary complexity will attempt to mv the libraries it just installed from /usr/lib to /lib. In my case, /lib is a symlink to /usr/lib so this breaks because the source and target are the same file.

Let's list what's wrong with this, shall we? This makefile just installed the file in one place, and is now trying to move it to another. Why not just install it in the right place in the first place? Why not set the --prefix default to not install into /usr/lib in the first place? (Note that setting --prefix=/ means that libdir becomes "//lib" which is still not equal to /lib.) Why care where it's installed in the first place since /usr/lib can just as easily be in initramfs or initrd as /lib can, this distinction was already obsolete 10 years ago and udev was only launched 5 years ago.

But mostly: this is code layered on top of other code. Rather than fix the underlying code, they just slatered another layer of crap on top. The above hunk SHOULD NOT EXIST. And the solution is to yank the libudev-install-move-hook dependency out so it never triggers. (I bother complaining about this because it's actually _not_ FSF code, where this kind of self-defeating programming is a matter of course.)

Topic switch!

I've been enjoying a Harry Potter Fanfic written by a logician (not something I normally go in for), so I decided to read the guy's website linked from his profile. At the upper right corner, it has a "sequence" link for reading his old posts in batches. So, clicking on the first link each time, we get to "core sequences", "map and territory", and the essay The Simple Truth. And I read over twenty pages of meandering, masturbatory drivel about sheep and buckets before I start skipping and hoping there's more of a point to the post than contained in his introductory paragraph. (I repeat: this is where his website _suggested_ I start, I _already_ like his writing from the fanfic, and this is seriously turning me off.)

It's always sad when smart people prove biologically incapable of explaining themselves to outsiders. It makes me question the validity of what they're trying to explain, because I start to wonder if they can really see other people's point of view, or if they honestly never seriously took other people's positions into account because they couldn't get their heads around the other guy's position well enough to actually understand their argument.

This is a failure mode I've most often previously encountered among libertarians. The long rambling delivery gradually filling itself with in-jokes until the dude must be preaching to a _subset_ of the choir, and the writings devolve into identity goods rather than any meaningful attempt to communicate information or argue a point. I suspect the underlying failure is an unquestioning belief that their point of view is the only correct one, so obviously everyone will come around to it eventually. Despite thousands of years of humanity not doing this, the majority of people on the planet today disagreeing, it's somehow a fait accompli. As inevitable as lunar colonies, or Linux on the Desktop, powered by the strength of the idea itself, things that just _happen_ rather anybody specific ever actually needing to _do_...

Sigh. I suppose "that's a blog for you", but if you're going to bother to collect them together into series and say "start here", turning off your audience on the very first one? Seems a bit of a waste.

November 21, 2010

I hit a bug that requires pages of backstory just to explain the _context_ for.

The Aboriginal Linux package cache infrastructure (which I documented in mind-numbing detail) downloads tarballs and stores them in the "packages" directory. (The obvious alternative to giving you a download script is including the tarballs in my source control, but I didn't write them, want to document where they come from so you can see for yourself if there's something newer. So I want to let you download them directly from the upstream sources, although I irror them on my website and the download infrastructure will fall back to trying that mirror if the upstream source vanishes.)

The script "download.sh" does this downloading, using shell functions so that an actual download invocation looks like:

URL=http://cxx.uclibc.org/src/uClibc++-0.2.2.tar.bz2 \
SHA1=f5582d206378d7daee6f46609c80204c1ad5c0f7 \
download || dienow

The URL is the place to download the file if we haven't currently got an existing file matching that sha1 checksum value. The _only_ place anywhere in the Aboriginal build scripts that stores the _version_ of the package is that URL in download.sh. Everything else is version-agnostic (wasn't easy to do, but I made it work). This is also the only file that lists all the packages we use in one place.

Extracted copies of the contents of these tarballs live in build/packages, and are re-extracted whenever the tarball changes, or when one of the patches applied to that tarball (from sources/patches) changes. (Including patches getting added or removed.) We use a special snapshotting technique ("cp -l", which links to the files rather than copying them) so that when we do multiple target builds in parallel, we don't spend all our time extracting and deleting the same source code over and over again (which is uses up buckets of memory for disk cache, uses buckets of disk space, and uses buckets of I/O bandwidth reading and writing all that redundant data; I implemented this cache for a _reason_, it more than doubled the speed of the build, but I also tried very hard to make it easy to ignore).

When doing multiple builds in parallel you really want to populate the cache up front, to avoid a race condition where two builds needing the same package at once both try to populate the cache at the same time and fight. Since the only file that lists all the packages we use is download.sh, it made sense to teach the download function how to extract and patch the tarball into the cache as soon as it confirms it has a good copy of the tarball. Setting the variable $EXTRACT_ALL tells it to do this.

The problem comes in when you update a package's URL in download.sh to point at a new version. This means it downloads a new file, but the old file is still there in the "packages" directory. The build has infrastructure to deal with this: it records the time download.sh was run, updates the timestamp on each file as it checks its sha1sum, and then a function cleanup_oldfiles gets run at the end of download.sh to delete any files older than that start time. However, $EXTRACT_ALL extracts the file into the cache at the end of the download() function, which is before cleanup_oldfiles can run. So it has to deal with there potentially being two copies of this file.

And there's a way to handle it: use "ls -tc" to sort by creation time and then pick the first entry, which is the newest file. That's not the bug I just hit, that's a bug I hit (and fixed) many moons ago.

The bug I just hit is that when you have "udev-123.tar.bz2" and "udev-tests-123.tar.bz2", feeding "/path/to/udev-*.tar.bz2" to the ls -tc (to zap the version info) can return udev-tests when you asked for udev. Which is why udev isn't building in my LFS 6.7 stuff.

The _FIX_ is this horrible mess:

get_package_tarball()
{
  ls -tc "$SRCDIR/$1-"*.tar* 2>/dev/null | while read i
  do
    if [ "$(noversion "${i/*\//}")" == "$1" ]
    then
      echo "$i"
      break
    fi
  done
}

Plus patching the call site. This leverages the existing monster regex (in the noversion shell function) to strip the version information off a tarball, which doesn't consider a string like "tests" to be a version number. The magic right after that strips off everything before the last slash, to trim the path off the filename.

Yes, this is what I do for fun.

November 20, 2010

Halfway decided to just jettison internationalization support in lfs-bootstrap for the moment, since the uClibc code for it is unfinished, brittle, and stagnant since roughly 2003, and the autoconf stuff that tries to use it is even worse. Trying to get LFS to build through to the end even just on my _laptop_ (let alone portably) with internationalization support spun off so many TODO items I started to lose track of them.

So I just backed off and added --disable-nls to util-linux-ng, kbd, and psmisc... And the shadow build died with an _assert_ in the findutils version of the find command. (Not the busybox one, the findutils one.)

Hmmm... Finish debugging NLS, or switch it all back off and use a big hammer to teach packages not to use it. Decisions, decisions... Both are a bit of a mess, really. LFS wants internationalization, and I don't want to diverge farther from necessary from building stock LFS 6.7. That's kind of the point...

# find man -name Makefile.in -exec sed -i 's/groups\.1 / /' {} \;
find: mbuiter.h: 171: mbuiter_multi_next: Assertion `iter->cur.wc == 0' failed.

The fact that I break busybox all the time, and have to fix it, doesn't disturb me. The fact that I break supposedly "standard" tools, that I find disturbing.

Right, rephrase it as "find blah -print0 | xargs 0 blah" and just _ignore_ the absence of pipefail for the moment...

configure: error: posix_spawn is needed for nscd support

Ok, "./configure --help | grep nscd" says there's a --with-nscd, so try adding "--without-nscd"...

Then texinfo went:

..//makeinfo/makeinfo: mbuiter.h: 171: mbuiter_multi_next: Assertion `iter->cur.wc == 0' failed.

Ok, something is wrong with uClibc somewhere. Google found a gentoo bug report that says it's been a known issue for a year, and backing off to uClibc 0.9.28 is their workaround.

Why do I insist on thinking of uClibc as _maintained_? That's not really an accurate description of what's going on here, is it? And of course their repository is such utter crap that I can't exactly git bisect my way between the known good version and the current version, can I? Because none of the versions in between work at _all_, so there's nothing to test...

A quick google for "gettext stub" found this, which is probably the way to go if I revert all this internationalization insanity. But this also looks like a one line fix in uClibc, which NOBODY HAS BOTHERED TO DO FOR A YEAR...

November 19, 2010

Cross compiling still sucks, part 8 gazillilion:

So, uClibc has a libintl.so, which provides at least stub versions of the various functions, except they have names like gettext() and various programs are trying to link against libintl_gettext() so I added a bunch of weak_reference() prefixed versions to the .c file...

And then found out that the libintl.so isn't getting installed. Why not? Because even though UCLIBC_HAS_XLOCALE is on, there's a separate symbol called UCLIBC_HAS_GETTEXT_AWARENESS controlling large bits of the internationalization support (including the libintl.so stub install), and that depends on UCLIBC_MJN3_ONLY, which is a debug symbol switching on Manuel Nova's personal warnings. The last time he had anything to do with the project was 2006, and this code dates back to 2003.

Right.

When I switch on UCLIBC_MJN3_ONLY, it doesn't reliably build in parallel anymore. But that's not even the worst part. When I export CPUS=1 so everything's building with make -j 1, the bash build dies during native-compiler.sh:

gcc --static  -rdynamic -g -O2 -o mkbuiltins mkbuiltins.o -ldl -lintl 
/home/landley/aboriginal/aboriginal/build/host/ld: cannot find -lintl

So I changed what was available in the _target_ compiler, and that made the bash build change what flags it sent to the _host_ compiler. (You know how I keep saying cross compiling sucks? It's because it sucks.)

Tried changing CC_FOR_BUILD=$CC but that just builds a target binary and tries to run it on the host. Passing --disable-nls to configure didn't fix it. Neither did throwing "ac_cv_func_bindtextdomain=no" in config.cache because there's a stupid test in the autoconf-is-useless ./configure script that checks if it's currently off and if so _retests_ halfway through. The fix? Putting "ac_cv_func_bindtextdomain=nyet" in there, which is neither "no" (which causes a retest) nor "yes" (which causes the wrong behavior to trigger).

I hate autoconf. I hate cross compiling.

So, that builds again, although the builds now crap all sorts of #warnings because the UCLIBC_MJN3_ONLY configure symbol poisoned the header files. Great. Proper fix: break the dependency between UCLIBC_HAS_GETTEXT_AWARENESS and UCLIBC_MJN3_ONLY. (Or really, change all the GETTEXT_AWARENESS tests to be testing for XLOCALE since that's what that symbol is supposed to MEAN, isn't it?) Oh well, todo item for later, let's see if it fixed the darn intl_thingy() not found warnings... gettext still builds...

But util-linux-ng does not. Because its ./configure tests for how you find gettext() and sets INTLLIBS to the libraries you have to enable to add that support, and an assignment to INTLLIBS gets propogated into 33 different Makefiles generated by ./configure, but the value is NEVER READ. Every single usage of it is an _assignment_.

I hate autoconf. This is working in glibc by accident and it's never tested against anything else. That ./configure song and dance of running a thousand different tests is ENTIRELY FOR SHOW, because it's ONLY EVER BEEN TESTED AGAINST GLIBC and the tests are contradictory crap and the results from the tests are often never actually used.

I'm going to go watch meerkats narrated by Samwise Gamgee.

November 17, 2010

And landley.net is back up. Yay!

The lfs-bootstrap Aboriginal Linux control image is coming along nicely. (I'm trying to get the next Aboriginal Linux release out December 1, with it natively building Linux From Scratch 6.7 for all the targets.) I've got all the package build scripts coded up and the various bug workarounds in place. Perl is building (which is by far the most horrible package of the lot, although iproute2 was surprisingly fiddly).

But in order to build gettext I had to enable uClibc's internationalization support and tell it to build "minimal locales", which still requires more locale support on the host than quadrolith (my gentoo server running the cron jobs) has installed. So the cron jobs aren't working and the build isn't really properly portable until I fix this.

I could instead have my lfs-bootstrap install the libiconv package from the FSF's gnu-gnu-gnu-dammit project, but current LFS isn't doing this and adding more gnu-gnu-gnu-dammit packages is unpleasant. Fixing upstream uClibc locale support would be a better solution, so I've been saber rattling a bit in that direction.

The most disturbing part about that is that uClibc locale support uses data that _isn't_in_uClibc_. It grabs locale data off the build machine and repackages it, so that if you rebuild under a uClibc system it's just passing that information along to the next cycle, like one of those time travel plots where your future self tells you the password and it never _came_ from anywhere. Needless to say, this is not good engineering practice. It's also a potential LGPLv2 license violation of upstream glibc (or possibly libiconv) that this came from.

The solution in uClibc is "prebuilt locales", which involves downloading a tarball of generated files from 2003, dropping them into the build, and hoping nothing has changed:

$ tar tvzf uClibc-locale-030818.tgz 
-rw-r--r-- mjn3/mjn3     67513 2003-08-18 23:07 c8tables.h
-rw-r--r-- mjn3/mjn3   1188953 2003-08-18 23:07 locale_data.c
-rw-r--r-- mjn3/mjn3      3375 2003-08-01 15:12 locale_mmap.h
-rw-r--r-- mjn3/mjn3      1212 2003-08-18 23:07 lt_defines.h
-rw-r--r-- mjn3/mjn3    106643 2003-08-18 23:07 uClibc_locale_data.h
-rw-r--r-- mjn3/mjn3     34704 2003-08-18 23:07 wctables.h

This is unmaintainable to the level of being _creepy_, and again might be a license violation. (Do these files count as the "preferred" source format, if there's a deeper layer they're generated from? When you use source files to generate other source files from which you generate binary code, _which_ source files does the license oblige you to redistribute? Darn it, I need to ask Bradley Kuhn on that one, I think.)

In theory the build I'm doing is generating new versions of those files from the locale data on my Ubuntu host, which ultimately came from either glibc or the gnu-gnu-gnu-dammit libiconv package. So I can put together my own version of this package, although see "license violation" above on why I'd rather not attempt to distribute that myself. And there's rumors of word size and endianness dependencies in there, so I'd need four versions to do it right even though the current one doesn't. (So far I'm just testing i686 because I can do the lfs-bootstrap build as a chroot on that one, which is fast to test. Once that's working, I need to try all the other targets and make sure they build too. Powerpc and mips are big endian 32-bit, mips64 is big endian 64 bit, x86_64 is little endian 64 bit.)

But... why those six files? The only new file that shows up on target is the uClibc_locale_data.h file, the other .h files don't seem to get installed (or maybe they're already there in the non-locale case). What does the .c file get used for?

Alas, the makefile in extra/locale is as impenetrable as any other Makefile, and trying to trace through the horrible combination of declarative and imperative code just gives me eyestrain withous answering my questions about what it's DOING...

I also need to whip up a 64 bit sparc target and see if uClibc supports that. Oh, and the reason uClibc doesn't support 64 bit PPC is that all the 64 bit PPC systems use a 64 bit kernel with a 32 bit userspace, apparently 64 bit PPC is hideously inefficient or something. So I'd need to hack up simple-cross-compiler.sh to produce gcc+binutils but _not_ add the kernel headers and the C library, then use that to build a kernel, but use the 32 bit root-filesystem tarball. While I _can_ do this, the current design didn't really have that in mind. (Hmmm, add a NO_LIBC config entry for the first part, but the second pretty much says "don't use build.sh, that's not what it's for". Either I hack up build.sh with a horrible special case or I have an architecture you invoke the build of in a different way from all the others. Neither is really appealing...)

November 15, 2010

Eric attempted to upgrade Grelber (the server among other things running landley.net) to Ubuntu 10.10. And it ate itself. And apparently multiple attempts to install Ubuntu 10.10 on different pieces off hardware all _hung_, and he's calling 10.10 a loss and backing off to Ubuntu 10.04 and reinstalling on a brand new machine...

But it's a server reinstall, and landley.net is down, so you won't see this blog entry until it goes back up, will you?

Oh well, twitter's still up...

November 13, 2010

Just read an old blog post of Eric's which seems to boil down to "having seperate repositories and cherry-picking submissions is an artifact of not having a good regression test suite". I.E. that having a single shared repositories all the project's developers commit to is fine as long as you have a good regression test suite to catch bugs early.

I disagree with this on architectural grounds. As Alan Cox once told me, "a maintainer's job is to say no". The reason for cherry picking commits is because there's more to it than just "does this code compile and perform a function", there's "is this code a good idea". Is it the right approach, is it something the project should be doing at all, is it hideously unreadable, does it paint us into a corner precluding future expansion without ripping it out again... These things are part of the maintainer's architect role.

Perhaps in some projects a maintainer's editorial judgement can be replaced by a test suite, but so far I've found projects with a "mob branch" to be remarkably uninteresting from a development perspective. Then again, my perspective may be skewed by the embedded world. My years on BusyBox were all about writing _better_ implementations of commands that already existed out there in FSF bloatware, and I recently had words with my successor about how "managing what you can measure" with Matt Mackall's bloat-o-meter has led to a severe loss of simplicity and elegance in the project's code. (This isn't Denys being a bad architect, this is him and me having very different priorities. It's also easier to see this stuff when you aren't trying to drink from the firehose of patch submissions needing review.)

My point is that none of it would show up in a regression test suite. And presumably glibc would pass the uClibc regression test suite without ever giving you the first clue about why the uClibc project even _exists_.

November 12, 2010

So yesterday I came to the conclusion I have to rewrite the gcc wrapper script, and did maybe the first quarter of it. I should probably go and finish that at some point because what's left of the original wrapper could use some serious cleanup, but reading through the old code I managed to figure out how to do the resequencing I needed entirely by _removing_ bits of code the old thing shouldn't have been doing in the first place, which is generally the correct fix when appropriate. (I'm surprised because usually that turns up when dealing with FSF code; this is the first time in _years_ I've done it to non-FSF code. Of course, that's probably because I Don't Do Windows...)

November 11, 2010

And it's time for our weekly installment of the ongoing series "Autoconf is useless".

FSF software has this nasty habit of ./configure testing for various libc functions, deciding it doesn't like how they smell, and replacing them with its own rpl_thingy() versions. (The rpl_ presumably standing for "replacement" or some such.) It does this by a four step process:

A) having a header file #define the offending function name to the same name with an rpl_ prefix.

B) building its own internal version of the function thus #defined to a new name as well as all the callers of that function similarly being affected by the #define to use that new name.

C) Putting the .o file with the rpl_ version of the function into a library.

D) Forgetting to have the makefile link that library into the binaries it creates, so the build breaks at link time with an unresolved call to an rpl_ function.

It's like clockwork. Gettext did this with rpl_btowc, and now inetutils is trying to do it with rpl_ioctl. Autoconf is really good at coming up with codepaths that haven't been tested in multiple releases (if ever), fiddly workarounds with combinatorial complexity that have never undergone any sort of systematic scrutiny. It's there to pile up complexity, on the theory that you can never make things worse by _adding_ more conditionals and code paths to your program.

Kind of obnoxious, really. Adjusting to broken build environments means those build environments never get _fixed_.

Oh, and don't ask me what the gettext packing's ./configure having a --disable-nls option means. (As far as I can tell, it's a NOP, but it's there in ./configure --help. Disabling national language support in a package designed to do string translation for internation is kind of brain-bending on a design level...)

So next we move on to inetutils, which is dying with:

../libinetutils/libinetutils.a(tftpsubs.o): In function `synchnet':
/home/inetutils/libinetutils/tftpsubs.c:295: undefined reference to `rpl_ioctl'

WHY is it trying to replace ioctl()? Let's grep the ./configure output for "ioctl":

checking sys/ioctl.h usability... yes
checking sys/ioctl.h presence... yes
checking for sys/ioctl.h... yes
checking for ioctl... yes
checking for ioctl with POSIX signature... no
checking whether <sys/ioctl.h> declares ioctl... yes
checking whether ioctl is declared without a macro... yes
checking for sys/ioctl_compat.h... no

And we've got two "no" hits, the first of which is "ioctl with POSIX signature". Grepping the configure script for that string, we see it's compiling this test program:

#include <sys/ioctl.h>

int
main ()
{
extern int ioctl (int, int, ...);
  ;
  return 0;                                                             
}

The result of building/running that is saved in config.log as:

 In function 'main':               
279: error: conflicting types for 'ioctl'
/include/sys/ioctl.h:42: error: previous declaration of 'ioctl' was here

And in the header file we have:

extern int ioctl (int __fd, unsigned long int __request, ...) __THROW;

Ok, unsigned long != int, so the declarations do conflict, however the type promotion rules saying that calling with int should work fine. The bug here is that the test cares about something it shouldn't care about. Once again, the problem _is_ autoconf.

Luckily, we can just set the environment variable that autoconf itself tests, and it'll go "oh, I already tested that" because it has this huge cacheing layer of extra complexity to try to make up for the fact that it runs the same tests repeatedly. (Adding extra complexity to make up for having too much complexity is never a good idea, but this is autoconf we're talking about so we already _know_ it's not a good idea.)

So, re-run configure with "gl_cv_func_ioctl_posix_signature=yes" exported, and then re-run the build, and now it says:

  CC     forkpty.o
In file included from /usr/bin/../include/pty.h:26,
                 from ./pty.h:29,
                 from forkpty.c:20:
./sys/ioctl.h:362: error: conflicting types for 'ioctl'
/usr/bin/../include/sys/ioctl.h:42: error: previous declaration of 'ioctl' was here

Autoconf causes code paths that never get tested to breed. The standard headers #define standard functions like ioctl(), which you will be using out of your C library. Defining your own prototypes for functions out of the C library is INEXCUSABLY STUPID, but they're doing it anyway. From m4/ioctl.m4:

dnl On glibc systems, the second parameter is 'unsigned long int request',
dnl not 'int request'. We cannot simply cast the function pointer, but    
dnl instead need a wrapper.

I'm guessing that having a typedef for the function pointer in the headers and having the configure test select _that_ never occurred to them. (Or an #ifdef on glibc, since that's what the comment says they're testing for and you don't need a ./configure test for that since there's a #define for it...)

Ok, fix what they're _trying_ to do, which is that libgnu.a contains the functions' implementation but libinetutils/Makefile isn't feeding libgnu.a into the link (even though "libinetutils_a_DEPENDENCIES = daemon.o $(top_builddir)/lib/libgnu.a")... No, hang on... it's there in the link command line.

*boggle*

Poke, poke. Get the command line, run it myself... Add -v to see what the linker is getting... The argument order is changing.

DARN IT. MY FAULT. It's ccwrap. The wrapper is reordering the arguments, and in the case of mixing shared libraries with static libraries that screws stuff up.

Huh. Ok, this part of the ccwrap design came from the uClibc guys (via timesys), and although I've fixed it up a bit I've never really looked at the design much. I assumed they knew what they were doing and had reasons for this, but argument order is _important_. It can change header and library search paths which can cause it to find different stuff, and I'm kind of surprised this hasn't broken things before now.Ok, what ccwrap really needs to do is prefix the command line with --nostdinc --nostdlib, and then some fixups go at the beginning and others go at the end, but the arguments that you're passing through should go in the order they originally occurred.

Possibly I should just write a new ccwrap.c. After this, there pretty much won't be any of the original design left...

November 10, 2010

My cell phone connection through t-mobile was absolutely crap recently, and they're apparently also the reason I can't connect to freenode anymore, so I called their tech support line and got escalated to an actual support persion after the usual rigamorale of reciting jabberwocky and the gettysburg address into the automated system until it gave up and got me a human.

I've been paying $20/month to connect my laptop to my cell phone for a couple years now, this predates my switch from the phone they gave me to a nexus one. But for the past few days, the connection has been as the british would say, "total pants". It has four bars of signal but is behaving like it's either barely connected or the tower is completely overloaded. So I called support and asked them about it.

They said that my plan doesn't support my phone (despite six months of it working just fine), and said that I needed to upgrade to a $30/month "android" plan with a 5 gigabyte/month bandwidth cap. They assured me that triggering the bandwidth cap wouldn't cost extra, it would just slow down the connection until the next billing cycle. "From what to what?" They couldn't tell me. Not even ballpark figures.

The supposedly level 3 tech support guy gave me a speech about how "blackberries need to connect with a blackberry plan, iphones need to connect with an iphone plan"... Dude, A) you fail to understand the entire point of the Internet, B) it's been working fine for 6 months already so obviously this is not actually physically the case.

I kept paying the $20/month even when I couldn't use the internet (due to losing my bluetooth dongle with my old phone, reinstalling my laptop and not having the crotchety bluetooth stuff set up again yet, or not having upgraded to 2.2 when I got my new phone). I _kept_ paying it even though I strongly suspected that the USB tethering feature I was using didn't actually require it (since I got internet _on_ the old phone without it, and this is forwarding the internet on the phone).

The point of that plan was permission to tether my phone to my laptop. Now, t-mobile wants to increase the price of the data plan I kept TO BE NICE from $20/month to $30/month, on top of the actual voice plan.

I know AT&T is feeding at the iPhone trough, and Verizon can't tell dollars from cents. I left sprint to go to t-mobile because they wanted $75/month for the data plan (on top of the voice plan), which was just stupid. Who does that leave?

Oh well, back to picking my programming spots based on free wireless, I suppose...

(By the way, the solution seems to have been "the thing saying you have four bars is lying, put the phone on the windowsill".)

November 9, 2010

It's sort of hilarious that resume from disk takes so long that by the time it brings the desktop up, the screen saver has kicked in and blanked the screen. So the screen goes black during resume and stays black until at some point I hit a key and it unblanks itself and very, very slowly starts redrawing.

I'd estimate that resume takes about 10 minutes in all before the disk stops grinding. My laptop continues to have only 2 gigabytes of memory.

Poking at the gentoo chroot. I need to get xfce working under that, with the wireless card and the 3D card. Gonna be fun...

November 8, 2010

Had a few very nice days off, and now getting back to programming. Digging up the linux from scratch bootstrap, setting it up in an i686 chroot, and figuring out why it broke. (Because /bin/sh is pointing to busybox, and I switched defconfig from ash to hush. Right, todo item for later, get them working with #!/bin/sh -> bash first.)

November 3, 2010

Software Suspend is confusing. The first attempt to resume hung, and I had to reboot the machine. The second time, it happily resumed back to where it left off. Go figure.

And Mark deleted impactlinux.com over the weekend. Wonderful. No idea why he did that. (Probably related to him taking any mention of aboriginal linux out of the #edev channel on freenode?) No, I don't have the subscription records for the mailing list.

Oh well, at least http://landley.net/aboriginal is still up. I should move that domain to a server I administer, one with a bit more bandwidth. Still working on the next release, aiming for early december and trying to automated the Linux From Scratch 6.7 build as an hdc native-build.

Watching Obama give his concession speech, which I suppose was inevitable after Press Secretary Gibbs' "Stop Whining You Crybabies" motivational speech to the democratic base. Can't hear a word, but the scroll across the bottom is like an empty platitude drinking game.

My disappointment isn't about his Fisa vote, or Guantanamo still being open, or escalating the war in the Graveyard of Empires, or wasting an ENTIRE YEAR arguing about health care without ever taking the concept of a public option seriously, or the whole "Don't Ask Don't Issue An Executive Order Suspending Enforcement Since You're Commander In Chief And All" mess, or any other specific disappointment. It's that he still thinks he can find common ground with The Party of No. He has yet to figure out that every compromise so far was made unilaterally, by him. And now that they have MORE power he expects them to be LESS belligerent? (One definition of insanity is trying to do the exact same thing over and over and expecting different results.)

No idea if that spineless dishrag Reed won or not, although if somebody else is put in charge of the senate maybe they'll force Republicorp to actually filibuster instead of perpetually cringing at the mere _mention_ of the word. (Pelosi isn't in charge of the house either way.)

Sigh, so much wasted potential. I has a sad. Oh well. I voted, but I'm not in office. It's up to the ones who are to actually do something with it.

October 28, 2010

Watching some of my co-workers at qualcomm, I saw a bunch of strange bad habits that had never occurred to me before. A couple guys would constantly, every time they saw a ".sh" extension on a file, invoke it with "sh file.sh" instead of "./file.sh" like other executables. And then it would break because the worst technical decision Ubuntu ever made was to redirect /bin/sh to the Defective Annoying SHell back in 6.06, and the file said #!/bin/bash at the top, and they wouldn't understand why their casual override of its built-in specs didn't work.

Another guy said ./ at the start of every single file in the current directory. As in "cat ./thing". I have no idea where you pick that sort of thing up...

No real point, just odd. Where do people learn this stuff?

October 27, 2010

Today, The Linux Foundation destroyed the Consumer Electronic Linux Forum.

I am sad. Totally by coincidence, I was wearing one of my CELF t-shirts when I got the news. My opinion of the matter is unlikely to be a surprise to anyone who's read my previous posts on the topic, but I scooped up a few more details in the email I posted to the celinux-dev mailing list offering my condolences for their loss.

Between the FSF zealots at one end and the voracious pointy haired corporate drones intent on spraying "leadership" all over the other end, it's hard for hobbyists trying to do actual engineering to find a place to hang out.

Oh well. Just keep your head down and do the work, I suppose. One nice thing about corporate efforts is if you ignore them long enough, they go away. I'm sure the Linux Foundation's various new projects will have all the impact of Less Watts, the Desktop Linux Consortium, Linaro, Project Trillian, and so on. The Linux Foundation already seem to have forgotten Meego (itself formed by dinosaurs mating) in their rush to create a NEW pointless and unnecessary project that ignores what's already out there in hopes that everyone else will drop what they're doing and jump on board THIS thing instead.)

But I am sad to see CELF go away. It was a lot of fun while it lasted.

October 26, 2010

R.I.P. Freenode. You will be missed.

A week or two back, my phone stopped being able to connect to freenode, complaining that I had to identify using SASL (whatever that is) to access this server. The repeated failures drained the phone battery a few times before I figured out I needed to kill the IRC client to stop it, then it went on the todo list.

Today I fired up pidgin on my laptop (it's what Ubuntu provided as an IRC client, yell at them) and used my existing config to log in there. It doesn't work anymore either, it insists I need SASL.

I happily used freenode for years without this SASL thing. Both of these IRC client configurations _used_ to work just fine, and were broken by changes in freenode's servers. I changed nothing at my end, nor was I notified of the need for any changes, yet freenode no longer works for me.

I googled a bit, but nothing came up with a rationale why they were doing this, how to fake it from the login bar (it's a simple telnet protocol), how to beat this functionality out of the IRC clients I have without installing new ones, or really anything useful about it. I still don't know what SASL stands for. (I'm sure I could find out, if nothing else by asking Mark, but that's not the point.) I did find a freenode page mentioning sasl, but nothing about why, or how to make it work on my android phone, or even hinting that it was suddenly manditory. (That page is from january.)

I checked the main freenode.net page, and their using the network page, and neither contained the string "sasl". I tried changing my username and removing the password, to log in as an unverified user. No dice. I can't _connect_ to any servers in the irc.freenode.net pool unless I jump through this sasl hoop.

This means that a new user, coming to freenode today and trying to create a new account, would not be able to use the service. That's just BRILLIANT.

*shrug* I consider my freenode account essentially deleted by this change, and the service decommissioned. I'm sad to see it go, but any service that would throw this much of a monkey wrench in the way of newbies trying to use the thing is too snobbish to get a single new user _ever_, and will inevitably die, so it's not worth my effort to try to fix its self-inflicted wounds.

I should track down more people's twitter accounts...

October 25, 2010

I don't have time to write my own strace from scratch. I don't have time to write my own unfsd from scratch. I don't have time to write my own nbd server from scratch. I don't have time to rewrite busybox's md5sum and the shaXXsums other than sha1sum, or add bind mount autodetect to mount or "umount -r subdir" or tar autodetect or generally go into busybox and rip about half its' applets a new one....

I don't have time to update my git bisect HOWTO. I don't have time to restart qcc (or at least rip off the linker to try with llvm).

I need a Tardis.

October 24, 2010

One of my co-workers forwarded my resume to Oracle, which was nice of him, and I got an email from an oracle recruiter asking me to give him a call.

Unfortunately, I've been following Oracle. On the Java front, there's the lawsuit suing Google over dalvik, Java creator James Gosling's blog after he quit, and this interview with him explaining why he quit (also this interview where he starts swearing at them). That alone would mark Oracle as a troll happy to sue open source projects over patents and alienate their senior technical staff. (On the patent front, using the "mutually assured destruction" arsenal for a first strike is a definite kick-the-puppy moment.)

And of course their other open source projects such as MySQL immediately forked. When OpenOffice forked away Oracle's reaction to the formation of LibreOffice was predictable but sad The OpenSolaris board was ignored until they resigned en masse and created a new fork Illumos, while Oracle made plans to undermine them.

In each case, their perogative, and I can't say I actually care about most of these projects. Java is Cobol jr, kept alive by the fact that Y2K happened during its' flash-in-the-pan popularity, and all the emergency rewrites of pointy-haired mainframe code were done in Java, which must now be maintained and extended. OpenSolaris/Illumos simply does not matter. I'm a fan of in-memory databases and the NoSQL trend so MySQL can go hang. LibreOffice might breathe some life back in to OpenOffice (which has stagnated ever since StarOffice got acquired, it's a mozilla-style "how not to do it" free-range source project, not truly open but merely allowed to wander around in a fenced area). And I didn't even mention VirtualBox (an also-ran to KVM, Xen, VMWare, Parallels...) and Apache Harmony and so on.

But then going and doing Linux work for this company? A patent troll? A company whose core business is a dinosaur being undermined by a disruptive technology and well into an upward retreat? A company that's even less effective at interacting with the open source community than _Sun_ was (which takes some doing)?

That really doesn't sound like fun. Even from a position of pending unemployment, I'm don't think I'm interested in interviewing there.

October 23, 2010

So almost a month ago Niklas Brunback got userspace NFS working with Aboriginal, and emailed me how. I've been so busy at Qualcomm that I haven't gotten around to poking at it until now.

The server was a bit of a pain to build. It requires lex to be installed on the host, but doesn't complain if it's not there. It ships a cached lex.yy.c file, presumably for use on systems without lex, but it's broken in multiple ways. (The ./configure sets the name of the output file to "" instead of lex.yy, the pre-lexed file is configured to use yywrap() which lives in one of lex's .a files, and of course the build doesn't _stop_ if a subdirectory build fails because it uses recursive makefiles and doesn't check their return values.)

Emailed a long thing to the maintainer. (I'm aware there's no mailing list or contact info on the website, but the README in the source has a feedback email. Dunno if it works.)

When you run the result with --help or -? it says "unknown option" and refuses to elaborate. And if you run it without arguments it tries to launch the daemon, so there's no obvious way to beat a simple usage message out of it. It comes with a long man page, which I read the first page of and then went back to Niklays Brunback's email, where he told me how to launch the thing.

Got it to work anyway. Seems reasonably useful, but I can't integrate it into the Aboriginal host-tools.sh stage while it requires flex and such as a build requirement. Pondering reading its code a bit and then writing one that doesn't, perhaps for busybox...

October 22, 2010

So Aboriginal creates its own (stub) /etc/passwd and /etc/group files, and Linux From Scratch has instructions to create its own (replacing any files that might already be there, since there won't be). Mine has a "guest" account that LFS doesn't, and I'd prefer not to lost that. (Or whatever existing content you have, people can provide their own overlay files.)

I could append the new LFS entries to the existing ones, but I want to make it so that if I run these build scripts multiple times, they'll cleanly reinstall. This would extend the files with redundant entries.

I guess I need to make some kind of quick and dirty add_entry shell function that seds the existing file, removes the old entry (if any), and appends the new one.

This is complicated by the fact that Yifan Zhang pointed out I'm not setting up dropbear correctly. Either the /etc/passwd needs some way for somebody to log in, or it needs a home directory you can add .ssh/authorized_keys to. This is exacerbated by the fact that squashfs is read only, so if I get it wrong it can't necessarily be fixed at runtime.

Currently the passwd field is ":x:" which means no valid password in the absence of /etc/shadow. I could make that no password with "::" but the file is read only. (Hack to work around it: "cp /etc/passwd /tmp/passwd; mount --bind /tmp/passwd /etc/passwd; vi /etc/passwd". Yes, you can bind mount individual files. I'm not sure how that interacts with "mv", but busybox vi seems to modify files in place anyway.)

The other problem is that /etc/passwd says root's home directory is /root which doesn't exist in the squashfs, so there isn't even a mount point to mount something else on. (Oops.)

October 21, 2010

The sort of thing I tend to read for fun includes phrases like this:

Due to SR-IOV, the new cxgb4vf driver for guest systems can now directly access some of the features offered by Chelsio's 1-Gigabit and 10-Gigabit network chips; as with other SR-IOV drivers, this is designed to reduce latencies and CPU loads for network transfers and increase data throughput.

I should get out more.

I'm also seriously considering installing a gentoo chroot on my laptop, making the sucker dual-boot into it, and gradually migrating over from my old obsolete Ubuntu. (Because really, when Mark Shuttleworth and an FSF apologist start fighting, I don't want to get any of it on me. Third option time.)

I'm seriously looking forward to being able to spend quality time with my own open source projects weekend after next. (Being unemployed takes some of the fun out of it, but having to stop working in the computer industry in order to get some real programming done is not a new experience for me. Of course if I could stay consistently employed at even half my consulting rate for five years, I could retire and do open source development full time.)

October 20, 2010

And X hangs again, forcing me to reboot my laptop. Ubuntu continues to be vaguely unhappy with the concept of user interfaces.

October 16, 2010

Banging on Aboriginal, making lfs-bootstrap use the new bootstrap-skeleton infrasturcture. So far, I've found out that m4 won't build against uClibc 0.9.31 (came up with a patch to fix it, probably not the _right_ fix but it works). And I found out that the gmp c++ extensions won't build against uClibc 0.2.2 (posted a question to the uClibc list and Garrett to see if he's interested in fixing it, if not I'll have to dig up a new maintainer for the project despite thinking that C++ is a bad idea and not really knowing many people who do it anymore).

And then one time I rebuild the lfs-bootstrap.hdc squashfs image, qemu refused to mount it (insisting it was corrupted). Loopback mount on the host said it was fine, meaning it's either a kernel build issue or a QEMU issue. Theory: the file size is being rounded down to an appropriate number of device blocks (either 512 or 4096 bytes), thus qemu is truncating the file. Except that when I do a wc -l on the file and then fire up python, it says it's an even multiple of 4k.

Ah, nope, it wasn't the hdc, it was the root filesystem squashfs that got truncated. (I interrupted the build and thought it hadn't already started remaking it, but it had. My bad. Always the time consuming debugging sessions, when you start off looking at the wrong thing.)

October 15, 2010

My contract at work expires at the end of the month, and the budget for the project hasn't been approved yet, so my contract can't be renewed. Sigh. I liked this job. Oh well.

Still, two more weeks to try to finish stuff up, so I'm building packages as fast as I can. One of them is gnu libiconv. Its configure stage says:

checking for iconv... (cached) no, consider installing GNU libiconv

Again, that's the package it's the ./configure stage _for_.

All together now, in your best Vogon guard voice: "Autoconf is useless".

Not looking forward to job hunting. On the bright side, I should finally get some downtime to catch up on Aboriginal Linux todo items in November. My todo list runneth over, and I just haven't had the time (or energy) to work on it...

October 9, 2010

It's just so _cliche_.

My friend Eric's been blogging about smart phones taking over the world, and since I'm the one who told _him_ that back in 2002 (at a vaguely chinese restaurant, back in 2002, he ordered pulled pork even though it wasn't on the menu) and he didn't believe me, I decided I'd catch up on his blog.

My argument could be summed up "mainframe -> minicomputer -> micro/PC -> handheld". He argued that ergonomics would render that impossible, you can't stick a usable keyboard/mouse/screen on anything that small. I pointed out that you could plug a big keyboard and display into a small phone, pointed out the epidemic of texting RSI among japanese teens (no worse than carpal tunnel, just "all the kids are doing this already"), and that his attachment to enormous expensive 28 inch CRT monitors (this was that long ago) was like a hi-fi junkie's attachment to vacuum tube amplifiers; the disruptive technology would eat 95% of the market before he personally switched over but that didn't change the obvious trend.

The technologies I predicted are already here, Google for "USB docking station". Now note that a Nexus One has a USB port on it, which will charge the thing. It also has half a gig of ram, a gigahertz processor, 32 gigs of storage space, and multiple types of wireless internet). It's a reasonable development workstation, all it needs is the right software, and you can take it _with_ you when you go out.

So, something like 7 years later it's actually coming ot pass, and he's blogging about it, and I'm overcoming my normal aversion to reading his blog (he's my friend, but I'm really not a fan of most of his politics) to catch up on this topic.

Which brings us to this entry. And I follow the link. And I spend the next two days staying up until 5am reading Harry Potter fanfic.

No really. Hence the topic sentence.

And the SAD part is, when I reached the end (it's not finished yet, most recent update was two days ago), I read the author page, and that linked me to TV Tropes. WITH MALICE OF FORETHOUGHT!.

Grumble. I had _plans_ for this weekend...

October 5, 2010

Fade's in the northeast at Viable Paradise. Camine's visiting, for a definition of visiting that involves playing Kingdom Hearts 2 all the way through. I've mostly been at work, with occasional restaurant expeditions.

I had to step away from Aboriginal Linux for a couple days because the thread about giving the web page botox just got too frustrating, but I'm back banging on it now. I've been automating Linux From Scratch in my day job, but not the right way. So in my copious free time I'm trying to do it _right_ as an Aboriginal build control image.

(Yes, it's ironic that the whole project started circa 2001 by automating Linux From Scratch, and now I've circled back around to do it again. But there are a thousand ways to build systems, and they all suck. My hope is that this one might suck slightly less, and then I rewrite it to suck slightly less than that...)

September 30, 2010

Things I didn't want to have to know about git:

To create a new shared git repository, using ssh tunneling as the pull/push mechanism, first create a "bare" copy of your repository, then tell it any new files it creates in future should be group writeable, then mark the existing files group writeable:

git clone --bare /path/to/oldrepo /path/to/newrepo.git
cd /path/to/newrepo.git
git config core.sharedRepository group
chmod -R g+w .

Note that git clone can't use a URL as the target, it has to be a local directory name. So you may have to tar up newrepo.git and extract it on the remote machine.

Then adjust your original repository to consider the new clone its "upstream" version, so push and pull work on that:

git push --set-upstream git+ssh://server.name/path/to/newrepo.git

Where server.name is whatever server you ssh into.

Now to create new clones from that, do:

git clone git-ssh://server.name/path/to/newrepo.git

September 27, 2010

Heh, now _there's_ a new way to kill setupfor.

I've slowly been automating the Linux From Scratch build over the past week, superceding the old mklfs.sh with a new version based on the gentoo bootstraping stuff. Between the two of them, I think I've worked out about how each distro-bootstrap should go design-wise, and LFS is a pretty easy one since it _is_ build-from-source documentation. (Refactoring gentoo-bootstrap into subdirectory was part of this.)

The hdc maker script grabs the lfs-packages tarball from osuosl, extracts it, and then iterates through the resulting tarballs doing a setupfor on each one that applies their patches for them. (So the squashfs can re-compress them, but the target board just mounts the squashfs rather than spending a lot of time extracting and cleaing up after tarballs.) Then each build is a separate file, so they can be called individually as necessary if something breaks, for debugging purposes.

And of course applying random upstream patches is finding more bugs in patch. For example, in LFS 6.6 lots of things needed fuzz factor support to apply. In 6.7, the perl code has a patch for hints/linux.sh, but that file is read only, so patch can't modify it. Oh, and one of the patches is whitespace damaged so the filename ends with three spaces instead of a tab, and yet The Other Patch applies it. The ramifications of this are almost as disturbing from a "this is subtly broken" perspective as for fuzz factor. The FSF's "let me help!" heuristics aren't quite to Clippy the Anthropomorphic Paperclip levels of awful yet, but they're well beyond advisable in a number of places...

September 25, 2010

Busy week. Mostly recovered from the air conditioner dying, trip to Kelly's, working overtime to catch up, and sleeping. But boy, do I need exercise now.

Poking at the native target builds, both because that's a big post 1.0 todo item, and because it's as far away from web issues as I can get and still be in the same project.

September 24, 2010

I've gotten no Aboriginal Linux documentation written for a while now. I've been uninterested in touching anything web-related.

Most of the Aboriginal Linux list traffic the past couple weeks has been Michael Zick mocking up a new website with an actual stylesheet, which is simultaneously really nice of him, and which sends me fleeing in terror. I want it to look better, but this isn't a series of incremental tweaks, nor is it a "here, try this existing package, just plug it in and flip the switch".

No, this was its own project, which continued piling up complexity for over a week after I entirely stopped replying to it, but which is fundamentally a bespoke system that's almost guaranteed to bit-rot at some point in the future. And I honestly can't tell why it's supposed to look better than what I already had, (which isn't an endorsement of what I had, but turning blue underlined links, which are universally understood, into black text on a dark grey background... Why? And the color choices are apparently dictated by the fact that some people are colorblind. Yes, he said that.)

It's an Enormous Lateral Migration, and I'm hunkering down until it's over. I don't want to discourage his enthusiasm for the project, but so far the results have utterly failed to engage my own interest. When I tried to explain this, he put it up for a vote on the mailing list. I don't think open source works that way?

And thus I've gotten no documentation written, because I just haven't wanted to go there. It's no fun at the moment. (Yes, I feel guilty about it, but that doesn't exactly increase the fun level.)

September 15, 2010

So, trying to take the gentoo-bootstrap that Aboriginal is outputting, and build an actual stage 3 natively on target. I'm starting with i686, because if anything's going to work that should.

The /etc/make.conf contents are basically:

CHOST="i686-unknown-linux"
CFLAGS="-Os -pipe"
CXXFLAGS="$CFLAGS"
MAKEOPTS="-j 1"
GENTOO_MIRRORS="http://gentoo.osuosl.org/"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE=""

I added an /etc/portage/profile/package.provided which lists the base Aboriginal packages, plus the dozen-plus packages busybox provides, plus the extra packages the gentoo bootstrap builds. (There's a _lot_ in there.)

For /etc/make.profile I'm linking to /usr/portage/profiles/uclibc/x86 which is the Gentoo Embedded profile, which should know it's uClibc-based. Unfortunately, that profile has circular dependencies, so I added the "file" package to the bootstrap list to break the circle.

So now when I go "emerge --pretend system" it gives me a list of 33 packages it wants to build. The "readline" package looked nice and simple, so I did an emerge --pretend on that, and it wanted a dozen prerequisites including perl. That would be a "no".

In theory the list is sorted in install order, and the first two entries are "gnuconfig" (No idea what that is), and "flex". It says that flex doesn't depend on gnuconfig, so let's try to emerge flex, I know what that is and it should be reasonably self-contained...

And it died in ./configure because it wants the m4 package. (Macro preprocessor, basically an #include resolver on steroids.) The m4 package is #13 in this list, so there _should_ be a dependency there, but there isn't. Right. I could add this to the bootstrap list like file, but I'm trying to get _portage_ to build stuff, so...

Going "emerge --pretend m4" says it depends on xz-utils. What on earth is xz-utils? "Description: utils for managing LZMA compressed files." (So why does m4 need this...?)

Um, I think busybox is providing that, but let's try building it anyway...

Ok, it wanted to run "scanelf" and couldn't find it... And my Ubuntu laptop hasn't got it installed either. And it's in pax-utils. NO. WE DO NOT NEED PAX, EVER, PERIOD. Grrr... (Hmmm, should I just ignore this, or make a symlink to /bin/true? Eh, wait for it to cause a problem...)

And then it detected file collisions with xz, unxz, lzcat, unlzma, xzcat, and lzma (all in /usr/bin), which makes sense because busybox provides this. I want it to ignore collisions, and there's a config thing for that, but I'm worried about the "overwrite the busybox binary" thing that root-filesystem was accidentally doing, so I'll just add xz-utils to package.provided and keep the busybox version for the moment. (Catalyst should build a new chroot anyway, once I've got enough for catalyst...)

Now emerge says the first thing it wants to build for system is sys-apps/texinfo, so let's let it do that...

HA! It built and installed!

One down, several zillion to go... sys-devel/m4 died with the _sp has incompete type bug.

September 12, 2010

Spent the weekend fighting with Gentoo Catalyst. Wow, what a brittle piece of software.

It can't build a uClibc system based on a glibc system. There's no amd64 uClibc gentoo-stage3 tarball, and the package says ~amd64 meaning it's "experimental" and I have to do a keyword override. (Of course they've only packaged up 0.9.30, not the 0.9.31 that's been out for 5 months, so maybe there was some bug they didn't manage to patch their way around. I remember it working for me, but I did have a fairly extensive stack of patches for 0.9.30...)

The gentoo developers told me about catalyst's (apparently completely undocumented) portage_confdir setting, which I can put in the spec file pointing to a directory with a package.keywords file containing 'sys-libs/uclibc ~amd64', but what with the "expected not to work" thing I decided I should retrench to i686.

So I started over with an i686 based gentoo chroot, but then it insisted that emerging squashfs needed a 2.6.29 kernel and my laptop is still using the ubuntu 9.04 kernel. (Um, so why didn't the amd64 one complain about this?)

Ok, tar up the chroot directory, boot up gentoo's install-x86-minimal CD for i686, extract and chroot into that new directory, source /etc/profile, and...

The build is _now_ dying saying it can't find a directory called "arch". Apparently, the chroot I made on my host is completely not working when transplated to run in qemu under a gentoo boot CD, and I have no idea why. And re-running the gentoo handbook steps in this context would take about 3 hours of CPU time to grind away at everything...

Sadly, devmanual.gentoo.org is a big wiki you have to read online, there's no obvious way to print it out so you can flip through the whole thing without having to load pages and juggle tabs. (The handbook lets you do this, devmanual does not.)

September 11, 2010

All Week Rachel Maddow has been asking the same question: "what is too extreme a position for current politics". She's missing the point: It's an intentional strategy, called "Moving the Overton Window", and it's working.

I blogged about this back in July, and the whole thing is simple enough to sum up in a Football metaphor. It's "moving your own goalposts out of the stadium, until the 50 yard line winds up in your end zone, and each face-off starts out as a touchdown for your side". (Obviously it requires the referee to be asleep, but with the collapse of traditional newspaper and TV journalism this is the case.)

The Overton Window is the range of politial positions you can discuss without being immediately dismissed as crazy by the majority of your audience. It's also the title of Glenn Beck's new book, so this isn't exactly a hidden strategy.

A small number of old white male billionaires are intentionally funding the most radical libertarian/anarchist fringe elements of the Republican party they can find. Rupert Murdoch (owner of Fox news) created an organization that can extensively cover made-up stories like "The Acorn Scandal" until other channels start talking about them for fear of being left out, and the Koch brothers (who inherited the third largest fortune in the US, second only to Bill Gates and Warren Buffet, although after they split it between them each only ties for tenth richest American) hijacked libertarian wingnut Ron Paul's "tea party" and turned it from a simple annual (4th of july) rally into a perpetual year-round headline manufacturing machine.

By engineering extensive media coverage of crazy-extreme positions, the public becomes progressively desensitized to them, until discussion of anything _less_ radical is seen as a moderate view and gets taken seriously. On a political level, this makes previously extreme positions much easier to defend: "Of course _I_ don't think we should eat babies, I'm merely suggesting we repeal the child labor laws..."

This takes advantage of the collapse of traditional journalistic institutions (newspapers, network news, CNN) which no longer have the budget to fact-check anything, and have resorted to giving equal weight to any opposing views and inviting their audience to split the difference. Their legacy production models can no longer afford journalism, merely "reporting" (I.E. passing along what they're told), and the internet news sources are still too fragmented and opt-in to counter this. (You can always find a blog that agrees with you, and probably won't follow ones that don't.)

This pretty much leaves us with Rachel Maddow and Jon Stewart holding the fort for actual investigative journalism, but neither of them seems to have figured out specifically what they're up against yet. The frothing extremists is to protect the real agenda from scrutiny the way the flapping cape protects the bullfighter. You can quell opposition by distracting it into focusing on something else, and even Maddow and Stuart are chasing the cape, not the bullfighter.

Most unfortunately, Obama and the Democrats haven't figured it out either. They're still focused on countering the moves of the Republican leadership, which isn't _making_ any. Moving the Overton window is an external initiative dragging those clowns along for the ride, but from Washington it's hard to distinguish when "outside the beltway" and "grassroots" aren't necessarily synonymous.

The Republican party itself seems to have mixed views about all this, but they've always followed the money and this is a concerted effort on the part of multiple billionaires. Besides, it's easier to ride the tiger than face the consequences of getting off.

And politically it's extremely useful to them. Moving the Overton window is working like a _charm_ on this administration, because they still believe they're up against rationally held beliefs that can be negotiated with or explained away, even when it's the Tea Party screaming about "death panels".

In reality, this opposition _cannot_ be mollified by any concessions on Obama's part, because the Fox/Koch groups exist to push the envelope. They will always be jumping up and down just past the edge, wherever that edge currently is, with the exact same level of funding/intensity, because THAT'S WHAT FOX AND THE TEA PARTY ARE THERE FOR.

And thus Obama's administration is allowing themselves to be led around by the nose, and don't even seem to have _noticed_ that they're reacting predictably to a coordinated political strategy. Not only did they water down every major initiative so far in exchange for perpetual fillibusters and zero votes from "The Party of No", but they're attacking their own people. Not just firing them based on spurious accusations that turn out to be made up, they actually let their press secretary attack their own base right before the midterm election primaries.

This administration's perception of "a moderate position" has shifted so far that the most recent initiative of a Democratic administration is literally corporate tax cuts. They've been reduced to proposing the other side's legislative goals for them as an opening position, and then publically wondering why so many of the people who voted for them have lost enthusiasm.

Obama is playing chess while his opponents play poker, and he's letting himself get bluffed and outbid. Who needs the Citizens United decision when the 50 yard line is already in your own end zone?

September 10, 2010

Apparently November 2 Wootstock comes to Austin, with Neil Gaiman as Wil Wheaton.

November 5, Weird Al comes to Austin.

Due to the air conditioner dying and needing to be replaced the weekend before the Maryland trip, I have not yet bought tickets to either of these.

(I'm not blogging enough detail. The Maryland trip: Kelly and Steve used to run Mensa games night, but retired and moved to Maryland a few years ago. They were back for a day a few months ago, touring the country in a rented RV. Steve died of a sudden heart attack a month ago, so Fade and I went up to visit Kelly, offer general sympathy, and help her sort and move stuff.

(We visited Traitorous Joe's twice, and Kelly gave us an extra suitcase to haul home the results in.)

I'm now working 12 hour days this week trying to catch up from the time off. Might have to work some weekend days to, I need to get this darn thing done so I'm not blocking other people...

Found a new webcomic: http://exocomics.com.

September 09, 2010

And my laptop reboots, and my zillion open windows go away. Sadness.

Suspending when the lid closes doesn't work reliably, software suspend corrupts the text console you do it in and takes upwards of 5 minutes to complete so you can't tell if "cursor up, enter" is really suspending or just the little machine swapping its guts out with a mere 2 gigabytes of ram...

September 8, 2010

So, talking to @solar on #gentoo-embedded:

The package list in /usr/portage/profiles/default/linux/packags.build is what you need to build to make a stage 1. (That's the set of dependencies that are assumed to always be there, and thus not necessarily tracked by individual ebuilds.) The /usr/portage/profiles/uclibc/packages.build switches off sys-devel/gettext and switches on sys-apps/shadow (the shadow password suite), for reasons that are slightly unclear. (gettext isn't needed if you aren't using internationalization, which uClibc can turn off, but why wasn't shadow in the first list? It provides login stuff.)

Of course busybox provides most of this too, but there's no way to tell portage that other than lots of package.provided and swearing. The virtual/* infrastructure isn't flexible enough.

He also pointed me at a script to trim down the portage tree to a much more manageable squashfs (only about 8 megs instead of 52).

Finally, the (Catalyst project finally has a HOWTO. That's what builds stage1, stage2, and stage3 images (natively) these days. Previously, the documentation was just the FAQ, and the first entry in that says "follow the steps above"... with no steps above. (Sigh.)

September 4, 2010

Ok, beating Aboriginal 1.0 into shape.

Tracked down the mips64 failure (it was commit f6be75d03c88 which is a one line change that broke the build, and reverting it fixes it -- no idea what it was trying to accomplish, some register swap).

The armv4l thing was the conversion to miniconfig, it was configured for EABI instead of OABI.

I guess that's close enough.

SHIP IT!

September 1, 2010

Ok, I can't use this from my phone, but I _can_ cut and paste that on my laptop and then whip up an ugly little list:

There. Now I should be able to download them from my android phone and watch them there. (Don't ask me why blanking the display makes the sound stop playing. I want to listen to some of these without watching them, and if I put the thing in my pocket with the display up it interprets random commands from walking. Oh well...)

Sep 2, 2010

On a plane to Maryland, to visit Kelly.

Might get an Aboriginal release out this weekend, dunno. When I updated the kernel, armv4l-oabi and mips64 broke. I don't know why. Do I just ship with the infrastructure doing what I want, or do I actually try to debug the various weirdness in the component packages now that I've got them all back to current versions?

My bisectinate.sh remains moderately useless because the test granularity is wrong, because what I'm testing for keeps changing. In the mips64 case, the native compiler is horked (and has been for ages, need to dig into why; it happily builds a binary, which segfaults immediately when run whether dynamically or statically linked). So I need to check that it gives me a shell prompt. I suppose I could whip up an hdc that just uploads a quick "It worked!" text file, and test for that.

Need to refactor bisectinate so the test it runs can be passed in on the command line. Always more to do...

August 31, 2010

So, in the most recent test run, armv4l, mips64, and sparc unexpectedly failed. (sh4, m68k, and armeb all expectedly failed.)

All of them compiled. The armv4l issue is failure to launch userspace, presumably an OABI issue.

The sparc problem is a hang instead of launching the shell prompt. That's... odd. It runs the init script. It prints "Type exit when done." Then nothing. It's oneit hanging. Because oneit isn't build statically even when BUILD_STATIC=all. Right, fix that...

And mips64 is dying because compiling thread-hello2.c ends with "Aborted". Ok, that one could be a kernel issue, let's see...

August 30, 2010

Why do I consider the Linux Foundation incompetent? Today, they sent me this email:

Videos from all of our keynotes sessions and a number of our
conference sessions from LinuxCon 2010 are now available to everyone
for viewing.
Simply register here, and you will automatically receive a password to
view the videos:

And so on. They want me to register and receive a password for the priviledge of viewing their videos. There is a gratuitous permission step, pure pointy-haired bureaucracy with no obvious purpose.

I don't even read news websites that require me to log in, why would I log in to view their videos? I don't need to create yet another account to get bog-standard read only access to something you're claiming to be posting on the web free to all. If you don't comprehend how Youtube works, that's your problem, not mine.

The Linux Foundation: so open you need a password to access our web pages.

August 26, 2010

So I need to use a Network Block Device (nbd-client) on target. I wget the source from sourceforge and attempt ot build it natively, but it's autoconfed, and has grown pointless prerequisites.

checking for GLIB - version >= 2.6.0... no
*** A new enough version of pkg-config was not found.
*** See http://www.freedesktop.org/software/pkgconfig/
configure: error: Missing glib

Ok, nbd does _not_ need to suck in a library from ftp.gnome.org. And a 0.15 megabyte source package sucking in a 5.33 megabyte library... Code reuse is nice and all but have a sense of _scale_. But they refuse to build without it, so in the interest of getting this over with let's humor them...

checking whether to enable garbage collector friendliness... no
checking whether to disable memory pools... no

The purpose of this library is to make command line option parsing slightly easier, correct? It's over five megabytes of source code and contains a garbage collector with multiple configuration options.

The G in Gnome stands for GNU project. It's the FSF's desktop. This explains much.

checking for pkg-config... no
configure: error: *** pkg-config not found. See http://www.freedesktop.org/software/pkgconfig/

That's not really a configure thing, is it? That's a "the build breaks if you don't have this, so you might as well just try it and die if it doesn't work". That's the behavior you'd get in the absence of ./configure, so this is an _invented_ problem. Also, one that's really easy to switch off if you know the magic incantation: PKG_CONFIG=/bin/true and try again...

checking for iconv_open... no
checking for libiconv_open in -liconv... no
checking for iconv_open in -liconv... no
configure: error: *** No iconv() implementation found in C library or libiconv

It refuses to build without internationalization support. This is SAD.

According to ./configure --help the "optional packages" section says:

  --with-libiconv=[no/gnu/native]
                          use the libiconv library

So let's try building with that...

checking for special C compiler options needed for large files... no
checking for _FILE_OFFSET_BITS value needed for large files... 64
checking for pkg-config... /bin/true
checking for gawk... (cached) awk
checking for perl5... no
checking for perl... no
checking for indent... no
checking for a Python interpreter with version >= 2.4... none
checking for iconv_open... no
configure: error: *** No iconv() implementation found in C library or libiconv

Ok, let's gloss over the idea that a C compiler has a size limit on the incoming files that you'd want to support by doing anything _other_ than breaking your source file up into pieces of the appropriate size. And let's gloss over large file support being something other than 64 bits (and your code needing to care on any level that sizeof() wouldn't address). And that Posix awk isn't good enough for what configure is doing and you need to check specifically for gawk, yet work without it. (Um, and the second code path exists for _what_ reason, exactly?) And the idea that an internationalization library might optionally need perl or python. And let's gloss over why the heck indent, a code reformatting tool, is in there at all. (Bwah?)

But the documented --with-libiconv=no option turned the configure check for iconv into a build break. Bravo.

How can anybody ever use autoconf without stopping to THINK about this stuff? Does nobody else ever question the utility of this giant steaming pile of crap?

Right. The easy way for me to do this is to port nbd support to busybox.

August 24, 2010

Putting together an Aboriginal Linux release. Documenting what I've done always brings up new things that maybe I _should_ have done. (Should build.sh produce a simple-system-image-$ARCH by default?) But I'm holding off on adding more stuff right now, for the moment anyway.

One persistent problem with building target-agnostic systems is that identifying the system you're on at runtime is a flaming pain.

There's a command to do it of course, "uname -m", and it works fine. The problem is the format of output it produces doesn't match what any of the build tools expect when told to produce code for a given format. On simple targets like x86_64 you can add "-unknown-linux" to the end of it to medicate the FSF packages into a semblance of rationality, but x86 is i486, i586, or i686, and that glosses over the floating point options entirely. The kernel calls them all "x86" and provides further detail in the .config file. Then on something like arm you've got armv4tejl-eabi...

The compiler knows what target it's building for, and #defines various macros, although in a highly inconsistent manner that leads to the programming equivalent of the blind man solving a Rubik's Cube in UHF. ("Is this it? Nope... Is this it? Nope...") But of course the FSF couldn't leave well enough alone, and invented multilib, so you have to tell the compiler what it's building for even when it's a native compiler. (I switch it off. That thing can't find _one_ set of libraries and headers reliably...)

August 23, 2010

There's a theory going around that Germany's economic boom a century ago (the one that made it strong enough to try to take over Europe twice in the early 20th century) was helped by its weak-to-nonexistent copyright laws.

Well duh.

China has been categorically ignoring copyright and patent for the past 30 years, and they're kicking our asses. Meanwhile, over here in "the land of the free" you need an ASCAP license to sing in public. We keep thinking the "free market" will doom China's totalitarian regime, but we're the ones patenting chemicals, genes and math. We've lost track of what "free" means, and we're wondering why we can't compete.

Remember how medieval guilds cornering the market on just about everything extended the dark ages by centuries? Today we've got the RIAA, MPAA, DVD CCA, MPEG LA... They're guilds. When Monsanto sues farmers because pollen from their genetically modified crops drifted into adjacent fields and thus their crops now contain patented genes, that's not a "free market". That's a shakedown extorting money from people who ACTIVELY DO NOT WANT TO BUY YOUR PRODUCTS, but have no choice.

Cornering the market does not improve the market. It's not good for the market. It's not HEALTHY. It makes a minority rich at the expense of the majority, which is bad for that majority. Making cartels very rich at the _expense_ of everyone else as a drag on the larger economy. Lowering entry barriers and transaction costs (as the internet does, the printing press did, pretty much the conventional definition of "free trade") is the _opposite_ of cornering the market.

Intellectual property laws are protectionist, raising transaction costs and entry barriers. They put moats around businesses so they don't have to compete with immitators to survive. They are obviously bad for the economy. Yet people go "But they allow you to corner the market and get rich! Getting rich is healthy for the economy, isn't it?"

Not always. Kim-Jong-Il eating off solid gold plates doesn't mean anybody else in South Korea has enough to eat. Mogadishu's full of warloads with suitcases full of cash, doesn't mean you want to live there. Cornering the market benefits individuals at the _expense_ of everybody else who has to pay monopoly prices to the single supplier, and intellectual property laws are all about cornering the market on ideas.

August 22, 2010

I continue to be completely uninterested in both Chrome and Meego, which is sad because they _seem_ like the sort of thing I'd go for. But a random example from a recent article on Meego:

I asked whether MeeGo had any core or significant contributions outside developers employed by Intel and Nokia. So far, not much. Foster did mention that Novell had been "very active" and that a few developers from other companies had been involved but not very many.

Look: hobbyist != "other companies". Hobbyists aren't just doing the minimum amount of work required to get paid. Open source produces superior code in part because it attracts people to work on things they actually care about, not just things somebody else pays them to look at. Things we can be _proud_ of. In the commercial world, "fixing things that aren't broken" is a waste of money. But we constantly improve stuff that already works because there's a better way to do it, at least in active, healthy open source projects.

If your open source project isn't interesting enough to attract people to spend their own free time playing with it, taking it apart to see how it works, and picking at the rough edges, in the long run it's going to lose out to projects that _do_. Getting corporate interest in a project is just a question of signing checks. Getting hobbyist interest is a measure of _worth_. Failing to value hobbyist interest is an enormous strike against long-term viability. Failing to understand what hobbyist interest _is_ makes you a laughingstock.

This is why I think Meego, on its current trajectory, has a limited lifespan.

August 21, 2010

Gentoo from scratch progresses.

Build base system, USE_UNSTABLE=busybox to get ported toybox patch.
Run gentoo-stage1.hdc build.

Set up /etc/make.conf by hand (with the volatile bits at the end for easy appending in future):

CFLAGS="-Os -pipe"
CXXFLAGS="$CFLAGS"
MAKEOPTS="-j $CPUS"
GENTOO_MIRRORS="http://gentoo.osuosl.org/"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"

# see /usr/portage/profiles/use.desc
USE=""

# see http://www.gentoo.org/doc/en/change-chost.xml
CHOST="i686-unknown-linux"

Remember to have $CPUS exported by the init script or MAKEOPTS gets ugly. (You'd think there would be some way for make to auto-detect, but no...)

Link /etc/make.profile to /usr/portage/profiles/uclibc/$ARCH (adding the relevant $ARCH if you have a new architecture, probably have to play keyword whack-a-mole at some point but for right now get an i686 stage 1 working first).
Run emerge --sync (my wrapper script, sets up /usr/portage and does an emerge --sync before putting the real emerge in place)...
Get a stage 3 package list: emerge --pretend --nodeps system
Try to emerge --nodeps util-linux

And it hung because it's using the "patch filename patchfile" command line arguments. Wow, I've never seen anybody actually use that before. Toybox patch didn't support that, and adding it to busybox make it harder to fix.

Sigh. The busybox option parsing remains impenetrable. Poked the list about it. Simple to implement in toybox, but making busybox getopt32argsthing() do it is kind of unpleasant at a conceptual level...

August 20, 2010

So ncurses ./configure has --without-cxx-binding but never tests either "is g++ installed on the host" or "does the host cc support c++". So it runs 8 gazillion probes, none of which is "disable c++ support", even though it has an option to do so manually.

Meanwhile, the distcc autoconf probes for python (and won't build python extensions if it's not in the $PATH), but does _not_ have any command line way to --disable-python. Because consistent behavior out of autoconf is anathema.

Here's a fun section of the rsync autoconf:

checking size of int... 4
checking size of long... 4
checking size of long long... 8
checking size of short... 2
checking size of int16_t... 2
checking size of uint16_t... 2
checking size of int32_t... 4
checking size of uint32_t... 4
checking size of int64_t... 8

Yes, it's checking the size of uint32_t, which has the number of bits in the name of the symbol.

How many different _levels_ of wrong are we talking here?

This is easy to do at compile time with sizeof().
The integer types (char, short, and int, and the behavior of long on 32 bit and 64 bit platforms) have their size specified by the LP64 standard.
The [u]intXX_t types were introduced in c99, which specified their sizes.

It's the last one that's really, really boggle-worthy. They're testing the size of types which were specified by the standard that introduced those symbols. They _don't_trust_c99_.

Then there's this chunk of the bash ./configure:

checking limits.h usability... yes
checking limits.h presence... yes
checking for limits.h... yes
checking locale.h usability... yes
checking locale.h presence... yes
checking for locale.h... yes

Three checks for each header. Apparently checking the exact same thing. Doing something pointless, in _triplicate_.

And yes, it's pointless. If you're worried a header might not be there, make an "includes" directory that contains stub versions of each header you're worred about, #defining whatever WE_DO_NOT_HAVE_THIS guards or other alternate behavior the ./configure test is enabling, and then stick it at the end of your include path. That way if it can't #include the thing out of the system headers, it gets your "we do not have this" version instead. There's compiler command line options for this. Having ./configure do it is dumb.

Dear autoconf authors: please stop. Just give it up. You are taking a horribly wrong approch. Even before we moved to ubiquitous SMP and your tool remained resolutely single-processor and started taking longer than the rest of the compile combined, it was a BAD IDEA.

And of course my day job (helping port Linux to a new processor) hits the fact that every instance of config.guess is pretty much trying to do "echo $(uname -m)-unknown-linux" except it sits down and dies if it doesn't recognize the uname -m output. (Which it won't on a new processor, but when you're trying to build something like "patch" or "rsync" it DOESN'T MATTER, and the fix is "for i in $(find . -name config.guess); do echo 'echo $(uname -m)-unknown-linux' > $i; done" in each package that was dumb enough to use autoconf. Adding support upstream in autoconf would take years to percolate through to these various packages that block-copied it into their infrastructure.

(And yes, I know about variants like the armv5l-unknown-linuxgnugnugnudammiteabi tuple; FSF tuples are horribly designed and inconsistent but that's _another_ rant.)

People keep taking things like dropbear that didn't use autoconf and adding autoconf support to it because they think it's somehow an improvement. People wrote this huge pile of infrastructure, look how huge and complicated it is, obviously we don't want to duplicate all that!

You're right, you don't. You don't want to USE it either. It's a solution in search of a problem that encourages pointless complexity to breed, and you really don't want to get any of it on you. It's a steaming pile of garbage that should not _exist_. Don't go there.

August 19, 2010

Sigh. This whole "War on Science" thing is getting old.

Take global warming. Remember acid rain in the 70's? We put scrubbers on chimneys to take out the sulphur. But the carbon dioxide remained, and it's causing a bigger problem over a longer period of time.

Are they claiming acid rain wasn't caused by humans? Are they also claiming that the rain forests in south america aren't being destroyed by human activity? We can watch deforestation happening, an entire continent changing, clearly visible on sattelite photos. "Oh, deforestation is us, but the stuff with the glaciers can't possibly be us because human activity doesn't result in changes on that scale". Pick one, guys.

And the "God wouldn't let it happen" people are hilarious. Your book has a Great Flood story that says the whole world was under water, and you're using that book as evidence that the sea level can't rise if we're stupid enough to do it to ourselves. As Bill Engvall said, "Here's your sign".

August 18, 2010

Hmmm... uClibc++ won't build in gcc 3.4 because:

  #pragma GCC visibility push(default)
  #pragma GCC visibility pop

Were introduced in gcc 4.0. How much effort would that be to backport...

August 17, 2010

Ooh, the Ted talk URLs are _regular_. The first ever Ted talk put up on the website (Al Gore in 2006) is at http://ted.com/talks/view/id/1 and then the next one is http://ted.com/talks/view/id/2 and so on... There's even a spreadsheet of them all.

This could turn into an enormous time sink. (And if I ever bother to learn android userspace programming, a good first app would be to turn the spreadsheet into a ted talk browser/viewer.)

August 16, 2010

The Oracle suit against Google goes like this:

A few years back, Sun released their Java code under the same license as the Linux kernel (GPLv2). They can't really sue people for patent infringment on that code, because suing people for using the code they released under the license terms they offered is an extremely weak legal position.

But when Google released Android, it refused to allow any GPL code in userspace. They wrote (inferior) clones of BusyBox and uClibc (toolbox and bionic, respectively), and came up with their own "clean room" clone of Java called Dalvik. (I'm not sure why they bothered to still use a fork of the Linux kernel instead of writing their own version of _that_, since it's GPLv2 also, but you'd have to ask them what they were ~~smoking~~ thinking. I don't pretend to understand it.)

When Oracle bought Sun, they went "we may have licensed our patents for use in GPLv2 code, but your Dalvik thing isn't based on that code or under that license, so you're violating our patents". And they sued them for software patent infringment.

On the one hand, Oracle's being an evil patent troll, after years of saying its patents were only for mutually-assured-destruction defensive purposes (preventing other people from suing _it_ over software patents). Oracle smells of greed and desperation, and deserves to die.

But on the other hand, Google's strange refusal to allow GPLv2 in userspace is what opened them up to this lawsuit in the first place. Their "Not Invented Here" syndrome is what made it posssible for Oracle to sue them specifically, without having to take on the whole Open Source community at once. (Also, Google made Android pointlessly dependent on Java. At first they didn't even let you write native code, they insisted that all Android apps _had_ to be modern cobol. That was another _highly_ questionable technical decision that wound up exacerbating their vulnerability to this patent troll.)

I'm not saying Google deserved to be sued, I'm just saying they shouldn't have been _surprised_ that an obviously dying company would be acquired by amoral cretins intent on strip-mining its IP. (After all, we just went through this with SCO, and Sun Microsystems' ongoing loss of viability was increasingly obvious for the whole of its last decade.)

I can't say I'm really rooting for either of them here. The one I want to see go out of _business_ is Oracle, but Google's pointy haired, holier-than-thou approach to Android isn't really something I'm inclined to defend, either. That project's approximately as Open Source as MacOS X, it's just a fork off of Linux instead of from FreeBSD.

Eh, however it works out, this FUDs the heck out of Java as a programming language. "Use this, get sued." Good riddance, I hope.

August 14, 2010

I miss Konqueror. When I clicked on a PDF, it displayed it in the browser tab, with the URL at the top so I could cut and paste it. Firefox (and Chrome) download the file to /tmp and pop up an external viewer, so that by the time I'm reading the actual document, the URL it came from is gone.

Linux: losing capabilities it used to have since 1991. (Go ahead and boot it in 2 megs of RAM.)

Anyway, these are notes from "Hardware Modeling and Optimization: A critical assessment with case studies" by Norbert Wehn. Keynote from "ECTRS 2010", whatever that was. Somebody linked to it from twitter last week (and of course twitter refuses to go back more than a few pages in the feed). (Update: Now that I'm back online, I googled for it. Still annoying.)

The paper is basically about the limits of Moore's Law.

The first microprocessor was the Intel 4004 in 1972. It had 2250 transistors, each about 10 micrometers across, running at 108KHz in a chip 11 square milimeters. A reasonably modern processor is the Intel Dunnington, a server chip released in 2008. It had 1.9 billion transistors, each about 45 nanometers across, running at 2.66 ghz in a chip 403 square milimeters. That's about 1 million times the transistors, 25k times the frequency, 40x the chip size, and 1/200th the component size.

Diverging for a moment, that's not just a quantitative difference but an enormous _qualitative_ difference. The 4004 had 4-bit registers and 256 bytes of address space, it was barely powerful enough to run a calculator. The Dunnington has over a dozen 64-bit registers and 16 megabytes of _cache_ built into the chip (the onboard memory controller has terabytes of physical address space and the registers can index 18 exabytes of virtual memory; an exabyte being a million terabytes). It's SMP (6-core) with its own internal bus connecting them, an onboard memory controller, each core has staged pipelines with branch prediction feeding multiple execution units... And it's a couple years old already, there's better stuff out now.

The other massive difference between them is power consumption: the 4004 lived inside a battery powered calculator, and the Dunnington is hooked up to wall current drawing somewhere over 130 watts. That power is consumed in a space less than half a centimeter, I.E. the power draw of a low-end coffee pot focused into the tip of your little finger. Keeping this chip from _melting_ is already a challenge, and continuing to increase power consumption isn't a realistic option anymore.

(Slide 3 has a nice graph showing power consumption vs giga-ops of work produced in a cell phone, the takeaway being that dedicated hardware is way more power efficient than general purpose programmable hardware. But let's skip ahead to slide 5: the limits of Moore's Law.)

If the 4004 was a bacterium, the Pentium M had evolved up to about the insect level, but then we hit a wall. We couldn't make the ant any bigger or it would collapse under its own weight, couldn't make the bee any larger or it couldn't fly. So we went SMP and started making anthills and beehives.

The reason is "Intel's fundamental theorem of multi-core" on page 5: they measured that a 20% reduction in clock speed yields a 13% reduction in performance but 50% reduction in power. So for the same power budget, you can get 1.7 times the performance from a 2-core chip running at 80% of the clock speed. (Assuming you can make the software scale to SMP.)

The rest of the presentation is (among other things) about why that's an oversimplification. :)

August 13, 2010

I've ~~ranted at length about~~ mentioned my theory that open source is fundamentally incapable of handling user interface issues (and thus the continuing failure of everything from "Linux on the desktop" in general to The Gimp and OpenOffice's ability to become as usable as their proprietary counterparts) because any time "shut up and show me the code" is not the correct response to the problem at hand (which it isn't for the touchy-feely "usability" stuff), the open source development model melts down into one of three distinct failure modes:

Endless discussion, which never gets implemented.
Incompatible forks implementing the various approaches, which can't be integrated in the absence of emprical tests to determine "The Right Thing" to do.
Delegating the problem to nobody, either by separating the "engine" from the "interface" (and thus tabling the issue in hopes that somebody builds a good interface someday) or else by making an extensively configurable interface with every possible UI suggestion as a selectable option, such that its lack of sane default behavior is now somehow your fault. (X11 managed to do both with "Window Managers". It's not our problem to do this part _and_ we refuse to make any actual aesthetic choices.)

But don't take my word for it. Here's Ian Jackson (author of Debian's package management system) explicitly telling non-coders he doesn't want to hear from them, and explaining why. He didn't mean to, but he excellently summarized why open source development can't handle user interface design.

This is why Linux on the desktop _can't_ happen. Our UI success stories involve a very small team (like the FireFox guys) going off and producing something they think looks good. And then we either enjoy their aesthetic vision (or in the case of FireFox grudgingly tolerate it because IE is so much worse), or we try something else. We don't collaboratively participate in the aesthetic design, because nobody's ever worked around Brooks' Law in this aspect of software development. Open source can produce Wikipedia, but it can't produce a story with a plot (let alone The Great American Novel).

August 12, 2010

Turns out the build doesn't rebuild itself without toybox patch. I really need to put a proper patch into busybox.

Ooh, Bernhard already started porting toybox patch to busybox...

August 10, 2010

Gentoo From Scratch is currently blocked on busybox "patch" being broken (can't handle offsets, which most patches accumulate fairly quickly). And I can't "emerge patch" because they might apply patches to it during the emerge. (Most packages do.)

Grabbed toybox patch and looked at breaking it out from toybox (the way I did oneit), but it turns out to be a bit of a pain. Patch makes uses of the option parsing infrastructure, which is incredibly convenient to use and a real pain to do without. I also have some library code, such as the line based streaming/cacheing infrastructure (copy_tempfile(), replace_tempfile(), delete_tempfile()) and linked list code (llist_free() and friends) that patch is built around. I.E. I'm making extensive use of common code, just like a good swiss-army-knife program should, and the busybox equivalents either don't exist, don't map easily, or are crap. Even turning it into a standalone program involves rolling up rather a lot of code and making stubs to glue it together (toy_exec() replacement that calls the command line parsing).

I can't just build toybox on the target because the Hexagon uses gcc 3.4 and the toybox build uses flags that aren't there in 3.x, and I don't want to upgrade the toybox build when I'm trying to wean things off of it.

I think the easy thing to do is just grab gnu patch and build that on target along with ncurses and zlib. (That way I know I won't be patching the tarball before building it.) Long term, getting my patch implementation into busybox means I could do without toybox, but the port is nontrivial and I'm not getting it finished tonight...

August 9, 2010

Why do I consider the FSF incompetent at programming

checking whether strstr must be declared... no
...
checking whether declaration is required for strstr... yes

Yeah, it's an old version of binutils, but _dude_. Autoconf is an abomination, but this kind of gratuitous duplication is just _sad_, there shouldn't be the _option_ for the two tests to get out of sync because it shouldn't be testing for the same thing twice in different ways. Code that doesn't exist can't break.

Of course code duplication is why the FSF exists in the first place; it's not like there wasn't already a Unix. The charitable view of their goal was they were trying to pull a Compaq and reverse engineer the Existing Thing to let a thousand clones flourish. Except they didn't. They failed to produce a clone, they refused to let other people build on their work without trying to hijack those projects and claim ownership of them (from copyright assignment to the Gnu/Linux/Dammit campaign, there's some serious turf issues going on with those guys). I gave up on them when they split the community with an unnecessary GPLv3, because it seemed like such a good idea when Sun came out with CDDL.

Yeah, glass houses: Busybox similarly reimplemented a bunch of existing command line utilities. Our excuse was that the existing versions (from the FSF) were obviously, objectively, demonstrably crap, and we were trying to improve the technology. They were doing it for ideology.

Every time I see Stallman say "Freedom" I keep imagining him screaming the word, painted blue, holding up a sword, channeling Mel Gibson.

August 6, 2010

Christian pointed mem at the "git describe --tags" command, which is the bespoke thing that does what I was looking for, which there's no way to work out logically unless you know the magic incandation that does it. This command does what I asked about, and nothing else. Knowing "git tag -l" gives you no hint this other command exists, the logical relationship between these UI elements is zero.

That's git for you. (And yes, the --tags option is on a command that deals exclusively with tags.)

August 4, 2010

I hate git. I really do. The people who designed the plumbing never stopped to think how any of it would be used, and the user interface is a bunch of ad-hoc bolted on independent bits that have nothing to do with each other.

For example, the kernel mercurial repository just ate itself, so I decided to move my old http://kernel.org/doc scripts over to git, since that's all the kernel maintainers care about. I had a simple mercurial invocation that would find the current tag, because it listed the tags in the order they were applied to the repository, and the last one in the list would therefore be the most recent. (If I pulled tags from multiple sources the order would get weird, but I don't. There's one upstream tagging source I care about, and pulling from that won't re-order the tags.)

So all I had to do to figure out the current kernel version was list tags and take the one at the end.

Git does not work that way. It's showing me the tags in alphabetical order or something, so 2.6.35 is always before 2.6.35-rc1, even though 2.6.35 is the final and -rc1 was the start of the development series that led up to it.

So I look at "git help tag" to see if there's some way to go "tell me the most recent tag preceding this commit". There is not, at least in the documentation. It never occurred to them.

In Mercurial, this is trivial to work out how to do for yourself. In git, it's random magic that somebody had to implement special and you have to learn the magic word, and if nobody coded up some bespoke code path to fish this piece of data out of the incoherence of their repository, it can't be done.

This is pervasive. I hate git.

Note that "git log -v" doesn't show tags in the log entries, so I can't search that way.

August 3, 2010

I should really do some way to specify CPUS=1 for just an individual package. I've got the is_in_list infrastructure already, and the build_section stuff. The hard part is figuring out what to call the config variable.

BUILD_THIS_PACKAGE_WITH_JUST_ONE_CPU=linux

Yeah, that's going to go over well...

August 2, 2010

Doctor Who "The End of Time" was surprisingly bad. It seems like it was made by people who have no respect for the franchise.

Putting the doctor on San Dimas time in the beginning was kind of odd. Leaving the "somebody's accelerating the development of the Ood" as a dangling plot thread, maybe it's a setup for future episodes, whatever.

Turning the Master into a cut-rate spider-man supervillan was just sad. The dude was always a meanace because he was smart, devious, and evil, not because he was a cross between The Toad and The Shocker. (Dude had access to rayguns under the 3rd doctor, it simply wasn't his _style_.) He made _plans_, and went out to Earth and its colonies to be a punch-clock villian the same way a british explorer in a pith helmet would go on safari to africa to shoot the native wildlife (and the occasional native). He was so obviously _enjoying_ it, but in an unthinkingly smug, superior way. Giving him pointless superpowers actually REDUCES the character.

Speaking of which, The Doctor is not Batman. This episode _wrote_ him as batman, but that's not his character normally.

I'm not always bothered by plot holes you could drive a truck through, but the main MacGuffin was _stupid_. "It's a gate! It's a medical device! And a floor wax! Oh, and it affects entire species at once!" Leaving aside how they implemented that last addendum, _why_ would you build something like that? What use case did they have in mind, exactly? Obviously there were no security implications to be considered in doing so, since they left it lying around to be salvaged after all.

When the master _did_ transform everybody, what happened to babies and fat people? It wasn't Timelord technology (it was familiar to the cactus people, whose ship was nothing to write home about), yet it affected an entire planet at once in a way that wasn't bothered by conservation of mass, yet left clothing alone... Right.

And then the "gate" _did_ get used as a transportation device later on for the time lords to (almost) come through. MAKE UP YOUR MIND.

The human trying to salvage the gate was apparently supposed to be a villian (I think?), but came across as a watered down version of The Man Your Man Could Smell Like. They really didn't bother with any characterization for him (or his daughter). Lots of abandoned plot threads. I have no idea why Donna, Rose, Martha, or Ricky were even in this episode. None of them had anything to do, it was just extended cameos. Donna not being transformed was another plot thread that went nowhere, it had zero effect on the outcome of the plot. Apparently the doctor planted an EMP generator in her skull though.

Digging up Rassilon as the main bad guy is a bit like the americans always being represented by George Washington. We've had lots of Time Lord names (Borusa and Thalia were chancellors, Leela married Andred, and of course Romanadvoratrelundar... IMDB gives a half-dozen other time lord names for The Deadly Assassin alone), plus a history of addressing people by titles (The Doctor, The Master, The Rani, Omega, Castellan). Going for Rassilon was _lazy_.

The Doctor's attempted execution in Arc of Infinity was "only the second time the Time Lord race has sentenced one of its own to death" (after Morbius), and Rassilon casually offs one of his advisors. The "race of dusty senators" is way, way, way off model.

Rassilon himself is hugely out of character. Let's dig up the dude who invented regeneration (that's why Rassilon's famous, Omega was the one who invented time travel) who put a limit on it (12 times) because he was worried about society stagnating if nobody ever died, and then bowed out of politics and bogged off to The Tomb of Rassilon despite being immortal... Let's dig up THAT guy and make him a power-mad bond villain played over the top by Timothy Dalton, with such an enormous drive to monologue that he narrates the episode. Ok, we all knew James Bond was a time lord, he's regenerated often enough, but this was played _straight_. That's the sad part.

I don't expect them to explore more of Gallifrey in an episode like this, but some BASIC RESPECT FOR CANNON would be nice. There is a bit. The War Games introduced Gallifrey under the second doctor, the Three Doctors had cutscenes on it, The Deadly Assassin was the first episode that took place entirely on Gallifrey, and Invasion of Time was set there as well. The Five Doctors, Arc of Infinity, and the whole season of Trial of a Time Lord were set there. Plus a bunch of time lords bumped into the doctor elsewhere, Drax and the Meddling Monk and that professor from Shada and the one who hijacked the 4th doctor's transmat beam to send them to ancient Skaro and that guy from the 6th doctor episode with the giant slugs... They've been the bad guys before, and put the doctor on trial and/or tried to kill him at least four times (war games, deadly asassin, arc of infinity, trial of a timelord). But they weren't "destroy the universe for personal gain" type nutso evil, they had _motivations_ and acted reasonably consistently. The Planet of ~~Hats~~ ridiculous headgear treatment they got this time around was just _sad_.

The seer from The Ribos Operation was on a primitive planet. Putting her on Gallifrey with the chanting and weird face paint made no sense. "The bones, the bones I seek! Gallifrey Falls!" Why did they do that?

Doctor Who is not Star Wars. The "Jar-Jar addresses the Imperial Senate" moment substituting Rassilon for Jar-Jar is easy to do with CGI cloning your five extras to make them look like a crowd inset in a matte painting, but really lazy. They blew budget on CGI and skimped on sets and extras. Also, the bad rip-off of the bit from the original Star Wars where the millenium falcon fought off the tie fighters (shooting down the missiles) was way too obvious, and also unnecessary. (The music tried to make it exciting, and failed badly.)

The Doctor's Sonic Screwdriver has turned into a magic wand, and is way overpowered. It was actually used as an offensive weapon this time. "Disable the whole spaceship with a zap" then "Oh, it's fixed" later on... I can only assume the two cactus people are mechanical idiots who don't know how their own ship works, despite the fact they were lead techs trying to salvage the gate thing? You can't pull that kind of stunt on a car (the engine block's cracked and the cylinders siezed up due to overheating? No problem, trivial fix.) But obviously spaceships are much simpler to maintain...

Having the Doctor jump out of a moving aircraft, crash through a skylight onto concrete and live... for no obvious reason. (Couldn't have rigged a parachute? Apparently didn't _need_ to. He's batman.) If internal injuries from _that_ had been the excuse for the regeneration, ok. But no, it wasn't.

Radiation poisoning taking days to kill him is one thing. Having his injuries magically disappear right afterward with a comment "it's starting" but then he still has a week to travel around and hook Jack up with a date and everything... They didn't NEED to do that.

Oh, and numbers. Have a sense of scale. Way back in Unearthly Child, the Doctor and Susan were from about 5000 years in the future (they didn't say which planet). Under the sixth doctor (trial of a timelord) it was something like ten million years of history. Now it's a billion. Those are very, very different numbers. Being pulled out of someone's ass.

Just about everything this episode got wrong wasn't _necessary_. They had enormous potential, and they blew it. Any reasonable scientific advisor or continuity researcher would catch a lot of this stuff, they just didn't _try_. Under the 7th doctor, they treated the show as a children's program and by doing so slowly strangled it to death. They seem to be doing it again. That's sad.

August 1, 2010

America, where any noun can be verbed and any verb is subject to nouning. When you you verbify a noun by adding an s to it you mostly seem to follow the pluralization rules ("jumps, watches"). But what do I do for the "cd" command? When somebody "cds" into a directory. Is it "cd's"? The thing is, it's an acronym for "change directory" but it's got no vowels so everybody pronounces the individual letters, and adding an s has a mental clashing of gears. I _really_ want to put an apostraphe, but that's posessive and I haven't got a good excuse.

I'll probably do it anyway, I'd just like to come up with a good _reason_. (I've also tried to rephrase the sentence, but "extracts the tarball, patches it if necessary, and cd's into the new directory" doesn't lend itself easily to rephrasing. "changes the current directory into that new directory"... awkward.

Yeah, more documentation work.

And as always, doing documentation work dredges up things I need to fix (because explaining how something works often reveals how it's subtly broken, and documenting the broken thing is less appealing than fixing it). My todo list runneth over, as usual...

Today, toybox patch has "possibly reversed hunk" issues, because if a hunk moved in the file then the first instance looks backwards if you're doing a position-independent search (which I am). The correct thing to do is make a note of it but only complain if we hit the end of the file and didn't find a correct place to put the hunk.

Did breaking apart extract_package from setupfor fix the old "we downloaded a new version, and setupfor tries to clone a working copy before cleanup_oldfiles runs" issue? Or do I have more work to do? (Eh, remove the workaround from sources/native-builds/gentoo-stage1.sh and see if it dies. And while we're at it, put FORK=1 back into buildall.sh for that. And move build-control-images.sh to the start of buildall.sh so the rest can rely on them being there. And make the USE_UNSTABLE=busybox build run the busybox test suite...)

Wow, the busybox test suite is sort of craptacular, isn't it? Yeah, I bumped into this a month and change back, but it really doesn't get a lot of love. I need to go bang on it at great length. I should figure out how to check stuff into git and push it upstream via ssh to morris, and what Denys's policy on that is...

July 31, 2010

Note to self, get "Last Call" by Daniel Okrent from the library.

I haven't been blogging as much as I used to because I haven't been doing as much open source programming. (The whole day job thing really sucks up more time and energy than they seem. I keep forgetting that. If I've been programming for 8 hours, I don't really feel the need to do more of it when I get home most days. I make plans to do so, but I don't sit down and actually _do_ much of it.)

It's not so much "I'm tired, no brain left" as "I just spent 8 hours doing this, it's stopped being fun just now". I can sometimes switch gears to doing documentation, and that's different enough to be fun, but a lot of the times I just don't fire up my laptop when I get home...

Not really complaining, just wondering when I'll manage to get this darn 1.0 release out...

July 26, 2010

Why does Google's "Chromium OS" exist? I've been trying to get a clear answer to that, and failing. People keep pointing me at some infomercial video so I can see it slice and dice and such, but they can't explain its _purpose_. The closest I've gotten was "like Gentoo, only less so".

Chromium's main design goal seems to be to _not_ run local apps. Everything runs in your web browser. Ok, except you can do that with existing Linux distros already, a web browser is a standard app that runs on Linux today (in fullscreen mode if you like), and people have been making dedicated kiosk systems out of Linux for 15 years now. (Often for the purpose of running a web browser to display some info.) You can make a squashfs root filesystem with nothing but a web browser on it, running as PID 1 on the frame buffer. Or have init fire up X11 and have .xinitrc bring up the web browser. Add a jffs2 mount for persistent config data if you want. This is not a new distro, this is a weekend project for a couple of embedded Linux geeks.

Speedy boot that doesn't probe for hardware... The hard part is making a speedy boot that _does_ probe for a wide range of hardware, and people have been working on that hard part for about 10 years.

They announced this thing back in 2008, their goal was to come up with a distro that did _less_ than existing distros, and they're still working on it. But... working to make it do what?

It's so uninteresting it almost comes out the other side to be fascinating: _why_ are they doing this? There's utterly no point to it I can see. Android had to add cell phone functionality and aggressive power saving stuff, and while they were at it they rewrote the whole of userspace to avoid Stallman's GPLv3 crusade and make a Linux that even the FSF can't bracket with "GNU//DAMMIT". But Chrome OS... what's the _point_? What exactly are they trying to accomplish?

No idea...

July 25, 2010

Haven't rsynced up my blog file in a while both because I've been insanely busy at work, and because I was trying to fill in the July 13th entry. (Even though this is just a big text file, I try not to edit entries that have already gone up. It's bad manners.)

However, it's been long enough that I'm just going to make an exception, stub in that one, and go back and finish it later.

Biked to work today, hoping to get some cleanup done to prepare for the new guys starting monday. (And hoping to be able to take Friday off so I could spend an entire 3-day weekend trying to ship Aboriginal Linux 1.0 a week from now.) An hour and change later, arrived to discover I'd left my badge at home, and can't get into the building. Biked to Chick-fil-a. It's Sunday, they're closed. I've already got as much sunburn and heatstroke as I'm prepared for, and have finished both bottles of tea I brought with me. It's July in Texas, and only "flirting" with triple digits if the definition involves extensive use of tongue, so I'm not biking back home until the sun goes down a bit.

So I'm at Jimmy John's, trying to do the cleanup to put out an Aboriginal 1.0 release. My "todo.today" file has gotten to be 54 lines, no longer quite what I'm likely to get done today. Or even before the next release, which is monstrously overdue because I haven't wanted to put out yet another stopgap before 1.0.

One of the big things I'm doing is going through the hg log since the last release (commit 1020) and writing up release notes explaining what changed. Of course writing documentation always spawns new todo items, and I've done a few simple cleanups as a I go along, but 1089 got me thinking about commit 997 and the "unsets" I put in for Wolfgang Denk (who breaks stuff about as much as I do). I was never happy with a whack-a-mole blacklist, I want a _whitelist_ of variables allowed through, and to unset the rest of them.

And there is a way to do it: the "env" command lists all currently exported variables. I can run sed against the "config" file to get all my configure variables, and add an extra whitelist using the is_in_list logic. (I'll have to special case the DISTCC_* and CCACHE_* variables in case people use those.)

I've got some half-finished work waiting to be checked in. A switch to busybox 1.17.1 would let me move to "make defconfig", but Denys has been waiting for me to finish debugging the CONFIG_FEATURE_EDITING_ASK_TERMINAL stuff first. And I've spent a few days being mugged by real life and not getting back to him...

Guess I should do that now.

July 19, 2010

The list of Unix clones a couple days back didn't include an entry from the FSF. There's a reason for that. Stallman's own Gnu/Hurd project failed to produce a usable clone. His project did result in some interesting development tools (such as emacs and gcc) and a license other people found useful, but Stallman's focus on politics and idealism over engineering meant his actual attempt at a usable operating system utterly failed. (A fact he's been in an almost psychotic level of denial about for almost 20 years now.)

In 1998 he switched to trying to claim credit for Linux, ignoring not just Linus's work but the contributions of Tanenbaum and comp.os.minix community, the Berkeley developers, MIT's project Athena, and many others.

He's also tried to claim credit for the invention and preservation of "free software", which was the status quo before 1983 and continued to be so in many contexts, and which is similar to independent efforts such as creative commons (and project Gutenberg, which predates the founding of the FSF). Before Apple changed copyright law in 1983, "free software" was the norm from DECUS to the 8-bit BBS community. After the NSF allowed unrestricted use of the (heavily subsidized) internet backbone in 1993, free downloads were the norm from Napster to bittorrent. There was only a 10 year window for "shrinkwrap" software to establish itself, and the WWIV mod community arose and flourished during that without reference to any other group.

The initial success of the Free Software Foundation in the 1980's was aided by two things: the ftp site athena.ai.mit.edu and Sun's Vice President of Marketing (later head of the Software Division) Ed Zander.

The internet was started by the defense department, and later funded by the National Science Foundation, neither of which allowed commercial for-profit use of the network. To attach to the internet, you needed to either be a government entity or have a valid educational or research purpose. (Almost immediately the bulk of the traffic on the internet came to be dominated by non-scientific uses such as the sf-lovers mailing list, but since this attracted smart people to use the net it was considered a good tradeoff and the authorities looked the other way.)

In this context, having a high bandwidth FTP site (provided to Stallman by MIT) was an extremely attractive proposition. People signed their code over to the FSF to get internet distribution, back in the days when individuals unaffiliated with large institutions had few other options. They were the Sourceforge of their day, except that they essentially required you to sign a ~~loyalty oath~~ copyright assignment statement. Thus Larry Wall (who would go on to create the Perl programming language) assigned his "patch" program to the FSF, to get it up on their FTP site, even though he wasn't part of Stallman's crusade (and happily supported Perl on windows years later).

In 1992 Sunsite (now ibiblio) started up and undercut the FSF's power significantly, but the big change came when the NSF changed their Acceptable Use Policy in 1993 to allow for-profit use of the internet backbone. Suddenly, the internet hit the big time and home ISPs became commonplace. This meant jumping through hoops to get code on the FSF website was no longer necessary. The FSF's "killer app" was rendered irrelevant by Geocities.

The other big driver of the FSF's initial popularity was the short-sighted greed of Sun's Ed Zander, who started at Sun in 1987 and quickly came up with an idea to increase Sun's profitability: by unbundling previously standard parts of the operating system (such as network support) and selling them as optional extras. When Zander tried to charge extra for the compiler, lots of users looked around for alternatives, found that gcc supported sun3 from its 1.0 release (also in 1987), which quickly made it the de-facto standard compiler of Solaris.

This gave gcc the critical mass of early users and motivated developers to push ahead of other free compilers (such as BSD's Portable C Compiler, and the Minix C Compiler). Similarly, instead of paying Sun extra for better userspace tools, Sun's customers installed the GNU tools from the FSF (and improved upon them), because they were already familiar with the compiler. Zander forcing them to find and improve gcc was the initial attraction for this crowd.

But these users had no interest in working on Stallman's new operating system or promoting his politics, they just wanted a freeware tools vendor for their Sun workstations. So the Hurd and the rest of the FSF languished in relative obscurity, and with the rise of other internet distribution channels (sunsite, geocities, sourceforge), and codebases (FreeBSD, Linux), they began to recede into obscurity.

What really did them in was the rise of superior internet-enabled development models. Stallman never quite "got" the internet, he sold printed manuals and physical copies of GNU software (on _tape_) as a fundraising effort well into the 2000's. The distributed collaboration that gave rise to Wikipedia and Project Gutenberg was anathema to him, he insisted on tight control of all projects. To this day, the FSF requires a physical copyright assignment statement be filed (on paper) with the FSF before accepting patches from a new developer. The centralized "Cathedral" in Eric Raymond's "The Cathedral and the Bazaar" was the FSF, and the paper contrasted that old model with the new internet-enabled Bazaar of Linux development.

By the late 90's, the FSF was all but forgotten. Even Cygnus (which took its name from "Cygnus, Your Gnu Support", another recursive acronym which apparently Stallman finds funny) forked gcc development away from them with the "EGCS" project, because mainline development had ground to a halt.

By the time the internet was piped into everyone's homes, internet-enabled collaboration became commonplace, so Linux and Mozilla and Apache were probably inevitable. Alas, as Clay Shirky's talk points out, the main purpose of institutions is to perpetuate themselves, and the FSF's cathedral was no exception.

Then in 1998, Linux grew "212%" because the Java developers switched over en masse when Netscape told them to. Netscape had led them into Java, and Netscape led them back out again to Linux. Most Java developers hadn't even moved from Java 1.0 to Java 1.1 until Netscape shipped it in their browser, and when Netscape released its source code and made Linux only its third "tier 1" platform (after Windows and MacOS), the Java developers immediately focused on Linux.

The "Anything But Microsoft" crowd had united behind Java a year or two earlier, after Windows 95 came out and made Windows tolerable enough that its userbase was no longer actively looking to replace it (like DOS and the 640k barrier before it). Technical superiority was no longer enough, if you wanted to avoid becoming a Windows programmer, you needed a development and deployment environment. Java was many things, but it never quite managed to be an operating system. Linux was, and it ran on cheap PC hardware, and because there was no way to choke off its development funding (as Microsoft had done for OS/2, MacOS, and BeOS), it wasn't going away.These developers didn't know anything about the history of Linux, they just saw it as an island of stability. Linux developers had wound up on Linux from such diverse places as the Amiga (Alan Cox's previous platform) and various 8-bit systems (Linus Torvalds himself went from a commodore Vic 20 to a Sinclair QL to a PC running DOS and Minix, to Linux). The flood of Java immigrants was welcome, but more than doubling the size of the community provided more than they could easily socialize.

The flood of new Linux developers didn't understand their own history, how Linus had inherited the original Unix flame (started by Ken Thompson and Denis Ritchie) through Andrew Tanenbaum's Minix and the Sun Microsystems manuals in his university library. They didn't know that the Berkeley guys had continued BSD development and fought a legal challenge to keep their own version unencumbered, and that Linux had only beaten the first 386 version of FreeBSD to market by a few months (and then leapt past it because of Linus's visionary management of an internet development community, the flood of warm bodies from the comp.os.minix development community it had inherited, and because early 386 BSD releases wouldn't boot without an expensive floating point coprocessor and Linux was designed to run well on the cheap hardware available to a college student).

In 1998, the Linux developers had their hands full teaching the new Linux developers their technical knowledge and development methodology, they didn't have time to spend on history.

Stallman saw his opportunity then. If the flood of new Linux developers didn't know the history of Linux, he'd tell them the story of the FSF. He would claim credit for Linux. He responded to the failure of his life's work with a compaign of revisionist history and self-aggrandizement that continues to this day.

Building on this, the FSF moved to recapture gcc development, handing over the GCC name to the EGCS project in exchange for them setting up a "government board" with some FSF members on it. The FSF quickly co-opted this new bureaucracy, co-opting EGCS (now "gcc 2.95") and turning it into a slow, bloated tangled mess. Much of this damage was done intentionally because a clean modular design would more easily allow proprietary extensions. As always, Stallman's political goals trumped practical engineering considerations, but now that the engineering work was being done for him (essentially due to the rise of the internet), he could focus exclusively on evangelism. The FSF became an almost religious crusade, with the occasional nod to the "charity work" of software development, but primarily focused on multi-level marketing, winning converts to spread the faith.

Of course Stallman's inability to give up control undermined the multi-level aspect of the hoped-for pyramid scheme, the pragamatists continued to ignore the FSF, and its importance continued to recede. Red Hat bought Cygnus, employing the maintainers of projects such as glibc, who resented Stallman's interference ("The morale[sic] of this is that people will hopefully realize what a control freak and raging manic Stallman is. Don't trust him." - Ulrich Drepper, maintainer of glibc.) Even though Stallman had inserted himself in Linux's historical narrative (after first ignoring Linux, then insisting it would never amount to anything), he was quickly relegated to "crazy uncle" status, a relic of a bygone age which most people could safely ignore, somebody who spent all his time blowing his own horn and hadn't produced any useful code in many years.

Eventually Stallman (along with uber-lawyer Eben Moglen) turned to the last thing the FSF unequivocally owned, the GPL. They produced a new incompatible version, and used their cabinet full of copyright assignment statements to force all their projects to move to it, in a bid to regain relevance.

The problem is that GPLv3 wasn't actually needed. Linus himself (correctly predicting the FSF's actions) had explicitly removed the "lifeboat clause" from the Linux kernel by specifying version 2 only for his own code back in 2000, shortly before the 2.4.0 release. Six years later, the FSF played chicken with Linus, assuming he (and the other senior Linux developers) would fall in line with the FSF and bow to Stallman's demands. They didn't.

Thus the FSF's pointless self-aggrandizement split the development community that had formed around the GPL by triggering the "lifeboat clause" when the ship wasn't actually sinking. (Nothing had rendered GPLv2 invalid or unenforceable, it was still a perfectly good license. Stallman channeled Darth Vader's Cloud City speech, "I am altering the bargain, pray I don't alter it any further". The pragmatists didn't play along.)

From a pragmatic/engineering/software development perspectve, this is probably the single most counterproductive thing Stallman could have done, but it achieved the goal of digging the FSF out of its well-earned irrelevance and dragging the spotlight back onto it. They could no longer be ignored, because of the damage they could do. The pragmatistis had to pay attention and be prepared to defend themselves.

GPLv3 prompted the BSD guys to get together to produce an alternative compiler (PCC), and the apple guys to sponsor their own (LLVM/Clang), but as with Windows, an entrenched de-facto standard is difficult to unseat. Hopefully, gcc will be obsoleted before the FSF finishes rewriting it in C++.

The FSF started as a conservative reactionary organization, resisting changes that swept across the industry in 1983 and harking back to the glory days of the 1970's (when source code was so ubiquitous computer magazines had BASIC program listings) in the back. The fact that change was worth resisting doesn't mean he _wasn't_ a short-sighted stick-in-the mud balking like a mule at the new and unfamiliar, with the goal of cloning existing technologies like Unix and C compilers. (Even emacs seems to have been a group effort primarily designed by other people.)

That's not how they present themselves, but you can't be a computer historian without wincing at the FSF's revisionist history. The success of Linux was in _spite_ of the FSF, not because of it.

July 18, 2010

So, the Linux Foundation.

In 2006, the Linux Standards Group merged with the Open Source Development Labs (OSDL) which had been created to provide Linus Torvalds a steady paycheck in a vendor-neutral manner when he left Transmeta. Linus had gone to work for Transmeta after turning down an offer from Red Hat, because he didn't want to play favorites within the Linux world. Transmeta was a CPU manufacturer that offered to let him spend 50% of his time on Linux, and the other 50% on Transmeta's own project (Linus worked on the "code morphing" layer, an emulator somewhere between QEMU and a Java JIT that allowed the Crusoe processor to run x86 code). As Linux grew, Linux development started to take up more than 50% of Linus's time. OSDL let Linus work full-time on just Linux, without working for any one Linux vendor.

The two came together to form The Linux Foundation, a giant Voltron of bureaucracy that's bigger and more complicated than either of its component organizations. As a result, the Linux Foundation seems to have largely forgotten what its two component organizations were formed to do (provide Linus a paycheck, run a website to host standards documents), but you can still dig the standards documents out of them if you know where to look. (Even though that page doesn't seem to be linked anywhere on the rest of their site.)

The Linux Foundation doesn't actually need that much money to perform those two functions. The interest from the amount it collected its first year would have kept a website running and Linus paid in perpetuity; even a $5 million endowment would probably be overkill for that. But that's not how they work. Remember, the first goal of any institution is to preserve itself.

The Linux Foundation is a bureaucracy, with a CEO and a dozen employees and offices and so on. The money they pay Linus each year is a small fraction of their budget, but they have to be big enough, with a sufficient aura of solidity and stability, for Fortune 500 companies to engage with. Because that's where the money comes from, and they need to get more money to keep paying for their staff and offices.

What Fortune 500 companies really want is a single point of contact for the Linux community. This is like asking for a single point of contact for the Internet. It doesn't work that way. There is no Head Blogger, no chief photographer for Flickr, no Youtube program coordinator, and no one place all the spam and viruses comes from. Even Google just searches and hosts content, they don't create a notable portion of it. (You'd think Fortune 500 companies would understand this since it's about how the stock exchanges work too. But those at least are set up where you can monitor their activity centrally, and there's somebody in _charge_ if not in control. The internet, and Linux development, provide no such reassurances.)

But the Linux Foundation perceived a demand companies would pay for, and have set themselves up to provide a "face" for Linux, at least as far as Fortune 500 companies is concerned. Because they provide Linus's paycheck, they have a certain importance. Even though the whole _point_ of a vendor-neutral organization providing him his paycheck is that he's a free agent (the Linux Foundation neither tell Linus what to do, nor do they speak for Linus), large pointy-haired bureaucracies don't grok that. They're not set up to consider individuals important anyway, they'd much rather speak to an organization.Even Linus himself doesn't speak for the operating system. He only concerns himself with the kernel, which is the tip of the iceberg of Linux development. Projects like X11 and Apache and Python and VLC all having their own maintainers, with umbrella projects like Gnome or KDE, and then layers of aggregators on top of that putting together Linux distributions. And even within the kernel, Linus doesn't sponsor new filesystems or device drivers. He's the editor of a giant slush pile, rejecting the vast majority of submissions but picking a few of the best and stitching them together to form the next edition of his pulication, perhaps polishing them up a bit if there's time. His only authority is veto power, and respect for his judgement as to what _is_ best is the source of that authority. Successfully influencing him would only undermine his position.

But fortune 500 companies want to buy influence and access, and "it doesn't work that way" isn't the answer they want to hear. Fortune 500 companies want a single point of contact for Linux the way they wanted a single point of ocntact to negotiate with The Internet. The Linux Founation has offered to be their AOL, not actually representing what it claims to, but close enough to provide the desperately clueless with a reassuring fiction.

This means the Linux Foundation has set itself up to translate between Fortune 500 companies and individual hobbyist developers. And it's very good at talking to Fortune 500 companies, because that's where its money comes from. But it's not very good at talking to hobbyists, because it's a giant bureaucracy, which is the antithesis of hobbyist "playing around with stuff".

The Linux Foundation tries to influence and monitor Linux development, to provide results for the corporations that fund it. It sponsors conferences, writes white papers, maintains a technical advisory board... It's a bit like a modern Usenix, really.

OSDL was already traveling down this path ("we don't just sponsor Linus, we represent Linux!") before they merged with FSG, and the re-org became an excuse to fluff up their portfolio. Being truly vendor-neutrial lets them be an impartial standards organization, but potentially influencing the direction of Linux allows them to get money from sponsors. There's a built-in conflict there.

Since the Linux Foundation _is_ a large clueless bureaucracy, mostly their involvement in a project is taken as a sign of technical irrelevance. For example, Meego is the official embedded Linux distribution of the Linux Foundation, therefore no geek takes it seriously, that I am aware of. We're waiting for it to go away. (In the meantime, it's pissing off Cannonical by endorsing an RPM-based system incompatible with Ubuntu's dpkg-based repository. Their vendor-neutral standards function and their "take money from Intel, who is sponsoring this" function are in conflict.)

Recently they seem to be trying to achieve independent funding sources, to stop relying on Fortune 500 handouts. Thus they're going into business offering training, Linux Marketing Services, Linux Build Services and Support services, and so onn. I even heard a rumor they're thinking of staffing up a general purpose Linux consulting arm, to bid on new architecture ports.

This would mean the Linux Foundation is turning into a Linux company, competing with companies like Free Electrons, Secret Lab, Code Sourcery, Linutronix, and so on. They have a Vice President of Business Development tasked with coming up with more ways to do so.

I'm not sure I'm comfortable with that.

July 17, 2010

So why is the ELF spec on an obscure Linux Foundation website that's not even linked to from the rest of linuxfoundation.org? It's story time!.

In 1957, AT&T was subject to a Sherman antitrust action as a monopoly business. When Standard Oil or American Tobacco had undergone antitrust scrutiny, the government broke them up into smaller companies that competed with each other, the new companies (Texaco, Phillip Morris, and so on) thrived and made their owners rich. But AT&T worked out a consent decree to stay intact, and spent the next few decades stagnating.

One aspect of the consent decree was that AT&T couldn't expand into any non-telephone businesses. Another was that they had to license their non-telephone technology to anyone who asked, for a nominal fee. Thus the technologies Bell Labs developed to improve the phone system: the transistor (1947), the laser (1958), and the Unix operating system (1969) were each licensed to the rest of the world which developed them into hugely profitable things. In 1969 the department of defense also forced AT&T to license phone lines to their Advanced Research Projects Agency (created in response to the russian launch of Sputnik, also in 1957), for use in an experimental packet forwarding experiment called "arpanet" which might be able to survive a nuclear attack by routing around damage. (AT&T strenuously objected to this, it thought packet forwarding was a waste of time and hated the idea of anyone else touching its precious phone lines. As this experimental network grew, it was renamed "the internet".)

This was the context in which Unix was created, and licensed to Ken Thompson's alma mater (the University of California at Berkeley) when he took a year sabbatical from Bell Labs to teach a graduate course in operating system design, and used his Unix project as the basis for the course. His students maintained and improved the project after he left, forming the Berkeley Software Distribution (BSD). In 1979, Berkeley participated in a Darpa contract to replace their first generation arpanet routers with DEC Vax machines running Unix, adding an internet capability to BSD that the base Unix from Bell Labs never had. This made Unix the default operating system of the rapidly growing internet.

In 1983, Apple sued Franklin (the company that makes the speak and spell) over the "Franklin Ace", a clone of the Apple II that ran exactly the same software as Apple's machine, in part because it had copied Apple's ROM chips (containing a primitive operating system) verbatim, because it wasn't actually illegal for them to do so yet. Back then, copyright only covvered source code, which was human readable text authored by humans. Copyright didn't cover binaries, because they were just big numbers.

(Here's a 1980 audio recording of Bill Gates railing about "his roms" being disassembled and annotated in a TRS-80 how-to book, and how he was lobbying congress in hopes of making that illegal: audio, transcript, context.)

Where Gates had tried (and failed) to lobby congress, Jobs sued to get a precedent out of the courts, and won. Franklin's copying was so obvious that the Judge's ruled in Apple's favor, that Franklin _had_ infringed Apple's copyrights, and thus that copyrights must therefore cover binary code and not just source code. This ruling created closed source software, and led to everything from IBM's termination of its long history of public domain mainframe programs with its "Object Code Only" announcement, to the binary only Xerox device driver that triggered Richard Stallman into starting the GNU project. Before then, there was no such thing as "free software" or "open source" because _everything_ worked that way. Proprietary software was a fairly recent invention, and unix was 15 years old when it started.

The booming computer industry was too much for AT&T, which realized it was stagnating, and in 1984 allowed itself to be broken up. Bell Labs had contributed fundamental components of the flourishing PC market, but as a regulated monopoly AT&T was barred from directly participating in it. The breakup ended the 1957 antitrust consent decree that had prevented AT&T from expanding into non-telephone businesses, spinning off chunks of the phone business into the "Baby Bells" (of which sprint and verizon and such are modern descendants), the remaining core of AT&T then tried to commercialize Unix, and failed miserably.

One reason was that Unix had already been licensed to dozens of companies, which created their own versions of Unix. (Sun had hired away the senior Berkeley developers to start SunOS and later Solaris. IBM's AIX and RT-Unix, Novell's UnixWare, Microsoft's Xenix, and dozens of others). Many of these vendors licensed the AT&T version for propriety's sake, but didn't actually use the code, instead basing their derivatives on the superior Berkeley version, which included internet support. The Unix market was flourishing, and AT&T's offering was like IBM's attempts to sell its original (but obsolete) PC, AT, and PS/2 systems to compete with clones from Compaq and Dell and HP.

AT&T did pour a lot of fresh development effort into its Unix offering, called "System V". And it came up with interesting new technologies such as ELF, a new more flexible and scalable executable format. But the various companies that had licensed its technologies incorporated these improvements into their own versions, and layered their own improvements on top, leaving the AT&T version less interesting than the derivatives that came bundled with various vendor hardware.

Another reason was that one of the big advantages of Unix had always been portability, that it could be installed on many different types of hardware, and application source code recompiled to run on any Unix installation. Selling hardware with Unix preinstalled, and changing the license terms to forbid the various Unix licensees to distribute source code (as everybody was doing in the mid 80's) undermined these advantages, fragmenting the OS and applications of the existing Unix world into a bunch of incompatible binary-only versions that couldn't run each other's programs. AT&T's propeitary clampdown in the wake of the Apple vs Franklin verdict _shattered_ the Unix market, and by the time they understood the nature of the damage they were doing, it was too late.

A third reason was that Unix had been cloned, just like Compaq cloned the IBM PC. Ken Thomson's students at Berkeley had done so much work on their version that eventually none of the AT&T code was left (although it took a lawsuit to confirm this). When Unix was re-licensed out of reach as a teaching tool, Andrew Tanenbaum wrote a new version "Minix" from scratch for use in his classes. One student of Minix, Linus Torvalds, wrote his own new version Linux (also from scratch), which sucked away the hobbyist Minix developers who had never been able to contribute their own code upstream into Tanenbaum's version (because he wanted a simple teaching tool, not a usable real-world operating system, and because he licensed Minix to a textbook publisher and didn't reserve the right to redistribute it himself indepdendently.)

These clones could be distributed under more liberal licenses than the proprietary versions incorporating AT&T's copyrighted code, and all of them came with full source code which end users could modify and rebuild, just as the original Unix community had before AT&T's attempt ad commercialization.

Faced with the failure of their initial commercialization efforts, AT&T decided that it must be too big to participate in the nimble fast moving PC market. It spin off Unix into its own subsidiary, Unix System Labs (USL), in partnership with one of their larger Unix licensees, Novell (which sold its own UnixWare version). Novell soon bought out AT&T's share in USL, so that System V (and the Unix licensing business) became the property of Novell. But the binary-only fragmentation AT&T had started continued to render Unix irrelevant in the face of PC operating systems like DOS that only had a single version instead of multiple incompatible versions that couldn't run each other's code. When the NCSD telnet package and Trumpet Winsock added Internet support to DOS and Windows 3.1 (respectively), Unix's decline accelerated rapidly. The first version of Windows 95 didn't have it, but a "service pack" added internet support in 1996.

Novell squeezed one last round of license fees out of derivatives such as AIX by offering them a perpetual license to do whatever they wanted. Then around 1997, unable to compete against the rising de-facto standard clone Linux, Novell outsourced its Unix business to the Santa Cruz Operation (which had inherited Microsoft's Unix variant "Xenix" from another development partnership). SCO put two years of intense effort trying to improve its product to compete with Linux, but when that failed to increase sales SCO got out of the Unix business too, selling its interest in Xenix and Unixware (and the vestigial System V from AT&T) to the Linux vendor Caldera.

Ironically, Caldera was the Linux company started by Novell founder Ray Noorda to cash in on the superior Linux technology when proprietary Unix proved too fragmented to remain viable. The open source movement offered a return to the heyday of Unix from the 1970's, the environment it had been designed to thrive in, and Noorda wanted in on that. But as with the PC clone market, the crowded field (Red Hat, Debian, Mandrake, SuSe, Slackware...) left many vendors with thin margins and intense competition, and Caldera had never managed to claw its way near the front of the pack.

Caldera's original plan for the SCO acquisition had been to use this sales channel to sell Linux to the existing Xenix and UnixWare customers, but it didn't work out. These were legacy customers resistant to change; the ones willing to switch to something else already had. What they wanted from Caldera was support and upgrades for their existing Xenix and UnixWare installations, and that's it.

Worse, the legacy Unix business SCO sold to Caldera was both enormous and doomed. It produced huge amounts of gross revenue from legacy customers (somewhere around $100 million/year), but consumed more money in support costs than it produced, resulting in a net loss. The sheer size of this acquired Unix business dwarfed the rest of Caldera, leading to a "tail wagging the dog" situation, so much so that Caldera eventually renamed itself "SCO". (The company that sold them the Unix business had stopped using the name, and now called itself Tarantella after the application software that was its new core business. The legacy Unix channel now _was_ Caldera's core business, repeated attempts to switch the horde of Unix customers it had acquired over to Linux couldn't budge them.)

With losses from the Unix business bigger than Caldera's Linux profits, Caldera's attention was wrenched away from Linux development as it tried to trim the expenses from the Unix business before the cash drain bled it try. But trimming the expenses reduced the Unix revenue by a corresponding amount, and soon Caldera completely dismantled the Unix business it had acquired without ever achieving profitability. (If the Unix business could have been "fixed" that way, SCO wouldn't have sold it in the first place.)

By this time, Ray Noorda was dying (of a combination of Altzheimer's and heart disease), and management of his investments (the "Canopy Group") fell to a group of opportunist scumballs who tried to squeeze money out of Caldera with frivolous lawsuits against anyone and everyone. Having renamed the company "SCO", they pretended to _be_ the company they'd purchased several legacy Unix variants from, and not a Linux vendor at all. They claimed that the ancestral Unix code they'd inherited through the long chain of acquisitions somehow gave them control over the clones of Unix (even though AT&T had already sued the first clone BSD and lost, the same way IBM had sued the first clone Compaq and lost years earlier). Caldera shoveled all their remaining resources (plus large fresh investments from Microsoft and Sun, who were always happy to cause trouble for a competitor) into trolling the entire industry with frivolous lawsuits for as long as they could sustain them, which was about 7 years.

They'd expected large companies would settle rather than going through the expense of defending themselves in court, but IBM chose to step up and defend Linux instead, earning brownie points with the open source community by beating the SCO pinyata until delicious chocolate came out. (Groklaw covered this in great detail.)

Meanwhile, several public standards documents dating back to System V (like the ELF spec) were still hosted on the sco.com website. Even though these were decades-old published standards, freely available to the public for years, with SCO claiming that anybody who had ever read these somehow owed it money finding another place to host them seemed like a good idea.

And thus the the Linux Standards Group was formed as a more stable place to host the old standards documents. In theory these could have been put into a subdirectory of kernel.org (which is hosted by Oregon State University Open Source Labs), but that's not what they did.

How did the Linux Standards Group wind up part of the Linux Foundation? I'll post about that tomorrow.

July 15, 2010

Day 3 of launching an elf binary, and we're out of the kernel. Today we look at the uClibc runtime dynamic linker.

The linker starts in a brittle mode where it can't touch any functions or global variables because it hasn't relocated itself yet. Even though it's statically linked, it's also PIC because not all architectures have enough relative jump range to span a whole program. Being PIC means all function calls have an extra layer of indirection, bouncing off a function pointer table called the Procedure Linkage Table (PLT), and their global variables bounce off another table called the Global Offset Table (GOT). Those tables have offsets for where to find locally available stuff, but need to have the base address of where the program is mapped in added to them (which varies each time the program is run).

So one of the first things the dynamic linker has to do is fix up its own symbols so it can find its own functions and global variables. And then _not_ fix those up twice. (The convention it uses is that stuff with a _dl_ prefix belongs to the dynamic linker, so gets fixed up on the first pass when it's only got its own symbols to worry about, and then not fixed up later passes as it maps in each library. Apparently POSIX covers this, and you're not allowed to declare functions that start with that..)

The kernel calls the linker by passing control into function DL_START() in ldso/ldso/dl-startup.c. The argument to this function is an unsigned long provided by the kernel, which is actually a pointer to a blob of data. The linker washes this through a magic GET_ARGV() macro (out of ldso/$ARCH/dl-startup.h) which typecasts it to a pointer and adds 1 (which is actually sizeof(void *), except on sh and frv platforms, where it doesn't add 1; why this is repeated for each architecture, I have no idea). It then immediately _subtracts_ 1 (calling into question the point of adding 1) to get a pointer to argc.

That pointer is the start of a block of data that looks like this:

struct {
  long argc;
  long argv[argc];
  long null = NULL;
  long envp[...];
  long null = NULL;
  long aux_dat[...][2];
}

So it fetches argc, argv, envp, and aux_dat out of that.

Then it sorts the Auxiliary Vector Table, with a loop that converts aux_dat into auxvt[]. The kernel provides aux_dat as an array of pairs of longs (initialized in the fs/binfmt_elf.c), where the first is the type (all those AT_BLAH values in /usr/include/elf.h) and the second is the associated value. The dynamic linker loops through those and populates an auxvt[AT_EGID+1] array indexed by the type and containing the values. (And discarding any value greater than AT_EGID.

This array isn't initialized to zero, so any entry that isn't seen is presumably full of random data. And it uses _dl_memcpy(), which is an always_inline function from ldso/include/dl-string.h, to get around the "can't make function calls yet" thing. (It's assigning aligned "long" values on both 32-bit and 64-bit Linux, so the memcpy is pointless.)

Next it finds the ELF header mapped into memory, which should be in auxvt[AT_BASE] but if that's zero it calls elf_machine_load_address() out of ldso/ldso/$ARCH/dl-sydep.h (which is generally a horrible assembly blob performing black magic). When is that needed, anyway? Under what circumstances (kernel versions? architectures?) doesn't the kernel tell it where it mmaped in the ELF header in the auxiliary vector table? It looks like the kernel is setting it to interp_load_addr fairly reliably. (Git annotate says the last time that line was changed was 2002.)

Got an answer to that question: you can call the dynamic linker from the command line (which glibc's ldd does but the uClibc one doesn't, and is also used to bypass the NOEXEC mount flag by people cracking into systems; it serves no other obvious purpose), and in that case the kernel setup wasn't done because it was called as a static binary, not as a dynamic linker. So it has to improvise.

Anyway, once we've got the base address, assign it to "header" and "load_addr". (Ignore DL_INIT_LOADADDR_BOOT(), it's just a simple assignment on everything but frv and bfin.)

Next we need to find two tables: the Global Offset Table (the indirection table PIC code accesses global variables through) and the Procedure Linkage Table (a list of function pointers mostly used for lazy binding). This is so we can relocate the linker's own symbols so it can call functions and access global variables.

Start with the GOT, which is the easy one because PIC programs need to know where that is in order to access global variables. Every object linked -fPIC has a GOT, and the kernel helpfully mapped the dynamic linker's in for us already. We can't use the values in that table to access globals yet because they haven't had the base address added to them yet, but if we _try_ to the compiler and linker will happily generate the appropriate code to access the GOT, so all we have to do is beat that address out of them. We do this by calling the macro DL_BOOT_COMPUTE_GOT() which is an unnecessary wrapper around elf_machine_dynamic(), which contains an assembly blob in ldso/$ARCH/dl-sysdep.h that returns a pointer. (Usually it's just reading a register the kernel initialized before calling us.)

The result is fed to DL_BOOT_COMPUTE_DYN() which is an even more unnecessary wrapper around DL_RELOC_ADDR(), and doesn't even use the GOT and LOAD_ADDR arguments but instead hardwires the values passed in from the one and only call site. (This code could use a little cleanup.)

On everything but the two crazy platforms (bfin and frv), DL_RELOC_ADDR() is just adding together the base and the offset. So this whole rigamarole boils down to:

  load_addr = elf_machine_load_address();
  got = elf_machine_dynamic();  // A better name would be got_offset()
  dpnt = load_addr + got;

Why they couldn't just WRITE that is an open question. (And don't get me started on all this ElfW(blah) crap. The LP64 standard exists for a reason. It's a long. Deal with it.)

Note that the variable "got" is never used again. Also, "header" is the same as load_addr and is only used for sanity checking the ELF header,

Anyway, we went through all that to get dpnt, which is a pointer to the "dynamic section". That gives us the list of shared libraries this program needs to map in.

July 13, 2010

This is a stub for when I actually fill in a description of load_elf_interp() and create_elf_tables() from fs/binfmt_elf.c in the kernel source.

An ELF file is an archive format, storing chunks of machine code and various associated resources. The basic format dates back to AT&T's System V Unix, and these days it's hosted here.

Yesterday we walked through load_elf_binary() in fs/binfmt_elf.c. That calls two functions, load_elf_interp() and create_elf_tables().

...

July 12, 2010

I'm reading through the kernel's elf loader and uClibc dynamic linker and trying to learn how it works.

First the kernel, function load_elf_binary() in fs/binfmt_elf.c. Very simplified (skipping sanity and security checks, non-executable stack, alignment randomization, and so on): the kernel's loader reads in the ELF header (first 128 bytes) of the executable file, uses that to find the Program Header Table, and reads that in too. Then it iterates through the PHT entries to find the dynamic linker entry (if any, type PT_INTERP from elf.h), and reads in the ELF header of the dynamic linker.

It then shoots the old process in the head (freeing its resources), and starts a fresh memory map. It iterates through the PHT entries to find all the PT_LOAD ones, figures out what virtual addresses to load them all at, where the code and data segments will start and end, and how big the bss and brk segments it need to be allocate are. At the end of the loop, it calls set_brk() to allocate memory for the bss and break segments.

Then if there's an elf interpreter, it calls load_elf_interp() and sets the entry point to where it loaded it + the e_entry field from the interpreter's ELF header. (Otherwise it sets the entry point to the e_entry field from the program's ELF header, since the PHT specified the virtual address at which it wanted each segment loaded.)

Note that when you try to execute a dynamic library, it does this weird load_bias thing when CONFIG_X86 isn't set, but doesn't apply that to the entry point. How that ever works off of x86, I have no idea.

It then calls create_elf_tables(), and sets up current->mm with the values it calculated by looping through the PHT earlier. It does ELF_PLAT_INIT() if a specific architecture has weird standards requiring strange things, and finally does start_thread feeding in a set of registers, the starting point, and the stack.

July 10, 2010

Stopped by San Japan, an anime con in San Antonio. Hoped to say hi to Randy Milholland, but he wasn't at his table. As for the con itself: meh. So crowded getting from one end to the other takes half an hour. AMVs and such I could watch online, only two panel tracks mostly with uninteresting stuff, a dealer's room I spent all the money I was willing to in five minutes, nice artist's alley but nobody I know... I suppose I could stop in Gaming but I'm not really in the mood. No video games that I've noticed. The hall costumes are nice but after about 20 minutes start to blend together, which leaves the masquerade out.

The first scheduled thing that interested me was the the 5pm webcomics panel, so hung around until then. Found out (after 20 minutes of watching a lame game show that just didn't end) that it had been bumped back half an hour. Gave up and went home.

I've been to anime cons I liked, this just isn't one.

July 7, 2010

So initramfs is broken in Aboriginal, not sure when that happened. The problem is that the kernel's new initramfs compression selection type logic means that it refuses to deal with a compressed initramfs unless it compresses it itself. You _must_ specify an "Initramfs source file(s)" in order for it to show the "Built-in initramfs compression mode" sub-menu. (And you have to tell it what kind of compression you used in the .config.)

The problem is, if you point it at a directory for it to tar up, it won't be able to create /dev/console unless there's a /dev/console in the source, meaning the source directory required root access to create. You can override this when creating a cpio image, but the kernel .config doesn't have any way to pass through that information.

But if you point it at a file, it wants to compress it. I.E. you can't point it at a cpio.gz (like you used to be able to). Now you have to point it at an uncompressed cpio file and let the kernel build compress it.

So they "made it more clever" in a way that reduced flexibility and broke stuff. As usual. Working around it...

July 8, 2010

So the way portage gets installed, Python can't find it unless you add /usr/lib/portage/pym to PYTHONPATH. It turns out Gentoo is patching their version of Python to include extra search paths when the build it, which is crazy because Python already has logic in site.py that looks for *.pth files to add extra search paths:

echo /usr/lib/portage/pym > /usr/lib/python2.6/site-packages/gentoo.pth

Why Gentoo didn't do that is an open question, but my gentoo-stage1.sh init file is doing that now.

July 7, 2010

Building an ancient toolchain (binutils 2.14 and gcc 3.4.6) requires HOST_EXTRA="lex yacc m4 perl makeinfo". You'd think ./configure's 8 gazillion weird little probes would notice these aren't in the $PATH and would skip them. (And newer versions get along without them just fine.) But ./configure in the old versions is even more useless than ./configure in the current versions. (Yes, this is why I added HOST_EXTRA.)

The rebuild support I've put into system-image.sh is a design problem. It's there because if you rebuild the squashfs a dozen times in a row, having to wait through a kernel rebuild each time is annoying. But it shouldn't be on by default, if you re-run any of the other build stages it deletes any existing output and redoes it from scratch. (It's build.sh that checks to see if they're there, running the stages manually forces a rebuild.) Except that system-image.sh isn't working that way at the moment, which is inconsistent and therefore wrong.

I could add a REBUILD= config variable, but that implies that the rest of the stages should check "did I already install this package, if so skip it". Possibly some kind of setupfor hook. Which I did half a decade ago (yes, the project really is that old). And what I've learned along the way is that maintaining unnecessary complexity SUCKS. I have a black belt in "not going there" these days.

That makes me inclined to rip this back _out_ rather than let it spread. It's tempting to add this as a convenience for other people who might want to poke at the build scripts, but really the important thing is that they _work_. It's already got stage-level granularity, where if root-filesystem.sh is what screwed up you don't need to go back and rebuild the cross compiler, but should be able to rerun just that script.

July 6, 2010

There are times when git bisect is inexplicably useless.

Last night's bisect homed in on commit 7b6fd3bf82c, which is an unrelated build break. (Wheee.) So guessing the bisect the other way did find 6234077d6bad4db25d boots to a shell prompt (good to know), but then says the first bad commit is e0aa51f54faa0, which is a merge commit. Difference between that and the ^1 commit is a 21 megabyte patch. That can't be right...

Sigh. Looks like the breakage is hidden in the range that didn't compile, so I've got to track down a fix for that build break. The break was introduced by f533c3d3405 which was an enormous complicated patch, and a reversion patch won't apply to the start of my replay log. Ok, try finding the first patch that fixed it, let's see, that means lying to bisect to call the build break "good"...

And it bisects to the build break being fixed by b8ff7357da45e025c which is a one line fix:

--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -320,7 +320,7 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp,
                return -EFAULT;
 
        driver = bo->bdev->driver;
-       if (unlikely(driver->verify_access)) {
+       if (unlikely(!driver->verify_access)) {
                ret = -EPERM;
                goto out_unref;
        }

Yeah, that's not going to fix a build break.

Not sure what's up here. Time to start with the first known break (the "enormous complicated" patch above) and see if I can fix it by hand and then drill forward...

July 5, 2010

"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction." - Albert Einstein

So, the last sh4 version that shipped was in 0.9.10, and that last worked in the qemu git version right before e1c09175bc00 (I.E. git checkout e1c09175bc00^1), and linux-2.6.32. So I need to get the current kernel to run under the old qemu, and/or see if I can get current qemu to run the old kernel, and then try new qemu and new kernel together.

I vaguely recall having trudged through this earlier, but I didn't take good notes in this blog. My March 14 blog entry says that the magic QEMU command line incantation for newer qemus needs to be "-nodefaults -nographic -serial null -serial stdio". It seems that if I put "-nodefaults" in front of "-serial null", the old kernel boots up and I get a shell prompt with current qemu. (It then refuses to shut down on exit, complaining "Unauthorized Access" and hanging, but I can track that down separately.)

However, if I use the _current_ kernel instead of the old known working one, QEMU dies with "sh_serial: unsupported read from 0x10". So, to track this down, start by cloning the git repo into the build/packages directory:

rm packages/alt-linux-0.tar.bz2
cd build/packages
git clone /path/to/linux/git alt-linux
cd ../..

And then in one window do a git bisect (good is v2.6.31, bad is master), and do a "git checkout -f" before specifying each new bad/good cycle on the bisect because the build is going to apply sources/patches/alt-linux-*.patch.

In the other window, each build/test cycle is: rm -f build/packages/alt-linux/sha1-for-source.txt && rm -rf build/system-image-sh4 && USE_UNSTABLE=linux ALLOW_PATCH_FAILURE=1 nice -n 20 ./system-image.sh sh4 && more/run-emulator-from-build.sh sh4

(Deleting the sha1-for-source.txt makes it re-patch the source, which we need to revert the inexcusable stupidity which is why I stopped following sh4 long enough for it to bit-rot this badly in the first place. I bumped into some CELF guys that convinced me to put it back on the todo list, but it took this long to work its way back up...)

So what do we get... The bug in kernel 337e4a1ab4d736b8c39a4c3a2 where it hangs after the line "sm501 sm501: SM501 At b3e00000: Version 050100a0, 8 Mb, IRQ 100" was something I fought with last year, already asked about on the list and got an answer on. (Yes, this worked fine in 0.9.31 but as bisect goes along it stops working, probably a config symbol name change, look at it later, let's find the serial console problem first.)

And I made it through a long run of consecutive "git bisect bad" versions (which always make me nervous, the symptom of screwing up an earlier choice is that every single test the rest of the run goes the same way, and then it homes in on a version that _also_ has that symptom. Then it made it to 9f815a1765b0ce766ab1d which has a build break, meaning remember that and guess (easier than fiddling with "git bisect maybe" which tends to repeat dozens of times on the same build break because it tests adjacent commits.

So, "git bisect log > ../blah.txt" and be prepared to "git replay" later on...

July 4, 2010

Interesting research paper on Balance as Bias. Basically, if you've got reporters who think that "balanced reporting" is to give the opposing views equal weight (essentially split the difference between them), then all you have to do to game the system is move your own goalpost way out past the end of the stadium into crazytown. Then the halfway point between the two opposing camps is your original position, and thus lazy journalists who think the truth is always in the middle wind up taking your original position as the "center".

Hence groups like the Mad Hatter's Tea Party, which exist solely to re-center the debate. They know their position is crazy (or at least their backers do), but by linking hands and singing "clean cup, move down" for the cameras, lure lazy journalists into taking their _actual_ positions as some sort of middle ground between sane and crazy.

The original purpose of journalism wasn't just "we report, you decide", but involved actual research and fact-checking. When the choices are "This parrot is no more" vs "It's just resting", deciding that the bird must merely be unwell (due to pining for the Fijords) is _not_ good journalism. Taking a position based on actual facts is not bias. You do the research to confirm or deny what was said, and you show your work. Sourcing a quote accurately is where this process _starts_, not where it ends.

Alas, the same deregulation that gutted the banks (from the Mortgage crisis to Goldman Sachs) allowed clowns like Rupert Murdoch to purchase media empires and re-invent Yellow Journalism. (Meanwhile the three original TV networks cost-cut themselves to death in response to cable TV, and newspapers similarly gutted themselves rather than face the internet. In both cases, the solution to competition was _not_ to lower the quality of your own product, or to stick with the tried and true of Garfield and Peanuts no matter how worn out and tired they got.)

That's why today, the best conventional journalism out there is from The Daily Show and Rolling Stone. (But when they actually do real journalism, the equivocators are shocked and dismayed, and actually seem to think reporting the truth somehow a sin. They're terrified of taking a position to the point where they're trivial to manipulate. Essentially, they've become useless.)

The internet is trying to catch up, but it's still got a lot of maturing to do. Sites like fivethirtyeight.com do a great job but they evidence they present doesn't add up to a narrative. You can't point somebody at a good summary of the past year's work they've done on a topic, you have to follow along and new posts bury old ones.

That's a problem because the defense that the Becks and Limbaughs raise is that refusing to compromise with their position means you're an extremist. These people's day job is to lie with an agenda, to ignore the truth in favor of spewing crazy propoganda to move public opinion, but they insist that if you don't equivocate to the "center" they're aiming you at, then you're an unreasonable extremist just like them, so everybody's an extremist and the truth remains a matter of opinion.

The way to fight this is to explain the trick. Give a reason _why_ they might be lying. That the crazy has a purpose: they're moving the goalposts in hopes you're dumb enough to place the 50 yard line in their end zone. The tobacco institute didn't have to convince you cigarettes were harmless, they just wanted to introduce doubt, and perhaps make you think they might be LESS dangerous. Reveal the sleight of hand they're trying to pull so people aren't fooled by it.

And then prove they have a _pattern_ of lying. Side with science. Truth can be proved with empirical evidence, having proof is not the same as being biased, and being able to establish the truth a year later does _not_ make it irrelevant. (Of course by the time the truth is common knowledge, a professional propogandist will have moved on to the next lie and never go back to comment on their old "Do it Live" Oxycontin days. But that's no reason _we_ can't document them, accumulate evidence, and write up the paper trail into a clear story showing that that mouthpiece X is a professional propogandist with a history of knowingly lying for the purpose of tricking people into compromising with extremism, which lets them steer you to take any position they want you to.)

Showing that a mouthpiece has a history of lying means that when their new topic du jour comes up, the history of disproven claims makes them an unreliable source. When the victim responds "but why would they lie", explain about moving the goalposts so lazy people accept half-truths. They don't need you to believe them, they just want you to take what they say into account. They're triangulating to steer you, their crazy is calculated to move your thinking to a "compromise" position of their choosing. How safe is "drill baby drill"? The halfway point between "perfectly safe" and what actually happened was not a useful data point.

The long-term fix is to get real journalism back. I'm not sure how to do that. Sites like fivethirtyeight.com do a great job but they evidence they present doesn't add up to a narrative. You can't point somebody at a good summary of the past year's work they've done on a topic, you have to follow along and new posts bury old ones.

One question I do have is that if you knowingly lie in a commercial, you've broken the law, so why is knowingly lying in a news program not similarly illegal? Yes I'm aware of freedom of speech, but it doesn't mean you can claim your product cures cancer when it doesn't. If you claim to be a professional news source, you should at the _very_ least be required to issue a prominent retraction when you were provably wrong. If you're a paid professional talking head who takes _money_ to lie to strangers, that's different from a blogger doing it for free. (And doesn't libel and slander work into this somewhere? Facts are not a matter of opinion.)

That's the other "defense" the Becks and Limbaughs raise is that preventing them from lying infringes upon their freedom of speech. But freedom of speech has never been the right to knowingly, provably lie without consequences. Especially not for money. Watergate was about proving the president of the united states was lying. The crime that freaked everybody out wasn't the break-in to tap his opponents' phones (El Shrubbo got away with plenty of warantless wiretaps). The crime people couldn't forgive was that he was covering up the truth. Woodward and Bernstein didn't infringe his freedom of speech by calling him out on the cover-up.

By the way, if you think this might be a coincidence, google the phrase "Overton Window". The first hit is Wikipedia explaining the concept that radical fringe groups can make less radical groups seem reasonable when they were previously dismissed as irrational. The second hit is a "political thriller" Glenn Beck wrote by that title. Yup, same guy. This is what they are doing, very much intentionally.

July 2, 2010

Ok, closing in on an Aboriginal Linux 1.0 release. I've got a 3 day weekend and a finite todo list. (Admittedly my todo lists are always fractal, spinning off new tangents as I try to process them. It's a bit like trying to read and close tabs on tvtropes, or catching up on twitter.)

However, my first in-progress thing is switching all targets over to using a baseconfig-linux. (Which I started doing because it makes it easier to switch them all to use devtmpfs, and helps the general "same behavior from all targets" goal anyway.) Right now, the kernel .configs are all bespoke things for individual targets, meaning beating common behavior out of them requires essentially redoing each config and then testing that it still works.

I switched over the ARM targets already, dropping a few symbols that the individual configs didn't need. (CONFIG_UNEVICTABLE_LRU was one, I should benchmark to see if it actually helps native builds under qemu.)

Now on x86_64 I'm dropping CONFIG_KALLSYMS_EXTRA_PASS=y, but keeping CONFIG_PM=y and CONFIG_ACPI=y (which are needed so the kernel can signal the board to power off or reboot, thus exiting the emulator). Dropping CONFIG_HT_IRQ=y. (Hyper-transport? How'd that get in there?)

The baseconfig has CONFIG_FW_LOADER which I don't think any of the emulated targets need.

I'm not quite sure how to deal with the ATA and SCSI drives. Should the baseconfig have support for both, or should each target's LINUX_CONFIG switch this on? How generic is this? Alas, I haven't got one universal board but how many variants are there really? Hmmm... I think CONFIG_IDE, CONFIG_IDE_GD, CONFIG_IDE_GD_ATA, and maybe CONFIG_BLK_DEV_IDECD are all baseconfig fodder. (Yes, the symbol names are wildly inconsistent here. Thanks for noticing.) But looking closer, CONFIG_IDE_GENERIC should probably be dropped.

Then the actual driver selection should probably be moved to the target config, so on ARM CONFIG_SCSI_SYM53C8XX_2, SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0, and SCSI_SYM53C8XX_MMIO should all go into the sources/targets/arm*/settings files.

Is CONFIG_IDEPCI_PCIBUS_ORDER necessary for x86_64? [read read read...] No. It only applies if you have more than one type of controller so multiple drivers are making /dev/hdX devices show up. And in fact on powerpc I'm forcing driver load order to put the PPC devices in a usable order, so this _has_ to be switched off there.

Wanna know why I've held off on doing this conversion so long? This is why. It's not hard, it's just _fiddly_, I have to diff all these configs and understand what the symbols do.

Why does x86 need CONFIG_SERIO_SERPORT but arm needs CONFIG_SERIAL_NONSTANDARD? Define "standard"?

The baseconfig has CONFIG_RTC_CLASS and such, which x86 didn't have but it looks useful even there, so leave it in baseconfig...

Ok, that's x86_64 converted. Let's try rebuilding that, and rebuilding armv4tl which had stuff transplanted out of the baseconfig...

Ok, i686: CONFIG_MPENTIUMII=y obviously. Drop PREVENT_FIRMWARE_BUILD. Does uClibc still need COMPAT_VDSO? I should check, try dropping it for now... Drop LEGACY_PTYS already. (But am I enabling devpts in the baseconfig? No, apparently not. Ah, but it only acts as a choice with CONFIG_EMBEDDED, otherwise it's hardwired to y. So that's ok then.)

Back looking at SERIO_SERPORT because i686 isn't defining it. It's some weird input core thing, says to look at Documentation/input/input.txt which says that the 2.5 kernel is "future use". Uh-huh. Currently only used for USB. Apparently this only matters if I want a /dev/input/mice aggregator through _devfs_. *shudder*. Ok, I can rip that out of x86_64/settings.

I _think_ that CONFIG_RTC=y is no longer needed if CONFIG_RTC_CLASS=y. (It's one of them legacy thingies.) So that can go from i686...

Ooh, CONFIG_MAGIC_SYSRQ is good. Ah, baseconfig already has it (but the previous x86_64 config didn't.)

I know I experimented with CONFIG_VFAT_FS and qemu's ability to generate a read-only FAT filesystem on the fly from a host directory last year, but it was too slow to be useful. So rip out that config stuff for now, and the NLS crap it drags along.

And that's i686 converted. And i586 and i486 should be fairly straightforward diffs against i686.

June 29, 2010

So a website called h-online.com seems to do reasonable kernel traffic summaries, but unfortunately they don't have their own section. Is it daily, or weekly, or what? How do I tell if I've missed one? The only way to dig them up is to wade through every other article they do (their suggested search pulls up other extraneous crap), so it's probably not worth following. Which is sad, they seem to have put effort into it. Oh well.

Ah, instead of an rss feed they have twitter @kernellog2 which I suppose is reasonable. They never Horribly Retweet anything, so presumably I can follow them. (Yeah, I know there's a greasemonkey script. This helps on Chrome and on my phone how?)

Ah, the Horrible Retweets issue is tracked here and they estimate it will be fixed "soon", as of last month. I can only guess they've outsourced this issue to BP/Haliburton.

What I don't understand is how twitter can be so amazingly tone deaf about this feature. They have a user interface problem, and if they'd just fix the darn _display_ of the things the way the greasemonkey script does (which doesn't help in other browsers, or on my phone) most of us woul be _happy_ about the new feature. Just give me the correct icon of the person actually responsible for this tweet showing up in my feed (and preferably have it _say_ @userifollow RT @useridon'tfollow at the start) then I don't care how they record it in the database behind the scenes.

The problem is that the current UI is disruptive and confusing, and it feels like my feed is being spammed by people I don't follow. If I didn't authorize this person to post to my feed, it should not look like they're posting to my feed. Why is that hard for twitter to understand? Twitter being responsible for spaming my feed, and insisting I'll grow used to the spam, does not give me warm fuzzy feelings about twitter. It makes me think they're clueless and got where they are entirely by accident. (Which probably isn't the case, but you have to wonder when they take THIS LONG TO GET SOMETHING THIS SIMPLE.)

June 28, 2010

Horrible Retweets are still broken, and Grelber is still down. (Cathy says they'll get back from Michigan to fix it Wednesday.)

I am totally under the weather today. Headachey, wanna nap. Possibly another sinus infection, although seems like a fairly mild one if so. Or just "it's monday, my sleep schedule's got moved three hours up"...

June 27, 2010

So the new server (quadrolith) is not automatically re-associating with the access point after a disconnect, and since it's a headless box I can't talk to it when its network goes down. I have to power cycle the machine to get it to come back.

The other problem is that the wireless AP goes away for 30 seconds about once an hour, not sure why, but it's long enough for both my laptop and quadrolith to lose the connection. According to /var/log/messages, this is what's happening:

Jun 27 17:20:31 quadrolith kernel: [ 2128.091150] No probe response from AP 00:1a:70:80:4a:53 after 500ms, disconnecting.

In theory dhcpcd or similar should detect when the wireless comes back and re-associate. In practice, I gave the sucker a static IP, and the wireless driver is really stupid. Imagine if conventional ethernet drivers did this, so that when I unplugged a cat 5 cable (and the MII transciever toggled the link status), the driver went "oh, eth0 no longer exits, let me remove that device", and then didn't put it _back_ when the cable got plugged back in.

My solution:

#!/bin/bash

while true
do
  ping -c 3 192.168.2.1 || /etc/init.d/net.wlan0 restart
  sleep 120
done

June 26, 2010

At Starbucks. Zombie Sinatra is once again playing. There's no internet here that I can access. (July 1st, Starbucks starts having free internet nationally. Until then, lack of internet access continues to be provided by AT&T.)

I miss the days when my phone could associate with my laptop. In theory I could get the nexus to do that (without even paying T-mobile $20/month extra for the privilege, which I think I'm still doing), but I'd have to essentially crack it and install a new bootloader on it, which just seems way too much like work.

I also miss the days when I could have one terminal window in a tiny font up in a corner, running a build or top or some such I wanted to track the progress of but not read very closely, while I did things in other full-sized terminal windows. OS/2 could do this in 1995. XFCE still can't today, changing the font size for one terminal window changes it for all of them (when it doesn't just crash them all).

Lateral progress. That's what really gets to me. The open source world is full of "upgrades" that lose abilities we used to have. (Other platforms have that too, but open source has no excuse for it other than the general "we really suck at user interface issues". We _fix_ this stuff for servers and build tools and such, but you can't run a regression test script on a GUI redesign. We suck at all things GUI.)

It's ubiquitous. For example, htop is in most ways superior to the normal top. But the original top shows the name of the binary each process is running, and htop shows the absolute path to the executable. I spent fifteen minutes looking, and couldn't find any way to get htop to show me "i686-gcc" instead of /home/landley/aboriginal/aboriginal/build/simple-cross-compiler/bin/i686-gcc (of which I see the first 18 characters on an 80 column text window. No, I'm not making the window wider.)

So to get all the other nice things htop does (bar graphs of CPU usage!), I have to accept this particular regression over "top". Ok, I can devote a couple hours to tracking down where to get the the htop source, reproducing its build environment, tracking down where in the source this is done, modifying it, building/testing my own copy, tracking down where to submit the patch to, and then waiting 6 months for it to make it into my distro repository. Or I can just run "top" as well to see what CPU-eating name I need to feed to killall.

And yes, it's my fault for not "engaging with the development community". Open source means that everything that's wrong with it is somehow my fault, even when it didn't used to do that and I'm not the one who broke it. I spent most of a decade "engaging" with this kind of issue, but new releases have just as many regressions as the old ones. It never ends, and after a while it's just not worth it.

This is why I'm still on Ubuntu 9.04: upgrades break too much stuff and it takes months to get my system beaten back into submission again. Although if I haven't moved my laptop to Gentoo yet, I'll probably install the Don't Panic release of Ubuntu on general principles. (Just because of the Hitchhiker's Guide references.)

And landley.net is down again. There are downsides to having my website on a server in somebody's basement. I should take Mark up on his offer of hosting it on the machine that's hosting impactlinux.com...

June 25, 2010

Back on the morning schedule again. Yay!

McDonalds' wifi today isn't routing packets. The login screen happens, says you're connected, and attempts to contact any websites anywhere timeout. (Complete lack of internet access once again provided by AT&T, the company that Just Does Not Get The Internet (tm). The company that charges by the byte, but can't even deliver them.) Luckily, I can just barely reach neighboring free wifi from the popcorn place down the street...

So I'm trying to figure out how Gentoo puts portage in the $PYTHONPATH. When I chroot into a stage 3, "emerge" runs just fine, but the one I installed complains that it can't find _emerge.main, which lives in the /usr/lib/portage/pym directory. When I run gentoo stage 3's python from the command line, I can import _emerge.main just fine, but $PYTHONPATH isn't set, and I can't find anything special in /usr/lib/python2.6/site-packages.

Asked Solar: gentoo patches python to hardwire in a magic path.

Ok, that's too disgusting for words. I'll drop a symlink in site-packages or something, I'm not patching it.

June 23, 2010

At home, the upstairs floor is being replaced today, which means the cable modem and router got moved downstairs last night, which means I can't ssh into the new server (quadrolith, I started at monolith a few servers back) because the wireless doesn't seem to reassociate after a router reboot.

I'm not sure why. Maybe the dhcp daemon is exiting? The problem with a headless box is that when it's not talking to the net, you have to dredge out a monitor and keyboard to talk to it, and the wireless card isn't seated strongly enough in the PCI slot that I really want to move the box while it's on. (The little metal plate didn't fit in a case this size, so I took it off).

If I reboot quadrolith, it happily reassociates with the router again. I should plug a cat 5 cable into the back of the thing and leave it there, so I can get in via wired connection next time I need to diagnose something like this...

June 20, 2010

Fairly productive weekend, both in the "packing up boxes of books so we can get the upstairs floors redone" and in the "banging on Aboriginal Linux" senses. I want a three day weekend, but alas it's not available just now.

June 18, 2010

Ok, layout of the new snapshots directory. The problem is, there's two ways to index this stuff. There's by date (snapshots/2010-06-18/alt-linux), and there's by category (snapshots/alt-linux/2010-06-18). The easiest way to expire snapshots is to have them sorted by snapshots/$DATE/stuff and then I can just rm -rf the snapshots/$DATE directory to expire all the categories at once.

I want to be able to expire old snapshots (say keeping only the last 30 builds) which leans towards snapshots/$DATE so everything I need to delete is under a single directory. But some people will want to bisect a bug, would "snapshots/alt-uClibc/$DATE" make that easier for them? (I could always add symlinks if so, but I'll wait for somebody to ask...)

June 16, 2010

June is the _6th_ month, not the 5th. Thus I just renumbered the #anchor tags on all the entries this month, probably confusing the rss feed to no end. (The downside of doing this by hand in vi...)

Got the new server set up downstairs, with a wireless card that's held in by wishful thinking and the friction of the PCI slot. (The little metal plate at the end was bent to fit in a much smaller case. Luckily, said plate unscrewed.) Alas, despite having an HDMI cable it doesn't seem to produce any output the TV recognizes. Probably have to install X and put it into graphics mode or something. Worry about that later.

I'm working on factoring out baseconfig-linux, and to start with I'm running:

sources/more/for-each-target.sh 'sources/more/migrate_kernel.sh $TARGET'

with packages/alt-linux-0.tar.bz2 symlinked to the stable version. (That pretty much just does an oldconfig and then squashes it back to a miniconfig.) I'm doing that because I'm pretty sure CONFIG_FILE_LOCKING went away, but it's showing up in my baseconfig difs and a quick grep found it in 14 miniconfig-linux files. Might as well start with something consistently sorted and not including symbols that no longer matter.

The above invocation needs to go into the linux-git cron job, too. Half the new kernel failures were config things changing, and as long as I've got infrastructure to automatically cope I should plug it in.

I also have a few lines to check packages/MANIFEST and to skip rebuilding everything if that's identical, because there's no point in redoing the build if nothing got checked in today. That makes the "expire old versions" logic slightly more fiddly, but I can probably just drop a symlink to the previous directory when I skip a build, then count directory entries and keep the last 3 dozen or so, and zap symlinks that no longer point to anything. Name the directories in an order that trivially sorts (2010-06-16) and the script becomes fairly simple.

But I still have to write it...

June 14, 2010

Broke up build.sh some more, now there's cross-compiler.sh (trivialish wrapper) and simple-root-filesystem.sh (so the logic to combine the native compiler and simple root filesystem can go into its own script and not in build).

Renamed STATIC_CC_HOST to CROSS_HOST_ARCH, and made more things listen to NO_NATIVE_COMPILER (including system-image.sh). So now I need to run a couple of test builds (one with CROSS_HOST_ARCH set to something other than i686, and one with NO_NATIVE_COMPILER set).

Meanwhile, I started working on a control image to run the busybox test suite. Oh _WOW_ the busybox test suite has bit-rotted horribly.

First of all, "make test" insists on recompiling busybox. Not only is there no way to tell it to run against the host tools (which is kind of necessary to regression test your test suite against non-busybox tools), but you can't even point it at a currently installed busybox binary. Even if you symlink "busybox" to the existing busybox binary, it tries to rebuilt it anyway. (I built support for testing against the host tools into the test suite code I wrote, and it's still there, it's just the makefile's assumptions hardwire around it.)

But it's worse than that: it doesn't work. I emailed Denys to take a compile error out of the hardwired "echo.c" to get it to run at all, and then found the result was swiss cheese. It's been broken for YEARS.

For example, commit 8d0a734d91ff197a removed the default value of "$bindir", meaning if you paid attention to testsuite/README and did "cd testsuite && ./runtest" like it said (which you _need_ to do to hardwire around at least some of the assumptions in the makefile), it no longer works because the option flags are never set.

Of course nobody noticed that the option skipping logic stopped working back around 2007 because they humored the Defective Annoying Shell. If you say #!/bin/bash it prints "SKIPPING" for those tests, if you use dash it doesn't print anything.

The code I wrote was always bash-specific, but back when I wrote it #!/bin/sh always gave you bash on every Linux distro ever. Rather than explicitly say #!/bin/bash when Ubuntu broke that, they made half-assed attempts at making it work with dash... but it doesn't. My way of cleaning it up is to make it properly specify its bash dependencies. (Building your own "echo" from source to make sure you have -n and -e is just silly. It's a bash builtin, and BusyBox already has an "echo.c" which this duplicates. Yes, the one that was broken until I asked Denys to fix it yesterday. Duplicating echo.c was not a reasonable approach to solving the problem.)

And while we're at it, testsuite/runtest:

# Set up option flags so tests can be selective.
export OPTIONFLAGS=:$(
        sed -nr 's/^CONFIG_//p' "$bindir/.config" |
        sed 's/=.*//' | xargs | sed 's/ /:/g'
        ):

What is xargs doing there? It's piping data through xargs... with no command. What? (Oh wow, susv4 says that if "utility" omitted the default is "echo". It's actually standard behavior. Weird.)

This is an ENORMOUS MESS...

June 13, 2010

Today Mark and I went to the Drafthouse's Back To the Future feast. (All three movies, six course meal.)

They had three DeLoreans parked out front, and a fourth DeLorean dropped off a surprise guest, Christpher Lloyd, who did a question and answer session after the first movie. (Yes, I'm aware he doesn't normally do personal appearances. The Drafthouse is awesome that way.)

Afterwards he did an impromptu meet and greet out front for ten minutes or so (ok, he got mobbed by fans, but extremely politely), and I asked him to sign the menu, which I gave to Mark. (Well Mark gave me his spatula from the UHF showing Weird Al was at, and Mark worships Christopher Lloyd the way I worship Weird Al, so it only made sense from a karmic balance perspective.)

Afterwards, one of Mark's friends (while agreeing to the general awesomeness of the event) expressed mild disappointment that Lloyd hadn't gone "and here's a surprise showing of a brand NEW Back to the Future movie for you to watch" the way Leonard Nimoy did at the Star Trek II showing last year.

It's possible the Drafthouse has set the bar just a _bit_ high.

June 12, 2010

The gentoo "wheel" thing can be removed by editing /etc/pam.d/su and commenting out the line ~~"be_extremely_stupid"~~ "auth required pam_wheel.so use_uid".

That makes a _much_ more usable system.

June 11, 2010

What are my current todo items...

Find out why Gentoo requires that stupid "wheel" group to let su work, and rip it out of the source code.

Clean up the Aboriginal kernel .configs to use a common baseconfig (the way uClibc does now). Use this to add devtmpfs to all targets.

The current portage tree tarball is 36 megabytes, with no releases (just snapshots). Do I really want that in the gentoo-stage1.hdc control image? Something that tracks how many _days_ out of date it is, when the hdc image goes months between updates?

What's a good alternative? Coming up with a sane subset of the portage tree that could be "emerge --sync"ed up to the full list later has the downside that it puts more load on the gentoo rsync servers, which are a lot more resource constrained than their wget mirrors serving compressed tarballs as static files. But if I defer the tarball download to runtime I can't even test portage.

Hmmm... Portage needs net access to run (it has to fetch the source tarballs), so needing net access to initialize the repository isn't a big deal. I should add some code to warn about a missing repository if you try to do anything with "emerge", and to fetch and extract the tarball before doing the rsync. (I'll just assume you have a writeable root filesystem because running portage on a read-only one is moderately pointless.)

Hmmm, maybe just an "emerge" wrapper with the real emerge in "emerge.real", and then once the tarball has been fetched and synced, "mv emerge.real emerge"... And the wrapper would exit with an error message if they ran anything other than "emerge --sync", and would download and extract the tarball and then do the sync...

Alright, that might be worth a try. Of course it adds rsync as a dependency. (I've been meaning to add rsync to busybox for years...)

June 9, 2010

So portage is more or less sort of together now, and now I need to download its database (the "portage tree"). This is a big tarball of text files, which you get a stripped down snapshot of during the base install (and extract into /usr/portage), and then update via rsync.

The problem is, the portage tree is a constantly changing thing. The snapshots are updated every couple of days, and the rsync is against a live set of servers that could change any minute.

This doesn't mesh well with the way download.sh works. It needs an sha1sum just to be sure it's correctly downloaded the entire file. (I have this thing for reproducibility. If the file changes randomly on the server, I want to know about it.)

The point is, this isn't just "the next file to grab and build". That's why I've been stuck on it for a few days. (Ok, that and not having any spare time.)

So, I can leave the SHA1 value for the portage tree snapshot blank in gentoo-stage1.sh, and grab a who-knows-what snapshot, which means running the same build twice can have different behavior. Or I can grab an arbitrary version and trust the mirror infrastructure to retain it, which is consistent but would go stale quickly.

Or I could just not install one at all, and require the target system to either fetch a snapshot tarball do an "emerge --sync" before it can install anything else with portage.

I can also put together my own cheesy little portage tree describing just the stuff I'm building (in all its mutant glory), and then let "emerge --sync" blow it away later. I'm not sure how useful that is though, since the point of installing portage is to then be able to use it to build more packages, and I can't even test that without a portage tree describing packages it can build. What I really need to describe the stuff that's already installed is either a /var/db/portage or a package.provided file...

Need to do more design work before the implementation can continue...

June 8, 2010

The server takes about 2 hours to build all architectures, so building the stable set of packages, then building uClibc-git, then busybox-git, then linux-git, and then all unstable, is about a 10 hour job.

Then again, that's pretty much what the server's _for_.

Oddly, the OOM killer doesn't seem to trigger before the server locks up. Possibly this is a kernel .config issue? (Not driving the server out of memory is the easy fix...)

June 5, 2010

Finally picked up the new server from Fry's: 4 way SMP, 2.67ghz i5, 8 gigs of ram, 1.5 terabytes disk, and _quiet_. (Meant to spend around $500, wound up spending around $800. Oh well.) Attempting to inflict Gentoo upon it.

I bought the pile of parts back on memorial day. Assembled the pile of parts. The result did not light up the VGA. Took the pile of parts back to Fry's and waved money at them to diagnose the issue.

Rather a lot's changed since the last time I tried to assemble an x86 box. (I've been dealing either with preassembled laptops, or with preassembled server side hosted 1U and 2U boxes that are often actually a KVM share being emulated on something I don't have to care about. I've taken apart a lot of embedded systems, but that doesn't help me with "The CPU now has its own 4-pin power connector? When did that happen? How many different fans does the inside of this box actually need, anyway? This CPU box has a giant heat sink and fan in it but I can't find the actual _processor_. Do the screws come with the case or with the motherboard?" And so on...

The Frys professional Frown Expensively At Things department took a week to tell me that the 4-way CPU didn't work with the motherboard the salesguy recommended to go with it, because Intel graphics are now sort of but not exactly on-die with the CPU now, but only in the 2-way processors hyper-threaded to 4-way, not with actual 4xSMP i5 chip I bought. (Well, the i5 heat sink and fan I bought that comes with a free CPU.)

The new motherboard I needed has no onboard graphics (not an option with this CPU, apparently; they could pull this off with a 486 but not with an i5), so I had to add a graphics card, and swap out the power supply too.

Alas, the new motherboard doesn't have an IDE port either, and I don't have a non-IDE dvd drive (used to have a USB one, but I can't find it), so installing's a bit tricky. But it seems that Gentoo has a page on making a USB stick bootable, so I did that. Which SORT of worked.

Specificially, it booted fine the first time, and let me partition the disk, but I suspect this disk has the 4k alignment issue LWN was going on about months ago, so I'm not quite sure how I need to partition it, so I decided to run some tests. Meaning I wanted to interrupt the format, and after several minutes of formatting a partition ext3 with no end in sight (it turns out letting it finish takes almost 8 minutes), I ctrl-c'd the format process... and it refused to die. I ctrl-alt-deleted the system, which decided that process was hung in D state and shut down anyway... And then the system wouldn't boot up again.

After the Pointless Boot Graphic went away telling me how great the motherboard is for gaming, I got a blank VGA screen with a blinking cursor. Left it there for 5 minutes: nothing. Hard power cycled three times: nothing.

Now I know exactly what this stupid BIOS is doing: when the hard drive wasn't partitioned it ignored it and went to the next boot device, but now that it IS parititioned it's loading a MBR full of zeroes and jumping to it. and not falling back to the USB stick anymore. And the Pointless Boot Graphic hid the prompt telling me what keys to hit to get into the bios menu. (It's not escape, not F1, not F12, enter, space...) The instruction card just said "enter the bios menu" without saying HOW.

I had to go out to the car and dig the motherboard manual out of the motherboard box (thank goodness I haven't thrown out any packaging yet) and read to page 27 to find out I needed to hit "DEL", and do so at the right _time_ in the boot sequence, to get the BIOS menu. (The people who think this is good UI can die in a fire. They work for msi.com.tw.)

The BIOS didn't have an option to change the boot order either. Its menu had "boot device: SATA drive" which could enable or disable, and "boot from other devices: yes/no". But telling it to disable sata and boot from other devices STILL tried to boot from the hard drive, and never fell back to USB.

However, it the bios menu also had an option to disable the Pointless Boot Graphic, and when I did THAT it said that I could hit F11 for a boot menu. And THAT let me boot from the USB stick again. (No way to set it persistently that I can see, but maybe it magically remembers what you booted last. There's no way to know except to try. Once I get the hard drive working presumably I won't need to boot from USB anymore without manual intervention, but if I did want to this motherboard would be going back to the store for a refund.)

To sum up: MSI is insane, would not buy again.

So anyway, now I've got gentoo's install disk booted up again, and I'm trying to figure out how to tell if this disk needs the 4k alignment thing. this page talks about how important it is, but doesn't say how to _select_ the alignment. Some disks do a weird XP adjustment, others just insist on 4K alignment or they have performance problems (and potential data loss on sudden power failure, which I'm more worried about). Linux Weekly News has its usual excellent coverage but again, a HOWTO it ain't. I want to A) test my drive to see if I need this, B) figure out _which_ alignment I need, C) feed the right options to fdisk to make it set the drive up correctly.

But how do I do that? "We'll fix the kernel someday, then we'll magically adjust userspace, and it'll all just happen for you" is NOT what I'm looking for right now. (Especially since I might be the guy who has to fix busybox's fdisk to do this right.)

The fdisk expert menu has the "adjust starting position" option for that first cylinder offset for ancient DOS compatability: is that relevant? Later partitions start at cylinder boundaries, so I think they just need the cylinder boundary to line up right...)

Ted Tso's blog has some reasonably useful information, and I _think_ the -H 240 thing will do what I want. (The "255 hardwired in lots of places" comment is slightly worrisome, but since I dunno what he's talking about I might be able to get away with ignoring it.)

So now the question is "do I need to do this?"

If I make 4 partitions with -H 240 (a 64 meg /boot to eat the initial alignment weirdness, 16 gig /, 2 gig swap, and the rest in /home), extracting the Gentoo stage 3 tarball into / followed by a sync takes 3 minutes, 50 seconds.

If I make 3 partitions with -H 255 it takes 5 minutes 31 seconds. Yeah, I need to do this.

Ah, _but_ I can make 3 partitions with -H 240 if the first is 2 gig swap, the second is 16 gig /, and the third is the rest in /home. Then extracting the gentoo stage 3 tarball followed by a sync into / also takes 3 minutes, 50 seconds. (The swap partition should eat the initial alignment hiccup from skipping the first 63 sectors so as not to confuse DOS (and thus XP), and I don't care about performance or data reliability in the swap because the data in there won't survive an unclean shutdown anyway, and I have 8 gigs of ram and don't intend to hit swap. It's just there to kick out stuff that's essentially never used and free up working memory, and to give the OOM killer heuristics something to work with if they're ever needed. Figuring out that the system is overcommitted and about to go all thrashy when there's no swap to thrash is kind of hard. Detecting that we're currently swap thrashing and something needs to be done about it is easier. And the performance hit from swapping at all is already so awful I don't _care_ about making it worse.)

I'd rather not have a pointless /boot partition because ATA-1 LBA is 28 bits of sectors (128 gigs) and ATA-6 is 48 bits (128 petabytes), and either way my 16 gig / partition is covered. Splitting / and /home lets me reinstall the OS without spending all day copying a terabyte of data back on to the thing, so that's probably worth doing.

June 4, 2010

When you rebuild cross-compiler.sh and build/cross-compiler-$ARCH already exists (because it died building uCLibc++ or some such after building $ARCH-cc), next run the include.sh logic detects build/cross-compiler-$ARCH/$ARCH-cc exists and sets the path to point there. Then the build does a rm -rf on the cross compiler directory (to rebuild it), and dies because it hasn't got a cross compiler.

This means that the $PATH adjustment has to happen after blanking the $STAGE_DIR. At the moment, STAGE_DIR gets set in read_arch_dir (right before the $PATH is adjusted), but it gets blanked in check_for_base_arch(), which system-image.sh doesn't do thus that can rebuild the kernel image and squashfs separately while the others rebuild everything.

Alas, that's too subtle. Gotta move it and make it explicit.

June 02, 2010

So I bisected the mips64 thing to uClibc git 9c343fd4030dc and emailed the committer (Khem Raj) who had a patch for me the next morning, which fixed it. (Yay!) I vaguely recall other things are still wrong with it, but I'm calling it good for the moment and moving to other targets.

I poked at powerpc64 but I can't find how to enable that in uClibc. (I thought uClibc had support for this, but maybe it doesn't?)

The sh4 target is still horked, but now it seems horked by something new:

  AS lib/crt1.o
libc/sysdeps/linux/sh/crt1.S: Assembler messages:
libc/sysdeps/linux/sh/crt1.S:50: Error: offset out of range
make: *** [lib/crt1.o] Error 1

That's nice.

I vaguely recalled that the kernel was horked, but that's implying that uClibc is horked? Let's check the build script repo: hg 1020 built sh4 fine (but qemu has changed enough that the result doesn't run). 1060 built fine. 1080 built. 1090 built. 1097 (current tip) built.

Ok, something is weird here.

May 31, 2010

Trying to give Vladimir Dronnikov an account on securitybreach, I accidentally lobotomized it. (The ssh config was insanely locked down, with an explicit list of users allowed to log in, you could only get in via ssh key, not via password, rss keys weren't allowed...) Attempting to tweak the config, I screwed up the sshd so it wouldn't let me log back in either. Sigh.)

Since Fry's is having a memorial day sale, I thought I'd use this as an excuse to grab a cheap server. (Not as powerful as securitybreach, but less than a third what that cost and enough to run cron jobs on. Plus with a terabyte and a half disk I can back stuff up to, plus I won't be bogging down somebody else's net connection if I send a lot of network traffic its way.)

Unfortunately, motherboards seem to have grown a new connector since the last time I put one together (the CPU has a separate power connector), and if you don't plug that in it's like running a motherboard without a heat sink back before the K5 grew a thermal diode. (Might have destroyed a brand new CPU, which would suck.) Currently Frys' tech support is charging me money to look at the thing and confirm that. (Ok, technically they're charging me to have it sit in a week-long queue. Wheee.)

So, undefined reference to __getdents64 in readdir64.c... That means "good" is expressing that bug and "bad" is not expressing it (in this case, expressing some other random bug)... Oh, hey, it hit one with a bug I know how to fix. Patch that, and... Hey, commit 60972302801 is expressing the bug I'm looking for. Good to know. Ok, commit 7160571c5e3ed fixed the getdents thing. Ok, git bisect reset between ab600d2ad0327 and 60972302801. (Note how if I was using hg I could say "between 1027 and 1095". Use local repo commit numbers that _mean_ something. Oh well. Random arbitrary commits...)

Now the ldso biuld is dying with O_CREAT redefined and MAP_FAILED undeclared. So that becomes the new "good" while we look for the patch to fix _that_... Meaning udp_io.c dying with "invalid application of 'sizeof' to incomplete type 'struct in6_pktinfo'" is "bad" just because it's the wrong error. (Yes it's still such a crappy development tree that you can hit a half-dozen unrelated bugs trying to track down the one you want.)

Ooh, 997d8efec66ff513 works properly. (With several unrelated bugfix patches applied, anyway.) Ok, the bug I'm looking for was introduced after that, so start a new bisect from there...

May 30, 2010

Figuring out why mips64 dynamic linking stopped working is, in theory, a question of git bisecting the uClibc repository.

Unfortunately, the uClibc developers are a frustrating lot. They fork off a separate branch for each stable version, but don't tag the previous stable version in the history of that branch. The tag is applied within the new branch. Meaning I can't just bisect from 0_9_30 to 0_9_31, I have to figure out a common ancestor between them largely by guesswork.

The next problem is that git bisect's kryptonite is a repository that doesn't consistently build. Testing intermediate versions is useless if those intermediates regularly don't even _compile_, and if each version you test is broken for different reasons. You can change what you're bisecting for midstream (now I'm looking for the patch that fixed this error), but finding "the last version that died with this particular error" is often a case of finding not the fix for this problem but the point where a new bug was introduced that prevented it from getting as far as the error you were previously seeing.

Yes, this means you need to REGRESSION TEST YOUR DEVELOPMENT VERSIONS. Run the test suite against your patches and if it doesn't work, FIX IT. Don't let the tree sit there for dozens of versions with a show-stopper error preventing you from testing something while you go off making unrelated changes that introduce new bugs.

Grrrr.

By the way, git's good/bad terminology is stupid. When I'm looking for the last version that died with this error (so I can backport the patch and test _under_ the error), "good" means died with this error and "bad" means it didn't. Keeping "success==failure" straight in your head during an entire bisect with several minutes of build between each cycle is a bit hard, and if you get it wrong once your whole bisect is off. Especially when you've changed what you're bisecting for five times in a single session (because there's now another bug layered on top of the one you were hunting for)...

Let's just say that "git bsect log" and "git bisect replay" are your friends here. And this process is very, very time consuming.

Oh, the other problem is that I have to bisect over such large areas, that magnifies the "low quality repo, frequent breakage" problem. The solution for that is time based releases and someday I hope the uClibc guys start doing that. Time based releases mean the repository doesn't have a chance to diverge very far between "the last time it worked" and "the next time it's expected to work". The reason the Linux 2.5.x development cycle is not being repeated is it doesn't scale: if the codebase hasn't worked in two years tracking down any specific issue is virtually impossible, you have tens of thousands of changes to eliminate as the possible culprit for _this_ bug, and you're fighting off 11 other random behavior changes screwing up your test while you search. With regular stabilization points there are a lot fewer suspects when something goes wrong, and it's possible to isolate the one you're looking for..

So, undefined reference to __getdents64 in readdir64.c... That means "good" is expressing that bug and "bad" is not expressing it... Huh

May 29, 2010

Renamed my firmware/firmware directory aboriginal/aboriginal. (The project rename's getting serious...)

I tend to have two-level work directories like that. The first one is random project-associated files I've collected that aren't part of the actual repository, an the one under that is the repo. So for linux/linux I have 8 gazillion mbox files of lkml, various patches, a copy of "sparse", and so on in the upper of the two directories, and then linux/linux is the git repo (which I can "git clean -fdx && git checkout -f" without losing anything important).

It also means if I ever need to do a clean checkout of the whole project but don't want to zap my repo directory (with uncommitted changes), I can do so as aboriginal/temp or some such. (And generally "temp" as a directory name means I can zap it if I run across it later and don't remember what it was for.)

Sigh. Using Linux as a desktop continues to suck:

[502398.261995] iwl3945 0000:0b:00.0: PCI INT A disabled
[502402.892107] cfg80211: Calling CRDA to update world regulatory domain
[502403.109854] cfg80211: World regulatory domain updated:
[502403.109859] 	(start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[502403.109862] 	(2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[502403.109865] 	(2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[502403.109867] 	(2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[502403.109870] 	(5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[502403.109872] 	(5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[502403.185757] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.26ks

And my wireless card no longer works. I wonder if a reboot would fix it?

Yup, it did.

So armv4eb should work if the kernel lets you specify big-endian support, which means I need to add a patch so versatile board sets ARCH_SUPPORTS_BIG_ENDIAN, and update the miniconfig-linux to select CONFIG_CPU_BIT_ENDIAN, and...

qemu: hardware error: pl011_read: Bad offset 6c8

CPU #0:
R00=00000000 R01=00000183 R02=00000100 R03=00000000
R04=101f36ca R05=00000000 R06=00000000 R07=00000000
R08=ffffffe4 R09=00000000 R10=00000000 R11=00000000
R12=00000000 R13=00000000 R14=00010088 R15=00010000
PSR=000001db ---- A und32
Aborted

Well, it's different. Time to bug the qemu mailing list...

May 27, 2010

So, Gentoo From Scratch:

Turning a "fluffy" host (which already has a full development environment) into a minimal Portage environment is basically just:

Setup a toolchain
Install the base portage tree
Install portage.
Update portage tree (emerge --sync).

Then to do stage 2 (rebuild everything using portage), you more or less do:

for i in ncurses zlib sed bash wget baselayout-prefix xz-utils m4 flex bison perl coreutils findutils tar grep patch gawk make bzip2 python eselect-python eselect file pax-utils
do
  emerge --oneshot --nodeps $i
done

eselect python2.6
export PYTHON_ABI=2.6
FEATURES=-collision-protect emerge --oneshot portage

Not quite sure about that list, stage 3 has something like 80 packages in it, how many of them are needed to rebuild itself and how many are supplied for building future packages, I'm not sure. The problem is, anything that's in stage 3 never has its prerequisites tested. If another package depends on a stage 3 package, its build won't fail even if it doesn't explicitly list its dependency because you're always building on top of a fully populated stage 3. (Mark poked Solar about a few of these, and they got fixed. Dunno what's left...)

Stage 3 is doing "emerge world", under a portage-based system that was built using portage.

The problem is making this work on a system that ISN'T fluffy. Specifically, I want to replace most of the for loop list with busybox defconfig, and of course supply my own distcc toolchain with package.provided entries so it doesn't try to rebuild it. Plus the uClibc vs glibc stuff, but that's already got virtual/* hooks. There's no virtual for "sed", "tar", "grep", "patch", "gawk"...

May 25, 2010

"The enterprise is a lagging indicator." - Matt Assay. (Now _there's_ a truism.)

Sturgeon's Law: 90% of everything is crap.

Howard Tayler's corollary to Sturgeon's Law: The Internet has proven that Sturgeon was an optimist.

The spelling of the words "corollary" and "correlate" is designed to trip people up. English is kind of obnoxious in places.

May 24, 2010

Excuse me, all that oil in the gulf of Mexico? Why is it called a "spill"? Was it ever in a container other than the ground?

BP was looking for oil by drilling holes in the seabed. They found some. Oil and natural gas float. They were apparently surprised by this.

The oil is coming up through the hole they drilled in the seabed and floating to the surface. It's a "gusher", just like East Texas in the olden days. Except that pooled on the ground where you could scoop it up, and this stuff floats to the surface where wind and waves have fun with it.

On the surface people could walk right up to the gusher. (If they didn't mind getting soaked with oil, and they'd probably need to hold their breath, but historically an awful lot of people _did_ walk up to gushers and deal with them by hand. Of course this was assuming all that oil and natural gas hadn't caught fire yet, in which case you'd call a man named Red Adair who would put it out in exchange for a lot of money. But they knew how to do that. Heck, they got the whole of Kuwait put out in eight months and that involved finding and defusing land mines around the oil wells.) The point is that on the surface you can cap a gushing well with a standard construction crane, and you seldom have trouble SEEING it.

This hole is a mile underwater, where there's no light, and where the pressure would crush most manned submarines, let alone humans. Just getting a video feed of the thing involves special equipment that takes hours to lower into position once it's on site. Said camera is at the end of a MILE of cable, which is hard to make work on the surface.

Maybe they didn't expect a gusher? But every oil well underwater is at least potentially an artisan well, because the ocean acts as a naturally occurring water injection pump. And the deeper you go, the higher the pressure. So if it's not stopped, it'll continue to flow like that for years, until all the oil in the deposit has been squeezed out by the water on top of it, or replaced by seawater. Simple physics.

So, offshore drilling is dangerous because A) oil floats and tends to spread while doing so, B) the pressures down there are immense, C) we can't send people down to fix it when something goes wrong. This is not new information.

But with oil heading back towards $5/gallon it's also darn lucrative. We've already found all the oil on the surface (see "peak production"), and deep underwater is where the remaining oil _is_. The money is overcoming the difficulty, but not the danger.

The gusher's been going for a month now, and is bigger than they thought. (Doesn't large quantities of liquid flowing through holes tend to make them bigger in general? Or was the little dutch boy sticking his finger in the dike doing so for his own amusement?) BP tried dropping the Astrodome onto the hole to collect the oil as it floated up, but it clogged. They tried a smaller dome, but it turns out the ocean has currents in it (who knew?) that blew the thing off course or tipped it over or something. They're not being very forthcoming with details, because this is all proprietary information apparently.

They're also trying to drill _another_ hole, although I never got a clear explanation of how this was supposed to improve matters. Hair of the dog, I guess. (Drilling got us into this situation. Drilling is what they know how to do. Therefore, this is not a syllogism. Still, I imagine them going "no wait, I can do this" after it blew up in their faces, and desperately attempting to re-roll the crit fail after the fact.)

Their current plan is to scrape a whole lot of mud over the hole they drilled, mix in some concrete, and hope that plugs it up. They have a special name for this plan, a "top kill", designed to make it sound like they know what they're doing. I like how it's referred to as "a top kill" instead of "the top kill" or just "top kill". They're implying they do it all the time, this attempt is obviously one of many. I guess it just took 'em a month to remember?)

The official Republican energy policy was, and still is, "drill baby drill". And of _course_ Haliburton (the company Vice Sith Dick Cheney used to run) was responsible for making sure this particular well didn't leak.

My objection to the Republican party isn't that they're greedy, self-centered bastards. It's that they're stupid. They're willfully ignorant, anti-science bigots on the wrong side of every significant technical issue, from "intelligent design" to global warming. Their tactical understanding gave us two ongoing wars essentially replaying Vietnam. Their understanding of economics gave us Herbert Hoover's original Great Depression, the Regan/Bush "Savings and Loan Crisis", and El Shrubbo's 8 years gave us a string of disasters from Enron/Worldcom to Bernie Madoff and the recent bank bailout.

I'm not a fan of the democrats. I'm just vehemently anti-republican. You shouldn't have your finger on the button if you can't pronounce "nuclear", and reading the description of actions leading up to the accident reminds me similar descriptions of Chernobyl. Operated by people who never dream anything bad might ever possibly happen. It just never occurs to them.

May 23, 2010

Banging on mips64, which is a very screwed up platform. This is new:

what: symbol 'ntp_gettime': can't handle reloc type 0x3 in lib '/lib/libc.so.0'

I wonder what that means?

At least it's not segfaulting anymore. For some reason, it built busybox just fine, but ccwrap is segfaulting. And when I replaced ccwrap with hello world, that segfaulted too. But when I built it dynamic instead of static, I got that message.

May 22, 2010

I SAW WEIRD AL IN PERSON TODAY! With Fade and Mark and Camine. Said I could die happy. Fade advised against it.

(It was the Alamo Drafthouse's 3pm showing of UHF. They threw spatulas into the crowd. Mark got one, and gave it to me. I'm pondering having it bronzed.)

Sent Camine home earlier than expected because Fade is (understandably) burned out on socializing this week. My fault: Beth from Ohio was visiting all week (work-related training), and Camine's legal first name is "Beth", and I failed to communicate to Fade that these weren't the same person and/or event. (I also got got when things were happening wrong; I need to get the calendar in my phone working.) So Fade's a bit burnt out entertaining just now. On the bright side, a quiet night at home watching some Netflix stuff means I get to program. :)

Finally checked in and uploaded the blob I've been (very very slowly) working on for the past week. Whipped up a uClibc patch to fix the x86-64 target, now I'm banging on the others: armv4eb, m68k, mips64el, sh4, and sparc.

My static redo broke sparc. Easy enough to fix: it's now BUILD_STATIC=all instead of =1.

Hmmm, the native builds are broken:

./configure: line 524: can't create Makefile: Read-only file system

Looks like the /home mount is screwed up, probably by the poking around in the init script trying to get the tmpfs not mounted when / is writeable...

You know, chrome crashes much faster than mozilla did. Mozilla would take a while to go down, and pop-up an error box you had to dismiss. Chrome is just "zap, it's gone", leaving you going "what?" at the keyboard as a window with 40 open tabs disappears.

I remember the comic book Scott McCloud did introducing chrome and its revolutionary new design of having each tab be a separate process that could crash independently (or be killed independently, presumably if there was some way to associate resource consumption with tabs). Too bad they didn't decide to actually implement that. No, instead they sucked in flash as a shared library, without a wrapper, so you can't even do the "killall npviewer.bin" thing you could with mozilla. Meaning Chrome is even MORE of a giant monolithic hairball than mozilla, they just advertise that they're less of one. (Nope, not a Microsoft product. Yeah, I thought they'd patented that technique too...)

May 21, 2010

Ooh, this video of Vint Cerf has some really nice internet history stuff around the 1 hour, 10 minute mark.

May 20, 2010

Saw Cory Doctorow speak at Bookpeople today. Man gives good speech. (And remains the best "Eddie" Pengicon has ever had.) Oddly, last time I bumped into him was randomly in the airport, while we were both waiting for planes. Seemed rude to interrupt then, and this time there was 30 seconds per book signed, (which was Fade's since the book was dedicated to her).He's at a thing down the street now, which Beth (of Ohio LinuxFest, who is in town this week for work-related training) is attending. I'm at the Popeye's down the street, not because I need more food but because I need a place to commune with my laptop. (I'm not up to a party right now, instead I want to finally FIX the darn hairball I've been trying to check in to Aboriginal all week.)

So, the uClibc build is going nuts, in a non-obvious way. Well, the result is obvious, but the cause isn't.

The configure is this:

make -j 1 V=1 CROSS=sparc- UCLIBC_LDSO_NAME=ld-uClibc KERNEL_HEADERS=/home/landley/firmware/firmware/build/native-compiler-sparc/include PREFIX=/home/landley/firmware/firmware/build/native-compiler-sparc/ RUNTIME_PREFIX=/ DEVEL_PREFIX=/ install

Which looks fine, and I'm pretty sure hasn't changed recently.

Sigh. I thought it was the FROM_ARCH redo not propogating the new values to all the places it needs (which is indeed an issue), but it turned out to be that ccwrap was built dynamically linked when it should have been static. Turns out uClibc's error checking is kind of iffy in places...

May 18, 2010

So, the Google Nexus.

It sucks as a podcast player. It's obviously quite capable of doing it, but the mp3 player won't background. When I switch focus to something else while listening to an mp3, the MP3 stops playing 5 seconds later. And I can't resume it either, I have to start over from the beginning. Seeking is broken: the forward and back buttons change the progress indicator, but not what part of the audio is playing. Tell it to jump ahead 10 minutes and it'll skip maybe 5 seconds, so now the progress indicator is all wonky and will continue advancing past 100%. The "notification" sound totally obscures the audio, so if anything notification-worthy happens the audio cuts out for five seconds while it plays the notification, and then resumes with a portion missed. And if it loses signal partway through and can't download the rest of the file, it just _aborts_ back to the web browser, losing all context. (And showing a gratuitous blank web browser page.)

There's a theme here: it doesn't want to pause and retain context. It either aborts or drops content if anything happens. The "we don't multitask" iphone at least has the idea of pausing tasks, saving state, and coming back to them.

Sometimes Nexus wants to download things instead of playing them streaming. There's no obvious way to delete these files. I cleared the download menu, but later downloaded a text editor which lets me navigate the filesystem... and it shows me the old 80 megabyte video I downloaded days ago. I could presumably save a text file over it, but that's about it.

There's no built-in way to navigate the filesystem. You can't browse the contents of your phone and delete files. It came with a couple dozen bad MP3s that I don't want to hear; I have no idea how to delete them. I have no idea how to put more onto it, either.

The built-in keyboard has a "contextual" button. In the web browser it's ".com", in google talk it's a smiley-face button (colon, dash, period). But you have to switch to the alternate keyboard if you want a comma.

The "developer" model has no shell prompt. I'm not sure what's developer about it.

May 17, 2010

Ok, disentangling the whole FROM_ARCH/FROM_HOST thing. It's been a mess all along, and previous attempts at beating reasonable behavior out of it didn't clean it up nearly enough.

First, FROM_ARCH becomes FROM_HOST. (Since host/target have established meanings.) Have native-compiler.sh set it, not sources/include.sh. That way it can stay blank if necessary.

Alas, the previous round of cleanups hasn't been checked in yet (still doesn't quite work in places), so this is probably getting mixed together into a single checkin.

May 15, 2010

I redid BUILD_STATIC to take a comma separated list of packages, which is nice except that ccwrap is sort of its own package, and the uClibc utils (ldd and such) should be built static even if uClibc itself isn't. And then there's the fun corner case that ldconfig is built by the main uClibc build, not by the utils build, but uClibc itself should have .so versions of the libraries it builds. (Oh, and a fun corner case: why is ldconfig not part of "make utils"? Emailed the list about that, UCLIBC_STATIC_LDCONFIG as a config symbol seems like the worst kind of special case...)

Making individual config symbols for uClibc-utils and ccwrap seems a bit fiddly. I could make a "misc" binary category, which is a bit non-obvious and non-orthogonal. Or I could slave these to gcc (um, gcc-core I guess). Or I could make them static all the time, at least when we're not doing a "simple" toolchain build (and thus potentially linking against glibc), which is probably the least objectionable option.

Came up with a fix for the uClibc 0.9.31 fcntl64 thing. The guard symbol needed is __LP64__ (because __SIZEOF_LONG__ and __SIZEOF_POINTER__ aren't available in gcc 4.2, but that one is). Yes, I waited over a month for the uClibc list to fix it. They never did. Sigh.

Hmmm, FROM_ARCH is a problem. When it's blank that means we're making a simple cross compiler, but when it's blank read_arch_dir sets it to a default value. So you can't externally force it blank in sources/more/test.sh.

Need to untangle that, track down every FROM_ARCH user. Also, there should be a cross-compiler.sh wrapper instead of build.sh doing magic...

May 13, 2010

Git has been grinding away for 20 minutes trying to pack my Linux repository, rendering my system essentially unusable due to constant disk access. I didn't ask it to, it just decided it had been too long, like Ubuntu's "I don't care that _you_ want to reboot, because _I_ wanna fsck" logic.

Auto packing your repository for optimum performance. You may also
run "git gc" manually. See "git help gc" for more information.

Mercurial never needs to pack its repository, because mercurial's file format isn't stupid. Mercurial's clone URL and the built-in human readable web repository are the same URL served by the same code. Mercurial gives me obvious repo-local linear enumeration of a given branch.

Alas, Linux used git, and everybody else copied Linux...

May 12, 2010

Portage (gentoo's package manager) is frustrating. Simple things are impossible to look up. How do I get a package to tell me its immediate dependencies? (The closest I've found is "emerge --pretend --emptytree package" which gives me every dependency all the way back to the C compiler.) How do I query a package to get its description? (Portage only seems to have one line of description for each package (the DESCRIPTION= field in the ebuild file), instead of the detailed descriptions dpkg and rpm give you. But how do I get the tools to show that to me without going and looking at the ebuild file with "vi" or "less"? Presumably there's a way, but the man page is written by people who this apparently never occurred to.)

Alas, portage's documentation on "how to use it" and "how it's implemented" are both tangled together and split into a bunch of different places. There are separate man pages on portage, emerge, make.conf, and so on. The gentoo handbook tells you some of what you need to know, and there's a separate portage handbook. Then the gentoolkit package supplies equery (how that's not part of the base portage I don't understand)...

May 11, 2010

Open source development avoids Brooks' law scaling limits by being loosely coupled. The upside is we don't spend all our time giving status reports and synchronizing with each other, which can be a pain even on small proprietary projects.

The downside is the loose coupling inserts a lot of brownian-motion latency into the process (a priority for me isn't necessarily one for you), and it's inefficient in that a lot of the work we do has to be thrown away or redone when we get ready to merge it.

But these costs remain more or less constant. They aren't strongly affected by the size of the project, nor do they require strong coupling mechanisms such as having everybody on the project physically located in adjacent offices instead of living in different time zones and communicating via email.

We also have working conventions to mitigate open source's costs, which is where "release early, release often" comes in. Submitting prototype code (and checkpoint releases) as soon as possible detects collisions early (so you don't spend a lot of work reinventing the wheel), lets the code review portion of the merge costs be spread out over the project, and lets you incorporate feedback that would otherwise cause rewrites into your design phase so you don't waste time implementing and polishing a lot of code that you'll just have to do over to get it merged. It also mitigates the brownian motion latency a bit, because if you get diverted to work on something else for a while, someone else to whom it's a higher prioerity might volunteer to finish your unfinished code rather than waiting for you to get back to it.

May 10, 2010

So apparently gmail's spam filter is eating all copies of messages I send (in that it's not sending them back to _me_ when I've configured a mailing list to do so), meaning my mailing list threads are chopped up because I don't have copies of the messages being replied to in the archive. (And I don't see that messages actually went out unless I look at the web archive of the list.)

I've tried to log into the web interface to see if there was a way to make it stop this (or at least fish around in the spam folder and add that to my pop trawl), but I apparently don't remember the password.

The administrator password reset mechanism won't let me do a page on the website this time, it wants me to set up a new CNAME record. Last time that took over a week to propogate to the secondary server, which is why I'm only noticing now that I'm not getting these messages because the lists were still routing through my old server until this weekend.

This is why I avoided gmail all these years. Giving up control to a system where I can't get in touch with a human when (not if) something goes wrong really isn't my strong suit. Now if my email client stops having the pre-recorded password in it, I lose access to the account entirely, and I can't just log back in via ssh key to fix it.

May 8, 2010

Shuffled Aboriginal Linux code around so the native-compiler and root-filesystem stages are separate (and collated by build.sh). Bit of head scratching to figure out why this lobotomized busybox, but it turns out that "cp -a" will overwrite files rather than replace them, so all those symlinks to busybox provide a minefield. (The corruption is that sh and strings are conflicting, thus busybox gets partially overwritten with "strings", and amazingly enough the result doesn't segfault. But it's really, really confused.)

I can fix it by changing trimconfig-busybox not to build the conflicting commands, or by doing "yes 'n' | cp -i". But I'm not quite happy with the design yet.

The two stages should probably be collated by system-image.sh, because build.sh collating them is a violation of the design idea that build.sh never does anything on its own, but merely calls the other stages in order. Also it's still a layering violation (something other than root-filesystem.sh is writing to build/root-filesystem-$ARCH).

The problem with this is there are three different ways to do system imaging, so I'd potentially have to implement collating three times. (Plus who knows what future file formats, although if it grows much more I might split it into system-image-ext2.sh and system-image-squashfs.sh and so on...)

The other thing is that teaching system-image.sh to collate two sources of input isn't really _its_ job either. Hmmm... Really a question of which is less ugly. (Building the Linux kernel isn't really system-image.sh's job either, but it's got to go somewhere. Adding an extra stage in between to contain a new combined filesystem seems silly, and a gratuitous waste of disk space.)

Hmmm... Squashfs doesn't seem to have an opposite of --keep-as-directory. If you want it to collate two directories together, it won't. (And alas, dir1/. dir2/. doesn't fool it.) So it looks like the actual copy is required.

On an unrelated note, for a while now I've wanted to rename thee "$ARCH" variable with "$TARGET". Not sure it's a good idea. $ARCH is consistent whether we're building a system image or running in the emlator, but $TARGET becomes $HOST after that context switch...

May 6, 2010

The android/nexus GUI is really starting to annoy me. There's some kind of system tray at the top of the screen with icons that that seem to indicate running processes, but I can't click on them and I can't get it to tell me what they are. What's the big M? What's the triangle with the exclamation point in it? I can sort of go into settings->applications->running services (yeah, obvious place to look isn't it?) and get a list of running processes, and kill them from there, but that's not showing the same thing. It's the difference between the output or ps (or top) and the icons in your desktop's system tray.

I can't sign into Google talk anymore. It just sits there giving me an endless whirling progress indicator when I try. There's no explanation of why it might not be working. All I can do in the pull-up menu is "cancel signin". I can't even get it to tell me what kind of credentials it's logging in with, that's presumably somewhere else in the phone. (Perhaps under "settings" because the google account is _special_, and other accounts like twitter and youtube and IRC are all inferior other things not central to the phone's personality. Psychotic little...)

The thing will chirp loudly at me, but then the unlock screen gives no hint of _which_ application caused the chirp, and when I unlock it to the desktop it shows me the last application that had focus (or the desktop) again with no hint of why the chirp. I have to guess: was it messaging, or talk, or something else? Check 'em all and hope I find it. (I think the cause of the current chirps is google talk timing out, but it's not SAYING it's timing out. When I unlock the phone it's back at the desktop, and if I go back into google talk to see if it has an error message, it starts the login attempt over again, meaning it chirps at me again a minute later.

Oh, if I have the volume turned all the way up it's annoyingly loud when sitting on my desk, but still too muffled to reliably hear it ring when it's in the android case in my pocket. (But if I don't put it in the case, won't the screen get scratched?)

It has a nice standard headphone jack, and in theory this thing is an mp3 player, and it has a "music" icon that lists lots of mp3s that come with it (none of which I've ever heard of, haven't bothered playing any yet, I want _my_ music on it). But how do I get mp3s onto and off of the device? The "music" icon doesn't have any import/export capabilities. There's no file browser icon I can see. I tried typing "file:/" into the browser bar and it GOOGLED IT. (Look, if I type something into the URL bar I don't want the search engine triggering. If I wanted to go to google, I'd go to google.)

I bought an A->B USB cable (actually bought a new one, can't find the one that came with it) to plug the phone into my laptop... and Ubuntu 9.04 doesn't seem to recognize the thing. I can't mount it, or ssh to it, or anything like that. It sees that it's there and gives me an ID string, but there's no obvious way to interact with it. (Do I have to download some package to do this, ala the iPod? One year old is too old for Linux to talk to this thing?)

Even though I'm paying the extra $20/month for laptop tethering (the ability to use my phone as a modem to connect my laptop to the internet), there's no obvious way to get this sucker to do that. (That one's really a T-mobile thing, and probably I have to visit them and see if they've got an app for this.)

May 5, 2010

Boingboing says anonymous comments might never get posted, so here's the reply I did to this, specifically the first comment about "And politicians really wonder why there is apathy amongst voters?".

There's only apathy until it tips over into rage.

Unfortunately, the lighting rod Mr. Potter's crowd is using (that guy who runs Faux News, whatsisface. Sits in a wheelchair, would have gotten away with bankrupting Bedford Falls' savings and loan if it weren't for those meddling kids pooling their money and taking collective communist action. No, not Limbaugh or Beck or any of the vaguely potato-shaped crowd. Their boss. More like a raisin. Yeah, that guy.)

The lightning rod they're using for rage is the "Tea Party" crowd. They get together and sing "a very merry unbirthday" and demand an end to "plate tektonix" and other things they can't spell. As far as I can tell, their purpose is to make voter rage look as uncool as humanly possible, so the rest of us aren't tempted to indulge in it. They're also carefully orchestrated to be damn ineffective so that if anybody else does decide to get off the couch, the context they have to operate in as utterly useless as possible. (Don't make an appointment with your senator to try to speak to them in person on an issue, don't campaign for candidates you like during the primaries, don't pool your money and hire your own lobbyists... No, go out to some public space and hold a sign proclaiming the end of the world is "nye".)

I'm continually amazed that people can hear about the "Tea Party" without trying to work out who is the Mad Hatter, who's the March Hare, and who is the Doormouse in this scenario...

May 4, 2010

Happy Star Wars day everybody. (The original GPLv2 series, of course. The GPLv3 versions with Jar Jar binks were not worth celebrating.)

The pun is "may the fourth be with you".

This evening, it's Laundry and Aboriginal Linux. (Remember, Firmware Linux got renamed recently.)

I've been working on converting Gentoo From Scratch into an hdc build for a couple months now, but it kept spinning off into random unrelated infrastructure improvements. Well, tonight I finally checked in the first chunk. (Nothing that interesting, it builds zlib and ncurses. Next up is Python.)

One of the big things this script is doing is copying the root filesystem into a subdirectory to set up a chroot. It's easy to "find / -xdev" but it turns out that tar and cp and such really don't want to both create subdirectories and not recurse into subdirectories listed on the command line. The command that will actually do what it's _told_ is cpio. (Yes really. It's brainless enough to _obey_, which is useful.)

Once that's working, the rest of Gentoo from Scratch isn't that hard to move over.

(I've also moved mklfs.sh out of the example source code directory; that should be an hdc build too.)

May 2, 2010

Camped at The Donald's for the first time since I got a new day job, and the internet is horked.

It's sad, really, that AT&T and Wayport, together, can't manage to deal with basic wireless infrastructure on a consistent basis. The kind you get by installing a cable modem and plugging a linksys router into it. The kind Chick-fil-a has been managing in-house for years now.

Currently, the login redirector is returning this:

ERROR
The requested URL could not be retrieved

While trying to retrieve the URL: http://localhost/ffp_redirect.adp?

The following error was encountered:

    * Connection to 127.0.0.1 Failed 

The system returned:

    (111) Connection refused

The remote host or network may be down. Please try the request again.

Your cache administrator is webmaster@wayport.net.
Generated Sun, 02 May 2010 20:56:13 GMT by nmd.mcd14552.aus.wayport.net (squid/2.7.STABLE3)

The fun part is that my laptop does in fact have a webserver on 127.0.0.1, serving up a copy of my website. Last night I taught it that "aboriginal.localhost" should serve up that project's directory, because that's getting its own subdomain as part of the rename. So of course the first thing I needed to check was that the fiddling I did to my local config wasn't screwing it up.

But no, that message says that the redirector is trying to access its own localhost, and failing. It's a purely internal configuration error on the part of one of wayport's servers, which is cutting off all access to the internet from this McDonald's.

Dear McDonald's: Why did you partner with somebody incompetent? I'm guessing AT&T was the lowest bidder? (And they outsourced to an even lower one?)

May 1, 2010

I'm in the process of renaming Firmware Linux to Aboriginal Linux (for reasons discussed on the list), and there's lots of little details to get right:

Redo the contents of the web pages to use the new name and point to the new location. I was actually redoing the documentation anyway, so that's mostly done now, except that I'm not always sure what the new location _is_ before I've moved it.
New subdomain: http://aboriginal.impactlinux.com. (Mark actually added that over a week ago.)
Mercurial repo: I was pondering moving it to impactlinux.com/hg/aboriginal but I think I'd rather move it to aboriginal.impactlinux.com/hg. But that involves puzzling out how to redo the rewrite rules. Hmmm... Either way.
Mailing list, move from firmware@ to aboriginal@. Easier to just close the old one and open a new one (and resubscribe everybody), that way the archives don't move.
Redirectors: Make all the old links under impactlinux.com/fwl or impactlinux.com/code/firmware redirect to aboriginal.impactlinux.com. Mark pointed me at this page which says it's possible, but doesn't actually say how to do a wildcard redirect. (I want every page under /blah to go to the corresponding url at the new location. I don't want every page to go to the top page, and I don't want to list every single possible URL they might navigate to individual with the corresponding redirect. I want a regex search and replace.)

What did I miss...

April 29, 2010

Wow, Gentoo Embedded has collapsed since the last time I paid attention to it. There used to be info there about building under uClibc, and a repository that listed the packages that built under uClibc. Now it's pretty much exclusively about cross compiling.

Some of the old information (from 2005) seems to be cached here. Compare that to the current handbook. Yeah.

April 28, 2010

Attempting to build strace on the current i686-system-image is complaining:

net.c: In function 'printsock':
net.c:976: error: field 'nl' has incomplete type

Almost certainly the uClibc 0.9.31 move, but didn't I test the native builds since then? Or did I not bother since the 64 bit targets aren't working yet...?

Quick blog entry to remind me to look at this tonight...

More stuff wrong with the system-image-i686: Why is df not finding / properly? (Neither busybox nor toybox.) And I need to fix /home mounting so umount doesn't say it's busy. And system-image is rebuilding the kernel every time even when it doesn't need to. (SYSIMAGE_TYPE=ext2.)

April 27, 2010

I'm adding support for a new target to strace, which involves a lot of steps. (Mostly you can look in the changelog and see what people did for other targets. In this case CRIS and AVR32 targets which were added in February 2009 provide a decent starting todo list.)

But they don't cover everything. Especially the minutia of how autoconf is a useless pile of suck.

The first thing you have to do (before you can even run ./configure) is teach config.guess and config.sub about the new target. There's absolutely no excuse for this, because the compiler defines macros for the target it's building for, so this is trivial to probe for at compile time (and thus setup an #ifdef __arm__ staircase to #include the right files or complain if nothing's recognized). But the FSF created autoconf and they've sold their lead-infused snake oil (now with extra arsenic for health!) far and wide. So this crap has to be block copied into each new project.

(FYI, to see all the compiler's predefined macros, do "cc -dM -E - < /dev/null | less".)

Note that config.guess doesn't really know anything about the target, you could pretty much replace the entire script with echo "$(uname -m)-unknown-linux" and that would be good for 90% of the cases (modulo ARM eabi and similar, which again there are compiler macros for).

In config.sub you have to add your target's uname -m output to the big long case statement checking "$basic_machine" against 8 gazillion possible names it might be, and complaining if it's not in the list. Again, this file could be replaced by:

echo "$1" | sed 's/-linux$/linux-gnu/' # dammit

Again, it just checks that it's recognized and fails if it isn't. It doesn't actually have (or need) any real knowledge about the target, it's a bouncer. "Is this guy on the list to get into our exclusive club?"

It's sad that the code that exists entirely for the purpose of build portability merely adds work when adding a new target. Self-defeat, thy name is FSF. Sigh.

April 24, 2010

Still sick. So tired of being sick.

On the bright side, I sound less like a muppet than I did yesterday. (When I _could_ speak, that is. After three or four sentences it was stage whisper time. By the way, having anything in your throat swell up to the point where it's noticeably restricting airflow is REALLY CREEPY. Actually having it feel like there's a golf ball stuck halfway down isn't that pleasant when you DON'T have to hold your head in certain positions to breathe easily. Oh, and the throat numbing stuff the doctor prescribed made it so I couldn't swallow without choking. Not really a win, there.)

Still hurts to swallow. This means I eat things like ice cream and stuff that gives me lots of food in a small space (ala the KFC Double Bypass and cookies). Probably not a net loss of calories (although nutrition's likely out the window), but I am fairly constantly dehydrated. Yup, hurts to swallow beverages too.

I have not really enjoyed this week. Seems like I'm on the downslope of this thing, but it's taking a while...

April 20, 2010

So apparently I'm sick with a virus going around. It got so bad when I went out to lunch I just didn't look forward to eating and instead went to the doctor, who said it's a virus going around and it's made one of my eustachian tubes swell up which is pretty much screwing up the entire right side of my head. (My ear, eye, and neck all hurt.) Took the second half of the day off from work and slept a lot.

This was after the epic struggle to find my car in the hospital parking lot, the epic struggle to get and eat food, and the epic struggle to go to the pharmacy and get my prescription for throat numbing gargle stuff so I had a chance to sleep. (When swallowing or turning your neck hurts, and you're dizzy on one side of your head, everything becomes an epic struggle. Texas sunlight in the early afternoon did not help this.)

Got a couple hours of sleep, now awake again. Sort of unhappy about it.

Lunch was the new KFC Double Bypass, the bacon sandwich using slabs of fried chicken as bread. The theory was "can't swallow, vaguely nauseous, but should eat _something_ since I skipped breakfast and it's now coming up on 2pm. Why not try that thing that made Howard Tayler do this and let curiosity overcome pain?") It was good, but I hope to try it again when swallowing isn't so amazingly painful. (I was taking mouthfuls of beverage and contemplating them for upwards of ten seconds before swallowing.)

A bacon sandwich made entirely out of meat is one indicator that I'm living in the future now. Another is the way my phone can take voice input for "Where is the travis county tax office in Austin" and pull up Google maps with the address. (Very useful while driving and unable to pay attention to the phone. The buttons are still too small and fiddly, though.)

My laptop's wireless internet died today. Inexplicably. I think it's an issue with the binary-only iwl3945 driver losing its marbles, but it's survived a reboot? Weird.

The symptom is I can't access any websites, and dmesg instantly fills up with spam:

[  662.366163] IN= OUT=wlan0 SRC=192.168.2.14 DST=192.168.2.1 LEN=63 TOS=0x00 PREC=0x00 TTL=64 ID=25055 DF PROTO=UDP SPT=59532 DPT=53 LEN=43 
[  662.366187] IN= OUT=wlan0 SRC=192.168.2.14 DST=209.18.47.61 LEN=63 TOS=0x00 PREC=0x00 TTL=64 ID=25055 DF PROTO=UDP SPT=45601 DPT=53 LEN=43 
[  662.366212] IN= OUT=wlan0 SRC=192.168.2.14 DST=209.18.47.62 LEN=63 TOS=0x00 PREC=0x00 TTL=64 ID=25055 DF PROTO=UDP SPT=43078 DPT=53 LEN=43

I can rmmod and insmod the iwl3945 driver, and it seems to be happy when I do that. It can scan for available networks and even associate with them, so apparently the physical hardware is working. I even get a list of nameservers in /etc/resolv.conf so it's passing data. But it just won't route packets at the IP level.

An extended bout of head scratching resulted in finding this:

$ sudo iptables-save
# Generated by iptables-save v1.4.1.1 on Tue Apr 20 20:50:37 2010
*mangle
:PREROUTING ACCEPT [3103:296971]
:INPUT ACCEPT [2885:226968]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [6168:1749459]
:POSTROUTING ACCEPT [2837:220813]
COMMIT
# Completed on Tue Apr 20 20:50:37 2010
# Generated by iptables-save v1.4.1.1 on Tue Apr 20 20:50:37 2010
*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT DROP [0:0]
-A INPUT -i lo -j ACCEPT 
-A INPUT -s 127.0.0.0/8 -i ! lo -j LOG 
-A INPUT -s 127.0.0.0/8 -i ! lo -j DROP 
-A INPUT -d 224.0.0.1/32 -j DROP 
-A INPUT -j LOG 
-A INPUT -j DROP 
-A FORWARD -d 224.0.0.1/32 -j DROP 
-A FORWARD -j LOG 
-A FORWARD -j DROP 
-A OUTPUT -o lo -j ACCEPT 
-A OUTPUT -d 224.0.0.1/32 -j DROP 
-A OUTPUT -j LOG 
-A OUTPUT -j DROP 
COMMIT
# Completed on Tue Apr 20 20:50:37 2010
# Generated by iptables-save v1.4.1.1 on Tue Apr 20 20:50:37 2010
*nat
:PREROUTING ACCEPT [266:76158]
:POSTROUTING ACCEPT [1392:86974]
:OUTPUT ACCEPT [4723:1615620]
COMMIT
# Completed on Tue Apr 20 20:50:37 2010

I have no idea how that crap got in there. (I certainly didn't put it in there.) I have no idea why it's saving it between reboots. (Other than some misguided attempt by Ubuntu to save iptables state between reboots because obviously laptop == server.) But RIPPING IT OUT gave me twitter back. Figuring OUT how to rip it out (or in fact what was wrong) took most of an hour.

Yet another reason why Linux is just a roaring success among nontechnical end-users. I have NO idea what triggered that. I haven't manually messed with the network stack in months (until trying to debug this), and the only thing I've installed on my laptop in a week is aranym (the m68k emulator, from the Ubuntu repository).

April 19, 2010

Starting a new job today, 6 month contract at Qualcomm. Looks like fun.

Sore throat prevented me from getting much sleep last night, dunno if it's allergies or if I picked up something in Chicago and/or San Francisco. (Tried sleeping downstairs for a bit but it didn't help. I lie down, I get a sore throat.)

April 18, 2010

I have adopted a new batch of techology. Expect complaints for a while. You have been warned.

Apparently, in order to get the "internet for my laptop via bluetooth" functionality back, or in order to install busybox on this thing, I have to crack my own iPhone. Even though it's an "unlocked developer model". Wheee... (That goes on the todo list for a while.)

Missed three phone calls so far today. I'm not sure my nexus is actually ringing. (I can phone out, but having the volume all the way up gives me a little beep I can barely hear standing on the sidewalk of a busy street when it's _not_ in its case in my pocket.)

I also confused the "sms" function with the "google talk" function for a while. (The green icon that says "messaging" is sms, the "talk" one is google's instant messaging.) Mark eventually corrected me. I want to delete that darn icon, but can't figure out how. (Hopefully the bill won't be too high. I've tried telling t-mobile to remove my account's ability to send and receive SMS, but it keeps respawning because it's such a huge profit center for them. It costs them nothing, but they get to charge per tweet. Yes, in 2010. Including when I _receive_ sms spam. Sad, isn't it?)

Ok, I think I've officially run out of patience with Thunderbird.

I told it not to download email unless I press the button. (Do not download mail on startup, do not download mail periodically.) It did it anyway, automatically behind my back, and popped up a large notification window (which I found out how to switch off, but I don't want it to DO that). I don't know how to make it stop.

I have to use the mouse to switch focus between the message list and the message window to switch messages _and_ scroll up and down to read a long message. In kmail, I didn't have to take my hands off the keyboard, cursor up and down scrolled the message, left and right switched messages. (I tried page up and page down and similar, couldn't find a way to switch messages with the message window selected.)

The message display has both a "From" field (useful) and a "Sender" field (which contains entries like qemu-devel-bounces+rob=landley.net@nongnu.org), which is eating significant screen real estate for no reason. So I go in and try to find out how to configure the headers display... There isn't one. Thinking maybe I could switch just that field off, I wandered into the advanced config menu (the mozilla about:config since this mail client was a project tumor forking off of Mozilla), and searched for "sender". It found Preference Name mailnews.headers.showSender, Status default, Type boolean, Value false... It's ALREADY OFF. But it's displaying. Brilliant.

The directory it keeps the inbox files in has a space in it (~/.mozilla-thunderbird/zuhli4ke.default/Mail/Local Folders/Inbox) by default. This is a Windows program. In order to copy stuff out back to kmail, I had to know the "find -print0 | xargs -0" trick.

Time to look at other email clients, but for the moment back to Konqueror. It's at least a pile of _known_ bugs, and teaching it to use pop+ssl against the gmail servers is a lot easier than teaching Thunderbird to give me a usable interface...

April 17, 2010

Headed over to Mark's and we got my Sim card moved into the Nexus, and so he could show me how to do podcasts using Keynote. (I may have to borrow Fade's monolith for a few evenings.)

The Nexus makes Apple's device look open. It was impossible to configure the Nexus unless I had not only had a google account, but it was using gmail to handle my mail. It just got CONFUSED otherwise.

I broke down and let Mark switch my email over to gmail, since my Luddite tendencies were wearing a bit thin on him, it seems. (I prefer to learn things so thouroughly I can implement them, or not use them at all. The in-between state means I break stuff and can't fix it. This makes me somewhat resistant to picking up new tools when I can do the same thing with the ones I've already got. I _enjoy_ learning things like LUA, but I need to devote large blocks of time to exploring them very, very thoroughly to feel comfortable with them.)

That said, this was an excuse to finally get off of Kmail, which is bolted to KDE and currently trailing wires and sharp jagged bits of metal back towards KDE whie running under XFCE. It boots all sorts of weird servers in the background. It pops up strange errors because services it expects aren't running. It goes into CPU-eating mode where it I can type two sentences ahead of the composer window and then go get a beverage while it updates about one character per second... I'm familiar with it, but it's NOT optimal. They chose to bolt it to KDE, and thus it can go down with that ship. Konqeror at least got chopped ot by Apple and polished into Webkit, which means that by using Chrome I'm once again using a descendant of Konqueror.

Now if only Chrome let me do something similar to "killall npviewer.bin" so that when I suspend and resume and the flash plugin lost its marbles (and its timing information, and its attachment to the sound card) I could restart just the flash plugin without having to restart the whole darn browser. You know the Scott McCloud thing about how each tab is independent and closing a tab and reopening it is the same as closing the whole program and reopening it? When it comes to the flash plugin, that's a total lie.

(Yes, I break everything. Including stuff I wrote.)

April 16, 2010

Fighting the uClibc build. The fcntl64 thing seems to be affecting all 64 bit targets (or at least mips64 and x86_64, which has nothing else in common I can see).

Caught up on the busybox and uClibc mailing lists, and made a bug tracker account to poke at the bug I'm hitting. Also emailed the xylinx toolchain guy, and poked at FWL documentation a bit.

Came up with a potential new name for FWL: Aboriginal Linux.

"Firmware Linux" is a bad name both because A) it's misleading (it doesn't really mean what it says), B) it's Google space is polluted (17 million hits; we're the first hit but the other 9 on the first page have nothing to do with us).

Alas, Impact Linux wasn't any better (27 million hits). And really all it's got going for it is a bad pun. (It's how you get embedded.)

I want something that says "native compiling", but "Native Linux" has 8.2 million Google hits. (And lots of thesaurus-suggested synonyms are equally useless: Fundamental Linux is 2.3 million, Essential Linux is 8.2 million, Primary Linux is 27 million, Original Linux is 28 million...)

Aboriginal is from the Latin "Ab Origine" (from the beginning) and more or less means native:

Aboriginal
1: being the first or earliest known of its kind present in a region <aboriginal forests> <aboriginal rocks<
2: original or earliest known; native; indigenous.

It also implies that later developments displace it (that connotation is how it differs from "native" or "indigenous"), which is accurate for this build system's intended purpose: we give you a build environment that you can just as easily replace via native compiling as by building upon it.

But it's an obscure enough word that putting it together with Linux comes to only 122k hits. Meaning I might be able to set up a Google alert for other mentions of "Aboriginal Linux" and possibly get something interesting instead of a constant stream of noise.

A similarly obscure synonym is "Indigenous Linux", but that doesn't have the "displaced" connotation (Aboriginal's definition is all about what used to be there, Indigenous implies it still is). Indigenous is also far easier to misspell (and thus harder to Google for: 158k for "Indigenous Linux", 23k hits for "Indiginous Linux", 24k for "Indiginus Linux", and strangely if you google for "Indigenus Linux" you get 1.7 million hits, but only 8k when you add "-indigenous" to that).

Either way, if I'm going to rename the project I should do so on or before the 1.0 release...

April 15, 2010

In the LAX airport, on the way back to Austin.

Quote from twitter:

I changed from an amateur to a professional... which is to write even when you don't want to, don't much like what you are writing, and aren't writing particularly well. -- Agathata Christie

The failure mode of Chrome when dealing with too many tabs is to thin them to spikes and then draw them off the right side of the window. (No scroll bars.) Oddly, when they're thin enough you can't quite select 'em all, and some of the ones you can select highlight while your cursor is over a different tab...

April 14, 2010

The highlights of today were "I gave my talk, decided to do podcasts, met lots of people, and got a free unlocked Nexus phone".

Bumped into Michael Opdenacker last night, who apologized for never responding to my email. He's the guy who does educational stuff for Linux via free-electrons.org, I was hoping the "native development with QEM" thing as a training session through him, but it didn't happen. Oh well.

I was disappointed by my talk. It went ok, but early on I let myself get distracted by a guy essentially heckling my "cross compiling is hard" section right at the beginning. (He literally said "hire me to do it", since I was apparently so incompetent at it.)

I got distracted into explaining that obviously it could be _done_, but so could Cobol, Windows 3.1, and scaling Perl up to a million line codebase. That didn't make it a good idea, and what I was trying to offer was a better alternative, and that this whole section was just there to give you a frame of reference for the main body of the talk, which was about that alternative approach.

Alas, the above is what I _should_ have said, and in reality what I did was go into more detail about the downsides of cross compiling, going through a maybe a third my slides on the topic. That meant I only got maybe one slide done per minute for the first half of my talk, and when you're less than 30 slides into a 260 slide deck at the halfway mark... Yeah.

People came up to say they enjoyed it, but this is the third or fourth time I've given talks based on a random subset of this material, and each time there was more I didn't cover. The only time I really gave a good "here's an overview, now go read the slides in detail" that actually introduced each major topic section was Ohio LinuxFest, the very first public presentation of the material (with Mark).

The main problem is the slides were for the 8 hour training session I didn't get to do via Michael. Cutting that down to 1/10th the time to fit in a 50 minute panel session is _hard_.

What I think I need to do is borrow Mark's video camera, take each individual topic section, and do a separate podcast on it. Have them be 5, 10, 15 minutes long each (however long each topic takes to do properly), and then post them individually to impactlinux.com.

Right before my talk, Tim Bird told me that it was _important_ I finish on time (sigh, ok) and that there were REASONS I wanted to attend Chris Dibona's keynote. My first guess was "He's going to pull an Oprah and give every member of the audience a Nexus, isn't he? Possibly sponsored by the Linux Foundation, which has already spent more money than that would take _feeding_ us all at this conference?" But I didn't want to suggest it to the people attending my panel because I didn't want them to be disappointed if I was wrong. I just passed on "Tim Bird tells me there are REASONS you want to attend the next keynote". (I may have made finger quotes around the word "REASONS".)

Turns out, I was right. I am now the proud owner of an unlocked Nexus, and so is Denys Vlasenko (the busybox maintainer, who I hung out with most of the conference). I suppose I need to figure out how to program for this sucker now. (Ok, the existing armv6l root filesystem would probably work fine as-is as a chroot, the question is how to get it on there, launch it, and pass I/O to it. Does unlocked mean there's an xterm or similar?)

(The ironic part is I start a contract at Qualcomm on monday, and they might have given me one anyway since their chip is in it. Dunno. But this one, I get to play with on my terms...)

April 13, 2010

Not impressed with this morning's keynote. I don't see how this talk differs from the same talk given 12 years ago. The same things are coming Real Soon Now (tm).

Back then Java was going to abstract away the need for a system to run the Java on. It didn't happen. Now your cell phone will somehow become a commodity app delivery platform running generic portable apps. Presumably people will stop trying to "differentiate" (I.E. corner at least part of the market), after that irrelevant iPhone is forgotten by history.

Also, "the cloud" will deliver content. I don't understand "the cloud". It seems to be the new name for "the web". This talk used iTunes as an example of a cloud app, meaning Napster was a cloud app circa 1999.

Part of this "cloud" enthusiasm sounded like Larry Ellison's network computing push from the late 90's, where you'll log into the same desktop from anywhere and the machine you're currently using would be irrelevant. Except you still need a machine, and you either own it or you don't. We had the technology to do this 20 years ago (from nfs to vnc, I used it at Rutgers in 1993), but what people seem to _want_ their machines to be special. I use my data on my machine. If I lose my phone/laptop/pc, that's bad. Your phone/laptop/pc is not a substitute for my phone/laptop/pc, even if I have everything backed up religiously.

I suppose one counter-argument is gmail, which exists on the web. But "if the net is down, I can't read my email" was already true, so "if the net is down, I can't read my old email" wasn't a big stretch.

Eh, I'm just not getting "the cloud".

April 12, 2010

Rather paniced fifteen minutes this morning when I woke up thinking I'd slept through my alarm call, until I remembered the existence of time zones. (My watch is still on Texas time. My phone auto-adjusts.)

First day of CELF was a blast. Voice burnt out. My presentation is in the last slot of the last day. Voice condition could be an issue.

Got to hang out with Denys Vlasenko. His name is pronounced "Dennis". Good to know. David Mandala's here too, and Matt Mackall. Got to pester Grant Likely in depth about ARM device trees. It's unfortunately looking like I'll have to write part of what I want (teaching QEMU's to emulate a board based on a device tree), because he's not doing it (he's teaching QEMU to generate a device tree based on the hardwired initializations the various boards-coded-in-C are already doing), and the guy Ubuntu hired is mostly focusing on the kernel side of things and real hardware.

Back at the hotel around 9:30 because of the 4 hours of sleep thing last night. Grabbed the ice bucket, went out to fill it. They say the ice machine's on the 5th floor. Up there is the "UCSF sleep center", and a sign saying whoever should be ther is visiting with another patient, wait here, someone will be along to escort you to your room shortly. The plastic bucket rest of the ice machine is broken (missing) so I have to hold the bucket under the ice dispenser. The metal ice bucket rings like a gong as it is filled, echoing down the corridor. This seems like poor planning on somebody's part.

All I can think of in my sleep-deprived state is the Weird Al song (Everything You Know Is Wrong) with the line about getting the room next to the noisy ice machine. It hadn't seemed like a serious issue before...

Back in the room, turning on the TV, switch down three channels. There's a windows blue screen on channel 35 "VISATEL". (No, not Vistatel, although there was a doubletake there.) KMODE_EXCEPTION_NOT_HANDLED in ntoskrnl.exe. If rebooting doesn't fix it, make sure you haven't run out of disk space, check for driver updates, try changing video adapters, check with your hardware vendor for bios updates, try booting into safe mode, refer to your getting started manual for more information.

I've never actually had the chance to examine a windows bluescreen message in this much detail. It's _astoundingly_ useless, and sort of passive-aggressive about blaming vendors in hardwired boilerplate without the slightest clue what this particular problem was.

April 11, 2010

Off to CELF! I'm dressed for SF weather (I have my jacket), so natrally I have a layover in Phoenix. Beginning to recognize individual airports. (I've eaten at this Wendy's before...)

Plane out delayed enough that making connecting flight doesn't look like an option. Then it's fixed but overbooked (did they collate two planes?) and they've asked for volunteers to get bumped and are offering enough travel credit that maybe Fade and I could go to Pengicon after all. Took 'em up on it, my new plane is 4 hours later, via Los Angeles and then San Francisco. The LA plane was delayed too, but the layover was a couple hours so plenty of time.

Arrived in Los Angeles, plane to SFO is now delayed too. Delayed by 4 hours. That's kind of epic, by Southwest standards. There are apparently thunderstorms. (Yes, in California.) This airport McDonald's (home of the $8 big mac combo) has cherry pies (not just apple), but the crusts are screwed up? Thought they were big into uniformity? Oh well...

After midnight texas time. It is monday, the first day of CELF. Therefore, I can have caffeine. Starting out easy with just a diet coke. It may help me get to my hotel. Realize I could have brought a tin of Penguin mints, but left it in Fade's office.

Plane got in to SF at 1:30 am. Contemplated late night public transit in a strange city for some time before breaking down and taking a cab.) My watch says it's after 4am, Texas time. Very, very sleepy.

Arrive at Hotel at 2:30 am. Ask for 7:45 am wake up call, that's about 5 hours sleep, maybe?

April 10, 2010

Headed to Starbucks with Fade first thing in the morning to commune with my laptop for a bit, and found out via twitter that Texas LinuxFest was going on today. I hadn't planned on attending, and didn't have _time_ to do so today, but when I saw that David Mandala and Maddog would both be there I had to go.

The organizational meeting I attended last year convinced me the event was unlikely to happen, but they managed to pull it together anyway, and it was small but good. I missed the morning session on Linux on PowerPC, but caught David's Linux on Arm talk. (Not much I didn't already know, but it's good to see the progress they've made.) Alas, it meant I didn't get any documentation work in today.

At least I finally managed to finish the backup script for Terminal B, late in the evening while doing laundry so I have something to pack for the trip...

April 9, 2010

So a week or two back Eric Raymond hit an endianness bug in his gpsd project, and poked me for a big endian virtual development environment he could text a fix in. I tried to walk him through using FWL to set one up, but it turns out my documentation has bit-rotted horribly.

We're going through and fixing up the documentation now. This week is horribly overscheduled (I need to finish the backup script for terminal B and pack for my trip to CELF on Sunday), but I'm carving out some time to work on it.

I'm amused that Eric told me "you've built an F-16 and you're describing it as 'a way to go up in the air'."

Once again, this would be so much easier if this wasn't my week detoxing from caffeine. (Supposedly it takes about two weeks to get it out of your system. I have one. Presumably, this was the week I wouldn't be busy. And so, I've been busy...)

April 8, 2010

The central problem in the health insurance mess is denying people for preexisting conditions (and using that to find excuses to drop people when they do get sick). But the reason health insurance companies originally did that is otherwise people don't bother to get health insurance until they get sick. This is called the "free rider" problem. The preexisting conditions exclusion was introduced to address the free rider problem.

The Democrats' plan solves the free rider problem by requiring everybody to by insurance ("mandates"), which lets it take away the preexisting condition workaround without reintroducing the free rider problem.

The Republic party has insisted they're going to re-fight waterloo until they've retroacively repealed Obama's health care victory... but they're going to keep the part about covering preexisting conditions. But without the individual mandates. They're going to yank one (bad) solution to the free rider problem, and what do they plan to replace it with?

It turns out, they literally have no answer to that question.

The Democrats' answer to the free rider problem is an ugly solution, but not as ugly as the mess preexisting conditions have turned into over the past few decades. The preexisting condition exclusion is hack which has been badly abused over the years by greedy profit-hungry corporations to royally screw over their customers. If you DO get a chronic condition while uninsured, you're uninsurable for the rest of your life. (And not just an actually chronic problem, but something like leukemia where even if they cure it the potential for recurrence can make your premiums over a million dollars a year. Yes really.) And the theat of any problem you _do_ develop being categorized a "preexisting condition" (and thus your coverage dropped even though you paid the premiums) prevents people from shopping around for better insurance, because they're afraid of losing the history of continuous coverage that is their only defense against being dropped if they get sick. But if you can't shop around, the "free market" is useless. Plus, any slight imperfection in your history of coverage (such as an improperly filled out multi-page form years ago) can be used by greedy insurance companies to attack your history of coverage by dropping you just long enough to turn your newly manifested expensive disease into a "preexisting condition" when you try to reinstate your coverage. At the very least you wind up in court proving it's not, and the insurance companies have lots of lawyers and time on their side as you get sick and can't afford treatment.)

So the preexisting condition exclusion has to go, but there was a _reason_ for it in the first place: solving the free rider problem. Mandating everybody buy insurance is another way to solve the free rider problem, allowing preexisting condition exclusions to be retired.

Once again the Republic party has no plan to address a pressing problem, they just object to the people who do. If their offices were burning down, they'd scream at the fire department for causing water damage and opening doors with axes. The closest they come to dealing with anything is throwing borrowed money at the problem. (Regan gave us our national debt, the first Bush spent more money in 4 years than Regan managed to in 8, and then duh-byuh racked up more debt than either managed. In between, Clinton balanced the budget and paid down the debt under _both_ Democratic and Republic congresses.)

P.S. If you wonder why I'm referring to the Democratic and Republic parties: the Democratic party has always been called "Democratic", as an adjective. Various GOP talking heads have recently started insisting they're the "Democrat" party, but that would logically make them the GOP the "Republic" party (not an adjective either). On the theory that each party gets to name itself, that gives us the Democratic party and the Republic party.

(And for those who like lots of charts and graphs...)

April 6, 2010

Somebody's noticed that Java is the modern Cobol.

The loss of momentum of Java started in 1998, when the oft-quoted 212% growth of Linux came from all the Java developers switching over when Netscape released its source code and simultaneously elevated Linux to a Tier 1 platform (with Mac and Windows). Back when Netscape was _the_ browser it was the pied piper of Java, leading everybody everybody into it and then leading everybody back out again to Linux.

Netscape introduced the world to Java when it added Java 1.0 support to its browser, and nobody moved to Java 1.1 until Netscape supported it. Then Netscape introduced the world to Linux, and explained that "write once run everywhere" was actually more effectively provided by Open Source than by binary-only bytecode on a closed source runtime you couldn't debug (hence "write once, debug everywhere" as Java's motto was commonly rendered by developers). From a technical perspective, open source provided everything Java ever promised, and more. Since Java needs an OS to run it on anyway, the Java developers moved to Linux en masse and waited for Sun to catch up.

Amazingly, this caught Sun by surprise. When the #1 bug on the Java Developers Connection became "There's no JDK for Linux", it stayed at the #1 position (with 5 times as many votes as any other bug) for an entire year, because Sun refused to officially respond to it. They eventually made the problem go away not by addressing the issue (even to officially say "no, we're not going to support it"), but by redesigning their website so that when you logged into developer.java.sun.com you no longer saw the top 10 bugs.

This alienated Java's developer base in a big way. The "anything but microsoft" crowd already had a massive persecution complex, having fled their individual platforms (from the Amiga to OS/2) and uniting behind Java after Windows 95 finally made their technical superiority arguments irrelevant. Windows 95 was the first version just barely good enough to actually use without wanting to kill somebody. Windows 3.1 crashed multiple times per day; it would literally crash left sitting there doing nothing overnight. It was easy for OS/2 users to attract interest simply by providing a platform that DIDN'T DO THAT. But once Windows 95 came out, being technically better wasn't enough to get any new users, because the network effects of larger market share meant new software was written for the platform most users were on and new users bought what most software ran on, and then DIDN'T LEAVE anymore. The _developers_ hated Windows, but the users didn't care. Java offered a refuge for developers who didn't want to program specifically for windows, but still wanted their software to be used. Hence the swarm of developers flooding _into_ java in 1996 and 1997, fleeing their doomed also-ran platforms crushed by a Windows that actually sort of worked-ish.

When the Java developers saw Sun deliberately excluding a platform (because Linux outperformed Solaris on Sparc Hardware and Sun felt threatened by that), those java developers went "what if Sun starts to feel threatened by my platform"? They realized Sun didn't want to destroy Microsoft's monopoly, it wanted to capture it intact. They were fleeing the great Black Widow Microsoft, but now it looked like "out of the frying pan, into the fire".

In the late 90's the only Java available for Linux was a development effort called Blackdown, which had licensed the Java code from sun, ported it themselves, and put binaries on their website for download. (They couldn't release source but they could give away binaries a bit like the flash plugin today.)

Sun completely screwed blackdown over when it belatedly decided to support Linux, and although the mushroom cloud of flame-mail the descended upon Sun made it realize its mistake and apologize, it was too late. The ex-Java developers saw Sun as the enemy, a clueless pointy-haired Microsoft wannabe. These developers had already been writing native Linux code to pass the time while they waited for Java support to come to them. (Or learning scripting languages like Python which were still reasonably portable, by being inherently open source.) The blackdown debacle made lost of them stop paying attention to Java at all, writing it off as Sun's proprietary language (ala Visual Basic). Sun spending years resisting standardization didn't help matters any.

How Java wound up as the new Cobol is due to two things: IBM and Y2K. IBM had dozens of platforms (several different mainframe variations, powerpc workstations, x86 PCs, and more) to juggle, and when Java came along it seemed like a godsend. IBM converted to Java as a religion in 1998.

IBM similarly embraced Linux in 1999 after the free DB2 beta for Linux IBM whipped up for educational institutions, in response to Oracle's Linux version, yielded support queries from several of its largest customers, including more than one bank. But IBM didn't _stop_ using Java everywhere, Big Blue's plans have something like 5 years of latency to really get going, they were still gearing up to deploy Java everywhere when Y2K hit.

The Year 2000 bug was dealt with the way everything else is: wait until it becomes a crisis and then throw money at the problem in a last-minute panic. And suddenly 50 years worth of accumulated Cobol code that was sacred and untouchable because it Just Worked was now open for change, and it was often easier to rewrite it than find Cobol programmers. The language du jour was Java; IBM was pushing it and HP and such were copying them. So millions of lines of Cobol got rewritten in Java during Y2K, for things like payroll systems that would never be changed again once they were working.

The geeks in charge didn't handle Y2K very well politically. Politically, you want to let the consequences hit home, _then_ fix it, otherwise they'll start thinking the problem wasn't really that important because nothing hapened. These days Y2K is written off as a big worry over nothing, but at the time it was a big deal and the reason it wasn't a disaster is because the IT world dropped everything for a year to fix it all before bad things had a chance to happen.

But Y2K and IBM replaced Cobol with Java as the language of pointy-haired business software, and Sun's epic mishandling of Linux sunk Java's chances with hobbyist programmers, and natural momentum has done the rest.

April 5, 2010

Day two of caffeine detox. Many naps. Totally unenthused about doing anything. (I only have a week of downtime right now, so it's not really a thorough detox. But since I was up to two energy drinks a day _plus_ a constant stream of tea and soda, just resetting the levels a bit would be an improvement. Then I can start up again when I head out to CELF and be reasonably perky.)

Car broke down again yesterday. Radiator ran out of water, looks like the part I got replaced tore open again. Fixed the symptom but not the cause.

Biked up to the car place this morning to give 'em the keys so they can diagnose the thing. (They were closed when the tow truck dropped it off yesterday.) Bike broke down four times on the way there and the way back (chain coming off), took it to the bike shop where they fixed it for $5. Car isn't gonna be that cheap.

Mips64 is horked, beyond the distcc segfault. For one thing, "stat" is giving crazy data. When I do an ls -l it says every file was created on "Janary 0, 1900" (which is a strange failure mode because 0 in unix time is midnight January 1, 1970).

The "stat" function in uClibc is returning times like 0x4bb9956800000000, which is the correct result shifted 32 bits to the left. It's also claiming that the block size is 0xff (when it's actually 0x400), and that the number of blocks used is 0...

None of which explains why distcc was segfaulting, just why trying to build distcc natively got confused. (Make is not happy when every file has the same date and it can't be changed.)

Hmmm... Maybe I should try the uClibc 0.9.31 that claims to have been released three days again but which still has no mention on the main page of the uClibc website...

(Yet another 4 hour nap later. I miss caffeine.)

So, I had to delete two uClibc patches, adjust two others, and the fifth one mysteriously still applied. That's actually the one I'm worred about. (So uClibc still hasn't got futimes?)

Alright, build it:

And the link dies with lots of:

libbb/lib.a(xfuncs.o): In function `close_on_exec_on':
xfuncs.c:(.text.close_on_exec_on+0xc): undefined reference to `fcntl64'

Sigh. This might have something with libc/sysdeps/linux/common/__syscall_fcntl64.c having a "libc_hidden_def(fcntl64)" in it. What does "hidden" mean in this context? This is one of the library's exports. It's SUPPOSED to export this symbol...

I'm trying to figure out if this would be easier if I wasn't detoxing on caffeine, or if I'd just be more verbosely and energetically frustrated.

Heard back about the car. This time the radiator proper tore open, but luckily they can replace it for about $500. Could be worse. I am at the point where car repair bills and monthly car payment for a new one are approaching parity. The main advantage of a new one would be avoiding unscheduled breakdowns.

April 3, 2010

So I've known since before the release that mips64's native toolchain was horked, but I didn't hold up the release fixing it (since that architecture wasn't _in_ the previous release, so it's not a regression).

The problem is that the compiler segfaults immediately as soon as it's run. I just wasted half an hour adding various print statements to ccwrap trying to figure out where it was segfaulting, because if I call it as "/usr/bin/cc" it doesn't segfault, but if I call "cc" out of the $PATH it does, so obviously it's the find_in_path() logic, right?

Except that it's not. The reason it doesn't segfault if I call /usr/bin/cc is because it's segfaulting before it ever _gets_ there. It's segfaulting in the distcc wrapper. It's not _my_ code that's randomly failing when I go to a new architecture.

Hadn't even considered that. Sigh.

Trying to figure out if I should dig into the guts of distcc and figure out how to fix this, or if I should just take this as the excuse to teach ccwrap how to distribute builds and ditch distcc altogether.

The reason I want to ditch distcc is it falls back to building locally when it shouldn't, it can't always break things like "gcc hello.c" down into separate compile and link phases, it won't always tell me when it's distributing and when it's silently building locally, and it's WAY WAY WAY more complicated than it actually needs to be.

On the other hand, it's been used (and thus debugged) by over 10,000 users, and probably used to build every package in the Debian repository.

Then again, the same argument could be made about ccwrap itself. If there's bits of the gcc command line I'm not parsing right, the way to find them is to build an entire distro repository, all 8 gazillion packages. Which is what securitybreach is for. :)

April 2, 2010

Fade and I took an early anniversary trip to Houston. Largely going to Martini's Empanada House for every meal (plus getting to-go empanadas when we go home). There was a museum too, which was ok except for the "Oil is just so awesome!" exhibit, which was literally sponsored by Enron. (There was a Ken Lay foundation at one point. Really.) Fade's in-town friends were sick, and we didn't really have plans beyond spending some time away from the cats in a place we wouldn't have to clean. (Yes, we drove to Houston for the Empanadas.)

We also saw "How to Train Your Dragon", which was the same kind of excellently done as "Cloudy with a Chance of Meatballs". It was funny, engaging, reasonably inspired, with a coherent plot. It didn't redefine what's possible with the medium the way "Wall-E" did, but it wasn't the paint-by-the-numbers kind of entertainment Monsters vs Aliens provided either. Worth a second watch, but probably not the kind of thing you'll still be talking about years later.

I've largely used it as an excuse not to fire up my laptop much. I'm reading Rosemary and Rue (by Seanan McGuire, the woman who did the excellent Flowers for Barry Ween is now a professionally published novelist. I'm waiting for the Velveteen stories to come out in book for some I can buy 'em.

March 31, 2010

Finished the memory management documentation I've been blocked on for the past couple weeks. 700 lines, over 5000 words.

I want to go do something else for a while.

March 29, 2010

FWL 0.9.11 is out. It's uploading, very slowly. (I found the --bwlimit option to rsync, now I don't have to bog down Mark's cable modem uploading stuff from securitybreach.)

Spending the day banging on memory docs, which I may or may not get to release publically. I've actually been poking at this on and off for more than a week, but now I'm making _progress_.

I had writer's block for a week trying to write this up as a narrative, the way a class lecture or textbook would present it. The result was not only way too verbose and impenetrable, but had way too many forward references to stuff I hadn't explained yet, which amount of resequencing seemed able to reduce. Then on Friday I figured out that if I present it as a FAQ, it became much more concise and the forward references weren't nearly so bothersome, but at that point I was too burned out to work on it for a while so I spent the weekend banging on FWL instead.

Trying _really_ hard to finish this today. Wanna get it over with...

Also pondering getting a new server, so Mark could shut down securitybreach and maybe ebay it if he didn't want to use it anymore. The 8-way is great from a performance perspective, but Mark's getting a bit sick of the noise and the electricity bill, and I don't have a place to put it over here.

Possibly I could spend about $500 on a mini-ATX case with a 4-way CPU and 8 gigs of ram, something low power with really quiet fans that could fit in the upstairs office without annoying Fade. Possibly I could keep costs down by nationalizing one of the existing terabyte SATA drives Mark has lying around, and maybe even filch some RAM from securitybreach if it's the right kind and doesn't have strange pairing requirements. (It's _got_ 32 gigs of the stuff, I don't think we've ever managed to use half of that.)

March 28, 2010

Got through the FWL buglist to the point where I was ok cutting a release (defering armv4eb, whatever's up with the mips64 native toolchain, sh4 and m68k, and so on)...

And then as I'm uploading the gigabyte and a half of prebuilt binaries and writing up the release noties, I notice uClibc cut a new bugfix release while I wasn't looking. (Yay! Ok, that's 4 patches down. Can't in good conscience cut a release without that.) And then while I'm uploading _those_ binaries, busybox 1.16.1 drops.

Gonna be a long night...

March 27, 2010

Sigh. Is there a QEMU version that works for all targets? Due to qemu-system-ppc being broken in current -git, I backed up to the 0.12.3 release, in which qemu-system-ppc segfaults as soon as it tries to touch hdc. (This is a different segfault than the qemu-ppc one when trying to run static binaries, this is qemu-system-ppc segfaulting.)

I vaguely recall hitting this bug a while ago, and upgrading to a random svn snapshot to work around it. Unfortunately, that fix isn't in a stable release yet, and the current -git (which presumably the next release will be cut from) hasn't got it.

Ah, I can "git checkout origin/stable-0.12" and build that, and it has the fix. So 0.12.4 (when it ships) should theoretically fix this.

Going through David Seikel's bugs, trying to knock out a release this weekend.

March 26, 2010

At The Donalds to finish this darn memory management documentation once and for all.

The McDonalds' free wifi is impressively slow today. The pointless interception loading screen that AT&T does normally takes a minute or two to work, but today it's spent half an hour trying to load. It's currently on its third attempt. The second managed to load about 1k after 10 minutes or so, before hanging. But generally even the DNS redirect is taking 5 minutes... Ah, it gave up:

We're Sorry - we are unable to complete your transaction at this time. For further assistance, please contact Technical Support At 1‑877‑WAYPORT (1‑877‑929‑7678) with the following information:

* ErrorId: 508-6963-66

Phone tree. Can't really _hear_ the phone tree (I am in a McDonalds, I did not choose this location for its acoustics). Press zero a lot, try to get a human...

And they're aware of the issue. Ok, as long as it's fixed for next time.

Such a thrilling, _gripping_ life I lead, no?

Memory management docs! RIGHT... How do you coherently describe virtual memory without simplifying the hell out of it? Virtual vs physical addresses.

And I just rebooted my laptop because the X11 hang happened again. Yeah, that's about the normal frequency for it. I used to twitter every time it happened, but my friends got fed up with me spamming their feeds.

All those people who tell you how stable Linux is and how unstable Windows is? They're sitting on laurels that dried up and crumbled years ago. Chase at Terminal B says that Windows 7 hasn't bluescreened on him once since he installed it. I'm in double digit reboots this month. Except for a couple apps, Red Hat 9 was a better desktop than current Ubuntu.

And those apps? VLC to play video... which uses a boatload of binary-only Windows DLLs behind the scenes to actually _support_ half those formats. And the updated web browser is needed to view all the weird javascript modern web pages do, but it's both a memory and CPU hog, hangs regularly for 30 seconds at a time, and needs a binary-only flash plugin to interact with about 1/4 of the sites I use (and _still_ can't reliably watch The Daily Show now that Hulu's dropped it).

Oh well, at least we're still going strong on headless boxes. (What do servers and most embedded devices have in common? No GUI is expected out of 'em. That's where Linux shines: when the users can't actually tell it's Linux.)

Ahem. Memory documentation. Right.

(I'm grumpy today. I wonder if I'm coming down with Fade's cold?)

Ah! THAT'S what's wrong. This documentation really wants to be a FAQ, and I'm trying to make a narrative out of it like I'm teaching a class. Right. Complete from the top restructuring of all the data, but that organization makes more sense...

I suspect I should just cut the darn FWL release so it stops distracting me, and then get back to polishing the memory documentation. It's not the data (I gave 'em a big data dump in person), it's the editorial organization into something coherent when you can't interact with the author, that's the hard part...

The Netflix disk for the wii showed up! Yay! The phrase "X is streaming through my wii" sounds dirty no atter what X is. (At the moment, "Around the World in 80 Days, with Michael Palin". Yes, Michael Palin is... I'm going to plead Beethoven's fifth here.)

March 25, 2010

I got nothing done on the memory management writeup today, and I feel horrible about it. (It's like a looming homework assignment that's overdue. I've been authorized to spend 8 hours finishing it, of which I've managed about 4 so far. People in California are waiting for this.)

Unfortunately, I've progressed to full blown writers block, which means it's all tangled up in my head and I've got to do something else for a bit to let it settle down. So I'm banging on FWL for a while instead.

I _thought_ the powerpc thing was a kernel issue, but when I bisected the 2.6.32 through 2.6.33 kernels none of 'em worked, so I guess it is qemu. Apparently, the hard drive moved from IRQ 19 to IRQ 16, even though the first serial port is using IRQ 16. The qemu guys say it was an openbios change that did this, which sounds about right. I vaguely recall this having happened before. If I _never_slept_ I'd restart QEMU Weekly News and catch it up to the present, so I'd have some way to look this stuff up rather than blindly googling for things. (The qemu mailing list is on savannah, and the savannah list archive apparently blocks Google from searching it in their robots.txt file. The FSF: managing to get the _easy_ stuff wrong since 1983!)

Tracked down the mips performance regression. The powerpc thing is qemu moving IRQ 19 to IRQ 16 (which is already used by the first serial port) due to an openbios update; 0.12.3 doesn't do that.

Wolfgang Denk got fed up when CROSS_SMOKE_TEST didn't work, and decided that FWL isn't ripe yet. Yeah, it's bit-rotted a bit. Breaking CROSS_SMOKE_TEST out into a separate sources/more/cross-smoke-test.sh and fixing it up.

And David Seikel sent me a bunch of bugs, gotta fix those. And Milton Miller sent one. And

Oh yeah, and Eric Raymond called me as I was out getting Fade some cold medicine at CVS (It's a drugstore! It's a source control system! It's very, very bad at one of these things!) and said he had a GPSD bug that hits big endian systems, and since I do emulation stuff can I set him up with one. Which I proceeded to do over the phone, hitting at least three major landmines along the way. (Sigh.) So this weekend, we try again and he takes extensive notes so he can help me redo the documentation. (Again.)

Yay momentum?

Need to get back to the memory documentation, though...

March 24, 2010

When I was in California at the start of the month I gave a quick presentation on Linux memory management, and I'm trying to turn that into proper documentation they can put in their wiki.

Unfortunately, this isn't as easy as it seems. How much background will the people reading it have? How much detail should I go into? What order do I put the concepts in when you really have to understand all of them to understand the others?

It's not _exactly_ writer's block, but it's darn fiddly. Hmmm... You've got to understand virtual vs physical addresses, page tables, memory mappings, memory management units, translation lookaside buffers and the dreaded TLB flush, handling page faults and using them to implement mmap and swap, page stealing, huge pages, high memory...

This would be easier if I could spend longer than a half hour at a time on it. Phone calls, tax appointment, bank, grocery store...

It's a day.

On a side note, the convenience store on the ground floor of that 21 Rio place has "Monster Chai energy drinks" on sale ($1 each) and they're actually not bad. (Monster's been doing coffee based energy drinks for a while, and apparently now they're branching out into tea.) It's way too sweet, but definitely "good tea" and not strongly carbonated. Possibly I could put milk in one without an explosion or tentacles manifesting or similar. I'll have to get a few to try it.

I'm still an adherent of Rockstar's "Punched" and Orange Mango Passion Fruit flavors, of course. But variety's good...

Fade has a bad cold. I wonder how long until I get it?

March 23, 2010

By the way, when my laptop's display hangs solid and forces me to reboot, if I notice before I've filled up the X event queue it might still listen to ctrl-alt-F1, in which case I can get a text console and see this in dmesg:

[57332.342290] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0

Having the most recent such hang happen right before I gave a presentation was better than it doing so _during_ my presentation, I suppose. Did noticeably shorten my slide prep time for the Prototype and the Fan Club talk, but it's my fault for waiting until the last minute to try to turn outline into slides. (I just presented from the outline.)

Most of the the time, since a dozen seconds of lag processing the characters I type is _normal_ for pigs like firefox and kmail, I fill up the buffer before noticing that it's not a local hang in the one app but the entire desktop that's hung. And then ctrl-alt-F1 doesn't do me any good. (Oh, and either Ubuntu or x.org took out ctrl-alt-backslash. Somebody emailed me a supposed way to re-eneable it, but it didn't work when I tried it...)

If I can switch to a text console and back, X11 resets itself. If I can't, I have to hold down the power button until it hard powers off. (Because of course just pressing it once brings up the popup about "VLC once tried to play some media, and even though it's not running anymore we have three instances of this message queued up and we're going to display all three as an objection vetoing the action the user tried to perform, and yes we do this even when the action is triggered by low battery status." So I have to ctrl-alt-F1 and "sync && echo disk > /sys/power/state". (In theory I can go echo mem > /sys/power/state but for some reason when I moved to 9.04 it stopped debouncing the lid sensor, meaning it wakes itself up in my well-insulated backpack all the time and then stays on until the hardware does an emergency power down due to overheating. Maybe it's a hardware issue with the latch sensor wearing out, but the timing with the move from 8.10 is a heck of a coincidence. Of course suspend to disk horks the console I suspend from until the next reboot, putting it in some weird half-graphics mode where the font is composed entirely of vertical lines, and no switching to another console and back doesn't fix it, nor does "reset", but "cursor up, enter" still works even if I can't see what it's doing on that console. So I devote the first text console to being broken by my workaround for the breakage of software suspend.

Linux on the Desktop: Smell the usability.

March 22, 2010

Hey, the health care thing passed. About flipping time. Maybe they can shut up about it now. (I'm happy it passed. I'm kind of annoyed they wasted an entire year trying to get bipartisian support for something they had to ram through on a strict party line vote anyway. You started with a sure thing, and struggled through to a photo finish a year later, and it still isn't much to write home about.)

This guy has a plan that can be summed up in 3 words ("medicare for all") that would probably do more.

"One way of looking at this might be that for 42 years, I've been making small, regular deposits in this bank of experience: education and training. And on January 15, the balance was sufficient so that I could make a very large withdrawal." - Chesley Sullenberger

March 21, 2010

Airport. 5:30 am. Awake all night at the Flourish after party which adjourned to a nice Slovockian guy's apartment whose name I'm totally blanking on. (It wasn't him but he was there and would know the other guy's name. Update: Nikola.)

Yeah, brain shutting down about now.

Back in Austin now. Massively sleep deprived, on a bus home.

Amused by Neil Gaiman's daughter @hollyherself, who twittered: "I fancied myself a goth for about a fortnight before Dad sat me down and reminded me that it was about as far away from rebellion as possible." (Yes, I have just about enough coherence left to catch up on twitter.)

Flourish was fun. I sleep now.

March 20, 2010

So my attempts at a release before Flourish turned out to be a bit premature. There's a powerpc regression between the 2.6.32 and 2.6.33 kernels I need to bisect, and a qemu performance regression in current qemu-git that hits mips hard enough to trigger the 60 second inactivity timeout in the native dropbear/strace build.

I bisected the qemu thing to:

commit b6964827541afbf9c3aa246737ae3d17aaf19869
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Mar 10 11:38:45 2010 +0100

    extract timer handling out of main_loop_wait

But reverting that patch didn't fix it? Hmmm... Have to re-bisect with a more stringent test...

At the lightning talks. The pleasepirate.org lightning talk was _awesome_. And the www.pumpingstationone.org talk was great too, it's sad that organization isn't in Austin. (And criminal that a group of people who use the phrase "common goal of awesomeness" and who brought a full-scale police box they built to a Linux conference is _not_ already sponsored to come to Penguicon, although as far as I know Penguicon isn't even aware of things like this anymore. Sigh...

Ooh, lock picking lightning talk! (He has a 3 hour version. "Pass that around. There's 54 picks in there. I counted them, you thieving bastards." Yeah, he's another Pumping Station One guy. So was the Please Pirate guy. They seem to be gang-rushing the lightning talks, and it's the best darn event of the entire conference!)

The lockpicking guy gave me lock picks! And he'll be doing a tutorial at notacon in cleveland, maybe I should try to make it to that...

March 18, 2010

Spent the past couple days hanging around with Camine (if Fade wants to be called Fade, Stu's daughter can call herself anything she likes; I'm better at email addresses anyway), who's cat-sitting and wandered by early because she lives up in Leander and doesn't own a car, and Cap Metro's promised light rail linking Leander to downtown was supposed to launch over a year ago and is now supposed to start ANY DAY NOW.

Overdue for a period, there.

(Aside, I note that the reason the citizens of Austin kept voting down funding for light rail over the past decade was the city government insisted on putting Cap Metro in charge of it for nepotism reasons, and when it finally DID pass it turned into an incompetent money pit just as predicted because hey, Cap Metro's in charge of it. Sigh.)

So I was a bit distracted and didn't get a FWL release together until last night. But I'm pretty happy with commit 1003, dialed up securitybreach (the 8-way server at Mark's place) to do a build, and... it's down again.

Sigh. Ran a build overnight on my laptop, but it wasn't done this morning and I had to go catch a bus so I could catch a plane to Chicago to speak at Flourish. Suspended the build so my laptop battery has a chance of surviving the plane flight. I guess I'll try to cut a release from Chicago using hotel or coffee shop wireless. (Some biggish uploads involved, though. Takes over an hour to do at home.)

On the plane now. Woman in row behind me has a pathological need to talk (switched seamlessly from cell phone conversation to bugging seat-mate, 35 minutes without drawing breath and counting)... I have earplugs _and_ headphones.

In chicago. Google maps believes there is a Motel 6 downtown: it is hallucinating according to both going there (that's a bridge, not a building) and Motel 6's 800 number. Took refuge in a McDonalds and then a Barnes and Noble while contemplating, then checked into the local youth hostel. (Hey, they worked in Canada, why not Chicago? It's cold, starts with a C, associated with Mounties or at least Due South...

Very tired. I got maybe 3 hours of sleep last night depending on the definition of sleep. (Did I mention the FWL release was ready-ish around 5am? Ok, showing Camine several dozen youtube anime music videos made from Weird Al songs and such did not actually expedite matters, but still. The important thing is, orange mango passion fruit energy drinks are on sale for $1.50 each at the convenience store down the street. And it wore off at least 3 hours ago...)

Yay hostel. Lots of interesting-seeming people here, but I'm resuming the build, setting what's already built uploading overnight into a temporary directory (so I can collate stuff on the server when it's all ready and put it up at once), and passing out now...

March 17, 2010

Tomorrow I get on a plane to go to Flourish in Chicago, where I'll be giving the Prototype and the Fan Club talk. So far, I believe the only person I've actually given this talk to is Eric Raymond. (Yes, I'm aware that's backwards.) This should be interesting...

I'm also giving the developing for non-x86 platforms using qemu talk again. And I'm on a panel. And I'm trying to get an FWL release out before leaving, but it looks more likely I'll be trying to upload it during the convention. That may also be interesting...

March 14, 2010

Ok, what is my immediate todo stack...

Push toybox infrastructure upstream into busybox.
- Describe the transition, broken down into stages.
- Figure out how to make some variant of '#if THIS=="wc"' work in the C preprocessor (and if it's even feasible).
Firmware Linux release prep, build everything make sure all targets work.
- Why are the native builds going OOM? -- because when you're _not_ using distcc, CPUS isn't set, and thus make -j has no limit. Oops. Add safeguard to set CPUS=1 if no distcc. Actually... need to redesign that whole area slightly...
- Figure out what the deal with qemu-system-sh4 -serial stdio is? -- It needs to be "-nodefaults -nographic -serial null -serial stdio", which is _insane_. Is there any way to make that consistent across architectures...
- Make sure Wolfgang Denk's FWL issues are all resolved.
- Make sure Vladimir Dronnikov's issues with securitybreach are fixed.
FWL cron jobs?
- One project at a time (starting with stable)
- Rotate out old versions
- Bisectinate, ruminate.
- Pretty log viewer
Slides for Flourish this weekend.

Hmmm... I'm trying to refactor the run-emulator.sh script _yet_again_. (This is what, the fifth time? Not happy with it however it looks. I want it to be trivially simple _and_ to do some very complicated things. Yeah, I know.)

So, the current dev-environment.sh sets three environment variables: provide 256 megs of memory, use hdb.img as the hard drive image, and create that hdb as a 2 gigabyte file if it doesn't currently exist. Fairly straightforward.

The run-emulator.sh script does 8 gazillion things, mostly to do with setting up distcc, and then actually running the emulator. I want to break the emulator-running part out into a separate file...

No, the correct thing to do is move distcc into dev-environment.sh. :)

Except there's one other little gotcha, which is that the distcc setup script needs to know $ARCH, because it needs to check for $ARCH-cc. Otherwise, it's completely generic and architecture agnostic. The run script obviously needs to be completely bespoke for each architecture. It would be nice if I could localize the ARCH= knowledge in the other script as well...

Ah, source it. Put the run_emulator() invocation in a function, and have some kind of flag to tell it not to actually call the function, but have it call the function when run by itself...

Right...

March 13, 2010

Today's twitter breakage.

I suppose I should just give up on the web interface and get a twitter program. That way I can ignore their hideous attempts to constantly "improve" their interface.

Wolfgang Denk has been trying out Firmware Linux. Of course a half-dozen things have broken for him. (His environment has CROSS_COMPILE and ARCH set by default, and somehow CDPATH managed to mess up tar. His native toolchain doesn't have libc.a and can't build static binaries.)

I'm impressed. Somebody else breaks everything the same way I do! I'm trying to fix issues as fast as he finds them. (Cleaning out dangeroous environment variables, check. Fixing BUILD_STATIC=none so it actually _works_, check.)

March 12, 2010

Ok, alpha, m68k, and sh4 don't build. Alpha is dead hardware, the config isn't even checked in, and the build break is an internal compiler error in the uClibc build. Not a regression from last time. The m68k target doesn't actually run because we haven't got an emulator for it. (There's aranym but it doesn't have quite all the hardware we need, and I haven't heard from Charles Stevens in months. I should email him...) I've been wrestling with sh4, and might do a bit more but considering that qemu is broken _also_, I'm not sure how much I care.

That's the build breaks. I'm not happy about them, but can live with them for this release. Next, what native builds finish?

March 11, 2010

Hmmm... I need fallback patches for bisectinate.sh.

One of the common failures in doing a bisect is patches stop applying, but that doesn't mean the patch is no longer needed to fix some issue that hasn't gone upstream, just that the patch needs to be modified. For example, the patch to tell the Linux kernel that the arm versatile board can take a v6 processor (whether or not real hardware can, qemu does) keeps breaking every time the kernel adds yet another board to the list of processors v6 supports. This makes bisecting a real pain if 2.6.32 needs one patch and 2.6.33 needs another, but one of the two patches should always apply.

What I need to be able to do is say "try this patch, if it doesn't work try _this_ patch instead before failing". That means I need to teach the toybox patch command to support --dry-run (to avoid debris from partially applied patches messing stuff up), and figure out some way to name patches so the application logic knows they're related. (alt-thingy, alt2-thingy, alt3-thingy perhaps? Hmmm...)

And when a patch _has_ gone upstream, the fallback can be a zero byte file.

This probably involves a general cleanup of the download/setupfor/extract cluster of functions, since they need to be broken out into their own file the way utility_functions.sh was.

This is not something I want to do right _before_ a release, but it's something I need to tackle to make the cron jobs actually maintainable...

Speaking of messing with toybox, I posted about pushing it upstream to the busybox mailing list and was met with chirping crickets. Not sure anybody even read the message...

Oh, and two option parsing glitches I need to fix, "toybox touch -l walroid" isn't giving me an error (it should, -l takes a number argument and touch needs <1 argument), and wc m+c was segfaulting. Again, todo items for later, after I get this release out. (All this came up trying to bisect why sh4 screwed up its assembly boot code yet again. I'm starting to suspect that the simple cross compilers and the cross-compiler cross compilers don't behave quite the same way. Sigh...)

March 10, 2010

Does anyone still doubt that Linux has permanently lost the desktop when this is the state of play in 2010:

Considering all the factors, the only workable solution looks like doing what Windows is doing. Hard drive and SSD vendors are focusing on compatibility and performance on recent Windows releases and are happy to do things which break the standard defined mechanism as shown by C-1, so parting away from what Windows does would be unnecessarily painful.

Oh, the mistake I made in the original paper is that for the second half of the table I only checked the high end. For the 70's and 80's I looked up both high-end and low-end prices in magazines, but for the 90's and 2000s I just confirmed the high end prices. When I went back and actually _looked_ at old Dell and HP pricing pages out of archive.org, I figured out that I'd missed the gap between high-end and low-end growing from a factor of 4 to a factor of 16 during the second half of the 1990's and early 2000s. (Even though I _knew_ it had happened; I'd lived through it, I remembered the first $999 pc and first under-$500 PC being big news at the time, and even the rise of the "used computer market" should have been a big neon sign reminding me... Sigh.)

The thing is, since the high end continued to march in lockstep with historical predictions, the new 64-bit hardware was introduced right on schedule because hardware transitions are driven by the early adopters at the high end. Thus it seemed logical that the software API transition (driven by the the volume of low-end purchasers switching over) would happen 3 years later, because it always had.

But in the 1990's the 56k modem gated internet performance, leading to the introduction of the Celeron to staunch the market vacuum which gave AMD enough money to develop the Athlon (introduced in 1997) and leapfrog Intel for a few years. Back around 1995 it didn't matter how fast your computer was as long as you were accessing the internet through a drinking straw. Since a faster processor meant nothing (despite Intel's "bunny suits" despreately trying to convince people that somehow a new processor would render web pages faster than your 56k modem could actually download them).

Moore's Law didn't stop (it didn't even slow down), but over the next 10 years or so, most consumers cashed in two interations of Moore's law for cheaper prices instead of better performance. With the widespread adoption of broadband this finally seems to have worked its way through (plus the dominant cost of low-end things like netbooks isn't really the chips or software but the case, display, batteries, keyboard... You can sell a router for $50 but the OLPC couldn't get down to twice that despite years of trying).

This means that the low end's abandonment of 4 megabyte memory spaces would happen FOUR iterations of Moore's law later, which is 6 years after 2005, which is 2011-ish. And it would happen more gradually, as if the browser-based "webtop" stuff wasn't confusing enough. (Sun crit-failed Java so badly they lost out to Adobe's Flash the ubiquitous cross-platform browser language, but we do _have_ ubiquitous browser-based binaries anyway. From the McDonalds login screen to "Plants vs Zombies", Flash is everywhere at the moment. No idea if it'll last, but none of the Facebook games or this or this or this or this or this seem to be written in Javascript.)

But at this point, it's moot for Linux. Since the Linux community continued to shove crap like Gnome down people's throats (and gutted KDE with a disatrous 4.0 transition that rendered it so unsable even Linus Torvalds abandoned it), Linux is in a distant third place. The fight is entirely between Apple and Microsoft. Even if Apple wins, they're evil too. And competent.

I suppose our consolation prize is that we're only in _second_ place in the smart phone market. (Somebody needs to do one of these showing Mainframe -> minicomputer > microcomputer/pc/laptop > smart phone.)

But the #1 vendor has a national sales and support infrastructure in "apple stores" as well as AT&T providing places you can walk in to try it and walk out with a new phone. Whereas Android has dozens of competing (intentionally slightly incompatible) implementations and a pointy-haired development project that's not only widely criticized by experts but which has managed to alienate the Linux-kernel develoment community.

And that's our _success_ story when it comes ot user interfaces. Which is not entirely surprising given open source's structural problems dealing with User Interface issues during the development process. It's not a coincidence our success stories (servers and embedded devices) are all headless boxes with little or no UI. Open Source development is a peer reviewed scientific discipline that works with objective metrics and automated regression testing. User Interface development is an artistic task similar to painting, storytelling, or composing music where it's often hard to say _why_ something works even when it clearly does. Both can be _creative_ (good programmers get writer's block), but one's got a science to fall back on that tends to produce at least reasonable results, and the other's still an art or nothing.

March 9, 2010

There is a Linucon Wikipedia page. *boggle*.

It's almost accurate, too. It mentions Stu Green but not Mark Miller, either I was chair or all three of us were equal, can't mention Stu without mentioning Mark. The thing was my idea and I'm the one who invited the guests, scheduled programming, fronted the money for the first year, and s on, but it wouldn't have happened without Stu _or_ Mark. I was hoping that one of them would take over as year 2 con chair, but the strain burned them both out badly. I burned out too, and wanted greatly reduced responsibilities for year 2 (basically I wanted to run the con suite, although I wound up inviting guests, doing the programming schedule, and talking on several panels as well).

We had other department heads, but not con chair material. For example, our most energetic participant was a guy named Troy Belding, who was driving in from Houston and had enthusiasm and energy and recruited several other volunteers such as Kreely, the woman who made our caffeinated soap. But Troy was flaky, he kept ignoring the tasks we needed done (and which he'd volunteered to do) in favor of random other things we didn't need and that he couldn't pull off. Thus panel recording never happened because he spent his time building a Dance Dance Revolution pad out of plywood and plexiglass that you needed to stomp on with steel toed boots to get it to register anything. He was so proud of that thing, and it was so useless. (Yes the rollable foam pads were already available mail-order for about $15. But more to the point recording all the panels so we could put them up on the website was more important to the con as a whole than a DDR pad for a one-hour slot saturday night. I don't mind that it wasn't what excited him, but he should have told us he wouldn't come through so we could have reassigned the important stuff and let him concentrate on the unimpotant stuff that interested him. Oh well.)

Chase Hoffman stepped in at the last minute to be Year 2 con chair (year 2 was organized in less time than year 1 because of this, but otherwise there wouldn't have _been_ a year 2). He was concom from Austin's then-big anime con Ushicon, who was moonlighting for Linucon the first year to run the Anime track. Ushicon's concom was burning out, and saw Linucon as an opportunity to groom a replacement con chair for Ushi. (Run this small event first and then graduate to their ~3000 person con.) Year one we'd had a little over 300 people, and year 2 we were hoping for somewhere in the 450-600 person range, if all else failed they believed the resources of Ushicon could run a 600 person event with one hand tied behind their back.

Unfortunately, Anime cons work differently than SF cons or Linux events. Ushi's first year had something like 1200 people based on word of mouth and we advertised like crazy to get the word out to 300. So Chase's crowd did essentially no advertising... and got half as many people the second year as we'd had the first. (Meaning they lost more money than we did, although Ushi covered it entirely from the travel/lodghing refund from a single guest they'd planned to fly in for Japan who'd cancelled on them.)

By that point I was planning to move to Pittsburgh (which I did in January 2006, to take a job at TimeSys), and couldn't take the event back for year 3. And thus Linucon stalled. I got mail about our incorporation and 501c3 status expiring last year.

I've pondered starting up another con but Mark hadn't regained interest until recently and Stu wasn't in a good position to devote a lot of time or resources to it due to the recession, so we put it on indefinite hiatus. I flirted with joining Texas LinuxFest but they just didn't get it and starting over from scratch would have been _easier_ than trying to make that work.

Speaking of which, that's coming up soon and attendance is free. I should probably stop by and see how it worked out...

March 8, 2010

I twittered about this but apparently didn't mention it here. That's a linux kernel git archive that goes all the way from 0.0.1 to the present, seamlessly. If you pull it updates itself from Linus's current 2.6 repository, but its history doesn't stop at 2.6.12-rc2, which makes "git annotate" way more useful. :)

Doing some prep work for Flourish this weekend. I'm giving the prototype and the fan club talk. Gotta outline it properly and come up with some slides...

Update: Floush is _next_ weekend.

March 7, 2010

At Starbucks with Fade, grinding through some of the programming things mentioned yesterday. Glad to be back in Texas. (I'm aware that there were Starbucks in Orange County, but this is _my_ Starbucks. I don't expect anyone else to understand the distinction. Don't care. And I'm aware I don't even drink coffee, but vanilla scones and hot chocolate count.)

As part of the cron job restart, I've been working on a ruminate.sh script to do similar sorts of optimizations to what I mentioned yesterday, but using a completely different mechanism. This script is about being able to do multiple bisectinate runs in parallel, by creating a build/bisectinate/$PID directory for each instance, so each ruminate run essentially has its own build directory. (Commits 984, 985, 986, and a few more since have been fallout from that.)

The irritating part there is similar to the parallel setupfor/extract stuff, that the temp directories don't get reliably deleted if they don't have constant names. But once again, trap EXIT is close enough, and keeping it all under build means you can do an rm -rf build once in a while just to be safe.

The next optimization I want to do is teach bisectinate to clone the repository straight into the extracted tarballs directory, and do the bisect in there. Right now, each bisect step updates git's working store (which is lots of disk activity), then we extract a tarball and pipe it through bzip to write it into the packages/alt-$BLAH.tar.bz2 location and format setupfor expects (which is expensive), and then we rm -rf the old copy and re-extract and patch it (which is expensive).

Also, on my laptop, that disk activity paralyzes vi for a minute or more at a time and I've forgotten the "set stupid_sync_every_30_seconds=HELLNO" command that tells it to stop doing that. Doing a "sync" in another window helps a bit, (probably working around this bug), but that means a 45 second lag becomes a 10 second lag, still kinda sucks when you're TYPING.

The fiddly bit there is patches. I have to re-patch the source, then do a "git checkout -f && git clean -fdx" which isn't so expensive if it's all in cache but is a bit of pain if it isn't. Still, that cleanup's going to be a lot cheaper than the tarball extract, and sets up the cache for the bisect operation to follow.

Meanwhile, what I'm bisecting to test this is the mips problem, which also affects mips32 so it turns out it would have been easy enough to fix anyway. :)

It amazes me, sometimes, how horrible the user interface of Linux is. For example, if you accidentally hit both mouse buttons (and on my laptop they're right next to each other with no dividing space, and meant to be hit with a thumb), that "chords" together to count as a middle-click. If you middle click on a console tab, it closes that console with no warning. So I'm constantly accidentally (irrevocably) closing open sessions when I mean to switch to them, and they go away without even showing what was in them. Just lost another one, no idea what it was. I just remember that it was important.

I've groveled around in the "settings" menu, but there's no way to tell it to stop doing this. I looked in /etc/X11 where you used to be able to disable it in the configure, but they've redone that all and there isn't even an input section mentioning mice anymore. This didn't used to be a problem, and I used to know how to fix it even if it was, but the endless regressions of Linux's "Great March Sideways" mean that the new bugs are replaced by old bugs, and the number never goes down, and we must sit down and shut up and accept what they think is good for us, and of course we can write our own from scratch in C if we disagree, so we have no right to complain.

It's the painful, _flamboyant_ incompetence that gets to me. Combined with the smug self-satisfied "we will inevitably dominate, it's just a matter of time". I hate to break this to you but it's been 20 years and you stopped actually getting any _better_ a decade ago. It's all lateral churn since then. You really, fundamentally, suck at user interface issues. You used to need a binary-only netscape, now you need a binary-only flash plugin. NOTHING HAS CHANGED. The hardware's orders of magnitude better (memory, CPU, disk space, screen resolution), and the kernel's just about managed to keep up with that. But userspace? If anything, it's gotten worse.

In still need to buy a mac.

March 6, 2010

Back in texas. Yay. Trying to get back on the strange "bed at 5pm, wake up at 3am" schedule where I have my quiet everybody's asleep work time at the start of the day instead of the end of it. (Strange, but it seems to work for me.)

So the move to 2.6.33 broke mips64. Kind of amusing, really:

  LD      vmlinuz
mips64-ld: invalid hex number `0x'

If that had happened before I got a known working version (modulo the ccwrap weirdness I still have to track down) I'd have to try to poke through and understand the linker script, which would have been a pain and probably eaten the day for a one line fix. But now I can just bisect to find the commit that broke it.

Of course this gets back to the fact that my bisectinate script is redoing the entire build (cross compilers and all) every step, and in this case it just needs to redo the system-image stage. Hmmm, how to autodetect that... (I.E. something _else_ eats my day...)

The problem is that kernel headers are used earlier, and it's possible that horked kernel headers could prevent the cross compiler from building anything, so it would die in the "hello world" test. Or screw up the root-filesystem stage. Either way, you'd need to rebuild everything. But if root-filesystem successfully built to the point of giving us a tarball, then it's probably ok. So when bisectinate runs, and UNSTABLE=linux, it can check for the existence of an appropraite root-filesystem tarball for the arch, and zap just the system-image tarball, which tells build.sh to rebuild that bit but leave the previous bits alone. (This is assuming that a build.sh of current happened before bisectinate ran. Otherwise it won't find the tarballs and will just do the full rebuild for every step.)

I can do something similar with UNSTABLE=busybox and simple-cross-compiler, since that only ever needs to rebuild the root filesystem and has no impact on the cross compiler.

Yeah, this is a bit of a fiddly optimization. But the cron jobs (when reinstated) should check linux, uClibc, busybox, and qemu upstream packages for regressions; these optimizations are targeted for the packages that get bisected often, even automatically. (I don't upgrade the FSF ones because I don't care: they went GPLv3, not interested in shipping that, much more interested in finding replacements for them.)

I'm also working on a ruminate.sh script to do similar sorts of optimization, but using a completely different mechanism. That's about being able to do multiple bisectinate runs in parallel. The first way it does this is by creating a build/bisectinate/$PID directory for each instance, so each ruminate run essentially has its own build directory. (Commits 984, 985, 986, and a few more since have been fallout from that.)

The 8-way server is still down, need to poke Mark.

March 5, 2010

Sitting in the John Wayne International ~~Pilgrim~~ Airport, ready to head back to Austin in an hour or so, listening to overhead announcements that start "To enhance your safety..." and trying not to laugh.

Sigh. On the flight out, southwest had these marvelous little hybrid cookie cracker things. Now they have "cheese nips". I suppose it saves money by preventing people from asking for seconds, and thus makes us less likely to run afoul of the "I don't care if you can get the armrests down and your seat buckled, I don't like your movies" policy Kevin Smith ran into...

Hour long layover in denver. Vaguely curious what happened to my friend Kirstin who used to live in Greely, but she graduated and moved, and I lost touch. Oh well.

I can't log into securitybreach. That's odd. That's the 8-way server with buckets of memory and a gentoo installation, which I run test builds on to come up with release binaries. (And which I need to get the cron jobs working on again so we have nightly snapshot builds.) It's saying connection refused trying to talk to the ssh port. I wonder if dns has gone wonky again? Must poke mark...

Grinding through the lwn.net kernel articles since the last stable release. (I'm behind.) The article about converting printk(TYPE blah) to pr_TYPE(blah) is interesting because pr_TYPE() can be selectively disabled in the .config based on severity, and the printk(TYPE) ones can't, so the migration helps the embedded guys. That doesn't seem to have been raised during the discussion when developers started pushing back against it as pointless churn. Sigh...

Oh right, device tree for arm, I should look into the current status of that...

Now that u-boot's gone gplv3 (and hence is no longer interesting), I should take a closer look at its gplv2 replacement fork.

I'm still somewhat amused I missed the whole devtmpfs argument, yet the result seems about what I'd have argued for anyway. :)

Then again, lack of external participation is probably why it worked. Kay Sievers and Greg K-H aren't stupid, they just seem to have a nuclear "Not Invented Here" territorial mushroom cloud around bits of sysfs, and react poorly to anybody else trying to play with their toys. Oh well. (Impolitic to say it, of course. But now I have the luxury of not having to care, and intend to revel in it...)

March 4, 2010

Watching TV in the hotel room. They've moved from spending a week talking about a pretty white girl killed by a killer whale (SHOCK) to a pretty white girl killed by a known sex offender alone in a wooded area. (Because when you label every horny 13 year old who takes a picture of THEMSELVES as a sex offender for the rest of their lives, the real ones get lost in the noise. Idiots.)

I wonder what pretty white girl they were going on about before the whale?

Is it just me, or does Orange County come across as filled with trembling octagenarians terrified that the world has slipped beyond their control, and they're going to break a hip at any time and HOOLIGANS (kids these days, I.E anyone under seventy) will break into their house at night and move their dentures to the other side of the bed so they'll NEVER NEVER FIND THEM, JUST LIKE LAST WEEK!

I mean honestly. Compare FDR's "we have nothing to fear but fear itself" with the current War On Terror. Talk about missing the point.

Unfortunately, I'm not that thrilled with the other side at the moment. Obama is not showing nearly as much spine as I expected. Kennedy gave the speech about "We choose to go to the moon and do these other things, not because they are easy, but because they are hard". Obama cancels our last gasp at a return trip attempt for budget reasons. Excuse me, NASA is such a tiny fraction of the total federal budget you can hardly even measure it, he's spending far more just on _escalating_ Afghanistan than the whole of Nasa spends in multiple years.

And why spend a year "reaching out" to a unified block of opposition that's started calling _itself_ "The Party of No" when it's far more effective to just LET THEM FILIBUSTER. He's wimping out about the _threat_ of filibuster. Call their bluff. LET THEM DO IT. Tell the idiot in charge of the senate (who is a complete nonentity) what you want him to do. Publically. Shame the bastard. Let the republic party (if it's "the democrat party then it's the republic party) physically exhaust itself. Let the news cover them reading the phone book into the congressional record. If something has to go on for eight months and block all senate business, LET IT BE AN ACTUAL FILIBUSTER AND NOT THE MERE THREAT OF ONE.

But no, that would be IMPOLITE or something.

So on the one side you have a party of spineless ineffectiveness that acts like the minority when it has a large majority. On the other side you have rich old men terrified of their own shadows hunkering down in bunkers at an undisclosed location that's pixelated on that newfangled Series of Tubes, and hiring mercenaries to act as bodyguards for the nation in places with names like Guantanamo, Abu Ghraib, and Falujiah.

March 2, 2010

Still in California, pondering a FWL release this weekend. It occurs to me that the prebuilt dropbear and strace binaries for lots of different targets, which I've been taking for granted on my laptop, weren't in the most recent release. That by itself is probably worth a new release. There's also the powerpc, sparc, and mips64 targets (and I'm poking at powerpc64).

And the new 2.6.33 kernel dropped last week. Testing it out on all the targets now. (Powerpc-32 is confused again.)

Also, the powerpc, sparc, and mips64 targets.

February 28, 2010

Lunch with Fade's sisters Lisa and Stine. Yay sociability. We went to some place snooty and extremely california. I had the world's most upscale macaroni and cheese.

Called t-mobile's tech support and they did something to get the speed of my connection back up to 40k/second (from the "slower than dialup, too slow to use twitter" state it had been in). It lasted about an hour, then started to slow down again. Sigh.

I suspect my phone is just lying about how much signal it has. Sometimes when it says it has 4 bars, the throughput is 200 bytes every 5 seconds. Sometimes when it says 1 bar I'm getting 20k/second. And moving it 1/4 inch can switch it between the two.

Hmmm... Might be a lot of unshielded electrical stuff in this hotel generating huge quantities of interference? Dunno...

The real problem is it doesn't TELL me anything. I either get throughput or I don't. I leave it downloading a large file (such as the Rachel Maddow podcasts so I have something to watch in my hotel room, or the existing 64-bit mips toolchain Mark dug up so I can compare against what I'm building), and it's getting the full 40k/second, then I come back five minutes later, none of the hardware's moved, but now it's stalled. Frustrating.

The fact wget --continue is corrupting files doesn't help matters. (I want to track down t-mobile's stupid cacheing proxy and smash it with a crowbar.) But _that_ one I might be able to work around with ssh, since if the signal's encrypted t-mobile's stupid cacheing can't screw with it. (Let's see, ssh to a server with a real connection and run the wget there: ok, about 2 minutes to download each file in that context. Now rsync that whole mess to my laptop via an ssh tunnel, which resumes _reliably_...

Well, it's doing something.

64 bit mips is mostly in now. It boots to a command line, and then when I tell it to natively compile hello world it gives me a segmentation fault in ccwrap. Wheee...

Patch:

I wasted about 2 hours figuring out why patches weren't applying again. (First hunk had a trailing space on the last line, the second hunk had version skew that meant the hunk started with a closed curly bracket instead of a blank line). Debian's patch implementation was of course doing various whitespace stripping and fuzz factor stuff to make both hunks apply magically. But I don't want to teach my patch to magically ignore problems, I want to teach it to report on _why_ the patch failed.

And thus for something like the fifth time, I inserted lots of printf()s into my patch implementation to see what it was actually doing. Rather than rip them out yet again, I added a -x command line option (guarded by TOYBOX_DEBUG in the config). Unfortunately, it's hard to come up with a good debug output format that makes sense to somebody who doesn't know the guts of the patch implementation. Have to think about it some more later.

Extract:

I'd like to make tarballs extract in parallel, I.E. "FORK=1 EXTRACT_ALL=1 ./download.sh". The way it works now is extract() creates $BUILD/temp, extracts the tarball into that, and then does a mv $BUILD/temp/* to the known destination location ($SRCTREE/$PACKAGE). That makes it independent of version number: as long as the tarball creates a single subdirectory under temp, then it gets renamed by the wildcard to a known value.

Of course if the tarball doesn't create a single subdirectory, this won't work. (Hence the need for --prefix with git archive.) And having all the extracts go into the same $BUILD/temp directory is why this doesn't work in parallel.

The problem with having each extract create a unique directory (easy enough to do, append the PID ala $BUILD/temp-$$) is cleaning up after it. If the extract gets interrupted, we don't want to accumulate megabytes of crud in $BUILD. With a single temp directory, it's easy enough to zap the old one next time you do an extract. But with multiple ones, you could accumulate an unbounded amount of debris over time.

I could look for $BUILD/temp-* directories that are "sufficiently old", but that's too fiddly and undefined for comfort. In theory 'trap "rm -rf $BUILD/temp-$$" EXIT' could do it (it triggers reliably for everything short of kill -9, and that's asking for it), but the shell's chronic inability to deal with whitespace once again adds a bit of subtle evil to the proceedings.

It's not enough to quote it, you need to use two different types of quoting. You have to prevent $BUILD from being evaluated before the trap triggers, because trap "rm -rf \"$BUILD\"" would cause problems if BUILD had a double quote in it (and would turn into rm -rf if BUILD='/";'). Yes I know that seems unlikely, but still possible. In-band signaling is not foolproof, the shell has to keep track of its own block aggregation and not re-evaluate stuff it's already evaluated, so you have too defer variable evaluation.

The reason this is non-trivial is you have to evaluate the $$ in the context the trap is set from to make sure the PID is correct. Thus I wound up with trap 'rm -rf "$BUILD/temp-'$$'"' EXIT.

February 27, 2010

Wrestling with 64 bit mips, because it's there.

As usual, gcc is horked. It builds a -mabi=64 toolchain with an -mabi=n32 version of libgcc, and then the uClibc build dies trying to link the two conflicting abis together. Wheee.

It would be so nice if I could get these suckers to be orthogonal. Gimme a compiler, and then let me invoke a libgcc build with an specified compiler. If you need inline assembly and such, you can #ifdef __i386__ like every other program in the world. (I want to clean out autoconf with a flame thrower: 95% of what it's doing is a complete waste of time, and of the remainder 4% is actively obstructive. And it's all really stupid.)

So I'm doing --target=mips64-unknown-linux and I admit I've patched gcc to beat a libgcc_eh.a build out of it even in the --disable-shared case (although libgcc.a is what's receiving the damage). The problem is it builds a temporary compiler (xgcc), and it builds it as multiarch (bad gcc!), and then feeds it "-mips64 -mtune=mips32 -mabi=64" during the build. (Why is that middle one 32 bit? Dunno. It might be one of those "64 bit code is bigger but 32 bit code sometimes works in a certain context" thing, ala the near/far pointers back in DOS. No, the problem seems to be that when it calls xgcc it doesn't feed those overrides to it, and thus it gets the default output type. And the default is wrong.

So when I build with NO_CLEANUP and then do a dumpspecs on build/temp-mips64/build-gcc/gcc/xgcc (side note: gcc --dumpspecs does not work, but gcc -dumpspecs does, brittle piece of trash...), it says:

*asm_abi_default_spec:
-mabi=n32

Where does that come from? The ./configure stage says "Using the following target machine macro files" and lists a lot of stuff out of gcc-core/gcc/config such as mips.h. In mips.h there's a #define EXTRA_SPECS block that sets asm_abi_default_spec to MULTILIB_ABI_DEFAULT, and earlier in that file there's an #ifdef staircase with:

#if MIPS_ABI_DEFAULT == ABI_N32
#define MULTILIB_ABI_DEFAULT "mabi=n32"
#endif

But the default value MIPS_ABI_DEFAULT gets #defined to if it's blank is ABI_32 (not ABI_N32). So apparently it's not blank. So where does that get set? Let's look at the build output again, and...

make[1]: Entering directory `/home/landley/firmware/firmware/build/temp-mips64/build-gcc/gcc'
TARGET_CPU_DEFAULT="" \
        HEADERS="auto-host.h ansidecl.h" DEFINES="" \
        /bin/sh /home/landley/firmware/firmware/build/temp-mips64/gcc-core/gcc/mkconfig.sh config.h
TARGET_CPU_DEFAULT="(MASK_SPLIT_ADDRESSES)|MASK_EXPLICIT_RELOCS" \
        HEADERS="options.h config/dbxelf.h config/elfos.h config/svr4.h config/linux.h config/mips/mips.h config/mips/linux.h config/mips/linux64.h defaults.h" DEFINES="UCLIBC_DEFAULT=0 MIPS_ABI_DEFAULT=ABI_N32" \
        /bin/sh /home/landley/firmware/firmware/build/temp-mips64/gcc-core/gcc/mkconfig.sh tm.h

Well, there's MIPS_ABI_DEFAULT getting set to ABI_N32. The first time mkconfig.sh is getting called to generate config.h, the second time it's called to generate tm.h. What's tm.h? Let's open up build/temp-mips64/build-gcc/gcc/tm.h and...

#ifndef MIPS_ABI_DEFAULT
# define MIPS_ABI_DEFAULT ABI_N32
#endif

And a half-dozen lines later that #includes config/mips/mips.h. Ok, so if that script bit the makefile was calling is the source of this data, where's THAT getting it from. Let's look at gcc/Makefile and search for mkconfig.sh being called to generate tm.h...

cs-tm.h: Makefile
        TARGET_CPU_DEFAULT="$(target_cpu_default)" \
        HEADERS="$(tm_include_list)" DEFINES="$(tm_defines)" \
        $(SHELL) $(srcdir)/mkconfig.sh tm.h

Ok, the ABI_N32 thing was in the DEFINES= part, so what's tm_defines... That's earlier in the same file:

tm_defines= UCLIBC_DEFAULT=0 MIPS_ABI_DEFAULT=ABI_N32

Did I mention this Makefile is generated by ./configure running sed against Makefile.in in the actual source directory (gcc instead of build-gcc). But Makefile.in just has "tm_defines=@tm_defines@" so let's look at gcc/gcc/configure and that has "s,@tm_defines@,$tm_defines,;t t" in the big sed invocation, so where is $tm_defines set in configure?

Wanna flamethrower. Wanna BIIIIG flamethrower.

Ok, configure is sourcing other files. (It does not use the "source" keyword, it uses the "." directive, which is REALLY HARD TO GREP FOR. Just sayin'. Wouldn't be so bad if it did it at the start of the file, but line 12187 of 17957? Yeah...) Ok, back up, go to the real source (not the build dir) and grep for tm_defines... not there. Ah, gcc subdir. Try again... and it's in config.gcc a lot. Ok, let's look at THAT file.

Ok. In gcc/gcc/config.gcc around line 1550 we have:

mips64*-*-linux*)
        tm_file="dbxelf.h elfos.h svr4.h linux.h ${tm_file} mips/linux.h mips/linux64.h"
        tmake_file="${tmake_file} mips/t-linux64"
        tm_defines="${tm_defines} MIPS_ABI_DEFAULT=ABI_N32"
        gnu_ld=yes
        gas=yes
        ;;

GOOD! Right! Makes sense. Why the heck is it setting ABI_N32 here as the default instead of ABI_64? Why is it doing it? Is there a reason? What's the difference between the five different ABIs it can define, anyway? (ABI_32, ABI_O64, ABI_N32, ABI_64, and ABI_EABI.) Is there something I should read somewhere?

Sigh. Anyway, I can change that ABI_N32 to ABI_64 and see what happens...

Great. So now asm_abi_default_spec is -mabi=64! (Yay!) But:

file build/temp-mips64/build-gcc/gcc/libgcc/_fixunstfdi.o
build/temp-mips64/build-gcc/gcc/libgcc/_fixunstfdi.o: ELF 32-bit MSB relocatable, MIPS, N32 MIPS-III version 1 (SYSV), not stripped

By default, xgcc is still producing 32 bit binary .o files. *headdesk*

I really hate gcc's configuration and build infrastructure.

RIGHT! So when uClibc is building its random .o files, the file command says those are:

file build/temp-mips64/uClibc/libc/string/wcscmp.o
build/temp-mips64/uClibc/libc/string/wcscmp.o: ELF 64-bit MSB relocatable, MIPS, MIPS64 version 1 (SYSV), not stripped

Which looks good. But the command line uClibc is using to compile that has "-mips64 -mtune=mips32 -mabi=64" in it, which is sort of crazy and seems like overkill somehow, and still has -mtune=mips32 which I do not understand... Still, I want to create a toolchain that will create the right output by default. What happens when I "./mips64-cc hello.c -c" with this compiler:

$ file hello.o
hello.o: ELF 32-bit MSB relocatable, MIPS, N32 MIPS-III version 1 (SYSV), not stripped

That's just not right. The defaults need a beating.

So what I did was just give up and replace every instance of "n32" in gcc/config/mips with a different string. Of course the build broke, but when it did it died with:

cc1: error: unrecognized command line option "-mabi=walrosity"

And _that_ was the string I put in linux64.h. Note that ./configure chose to #include linux.h instead, that didn't include linux64.h (I checked hours ago), but something somewhere apparently decided to. So, rm -rf the modified sources I've been fiddling with, do a fresh EXTRACT_ALL=1 ./download.sh and tweak that one instance (not even fiddling with the asm entry in the spec file I tracked down earlier), and...

Yay! Now it's breaking in uClibc because it can't find __getdents64(). Luckily, somebody else already hit this. (Ah, they're putting out a 0.9.30.3 soon, I should test the -rc to see what new breakage has been introduced and which of my patches need to go upstream...)

And we have a mips64 toolchain! That was kind of annoying...

February 26, 2010

Christopher Cahoon emailed me and told me how to fix the firefox thing: pull up "about:config", and change browser.fixup.alternate.enabled from true to false.

Busy few days consulting in California, now I have some weekend time alone in the hotel. (Went out to a restaurant called Spoons that offered free WiFI, but despite being just about empty at that time of day seated me next to a loud table that didn't pause for _breath_ talking for an hour. Sigh.)

It's coming up on 1.0 release time for FWL. I've been focusing on beating functionality out of various targets that have been near-working but not quite there yet. Next up after that is filling out the missing qemu targets (like all the missing 64 bit versions of arm, mips, sparc, and powerpc). There's also a lot of polishing work.

An example of polishing work is that I want to make the root filesystem packaging and the kernel building orthogonal, at least for repeated rebuilds. Too many times I've wanted to tweak the root filesystem packaging and had to wait for the kernel to rebuild, which takes ten times longer and drains my laptop battery a lot more.

But the reason they're together is for initramfs bundling, and because I don't want to increase the granularity of the project more than necessary. Conceptually, there's the "build a toolchain" stage, "make a root filesystem" stage, and the "make it bootable" stage. (Ok, toolchain gets repeated but it's just variations on "build a toolchain".)

I'm pondering adding a REBUILD=1 option that won't blank the output directories but will instead only rebuild stuff if it isn't already there. That way it builds everything the first time but you can selectively delete and rebuild just one thing next time around. (I don't want it to be too fiddly and hard to maintain or use, but I also don't want it to take 10 minutes to rebuild every time I tweak one config symbol.)

Making REBUILD a general feature would involve having the build scripts for all the packages know what they're installing. Right now lots of them just call "make install" and that knows what to install; the build script doesn't care. (I hate having the same information independently recorded in two places. It's an opportunity for things to get out of sync.) I suppose all that's really _bothering_ me at the moment is system-image.sh building both the kernel and the root filesystem image, so I'll add REBUILD support to system-image and worry about generalizing it later if it really seems like a win.

This came up in the context of banging on arm big endian, by the way. The fundamental problem seems to be convincing the kernel that you can stick a big endian arm CPU into a versatile board. (You can with qemu.) It's a config thing, it only allows combinations it knows about even though you can theoretically swap hardware around since it's all modular. (Ok, this might require a soldering iron in the real world, but still.) I have a patch that already convinces the kernel you can stick an armv6 into a versatilepb, I need to extend that for this, but I encountered weirdness where the -git versions aren't building, apparently not because something important changed but because I'm applying the wrong set of patches in the -alt context and it's getting all huffy about the arm config being what it considers insane.

While I'm polishing the system image step, maybe I should rename some of the system image files. Calling the kernel file "zImage-$ARCH" is slightly misleading because that's a format name and half the time it's a vmlinux, or bzImage, or for a while powerpc was using prep... I also don't need the -$ARCH postfix since the directory name includes that. I should just call it "linux" or "kernel" or something. Also, the root filesystem image should be root-filesystem.ext2 (or .sqf) rather than "image-$ARCH.sqf".

One of them "change is churn" vs "this could be better" things...

I'm aware this is not the most coherent blog entry I've ever done, but that's a reflection of the fact I'm unsure which approaches are worth taking here. If I could explain it to myself clearly, I'd know what to do and what to skip. Try, try again...

February 25, 2010

Firefox is really stupid. For example, if it can't contact the URL I typed it rephrases it to stick a a superfluous "www" on the front of it. This A) is deeply stupid in 2010 when "www" as a prefix to website names has been obsolete for years, B) often breaks the URL because there _is_ no www alias to the site so it becomes a domain not found. (For example, www.username.livejournal.com does not exist.)

But whenever there's a transient failure to access a website, it stupidly rephrases my URL, and as far as I can tell the only way to get it to STOP doing this is either A) modify theh source code and rebuild it, B) find a real web browser.

I miss konqueror. Too bad it was bolted onto the side of KDE and KDE 4.0 was utterly unusable. And thus it goes down with the ship. RIP.

UPDATE: Christopher Cahoon emailed me and told me how to fix the firefox thing. Pull up "about:config", and change browser.fixup.alternate.enabled from true to false. (How you learn this stuff, I have no idea. Other people tell you about it, apparently.)

February 24, 2010

On a plane, heading for a consulting gig in California (trying to get an embedded arm system to stop triggering the OOM killer).

It's a Southwest flight and the seats are narrower than I remember, so I'm feeling a bit squashed in the center seat between two other people. I'm worried that if I really need more elbow room Southwest will force me to write a sequel to "Clerks", or some such...

Reading a library book called "confessions of a werewolf supermodel". (Really.) On top of that, it was shelved under "romance"...

February 22, 2010

So arm, mips, mipsel, ppc, sh4, sparc, x86, and x86_64 are all working. I need to tweak sh4 and sparc a bit, but they boot and compile hello world (if not static native dropbear).

The qemu-system-* variants I haven't done yet are m68k, cris, mips64, ppcemb, ppc64, sparc64, sh4eb, mipsel64, and microblaze. Not much point in dealing with big endian sh4 until I'm happy with little endian. The m68k variant is actually coldfire and I haven't done any mmu-less systems yet. I should probably tackle all the 64 bit ones. (Is there an arm64? Doesn't have a separate qemu binary if so, but arm handles both endiannesses so it's always possible it's unified.) Microblaze requires patching gcc 4.2 to add support (but I have the patch lying around somewhere). I've asked what the heck ppcemb is before, and never got a clear answer. (I'm not entirely sure the current qemu guys actually know, it seems to be a historical relic of some kind. It seems intended to emulate a ppc4xx, except it doesn't actually do it, and the stock ppc has most of the stuff for that anyway...)

February 20, 2010

So the qemu mailing list guys got me a URL to a debian sparc install CD that works properly under qemu-system-sparc. Yesterday I installed it and confirmed that the root-filesystem image seems to more or less work properly as a chroot under there. (Dynamic linking is still horked, but static linked stuff isn't giving me the strage memory errors and system call weirdness. I can mount a tmpfs and list its contents, which is progress.)

So I grabbed the kernel .config (debian installs it in the /boot directory along with the kernel), which was a couple years out of date, did the make oldconfig dance, squashed it down to a miniconfig, stripped out the modules, and now I'm trying to test boot my root filesystem image under it.

This involves switching back on serial console support, squashfs support, scsi support, and so on. Lots of diffing between my old miniconfig and this one. (The miniconfig format makes diffing two of 'em _so_ much easier, by the way. No wading through pages of unrelated fallout from the one or two symbols you want to tweak, every changed line is significant.)

I am kind of impressed that kernel lets you switch off CONFIG_SCSI but still have CONFIG_SCSI_PROC_FS, CONFIG_SCSI_MULTI_LUN, and CONFIG_SCSI_LOWLEVEL switched on. How does that work, exactly? (Eh, maybe CONFIG_SCSI=m and it was one of the lines I stripped out...)

And it still can't mount a tmpfs on /tmp and list its contents. That's just sad.

Asked on the qemu list and they pointed out that 32-bit sparc isn't entirely maintained over on the kernel side, so it might be that 2.6.18 worked and 2.6.32 doesn't? Grab the old source, and...

Yup. Using the (adjusted) debian .config, building vanilla kernel source with the ~4 year old version debian is using, and it works. Figure out which network card it's working (lance) and it can wget a file. It can even compile thread-hello2.c (although if I say -static it gets confused, but since libc _is_ built static it builds a static version by default anyway).

Right. Command line history is a marvelous thing:

rm -rf linux.bak && mv linux linux.bak && (cd ~/linux/linux && git archive --prefix=linux/ v2.6.19 | tar xvC ~/firmware/firmware)
cd linux && cp ../linux.bak/.config . && make ARCH=sparc oldconfig
PATH=~/firmware/firmware/build/simple-cross-compiler-sparc/bin:$PATH make ARCH=sparc CROSS_COMPILE=sparc- -j 3
cd .. && qemu-system-sparc -kernel linux/arch/sparc/boot/image -hda build/system-image-sparc/image-sparc.ext2 -append "root=/dev/sda rw init=/usr/sbin/init.sh panic=1 PATH=/usr/bin console=ttyS0 host=sparc" -nographic -no-reboot

I.E. Fire up the git repository, jump around through the various release versions (running oldconfig each time and answering the questions it spits out). (Whole lotta cursor-up goin' on...) Start with 2.6.19...

Ok, worked fine through 2.6.28. Then 2.6.29 had the "could not allocate memory" problem...

Bisected it to git 085219f79cad. Deal with it in the morning...

February 17, 2010

Let's stop a moment and examine the futility of termcap in Linux.

Back in the 1970's Unix systems used to output to various hardware devices. First there were teletypes (most often the ASR-33, because they could be bought cheaply secondhand and refurbished by hobbyists). Teletypes were printers with keyboards attached, hooked up to a serial cable. Everything sent to the serial cable was printed with ink on paper, and the keys pressed by the user were sent the other way along the serial cable for the computer too read.

Note: we're not even talking dot-matrix printers here, the ink ribbon was generally struck by a daisy wheel or similar, so there was no possibility of bitmapped graphics. The characters were struck by good old metal type, dating back to Gutenberg.

This is what "tty" is an abbreviation for: teletype. The Unix console infrastructure still thinks in terms of serial ports connected to printers with keyboards attached. Newline and linefeed being separate characters, waiting until a full line of text is typed before processing it, inability to programmatically read back what was written. Even the ctrl-G "bell" character rang an actual metal BELL. These machines not only needed their ink ribons replaced, they needed to be periodically cleaned, oiled, and various pieces tightened and straightened because they rattled apart. It was almost steampunk.

Then in the 1970's "glass tty" devices were introduced, which connected the serial cable to a box with a CRT and keyboard, instead of printer and keyboard. This new style of terminal was a drop-in replacement for teletypes, more expensive to buy but cheaper to run in the long term because they didn't eat of box of paper every week, and you never had to change an ink ribbon. Mainframe giant IBM introduced the IBM 3270 in 1972, and DEC (the company that invented the minicomputer) had the VT52 and later the the VT100. But there were dozens of competing models from smaller vendors, in fact Intel's second processor, the 8008, was created to power one of the more obsecure ones.

The new electronic terminals could do things the old teletypes couldn't. You could backspace and ERASE things that had been written. You could cursor up, jump the cursor to any place on the screen, or blank the screen and start over from the top. Some let you delete from the current cursor position to the end of the line all in one go, some let you delete the current line and scroll everything under it up one line (inserting a blank line at the bottom of the screen, but leaving lines above the current one where they were)... The most expensive terminals even let you display different colors. All this behavior was hardwired into the output device, at the other end of the serial port.

But triggering these advanced features involved outputing special sequences of characters (often "escape sequences" starting with the escape key (ascii 27) followed by a command string). The 7 bit ASCII character set was standardized two years before Ken Tompson started work on Unix, and horrors like IBM's EBCDIC were largely ignored by the Unix world, but ASCII didn't cover cursor keys, function keys, home, end, page up, and so on. In the absence of a recognized standard, each different brand of terminal device used different escape sequences to represent each new function and each new key, and different brands offered different sets of functions and keys.

Different Unix installations bought different brands of terminals, but wanted to use the same software with them. The "solution" was a library called termcap (later replaced by terminfo, which these days is provided by ncurses). This hid the differences between terminals from software by collecting a database of known terminal types, and requiring the user to set the TERM environment variable to select which output device a given terminal was connected to, so it could output the right escape sequences.

As always, a system to manage complexity encouraged the accumulation of complexity. Hundreds of different terminal types emerged, and termcap happily bundled them all.

Of course some terminal types were more common than others. Unix started life on a DEC PDP-7 in the summer of 1969, which was then ported to the 16-bit PDP-11 by 1971, and eventually the 32-bit DEC VAX. Unix spread to a bunch of other machines as the years went on (due to being written in C, the portable assembly language), but the most common Unix machines continued to be DEC minicomputers until the rise of the microcomputer in the 1980's, when the Motorola 68000 became the chip of choice for a few years, until the Intel 80386 and its successors became ubiquitous.

This meant that DEC's brand of terminal was often at the other end of the serial cable, so VT100 escape sequences for things like cursor keys became the default assumption. In 1979 the American National Standards Institute produced a standard for terminal escape sequences (X3.64) which used a lot of the same escape codes as the VT100 for things like cursoring around and clearing the screen, so now there was an actual standard to follow for at least basic functionality.

By the 1980's, serial terminals such as the VT100 were being replaced by bit-mapped displays. Things like the IBM PC's VGA card provided a built-in graphical display which the computer could render its own fonts onto. The hardware usually provided a mode that behaved a bit like an old serial terminal, providing its own fonts, moving the cursor, and scrolling the screen automatically when given a string to display.

But by the 1990's most systems used the bitmapped graphics mode almost exclusively to provide a graphical desktop with software-rendered "console windows" displaying any text consoles. Unix developed its own standard for doing graphical desktops (X11) in 1987 (as part of MIT's Project Athena).

By the time Linux got started, external display hardware that spoke its own serial protocol had been gone for a decade. Computers displayed text consoles on their own built-in video cards. In Linux, "virtual terminals" (/dev/tty, accessable via ctrl-alt-1 through ctrl-alt-6 from X11) used the card's built-in text mode, and xterm programs rendered a terminal window in graphics mode (handling font rendering, scrolling, and even drawing the cursor entirely in software).

This meant that terminfo and termcap no longer served any real purpose. At both ends of each terminal connection were software programs passing data back and forth, usually both running on the same machine. All they had to do was agree on a protocol to do it in. It didn't matter _which_ protocol, they just had to agree. And there had been an official standard protocol for the basics since the late 70's.

Of course since terminfo/termcap was already there, the easy thing to do was add a new entries into the database for these programs, and set "TERM=linux" for the Linux kernel's built-in text mode terminal handling, and "TERM=xterm" for the X11 text window program, and add a "TERM=ansi" for the actual standard. And thus the pointless vestigial package was perpetuated, because nobody stopped to ask if it was still necessary.

These days, any terminal which CAN'T understand simple ANSI escape codes is too broken to use. Outputting them directly provides the ability to move the cursor, query the current screen size, and so on without the need for any other libraries. Unfortunately, relying on an actual standard was somehow considered "unclean" by hidebound old-timers stuck in the 1970's. (Luckily, most of them have now retired, but the damage persists.)

I mention this because I'm still vaguely poking at gdb, and it's too horrible for words. It unconditionally includes readline (has its own copy even), and thus it requires termcap, meaning these days you can't build gdb without ncurses.

It has a huge ./configure stage checking 8 gazillion things, but "lack of ncurses" is a build break. (So why bother testing? Why not just build break during the actual build?)

And as the above should make ABUNDANTLY clear, there is NO REASON for any software to still be using termcap anymore. Just output ANSI escape sequences and be done with it.

Unnecessary layers. All FSF code is composed primarily of unnecessary layers... (And what the heck is "nonopt"? It's there in configure, it's totally undocumented, google is not forthcoming...)

February 16, 2010

Heh. Finally changed the title of the blog page to say 2010. (I start each new year's blog file by copying the previous year's, deleting the entries, tweaking the header stuff to link back one more year, and updating the index.html symlink to point to the new one.)

Still writing 2009 on my checks, though.

Yay, the powerpc register fix (to run static powerpc binaries under qemu applicatin emulation) might be going in! The powerpc drive fix isn't, but I have a kernel patch workaround.

That puts powerpc in pretty good shape, so it's probably time to start thinking about what should go in the next release again.

And figure out what todo items are next. I need to redo the documentation, get the cron job back up and running, fix big endian arm and such, add s390 support, figure out what I want to do about gdb and screen...

February 15, 2010

The FSF continues to be crap at creating build systems.

For example, I'm trying to build gdb. On the host, gdb needs to be able to parse ELF files, disassemble machine language, and handle the gdbserver serial protocol. This has nothing whatsoever to do with the machine it's running on: I'm currently trying to get a gdb running on my x86-64 host it to parse 32-bit powerpc data via a network link from a qemu instance running a powerpc version of gdbserver. (Once again, my docbook->pdf converter analogy.)

Unfortunately, gdb was written by the FSF, and thus thinks it's special. So when you try to tell it "build with this compiler, parse this output", you run into all SORTS of incestuous hardwired assumptions, and trying to beat orthogonal behavior out of it is an exercise in frustration.

In theory, there should be a way to tell it "just include support for interpreting every machine language type, and I'll select the one I want via 'set architecture'". In practice, it doesn't have any way to do that. So you specify "--target=powerpc-unknown-linux", and it builds a host binary that interprets target instructions. So far, so good.

But when I specify --host=powerpc-unknown-linux (adding gdb to the native compiler), it flips out because there's no "powerpc-unknown-linux-gcc" and friends. (Yup, hardwired assumptions about the cross compiler name, and no way to change that short of running "sed" against configure.) So I try "--host=powerpc", which works fine through the configure stage but when I do make it starts configuring in the subdirectories and libbfd dies that powerpc is an unsupported target.

Let's back up a moment. The bfd library is there to parse binaries for the TARGET. It shouldn't care what the HOST is set to, that should just control the toolchain used to build the thing. The --host value is an ugly and incestuous way to control what the resulting program RUNS on, the --target value is to control what type of data the program OPERATES on when it runs.

So back up to specifying --host=powerpc-unknown-linux (same as --target) and hit it with the CC_FOR_TARGET and LD_FOR_TARGET environment variables, and try again. Now the bfd configure dies with:

checking whether the C compiler works... configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.

How is this wrong, let me count the ways: I did specify host. The build shouldn't be trying to run programs built with the TARGET compiler, it should only try to run programs built with the HOST compiler. Since this configure crap is called for me by the makefile I can't control its arguments the way I could the first configure.

So let's do a "make distclean" and re-run configure specifying --host=powerpc-fruitbasket-linux and --target=powerpc-unknown-linux (on a guess that I need to bypass the evil host==target checks FSF configures are littered with)... And it complains the tuple has change! Yes, it has! Did I mention that configure caches crap so feeding it different arguments won't necessarily register unless you do a distclean? Well a new fun wrinkle is that apparently distclean doesn't zap caches in subdirectories, so you have to rm -rf the source directory and re-extract it.

Ok, after re-extracting and re-running... it's building with "gcc", not with the cross compiler. And half the build invocations are run through libtool (which when functioning properly is a NOP on Linux, but which breaks all the time because the inability to do _nothing_ without screwing something up is the hallmark of the FSF).

In theory, you'd think "make CC=powerpc-cc AS=powerpc=as LD=powerpc-ld AR=powerpc-ar" would do the trick... but they use recursive makefiles. So the makefile's ability to override an environment variable on the command line (making it read only so that attemps to set it to something else are silently ignored) is useless because they call other child instances of make, which defeats the purpose of make.

Sigh. I can add gdb to simple-cross-compiler fairly easily: "./configure --target=powerpc-unknown-linux && make LDFLAGS=--static && cp gdb $OUT" That's the case they tested. But trying to convince it something as simple as "no, your compiler _isn't_ called gcc, it's got another name"... They just can't comprehend something of that magnitude.

If I could get one big version of gdb that could parse all targets, I'd happily add that to host-tools.sh and call it a night. Then build gdbserver on each target natively. Don't TRY to cross compile this nightmare. But I can't get one big version, I can only get per-target versions, because the FSF can only see what it expects to see. They never step back and THINK about what they're doing.

The version I'm using was written when the FSF's gdb project was over 20 years old. Apparently, THEY NEVER LEARNED ANYTHING EVER about how to build software. It's sad, really. I've never seen a software development effort with its head so far up its own ass as the FSF. It's too bad that 20 year headstart made their mediocre crap be a category killer for Linux development tools. Why bother writing something better when gdb already exists?

Because it SUCKS, that's why. It's the Windows of the posix world, widespread and development by an organization where actually developing software is a distant secondary goal. (In the case of Microsoft making money is their primmary goal. In the case of the FSF, spreading their religious views is the primary goal. Actually producing usable software is incidental, it only has to be just good enough to not to hold back the other goal.)

Grumble.

Ok, I can do this:

PATH=/path/to/crosscompiler:$PATH
./configure --host=powerpc-fruitbasket-linux --target=powerpc-unknown-linux
make configure-host
for i in $(find . -name Makefile)
do
  sed -i -e "s/AR = ar/AR = powerpc-ar/" -e 's/CC = gcc/CC = powerpc-cc/' $i
done
make

And then it dies with "unable to find libintl.h" because ./configure ran with the host compilers and checked the host headers and found random crap out of that which isn't there in uClibc.

Wanna flamethrower.

February 14, 2010

Threw out my broken noise cancelling headphones. I need to get new ones. Googling around a bit found cnet recommending the Koss Portapro as the "best under $50" option (and I'm not going to drop over $300 on something I expect to break in a year). But I can't find where in Austin might sell it, or confirm it supports bluetooth...

And I got one of the free tickets to the high tech happy hour thursday at 5:30 pm a buffalo billiards on 6th street, so I should remember to go to that.

Otherwise, poking at gdb. The last gplv2 release was 6.6, but there doesn't seem to be an easy way to get it to build one version that supports all architectures, so I have to put prefixed versions in the cross compilers instead of having one big one in host tools that lets you "set architecture" to each type. Alas.

I need to build native gdbserver binaries on each target, but I can do that as a native build along with dropbear and strace.

Pondering adding gzip support to dropbear (I.E. build gzip and then tell the dropbear build -I /path/to/gzipheaders -L /path/to/gziplibs), and screen (which needs ncurses). I've wrestled with screen a bit already and have to patch the sucker at least 3 times to get it to build.

I've been following Harry Shearer's twitter for a while, and I'm finally listening to an episode (February 7th) of Le Show. So far, not very impressed. I agreed with him on the whole tritium in the water thing and he still managed to turn me off on the topic with his coverage of it. And apparently, comcast going digital is some kind of conspiracy, as is the fact they (gasp) compress their digitial signals. Now another long pointless musical interlude, presumably where the radio station would cut to commercials...

This reminds me why I've lost interest in Countdown. Olbermann being factual, angry, even filled with righteous indignation, I'm fine with. But smug sarcasm is something you have to do exceptionally well in order to hold my interest. (The Daily Show does it exceptionally well. Rachel Maddow mostly avoids it.)

Le Show is constructed entirely from smug sarcasm, and it's not exceptionally good smug sarcasm. Eh, listen to the rest of the episode, but probably won't download more...

February 13, 2010

Last night I wrote up patches for both of the main approaches to fixing the qemu -hda=/dev/hdc thing, and posted the whole rock and a hard place mess to the qemu list. Naturally, they argued with it. *shrug*

Somebody named "Silvan" emailed to pont me at tmux which is apparently OpenBSD's answer to screen. It doesn't seem to be in the Ubuntu repository, so I can't test it out without building it from source first. Hmmm. (Seems more like a text mode VNC, from the docs...)

Working on a patch to make qemu-ppc run static binaries. The register layout uClibc expects is what the kernel is actually doing, not what the old IBM powerpc spec (for AIX) documents. The dynamic linker accepts both, and converts the kernel version to the AIX version, but static uClibc binaries don't have the conversion step (because they're lightweight and that's never what the kernel actually does)..

Unfortunately, what the kernel does is give you one register pointing to the start of a structure in memory. So it's not just a question of shuffling registers around, I have to create the in-memory structure with the right things in the right order. (argc, argv[], NULL, environment[], NULL, "auxiliary vector" whatever that is). QEMU is currently doing bits of this, but not all of it...

Let's see, in linux-user/linuxload.c, loader_build_argptr() builds a structure looking like:

  unsigned long argc
  unsigned long argv[]
  unsigned long NULL
  unsigned long envp[]
  unsigned long NULL

Which is what the previus link says the kernel is actually feeding into powerpc binaries as the blob the stack pointer points to. (Apparently, it's what Linux is _always_ feeding into binaries, not just on ppc.)

Ok, loader_build_argptr() is called from linux-user/elfload.c at the end of create_elf_tables(), which returns sp. That's called from load_elf_binary() and the sp gets saved int bprm->p and info->start_stack.

In linux-user/elfload.c, the last third of the TARGET_PPC version of init_thread() (line 530 or so) is doing the register reshuffling. And it's doing so right after this comment:

/* Note that isn't exactly what regular kernel does
 * but this is what the ABI wants and is needed to allow
 * execution of PPC BSD programs. */

I.E. they know it's not what linux is doing, they're doing it to run BSD binaries. So let's throw an #ifdef CONFIG_BSD around that...

And now the static binaries are running! And having a null pointer dereference on exit. (Why does that sound familiar...)

Is the static hello world segfaulting on exit under current system-image-powerpc? Hmmm... And it won't boot because of the hda/hdc thing. Argh. Ok, throw the kernel patch in there. (If they fix qemu to attach four drives to one controller the kernel change shouldn't hurt things, it's initializatin order when straddling controllers that's the funky bit.)

No, it isn't. So why is it doing it here...

Huh, a "make clean && make" made it go away. Creepy. (Yeah, my fault for not thinking of that earlier. Never trust makefile dependencies, I should know better by now...)

So, gdb 6.6 looks like the last GPLv2 version. Possibly I should teach the FWL host-tools to build that. The gdb you install from most distro repos is only built to understand x86 instructions...

February 12, 2010

Judgement call time.

The system emulated by "qemu-system-ppc -M g3beige" has two different IDE controllers, driven by two different drivers in linux-kernel. There's an apple-specific "heathrow" controller (CONFIG_BLK_DEV_IDE_PMAC), and a generic IDE controller based on the cmd646 chipset (CONFIG_BLK_DEV_CMD64X). When the drivers are statically linked into the kernel, the cmd64x driver gets initialized (and its devices probed) before the ide-pmac driver for the heathrow. This means that the devices attached to the cmd646 show up as hda and hdb, and the devices attached to the heathrow show up as hdc and hdd.

Unfortunately, the -hda and -hdb options to the qemu command line attach drives to the heathrow, and -hdc and -hdd attach drives to the cmd646. So the device I specify as -hda shows up as /dev/hdc.

This screws up FWL, which expects /dev/hda to be the root filesystem, /dev/hdb to be the writable space mounted on /home, and /dev/hdc to be optional automation logic mounted on /mnt which takes over from the boot script if /mnt/init exists (instead of launching an interactive command shell at the end of the boot script).

Note that neither CONFIG_BLK_DEV_IDE_PMAC_ATA100FIRST nor CONFIG_IDEPCI_PCIBUS_ORDER make any difference. The problem is these are two separate drivers, and one driver reliably gets initialized before the other. The first driver makes its devices /dev/hda and /dev/hdb, the second makes its devices /dev/hdc and /dev/hdd.

I can fix this in several ways:

Bundle multiple images into a single drive (using partitions), so I can do everything with one drive. (If there _is_ no third drive then only one controller gets initialized and whatever I specify winds up as hda. Yeah, ew, but it's probably why this hasn't come up much for the qemu guys.)

Except the different images serve different purposes, and having 'em orthogonal is a good thing. You should be able to drive different automation scripts from the same development environment, and the hda and hdc images are both read only while hdb is read/write.
Complicate my sources/targets/*/settings files to specify which drive is which, so run-emulator.sh can specify -hdc for /dev/hda and so on. This is the only target I would have to do that for, and it's really ugly.
Patch qemu so that the -hda command line option sets the controller Linux probes first. It's fairly arbitrary which disk gets initialized by each command line argument, and qemu is not matching up what Linux is doing.

The downside is that the qemu guys think the controller providing /dev/hdc and /dev/hdd should be the primary controller for efficiency reasons. They want your root filesystem to be on what Linux is interpreting as /dev/hdc, and thus they intentionally made -hda initialize that one. So my patch wouldn't go upstream.
Patch the Linux kernel to initialize its drivers in a different order.

Except those guys seem to think you should use udev for everything, and that device initialization order is potluck and unreliable (there have been lots of flamewars over this, the IBM mainframe guys were driving it and they had the money to hire several prominent developers to push their position). So chances don't look good of pushing that patch upstream either. Plus, there's nothing actually _wrong_ with the order they're using, and it's got historical precedent on its side. There's a decent argument that it's really qemu getting it wrong here, not the kernel.

One point I _could_ make to the kernel guys is that the kernel's CONFIG_IDEPCI_BUS_ORDER should be smarter. But that's hard to implement, because it has to let all the IDE drivers probe before assigning any of the devices it discovers. That's an intrusive patch, which would take me a long time to come up with, and probably still wouldn't go upstream because the feature in question is deprecated. (See "udev udev uber alles", above.)

I suppose I can at least try to get a qemu patch in. They'll argue that -hda and -hdb are initializing the IDE controller in the first PCI bus slot, and -hdc and -hdd are initializing the one int he second slot...

Grrr. Fiddly problem. Giant jurisdictional dispute.

February 10, 2010

I continue to wonder how gnu screen ever builds for anybody. (Is there another screen implementation I should be looking at that's not contaminated by the FSF?)

It has a "sched.h" file in the source. It does "-I." (which had to be removed not just from Makefile and Makefile.in, but also from osdef.h and comm.sh before it stopped happening, and also "-I$(srcdir)" had to become -iquote "$(srcdir)" except they didn't quote it so I dunno if it would work given a path with spaces in it).

What -I does is adds a file to the _system_ search path (I.E. for as well as "file.h" includes), and adds it before anything else. Meaning their sched.h was getting #included from /usr/include/pthread.h (replacing the /usr/include/sched.h the system was trying to #include) and the build was dying because A) it was replacing a system header file that actually did something, B) the new file being randomly #included expected struct timespec to be #defined, which only happens if you've already #included .

Did I mention that the FSF are the people who claim to have written gcc, the compiler that implements this -I? They're being screwed up by how their OWN SOFTWARE is DOCUMENTED TO BEHAVE. (What they presumably meant to do was "-iquote ." except THAT'S THE DEFAULT, so that makes no sense either.)

The actual fix, as with just about everything else the FSF has ever touched, is to REMOVE STUFF THEY SHOULDN'T BE DOING. Not replace it with anything, just chop it out and throw it away.

Oh, and it doesn't have a make distclean. (There was one, but it's commented out.) Instead it has "make realclean". Sigh.

February 9, 2010

So I'm not moving to Ireland. Finally got a decision from Google, seven months after applying to them, and they're not hiring me. (For reasons they didn't care to elaborate upon.)

I don't mind they said no, and I'm happy to get closure. But inviting me to apply again in a few months? Dude, that was just creepy. I did _not_ need the corporate equivalent of the "we can still be friends" speech.

I might write up the whole experience later. Right now I'm just happy it's over. (So is Fade, who can now start properly working out her proposed doctorate course schedule at UT. Go Fade!)

Meanwhile, I got powerpc to use a third drive, and thus do the automated build of dropbear and strace. I had to update to the git version of qemu because the most recent release was segfaulting halfway through the kernel's initialization. (No, not panicing, the emulator itself exited with a segfault.) And then once _that_ was working, the out of memory killer triggered halfway through the build. Although this one looks like it might actually be a legitimate out of memory situation. (It built with 512 megs, but not with 256. Yes, it's a 32-bit platform, and I know I've previously made those build with 128 megs. But something about powerpc seems hideously inefficient from a memory standpoint. Looking into why is on my TODO list.)

Debugging this is unpleasant enough that I'm experimenting with adding screen (which sadly requires curses) and maybe gdb to the automated build. Then I can run top during the compile and see which programs are using how much memory. (I could also do this with dropbear if I set up qemu to dial in, but Mark's gotten me a bit addicted to screen and I've been meaning to get gdb built for ages.)

I have no idea why gnu screen even bothers to have a ./configure stage. I had to put #ifdef guards around sched.h to prevent double #inclusion to get even the first file to build, and then a couple dozen files later it #included sys/stropts.h which is (drumroll please) streams support. Yup, streams, the abomination from AT&T that Linus refused to support back in the mid-90's. No, uClibc doesn't provide it, at least not in the configuration I'm using. The FSF continues to be unable to write software, film at 11.

In a way, my entire FWL project exists because "./configure; make; make install" is seldom that simple. If it was, I'd have been done in a weekend...

February 7, 2010

So, the dropbear and strace builds are working for all the arms (except armv4eb), i586 and i686, mips and mipsel, and x86_64. That means they're not working for armv4eb, m68k, powerpc/ppc440, sh4, and sparc.

Proper (non-coldfire) support for m68k still isn't in qemu, and I'm not convinced sh4 is even a real platform anymore. That leaves arm big endian, powerpc, and sparc.

The powerpc problem is that the qemu emulator I'm using only supports 2 drives, not 3, so when I feed in an hdc image it gets ignored. (Hmmm, actually the device tree is saying there's a third drive, but it's the same as the second drive. Under "/proc/device-tree/aliases", ide0, ide1, and ide2 are:

/pci/pci-ata/ata-1/cdrom
/pci/mac-io/ata-3/disk
/pci/mac-io/ata-3/disk

Huh. It seems to think the odd one out is a CDROM hanging off of a PCI bus? Do I have all the right drivers enabled? I think so, but it's not binding...

February 4, 2010

The trick to getting Hulu working is to clear all cookies in the browser every time it starts complaining. (Doesn't fix things like dailyshow.com which simply _never_ work, but oh well.)

Now that The Daily Show is in HD, it takes up 1/4 of my laptop screen. Makes it a touch harder to watch it in the backround while programming...

I need to fix the download.sh infrastructure in FWL, and I'm not sure how. It's a design problem.

Right now, there is a single download.sh script. That lists everything that gets downloaded, and it all goes into a single packages directory. The reason for this is so it can delete stale files out of that directory, because it gets a complete list of every package that should be updated and when it's done it can delete any file it hasn't seen yet. So any file that _isn't_ listed gets deleted.

This is why it's fiddly to have multiple download scripts, because either any files not listed in the current script get deleted, or else it leaks stuff as development goes on and you have to clean out the directory by hand.

This comes in when sources/more/native-static-build.sh needs to build dropbear and strace, two packages not used by the main build. They have to be downloaded somehow, and re-using the existing download infrastructure makes sense. But right now, those packages are in download.sh along with the stuff from the main build, due to the deletion stuff.

What this really means is you can't have multiple download scripts downloading into the same directory. I can have each one populate its own directory, but there's a clutter question. If I put it in build it gets blanked and re-downloaded, that stuff is transient. Having lots of top level directories is clutter. Moving the current "packages" to "packages/base" or something is a bit disruptive to people using the current workflow...

I suppose I could add a packages/native and then just have the cleanup logic look at files only (not directories). Hmmm, and it currently does.

Cool. Ok, a second script populating packages/native. That works.

Second problem: MIRROR_LIST is in download.sh. Hmmm... I guess that goes in sources/include.sh.

Next: the MANIFEST file is more or less hand rolled. (The list of packages is hardwired into do_readme, it vaguely seems generic at its call site but just plain isn't.) That one, I'm going to completely ignore right now.

Huh. Ok, clearing cookies, killall npviewer npviewer.bin, and the "clear private data" thing apparently aren't always enough to snap hulu out of whatever it's "this video is not currently available, and we will give you no HINT of a reason why" funk. Which is sad.

I'm in the process of ripping the native build stage of buildall.sh a new one...

February 3, 2010

I got a FWL release out yesterday, details on the FWL news page. Yeah, regressions. Fixing 'em.

I have been invited to speak at Flourish 2010 in Chicago, March 19 and 20th. (I feel I should track down a conference called "Blotts" and submit a paper to them, on general principles.)

I'm also speaking at CELF ELC, April 12-14 in San Francisco.

February 2, 2010

Fade got me an energy drink for my birthday. I am now about 2/3 of the way done with putting out a release (which is test building). These two facts are related.

The test build discovered that armv6 broke. (The armv5 target works, but armv6 breaks in uClibc with "libc/string/ffs.c:27:2: error: #error ffs needs rewriting!")

You know, I'm not holding up the release, I'm just LISTING REGRESSIONS IN THE RELEASE NOTES and I'll fix it up for next time.

The Drafthouse was showing Groundhog's Day today, which I was seriously tempted to go to even though A) I bought the DVD a couple months ago, B) I've watched it ~3 times in the past month.

The consulting "feast or famine" thing continues strange. CELF proposals are coming in, I just bid on this one which I'm pretty sure I can do. Mark also put me in touch with a friend of his who has a line on a quick consulting trip to California, might last a month.

You know, if it wasn't for the job hunting I'd love the consulting lifestyle. If there was just some web page I could pick contracts off of like a menu... (I'd like two days of enhancing busybox, two days of tweaking a toolchain, on a couple weeks of board bringup, and for dessert a three month contract doing a new board support package.)

Alas, consulting is constant job hunting, hence the desire for a permanent job I can keep doing without any more job hunting.

February 1, 2010

Happy Birthday To Me. I am officially old.

For my birthday, I would like for either the ongoing flirtation with the Ireland job (the identity of which is not exactly a secret), or else for the interview I did last tuesday at IBM, to turn into real full-time stable employment.

What I _got_ is a gift certificate to Red Lobster from my grandparents (which was excellent, as always).

Oh, and Eric and Mark called this morning to let me know they'd been sent forms to fill out as references for me. In month 7 of this process, we've progressed from getting approval from committees by attending meetings, to having third parties fill out forms. Outstanding.

I suppose it's my fault that I stopped looking for other work when (I thought) they got serious at the end of October. As a consultant, if I'm not constantly looking for work I rapidly become underemployed, and for the past 3 months I've only taken ultra-short-term jobs that wouldn't interfere with my availability to move overseas on a week or two notice. (Actually I just handed off to Mark some of the work I _did_ line up, because I've just been too stressed out by months of "hurry up and wait" to give it the attention it deserves, and it makes a nice weekend project for him. My Firmware Linux project is a bit neglected too, although that requires less concentration from me than coming up to speed on something new, and poking listlessly at it is something I do to relieve stress.)

Meaning so far this month I've billed less than one full-time week's worth of work, and have more or less run through the "liquid cash" portion of my savings and will soon have to either decide what retirement assets to cash out or else Get a Real Job (tm). Despite that, I'm not currently sending out more resumes since the Ireland thing got itself unblocked again. (Yes, I'm a sap, but I really DO want to work there.) But if the IBM position comes through first I'm not turning down a bird in the hand to continue waiting for a job I applied for 7 months ago, no matter how nice that job would have been or how close it seems to be to finishing.

Speaking of close to finishing, it turns out I've been more stressed than I needed to be because I've apparently misunderstood the interview process all along. The deadline I was waiting for last month wasn't the yes/no on an actual hiring decision. (Where did I get that idea?) That was just whether or not I'd passed the "technical committee". Silly me, the next step after that was asking for references. (Because obviously you don't ask for references until two months _after_ you fly somebody in for an in-person interview.) I pointed them at the author of The Cathedral and the Bazaar, the manager of Ubuntu Mobile, the chair of Ohio LinuxFest, the other co-founder of Impact Linux, and the maintainer of uClibc++. All of whom I've worked with in a professional paid capacity, _and_ on open source projects or conferences.

Yeah yeah, almost there. Oncoming train at the end of the tunnel, better to light a flamethrower than to curse the darkness, time flies like an arrow and fruit flies like an apple, any fool can criticize and most do... But closure would have been a _marvelous_ birthday present.

Anyway, enough whining about that. I'm way overdue for an FWL release.

So I tracked down the sparc thing, and now I'm fiddling with the code to make the uClibc and linux config files show up in usr/src again. (They used to, that bit-rotted during the great refactoring.)

Almost ready to cut a release. Hopefully tomorrow.

By the way, last week's hospital trip was probably some kind of stomach flu. (Either that or I've managed to give myself food poisoning again twice since then. Yeah, that's helped with the whole "concentrating on technical projects" stuff too.) Feeling better today, at least.

January 31, 2010

I used to wonder where Mystery Science Theatre 3000 got all those truly horrible movies. After looking at the movies currently available to watch on hulu, I think they were going easy on us.

In fact, "The brain that wouldn't die" is one of Hulu's current offerings, and the hover-text for it _IS_ from the mst3k version (mentioning Tom Servo by name)... but the movie they have isn't. Right.

I boggle at The Suck. Four different "Benji" movies. The Captain America movie from 1990. They have movies called "Rabid Grannies", "Chicken Chronicles", "Love is a Gun", and "Violent Cop". They have the 1997 remake of McHale's Navy and the 1999 rename of The Mod Squad. They have Karate Kid III (but not I and II). They have both the 1962 and 1998 verions of "Carnival of Souls" (horror movie about an evil clown, apparently). "Mirror Wars: Reflection One" ("this film was a huge box office smash in Russia.") American Virgin (Starring Bob Hoskins, the gumshoe character from Who Framed Roger Rabbit) and My Five Wives (starring Rodney Dangerfield, with the cover photo showing his face covered with lipstick kisses)...

I was tempted to watch "Pure Luck" _again_, even though I watched it off Hulu a month or two ago. Watched the first few minutes of the Mel Brooks disaster "Life Stinks", but when the flash plugin crashed again (I switched window focus, that kills it about 20% of the time, yes Linux on the desktop continues to suck) there was absolutely no reason to restart it. Wandered off to Youtube to see what they've got...

Wow. Lots of the same stuff. (National Lampoon Presents Cattle Call, Bad Girls from Mars, Hercules in New York, Cheerleader Ninjas... They're _taunting_ me with the Adam Sandler vehicle "Going Overboard" which is not the Goldie Hawn/Kurt Russel "Overboard" which I would happily watch right now. Maybe I should break down and watch Shirley Temple's "The Little Princess".)

Of course Youtube's "shows" tab has some new movies hulu didn't, if you can call "Slave of the Cannibal God", "Torture Chamber of Dr. Sadism", or "Rescue from Gilligan's Island" new.

Maybe I should try getting netflix "watch instantly" to work with Ubuntu again... Oh, right.

I need to buy a mac.

January 30, 2010

So, back to poking at a release.

The sparc and m68k architectures aren't building. (They haven't gotten much testing because they don't work under qemu, but they used to at least _build_.)

Hmmm, sparc got broken by the uClibc upgrade. Which is a pain to bisect because so many of the intermediate versions don't build, and the patches you need to beat sense out of it change as you go along. Sigh.

January 26, 2010

So the official position of the maintainer of the SuperH port of Linux is that any toolchain older than October 16th (the binutils 2.20 release) is too old to build the Linux kernel. With the corollary that that in the 5 years since the PC went 64 bit, apparently nobody ever actually tried to build anything for SuperH.

I'm trying to figure out if I should bother to fix gas, revert the kernel commit that broke the build and call it good, or just drop sh4 support as "not a real platform".

The fun part is the bug they're freaking out about in gas only triggers when you feed it hand-coded assembly that asks the assembler to do something it can't do. The compiler produces assembly that doesn't hit this, so as far as I can tell you'll A) never see this from C code, B) have to choose to trigger this from hand coded assembly.

January 25, 2010

Poking at current -git for the linux kernel. Mips is broken. Chance to use bisectinate.sh.

Wanna hook up bisectinate.sh to the cron job. First question, should I hook it up to buildall.sh, or to cronjob.sh? The one that knows where the local copies of the various repositories are is cronjob.sh, but the one that's doing this kind of work is buildall.sh. Decisions, decisions...

There are a few design issues to automating this:

Parallelism: it doesn't currently bisect multiple architectures at once.

Doing a bisect involves extracting/patching each new repository tarball, and the extract step doesn't currently parallelize. (That's why download.sh --extract does them one at a time even with FORK=1.) This is because there's only one temp directory, and if there were multiple ones figuring out where the old ones were in the case of a crash, and when to clean 'em up, would be problematic.

Another problem bisecting multiple architectures in parallel is there's only one packages directory, and only one build/packages directory (for the extracted tarball cache). So a single packages/alt-$PACKAGE-0.tar.bz2 source location gets extracted into a single build/packages/alt-$PACKAGE location.

The first problem might be worth the effort to fix, because download.sh --extract running in parallel would be a good thing. The second is trickier, because I don't want to complicate the infrastructure. I don't want to completely redirect the packages and build/packages directory because that would duplicate all the other packages that didn't change, which is something like 1.2 gigabytes of random crud. I could create overlay directories, some kind of MORE_PACKAGES environment variable to have a different packages directory to read from, and a MORE_EXTRACTED_PACKAGES for an alternate build/packages that gets checked first by setupfor. It's just a question of whether or not the complexity is worth it.
Tracking the "last known good version" isn't trivial.

The bisectinate script takes a starting commit, which also identifies the branch we're on (which is relevant in the case of uClibc). I'm probably going to have to annotate download.sh with repository branch information, and update it by hand.

This is related to the fact that bisections often have multiple things wrong with them. (This is more or less the auto-revert issue from last time.) In theory I can have the thing find the first bad commit (even if it's since been fixed), add a reversion patch to sources/patches by hand, and update the download.sh last known good indicator by hand.

The problem with that is it generates a lot of repo commit noise in download.sh, commemorating things that probably won't be interesting for long. I could instead put this data in a different file, and even have that file be automatically updated by the bisection process (which finds "last known good" versions rather often). Initializing that's interesting, because there's no automated way to convert download version numbers into repository tags (they vary all over the place, and are sometimes simply absent such as uClibc's current 0_9_30 banch having nothing for 0.9.30.1 or 0.9.30.2). And the problem with having to initialize it by hand and then overwriting that value automatically is if something goes wrong, it loses data that then has to be replaced by hand, so it can't do further bisection until it gets manual intervention. At which point I say "rathole" and step back to try to think of something simpler.
Patch management's a pain.

Unstable versions often still need multiple patches (such as the linux kernel still needing the three-patch perl reversion, and the versatilepb patch), but which patches they need tend to be _different_ from the stable version, especially as bugfixes make it upstream. Ideally I'd like to push ALL my patches upstream, but the upstream projects still won't take 'em all. Try, try again...

So a lot of build breakage is actually just "patch no longer applies", so the first bad commit it finds can be purely a local issue, but still require manual intervention. (Wheee.)

Meanwhile, the mips problem bisected down to:

commit 1b93b3c3e94be2605759735a89fc935ba5f58dcf
Author: Wu Zhangjin 
Date:   Wed Oct 14 18:12:16 2009 +0800

    MIPS: Add support for GZIP / BZIP2 / LZMA compressed kernel images

Which means it's probably "new config symbol showed up" and I need to run oldconfig. I have a script for that already, migrate_kernel.sh, but that's not hooked up to the automated builds either. So that would be yet another bullet point above.

(On a side note, migrate_kernel.sh vs migrate-kernel.sh is one of those chronic issues with me. Yeah, I'm mostly standardized on dash instead of underline in FWL, but some things like x86_64 are forced by other contexts, and I occasionally forget.)

And Marc Andre Turner reminded me the project's overdue for a release. All of this mess can probably be deferred until after that. (And I need to finish converting the snapshots directory to Vladimir's new pretty version, right now it's on the floor in pieces.

January 23, 2010

Went to the hospital shortly after midnight, with dizziness, sweating, and chest pains. As with the last time this happened, it wasn't a cardiac thing, but better safe than sorry when it comes to chest pains. (And hey, I actually had health insurance this time. Dunno if I still will once they see that I actually went to the hospital, but oh well.)

Last time it was a sinus infection causing swelling that pressed on a nerve. (Reeeeeeeally annoying.) It might be something like that again. (I note the cedar pollen season's starting up again, count's up around 300 or so already).

But it could also have been food poisoning, which it started presenting as _after_ I went to the hospital and sat in the little room hooked up to all the electrodes for an hour and had a chest x-ray. (This means mild food poisoning can be more annoying than full blown food poisoning, because it's harder to diagnose. Sigh.)

All this lasted until around 4am, so I'm on a night schedule again. May need an energy drink for my in-person job interview tuesday afternoon.

January 22, 2010

Went to Mr. Notebook and got a new power supply for my laptop. (They had to check five of them before they found one that works, apparently Dell power supplies are flaky in general.) This one's 90 watts instead of 65, so it charges the battery reasonably fast even when I'm using the sucker.

Heard back from a 1-year AIX contract I applied for. Phone interview today, in-person interview on Tuesday. (Haven't given up on the Ireland job, I'm just not waiting around for it either.)

January 21, 2010

Why I hate git, part gazillion:

I want to grab the description for a commit. I know the "git show COMMIT" command which appends a patch to the commit, but in this instance I don't want the patch. If I say "git log COMMIT" it doesn't show that specific commit, it shows a bunch of commits.

So I've already found two ways to get the data, but I have to chop out the subset of it I want with "sed". It's non-orthogonal crap that goes out of its way to be hard to use in a script.

I looked at the "git help show" (recycled man page), and it has the --pretty option, but none of those prevent it from appending a patch. I went through eight screens full of the "git help log" without finding anything useful there. From previous expeditions I know I can do "git log COMMIT^1..COMMIT", which is ugly but works.

I think my real problem with git is it's not a unix program. You have to go out of your way to beat _simple_ functionality out of it, by writing shell scripts to drive it and by screen-scraping the overcomplicated crap it spits out. Alas, that's what the kernel went with, and uClibc and busybox both did the mindless "me too" thing. Sigh.

There's probably some magic "do this" command line option...

And hey, may laptop's power supply died. This is the third one for this laptop. It still provides _power_ just fine, but Dell's patented "YOU'RE PIRATING POWER SUPPLIES!!!!11!1!" circuitry is refusing to identify itself, meaning it won't charge the battery anymore, and the thing has clocked down to 800 mhz.

I remember my toshiba. The power cord it used was the same power cord the radio used, and an electric shaver I had, and just about every other device out there. It did not have any circuitry in it.

Note that it refuses to even charge the battery when the laptop is _off_. If it's worried about potentially not being able to draw enough power from the thing, why won't it charge when the laptop is POWERED DOWN? Only answer: Dell's being greedy and hoping to charge extra for its power adapters. (And prevent you from using third party ones.)

I'll have to see if Mr. Notebook has anything in the morning. They closed at 6...

Of course Ubuntu's taking this as an opportunity to act _extra_ nuts. (It's convinced that one of the processors is at 1.7 mhz and the other is at 800 mhz. And the iwl3945 network card had firmware errors and aborted after thirty seconds _twice_, so I'm using my cell phone internet for the moment. If there was a way to just smack the Dell bios and go NO REALLY, IT'S A PERFECTLY USABLE AC ADAPTER YOU'VE BEEN USING FOR A YEAR NOW... Sigh.

January 20, 2010

So I wrote a git bisect script that rebuilds a given FWL target and runs the native build under it, and a bisect "bad" is when it doesn't successfully produce a dropbear binary from the native build. (This catches everything from patches failing to apply during the source tarball extract all the way through something subtle going wrong with the native compiler.) Hooking this up to the nightly cron job build should make that much easier to keep up with.

Right now I'm trying to figure out two things:

1) Is keeping the "git show" and log from the last "bad" build always going to produce the "first bad commit" the bisect ends with? (Can they ever be out of order so that the one bisect determines is _not_ the last build we did that failed?) Logging becomes much easier if this is the case, and I _think_ it is because this is a weird sort of binary search, but I'm not actually certain...
2) During the build I'm often reverting the "first bad commit" I find and then re-running the bisect from that point to find the _real_ problem. (Especially for messes like the uClibc repository or linux's -rc1 cluster-merge, the condition of the repository is crappy enough that any given bisect could hit five different problems along the way.) Should I teach the bisect script to do this (and just accumulate a stack of reverts) or should I just have it find the first bad one and wait for human intervention to figure out what to do about it? The downside is the reverted patches may be needed by later patches, so the revert stack can get huge (and can contain the patch that introduced the actual bug of interest). The script can't tell what bugs are of interest, of course. It can just fine the first thing that causes breakage, then the next thing that causes breakage, and so on...

January 19, 2010

Video interview with the guy in Zurich today, for the Ireland job. (I was 15 minutes late, couldn't find the building. Sigh. As the song goes, "Que sadilla, sadilla. It's mexican food, you see. Tortillas and refried beans, all covered with cheese...")

Since the the six month anniversary of my application for the Ireland job was last week, I started sending out fresh resumes to positions that look interesting. (Mark pointed me at indeed.com, which seems to work a bit better than Dice. I still feel weird about applying for positions advertised on linkedin, since I don't use that for anything. I only have an account there people keep trying to link to me. (I have sent out exactly one link request, which I believe was ignored. And I felt dirty afterwards.)

One recruiter already got back to me about an AIX job. (Well, not the one advertised but a different one that's still within operational parameters, as it were. Working on a different Unix variant might make a nice change, and I'd rather be paid to work on honestly propreitary code than GPLv3.) Local position, easy commute, Fade could stay in ACC through her prerequisites for her doctorate and we wouldn't have to try to get this condo on the market before spring break this time around.

At this point, it's "whoever comes through first with something acceptable". I'm tired of waiting.

January 18, 2010

And uClibc 0.9.30.2 dropped! Yay, the project isn't entirely dead. (I prefer to see this as a Princess Bride "mostly dead" situation rather than a Monty Python "He says he's not dead" situation.)

It broke Mips, of course. Builds fine, but booting dies trying to launch init. The static hello world works. I should write the automatic git bisect stuff to track this down...

Catching up on Doctor Who episodes. Silence in the Library through Midnight. For some reason, this entire DVD seems to consist of "episodes that were running short so we needed to insert an extra 15 seconds of repetition, reaction shots, or random atmospheric stuff", over and over. Fade says it's intentional. *shrug*

January 17, 2010

So I'm reading The MPEG "System" layer standard (ISO 13818-1). The System layer is essentially an archive format that bundles together "elementary streams". An elementary stream contains audio, video, or timing information.

Elementary stream data is stored in "PES packets", and a Packetized Elementary Stream is a trivial system layer that contains a single elementary stream (with some minimal timestamp and rate data in the Elementary Stream Clock Reference (ESCR) and Elementary Stream Rate (ES_Rate) fields of the PES packet headers).

The two main MPEG System layer archive formats are Transport Streams (TS) and Program Streams (PS), both of which bundle together audio and video into a single file, and keep them synchronized. These also contain PES packets, but they can contain several different groups of them bundled together representing multiple elementary streams. (Or they can contain just one, you can use a TS or PS stream to store an MP3 if you want to.)

Transport Streams are more seekable. They're designed to resume from an arbitrary point in the stream without having seen what came before, which lets you recover from lost data by skipping the damaged bits, and also lets you change channels on a television and start playing at an arbitrary point without having seen the firt half of the program. The format is implemented as a series of small constant-size packets (generally 188 bytes), each of which starts with the same byte (0x47). It's used for HDTV and video camera output and so on.

Program Streams offer better compression that Transport Streams, but you need access to the whole file in order to interpret it. This is generally what you find on DVDs and such, and it's common in downloadable video because the files are smaller.

Other complicationos are that each Transport stream can contain multiple "programs", meaning not only are you grabbing and synchronizing a video stream with an audio stream out of the passing data, but there could be more than one to choose from. Program Streams are designed for just one program each, although you can have different audio tracks to choose from. And in both types of system streams you can have other elementary streams than just audio and video. (Subtitles, for example.) You can convert a program between TS and PS, it looks a bit like converting between tar and zip.

What I'm trying to do is teach VLC to chop transport streams into different files every X seconds. The hard part is actually digging through all the weird VLC layering. (Note: VLC violates its own layering left and right, this design is kind of ugly. Plus it calls extensively to external libraries so you have to dig through three or four different projects to try to follow the control flow.) In theory, specifying the output (display, write it to a file, send it to a network) is orthogonal to what type of data is being generated. In practice, this is not the case, except when it is.

Tempted to just write a dumb little daemon to accept the network output and have _that_ write to the files, parsing the stream just enough to do the chopping...

January 16, 2010

If you log into justin.tv's chat thing with your twitter account, your chat things are limited to 61 bytes. The reason for this is every chat comment you make gets posted to your twitter account with an enormous advertising tag. Yeah. Revoked its access, deleted the spurious posts, apologized for the noise.

Getting that login to work involved three attempts, and in the middle of that I had to delete cookies for that site because the site was just coming up as blank for me, not even any "view source" content. (But it worked fine via wget.)

I thought "hey, maybe that'll get The Daily Show displaying again", but alas: no. It still displays an ad, and then hangs waiting for 174.129.6.226. (Leave it there for half an hour, nothing will ever display.) I notice that all these websites that have problems with this (Hulu, blip.tv, etc) work just FINE to display the ads, and then hang before displaying their actual content. It's not displaying video that has a problem, it's confirming I'm not DOWNLOADING the video. Alas, this means that untangling the guts of what flash is doing and trying to download the video myself is the only way to view it. Kind of self-defeating, eh?)

Alas, it wasn't all these websites that changed. It was letting the ubuntu update manager apply its updates. It updated firefox, and now it doesn't work. And I don't have the option to downgrade it back down to the one that did work. And it's been weeks without a fix. (Last Daily Show episode I managed to watch was December 15th.) Linux: smell the usability.

Up most of the night playing Sims 3 on Fade's computer. My little ghost girl is stuck in the floating mode where they're on their knees hovering a little above the ground whenever they go anywhere, meaning they move VERY VERY SLOWLY. (Unless they get on a bike or something, in which case they move at the same speed as anyone else.) Tried several things to fix it, to no avail. Not sure how to Google for it because I dunno what keywords to use. Wound up sending her back to the netherworld, hopefully reviving her again will fix it. (If not, hit the darn sim with ambrosia, although since she was born a ghost this would just be "surrect" instead of resurrect. I think.)

Oh, and I was watching The Daily Show in another window, because it plays just _fine_ on Fade's Mac, but plays only a commercial and then hangs before displaying actual program on Ubuntu. Same website, same network connect: Linux's flash support got subtly broken by one of the Ubuntu upgrades. Again.

(Yes, I spent the night fiddling with a game I broke while taking advantage of an environment in which a different bug wasn't manifesting. This is my experience with computers, always.)

January 13, 2010

What is it with me needing to address schedule slippage with cake?

I haven't uploaded my blog up to the website recently because of a looming job interview stressing me out, and thus being all I really wanted to blog about. I didn't want to comment on it while it was in progress. (This also means my project repositories haven't been synced up to the website, since it's the same rsync script. This blog is a file I edit in vi, with an rss feed generated via python. So I've been writing stuff in it, I just haven't been sending it anywhere.)

But today is the sixth month anniversary of me applying for the job, which I went through three phone interviews for before being flown to a full day in-person interview, and after an entire month of repeatedly rescheduling the decision resulting from that... I now have a new interview with somebody else on tuesday, this one via teleconference. Wheee.

Since I can no longer use the word "progress" to describe what this process is in (not with a straight face, anyway), I might as well get on with my life and rsync the past week up to the server. (January 6 was the third deadline for a decision since the in-person interview. I was silly enough to think it _meant_ something and was holding off posting this until I had _something_to_say_, one way or the other. Ah, the naievete of youth...)

The cake comes in because Fade is going back to college for her doctorate (well, at the moment taking prerequisite courses for the program at the local community college), and this looming schedule uncertainty has now caused her to miss her class registration deadline, because she put off registering until she knew whether or not we were going to move soon. We still don't, but today was the last day, and she forgot and missed the deadline by about an hour. That led to this, which of course led to me baking a cake. (I mean, we had the mix and the pan was clean and everything. I did have to wash a bowl.)

The thing is, I sent two cakes to Erik Andersen to unblock uClibc releases. (I also once arranged for a delivery of Mangoes to Neil Gaiman's assistant Lorraine for reasons that require some context to explain, but that's another story.) Going by historical precedent, what I might need to do is send a cake to the guys in Ireland.

In any case, there comes a point where I have to get on with my life, or at least run the darn rsync script.

January 12, 2010

Flying back to Austin. Loving the new laptop battery. Recharging (even though there's over 2/3 of the battery left) in Vegas. Trying to figure out why the strace build is breaking in the current system images, but worked fine in the last release.

One fun trick you can do if you have files that differ in two contexts that make diffing them problematic (such as two qemu instances) is "head -n filename 80 | sha1sum" on both files, and then vary the number until you find the first changed line via binary search. In this case, it's that linux/netlink.h isn't being detected via ./configure. (Insert standard autoconf-is-horrible rant here.)

The error autoconf is dying on is:

include/linux/netlink.h:35: error: expected specifier-qualifier-list before 'sa_family_t'

The problem turns out to be the linux kernel changed from 2.6.31 to 2.6.32 and that included git 9c501935a3cd. In the one that was working, linux/netlink.h includes linux/socket.h which defines sa_family_t. In the one that isn't working, linux/socket.h is largely an empty shell that had the relevant chunk removed by #unifdef because Ben Hutchings thought it was only still used by libc5.

Sigh. Email the kernel guys to ask why, patch strace to fix the immediate problem... If I wasn't on a plane right now I'd check to see if there was a current strace development project to send patches upstream to. (Last I checked, it was moribund.)

January 10, 2010

Still in Idaho, visiting relatives.

Belated christmas present opening yesterday. I got a new laptop battery! Woot!

Banging on FWL a bit. Alpha has the same missing inhibit_libc guards that sh4 did, and when you fix those the uClibc build dies with:

libm/e_exp.c: In function '__ieee754_exp':
libm/e_exp.c:144: internal compiler error: in emit_move_insn, at expr.c:3275

Isn't that special?

Also, the m68k and sparc builds were both broken by my futimes patch, because the last ever release of uClibc didn't include the INLINE_SYSCALL stuff that the other architectures I was actually testing did. (Neither of those worked properly under qemu, but they both _built_. The uneven nature of uClibc's support for them caused a build break on those architectures.) My resulting patch to fix this is AMAZINGLY ugly, but A) I'm fixing something that got fixed in the -devel branch months ago, B) what's the point of pushing things upstream if there's never going to be another uClibc release?

Since uClibc's so obviously moribund, I'm vaguely pondering poking at Android's "bionic" project instead, and maybe even their "toolbox" project. The first is their libc (uClibc replacement) and the second is their coreutils (busybox replacement). They wrote their own because they decided they want all the userspace stuff to be BSD licensed, no GPL in userspace. Before GPLv3 came out I would have objected to this, but now it seems kind of reasonable. (If the FSF is really _that_crazy_, then you don't want to get any of it on you.)

Yeah, traditionally BSD-licensed projects fork themselves to death. And I'm not even talking about free/net/open/dragonfly, I'm talking about developers being hired to work on propreitary forks. When BSD first worked up enough momentum to become interesting Sun hired away developers like Bill Joy to work on a proprietary fork in 1982, over the next decade it recovered enough for BSDI to hire the next generation of developers away to work on a different proprietary fork a decade later, and once it recovered from _that_ Apple hired Jordan Hubbard and friends away to work on MacOS X.

p>But Android is a corporate owned project like Darwin, Star Office, Mysql, QT, and so on. (And like Mozilla was before Netscape/AOL lost interest, and the whole first few years of the Fedora project before people stopped pretending it was anything other than Red Hat Enterprise Rawhide.) Corporate owned projects seldom get enough external contributions to be in much danger of forking, and never will. Hobbyists just aren't all that interested in something they can't affect the direction of, so just about all the project's developers will continue to work for the company that owns the copyright on the code. (Maybe it's only 1/10th as active as a real open source project, but that still puts it ahead of dying efforts like uClibc.)

Unfortunately, neither of Android's sub-projects have a mailing list. Instead they have a "google group", which (as I've mentioned before) is not a substitute for a mailing list. When I go to that Google Group, it starts with "Read the FAQs", which links to "What is Android", which says:

Android is a software stack for mobile devices that includes an operating system, middleware and key applications. The Android SDK provides the tools and APIs necessary to begin developing applications on the Android platform using the Java programming language.

That in a nutshell is why I've never had any real interest in Android. Java stopped being interesting as anything other than Cobol's successor back around 1999. It lost "the language of web browsers" to javascript (and then that lost about half of its share to flash's activescript). The famous "212% growth" of Linux in 1998 was all the Java developers LEAVING JAVA, for a platform that had no real java support (and thanks to Sun screwing over blackdown didn't for the next 5 years). The vast majority of us are not going back.

Come on guys, _java_? It's what replaced Cobol during Y2K when all the old stuff from the 1960's was being forcibly rewritten: it was the flavor of the month now graven in stone to serve the pointy-haired on their steam-powered mainframes, and outside of that niche it has all the appeal of a Pat Boone concert.

Aside from that, every platform in the world "also does java". It's a programming language. Crippling your platform so it _can't_ run stuff written in a language like that would be the hard part. Saying "the way to write code on this platform is to do it in perl" (or in python, or in ruby, or any other specific language other than C, which all those others are written _in_) means "this is not a general purpose programming environment", and we _have_ on of those already, it's called the web browser and it uses a totally unrelated language called javascript (which used to be called iscript but was renamed during the height of Java's flash-in-the-pan celebrity).

The fact I'm hearing people talk about bionic and toolbox now means the embedded community is finally starting to pay attention to the ability to program these devices in _C_, but they don't make it easy finding out how.

Alright, bite the bullet, just skip their pathological web pages, ignore the lack of mailing list archive to browse, and google for "android sdk download", which gets me to here which lists a linux tgz, right click save as and... it's trying to download an html file? Darn it, does it have one of those stupid broken sourceforge mirroring setups?

Nope, worse: it's a license agreement you have to agree to before they let you download the SDK.

Sigh. This is _so_ not an open source development project. (The result may be open source, but the development project is considerably _less_ friendly to hobbyists than the "Microsoft Developer Network".)

And numerous sections in the license mean "this SDK ain't open source" either:

3.3 Except to the extent required by applicable third party licenses, you may not copy (except for backup purposes), modify, adapt, redistribute, decompile, reverse engineer, disassemble, or create derivative works of the SDK or any part of the SDK. Except to the extent required by applicable third party licenses, you may not load any part of the SDK onto a mobile handset or any other hardware device except a personal computer, combine any part of the SDK with other software, or distribute any software or device incorporating a part of the SDK.

See also 4.2, 4.3, 4.4... about half the license really.

The other half of the license is Android treating us like infants or perhaps merely morons:

4.5 You agree that you are solely responsible for (and that Google has no responsibility to you or to any third party for) any data, content, or resources that you create, transmit or display through the Android platform and/or applications for the Android platform, and for the consequences of your actions (including any loss or damage which Google may suffer) by doing so.

Guys? Could you inform your lawyers that even the GPL is _not_ based on contract law, and you went OUT OF YOUR WAY to select BSD licensing instead? You insist on X/MIT-style licensing for your userspace and then you pull the "you must be 18 years old to press this button but obviously we don't think you really are" crap. Having a click-through license AT ALL is unnecessary, but what you put IN it is just _stupid_. (Also, could you inform your lawyers that if you're not recording WHO agreed to the darn contract then there is no privity of contract and thus NO CONTRACT UNDER CONTRACT LAW? Last I checked you couldn't legally HAVE a contract if you DON'T KNOW WHO IT'S WITH.)

A quick search for git repositories for bionic finds here and here in the first screen of hits, and I don't have to agree to anything to download those. Did the people who did that violate the "contract" terms in putting the code up online?

Alas, I've lost interest in dealing with android again. So far I've seen more input from lawyers and marketers than from programmers on this thing, and apparently I'm not the only one. I've been paid to work on far worse, but why would I spend my _hobby_ time messing with this clueless morass of pointy hair, java, and confused legal boilerplate? I wonder if they actively don't want to be bothered by developers like me, or if it simply never occurred to them that hobbyists might have something to contribute? (Obviously the only worthwhile open source comes from places like IBM. Nobody does this in their spare time, or at least nobody does anything _interesting_. That's the vibe this project exudes with a flamethrower. I wonder if it's intentional?)

January 8, 2010

So I heard back from the Ireland job recruiter again this morning:

Hi Rob,

These discussions are still ongoing. There are differing thoughts on the
interview feedback and if/where we could position your profile.

I should have more early next week once these discussions have taken place.
Apologies for the continued delay but you're still in the mix...

I have no idea how to respond to that. (I have a profile in need of positioning? There's a mix?) Fade's sister Lisa works in hollywood, and I asked her if she could translate that for me, but it still boils down to the fact that hiring me is still an "if", and a decision has not been made.

I'm starting to desire closure more than anything else out of this process. I've lost count of how many times the actual decision part has been postponed, and "at least it's not a no" is starting to lose its appeal as a mantra.

Oh well. One more set of wednesdays...

January 7, 2010

Flying to Boise today, to visit Fade's parents (who moved there from California a few years ago) and siblings (who are converging on their parents' house).

Fade got a spare key made and I dropped it in the mail to Mark so he can feed the kiggies. (And we left them with five bowls of food and two bowls of water, so even if they don't get fed again before Tuesday they should be ok.)

The first leg of the flight was the most unpleasant experience I've had on Southwest in as long as I can remember. Every seat full: check. Crying toddler in his father's lap in the seat to my left: check. Guy who somehow managed to get on the wrong plane (thinking this one's going to El Paso) so we have to taxi back off the runway and drop him off at the terminal: check.

Still, it wound up way ahead of either of my recent experiences with Alaska Airlines. We made our flight and arrived in time to make our connecting flight (we would have just made it even if said connecting flight hadn't also been delayed an hour). The flight attendants gave me four snacks and three drink refills and attempted to placate the baby with cheez-its (which as far as I can tell were laced with horse tranquilizers), and when we arrived sang the plane a song apologizing for arriving late.

In Idaho now. Have yet to see any potatoes. Tired.

Still no word on the Ireland job.

January 6, 2010

Stressed. Today is apparently the day I learn whether or not I get a really cool job that would involve moving to Ireland.

It's been a six month interview process. I applied back in July, heard back in October, and got two phone interviews in november. That apparently convinced the recruiter I was talking to that he wanted to hire me, so then I had to convince the rest of the company. Thus they flew me to california for an in-person interview at the start of December. That was just over a month ago.

While I was there they said they review interview write-ups and make decisions about them on Wednesdays, but when I was there they said the results from a friday interview probably wouldn't be in by the following Wednesday (in this case the 9th), so they'd probably do the review on the 16th instead. On the 17th, I got email that "the feedback took a little longer than expected to collate so you have been put into next Wednesday's review", and it was bumped to the 23rd. Then on the 22nd I got a phone call that people on the review committee were on vacation, so they bumped it two weeks to get past the holidays. That would be today.

In the past few months I've been invited to speak at a conference in Chicago this spring, and couldn't give 'em a clear answer because I don't know where they'd be flying me in _from_. Fade's only registered for one spring class because she doesn't know if we'll be moving. The real estate market near UT is intensely seasonal, if I'm going to sell the condo this year I should really have it on the market by spring break...

I'm supposed to be working on a VLC modification for a guy in Florida, but I'm just too stressed to focus. (And securitybreach went down yesterday, not sure why. That doesn't help.)

I also haven't been able to focus on FWL, which is having the weirdest problems with libgcc_s.so, because on arm eabi '__aeabi_ldivmod' is magic and evil. The more I look at that code the worse idea it was, maybe I should just hardware libgcc.a and _eh.a all the time...

When the host compiler from Ubuntu is building hello world, it's calling the linker via (lots of extraneous gorp removed):

ld crt1.o crti.o crtbegin.o hello.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed crtend.o crtn.o

The thing is, -lgcc asks for libgcc, and there's only libgcc.a (no .so). That means that libgcc_s.so can only replace libgcc_eh.a, which seems kind of pointless. But if I take out that -lgcc, then I get the __aeabi_ldivmod thing on armv5l, which is not straightforward.

I dunno, maybe this is a case where a new gcc release actually simplified something rather than making it worse. (There's a first time for everything.) But the new release is still GPLv3, so I still won't touch it.

January 5, 2010

I'm poking at toybox again. Added a trivial app (setsid) just to break the logjam, and also checked in some pending stuff I've had in my tree for... possibly a year?

I'm not sure what this really means, other than busybox is no fun for me to work on in its current state. I should look into what would be involved in porting largeish busybox applications to toybox commands. Bound to be easier than getting the busybox developers interested in redoing their infrastructure...

January 4, 2010

Learning about various things VLC can do. It's actually quite a powerful package, with documentation Google just can't find. Google finds the standard useless wiki, and some howtos with big warnings about how obsolete they are and which have a mixture of quite informative bits immediately followed by long sections about a windows-only wizard they apparently never bothered to implement on Linux

But what Google _doesn't_ find is any information on the command line options.

It turns out, what you do is "vlc -H" (which is not the same as "vlc --help", or even "vlc --help --advanced"), and that shows you all the actual options you've got. Apparently not all versions of vlc have that compiled in, but when it does it's quite informative. Why it's not online anywhere, I dunno...

January 3, 2010

Still weird to type 2010.

So I submitted another twitter help request for the same old problem. (Essentially boiling down to yet more whinging about Horrible Retweets being an abomination, but for a specific aspect of it that can be considered an actual bug rather than merely an epic design failure.)

This week's way to submit a twitter help request is to go to here and fill out the form. As far as I can tell, that link is no longer linked from anywhere on twitter's site, I found it by guesswork. Then when you post the form, it will go directly to "solved" status. The difference between "solved" and "closed" is you can still comment on it, and commenting will open it and put it in the actual queue for a human to look at, so you have to view your solved requests and then click on your new request and add a comment.

This post is pretty much to remind me how to do it. I'll have to update it when they add yet more hoops.

January 2, 2010

Blundering around like a bull in a china shop in my own code base. "This is broken, that's broken, that needs to go over there..." I'm back in "I can fix it!" mode, but I also should break this down into chunks and check it in in smallish pieces. (Or at least check it in soonish before it snowballs much more.)

(Several of these things are todo items, such as making ccwrap automatically detect whether or not gcc_s.so is in the library path instead of hardwiring that in at compile time.)

January 1, 2010

BWAHAHAHAH!

It took two weeks of sitting around moping and having writers block, and either making changes I reverted at the end of the day because it just wasn't improving anything or else feeling like banging on FWL was just no fun at all (so I didn't do it)... And then today while randomly poking at a reduced subset of the problem I'd been wrestling with to try to get SOMETHING done, I saw how to FIX IT.

And suddenly, it's fun again and I'm getting so much CRAP cleaned out of root-filesystem.sh.

The trick is to have root-filesystem.sh _never_ build a native toolchain. It should not be doing that at all. That incestuous breeding tangle of "this calls that but it's also called from over there AND called from over there too" that I just couldn't simplify? I don't need to do it in the first place.

Instead, it should check to see if build/native-toolchain-$ARCH exists, and if it does copy it verbatim into the new root filesystem. I was already building one, so just use _that_. (If it doesn't exist, then use path_search to copy the shared libraries out of the cross compiler that's building us, and just use those.)

I repeat: root-filesystem.sh should _never_ build its own C library. It should just use the one out of an existing target compiler. It seems counter-intuitive that we don't rebuild everything, but we already depend on the existence of a working binary C library for the target in the compiler or nothing works. So run with that.

I'll probably grab some variant of my old ldd root filesystem generation script I wrote for busybox's test suite years ago (see testing/testsuite.sh in the busybox source, functions mkchroot() and dochroot(). I'd link to the busybox source control system, but it appears horked at the moment. Submitted a trouble ticket to osuosl.org). Anyway, the point of using that is to grab just the shared libraries that are actually used by something, rather than all the shared libraries the compiler has access to. But that's an optimization for later.

The new code looks something like this (still debugging):

# If a native compiler exists for this target, use it.

if [ -d "$BUILD/native-compiler-$ARCH" ]
then
  cp -a "$BUILD/native-compiler-$ARCH/." "$ROOT_TOPDIR" || dienow
else if [ ! -z "$BUILD_STATIC" ]
then
  mkdir -p "$ROOT_TOPDIR/lib" &&
  path_search "$("$ARCH-cc" --print-search-dirs | sed 's/^libraries: =*//')" \
              "*.so*" 'cp -h "$DIR/$FILE" "$ROOT_TOPDIR/lib/$FILE"' &&
  # Since we're not installing a compiler, delete the example source code.
  rm -rf "$ROOT_TOPDIR/src/*.c*" || dienow
fi

And that replaces LOTS of old existing crap, _and_ it decouples this build from the toolchain (look, no more library path redirection, although we're not entirely out of the woods on that front yet because when the native toolchain _does_ exist it's supplying the shared libraries for the final system and there's currently nothing actually enforcing a match between them, but eh. Worry about that later. Possibly the else goes away and the cp should only happen if the target file doesn't exist yet, something like that. The ldd script from busybox above works in here somehow.

But I am unblocked! I am removing complexity! Ok, some of it's migrated into build.sh and that's bad and I still need to keep chasing it but PROGRESS. There is still mess but it's FLEXIBLE mess, it moves when I push on it, so I'll mold it into something reasonable eventually.

Sigh. Happy.

December 31, 2009

Happy new year. I haven't done anything worth blogging about it a week. (I've twittered but that's a service _designed_ for having very little to say. Sort of the point of the thing, really.)

I have to move the blog file and start a new one tomorrow. (Actually what I have to move is the "notes.html" symlink to point to the new file. And possibly I should start making the suckers so the date is a link to the individual post...)

Or just use a real blogging service. Can't quite bring myself to make my old livejournal important again now that the russian mafia owns it. But the span tags I've been putting in the entries don't show up in the rss feed, and every time I ponder doing something about that it always brings up the fact there's no comments and no individual entry URLs (beyond the hash tags)...

I dunno, what blogging services out there these days are worth using?

December 28, 2009

Wow, am I an agnostic.

I just added somebody to my spam filter over an argument about a tecnical topic, and not even because I particularly disagree with him (the approach he takes works fine for his use case). It's that he was arguing that what he'd chosen to do was a One True Way, conflating together a half-dozen different choices that are in reality orthogonal, insisting his personal preferences were universal truths... and I just don't want to argue with people like that anymore. It's no fun. Possibly I used up my current stock of patience in last week's LWN thread, dunno.

Christmas came and went, New Year's approaches, it's the holiday week between the two and I've been sleeping for 11 hours a night. Not quite sure why, just tired all the time. (Fade points out the time of year. If I do wind up in Dublin I'm going to need big flourescent lights, right next to my desk. Yeah, that's still an if.)

December 25, 2009

Merry Christmas! Watched a festive holiday movie (Groundhog's Day), opened presents, cooked an enormous roast...

I have been _massively_ slacking off due to the holidays. Haven't made any progress on any computer related things since last week. (In part because my current FWL thingy is trying to cut a gordian knot of complexity that I'm writing an explanation of for the mailing list, but mostly because it's christmas and I just want to relax and enjoy it.)

Wrote up a status report on FWL, and then moved it to the mailing list instead.

December 23, 2009

Sigh. Back in the 90's I found a marvelous orange bread recipe on Ye Olde Internet, which was accidentally vegan. (The whole "that carrot has been murdered, I must go straight edge on you now" aspect was incidental, I liked that it was actually _good_.) I dug the recipe up out of a box again years later and it was still marvelous, but alas it seems long lost since sometime before the move to Pittsburgh.

What I really remember about it is it was trivial to make. The main ingredients were flour and orange juice, with a bit of salt and baking powder/soda. (I still can't quite keep those straight. Possibly both.) Oh, and a teaspoon or so of lemon juice, optional. There might have been a little sugar, although the orange juice had rather a lot already. I do remember that the only liquid ingredient was the juice. (No eggs, butter, oil, shortening, or anything like that.) Also, no yeast. (It was important not to stir it too much, you just fold the mixture over once or twice until it's wet and then _stop_stirring_.)

Unfortunately, attempting to google for this recipe keeps bringing up pages and pages of irrelevant crap. No grated peel, no nuts, no bananas, no pumpkin, no cranberry. No milk, no chunks of fruit.

By the time I've put in all those filters, I think I broke Google's brain. I'm now getting a video recipe for "Frog's Eye Salad", a page about "Catsup Cake Flour" (?), A PDF from a state.tx.us page that claims to be a tax publication but apparently contains a grocery list, a recipe for "Orange Roughy Roll-ups with Creamy Dill Sauce" (if that has any relation to fruit roll-ups, I'm going to pass, thanks), fondue recipes, "Holiday Baking with Whole Grains", and a page from the North Dakota Wheat Commission. That's all in the first two pages of results.

I suspect my best option is to just fiddle around and try to recreate this stuff by trial and error. Except I don't currently have any orange juice.

Wow. After I stopped directly replying to His Specialness over on lwn (because it just wasn't accomplishing anything), and then later stopped posting to the lwn.net thread at all (well, except for this) when Jonathan Corbet (editor of Linux Weekly News) asked us to stop. (Then I fell behind on email and twitter for a couple days because it's christmas and I was busy, so I'm just catching up now.)

But other people didn't stop posting, and I still get emailed the replies to replies because I checked the little boxes, and apparenty His Specialness has decided that anyone who disagrees with him must really be me in disguise.

I find that hilarious. When have I ever been reluctant to insult him to his face? The only reason I've stopped using His Specialness's name is I think he's a glory hound on the order of those white house party crashers, balloon boy's parents, and that toupee that used to govern Illionis. Getting his name repeated seems to be what he wants, with context in which it's repeated a very distant secondary consideration. I suspect the only reason he hasn't done one of those reality television shows where you eat a bug on camera is they wouldn't take him.

LWN doesn't seem to have user profiles, so out of morbid curiosity I spent the 30 seconds to type "site:lwn.net syspig" into google, which found a comment from 2006 on its first page of hits. His Specialness apparently did not bother to do this.

I repeat my comment from the last big post I made to the thread before the moratorium: "I'm tired of arguing with people who don't bother to do their homework". (And I note yesterday's squirrel analogy on the difference between clever and smart.)

December 22, 2009

Yesterday at the mall there were tons of banners for this clear.com thing, and today I clicked on a web ad for 'em, because I'm all for any alternative to Time Warner (which needs to die on general principles because they keep trying to retroactively sneak usage limits into their contracts), and these guys look like they've got reasonable plans with no usage caps... if you can find them.

Unfortunately, their web site was designed by somebody who thinks that clicking through five pages to answer a simple question somehow improves matters, and that all pages should be generated by javascript based on cookies so that when you finally DO dredge an interesting piece of info out of their navigational morass, you can never actually get a URL directly to that page so you can send said URL to someone else. (Oh, and two of the pages I tried to visit had firefox bring up a "redirect loop detected" page, that was nice.)

It's a very _pretty_ website. Lovely plumage.

They had a "live chat" service which was answered by a very nice lady who was very polite, gave prompt replies... and didn't actually have any new information to give me. (She was looking stuff up on the website for me, and apparently she couldn't find some of it either.)

Hopefully if I come back in a month their teething troubles will have worked through, their support people will be more experienced, and maybe they'll have burned their existing website to the ground and replaced it with something that doesn't spend all its effort on being clever and none of it on being smart. (I can hope. When squirrels hide nuts so ingeniously they can't find them again, they're being clever but not smart. It's less cute in a large corporation.)

December 21, 2009

Ok, the FROM_ARCH, FROM_HOST, CROSS_HOST, and CROSS_TARGET stuff is once again TOO CONFUSING. It works, and it was worked out laboriously via extensive trial and error, but it's too nasty and tangled for me to keep it straight in my head, and I wrote it.

Let's see...

if CROSS_HOST is blank, it's $(uname -m)-walrus-linux
if CROSS_TARGET is blank, it's $ARCH-unknown-lilnux
  if CROSS_TARGET is set by an arch dir (such as powerpc-440fp),
  then FROM_HOST=$CROSS_TARGET
if FROM_ARCH is blank, FROM_ARCH=$ARCH
if FROM_HOST is blank, FROM_HOST=$FROM_ARCH-thingy-linux

Ok, FROM_ARCH defaults equal to $ARCH. (Probably $ARCH should just be called $TARGET these days.) This is blanked for the binutils/gcc/ccwrap stages when we're building a simple cross compile. It's set to the host for static builds (currently hardwired to i686), and then set equal to the target arch for native builds.

That's overcomplicated because the toolchain build and root filesystem stuff is glued together. Setting it to the toolchain host for cross and native builds makes sense, but the "default equal to $ARCH, except when it's blank" bit is crazy. There should be some way to signal we're building a simple compiler, but overloading FROM_ARCH seems a bit silly.

FROM_HOST is derived from FROM_ARCH, and overriden by CROSS_TARGET under magic circumstances. That's a mess, and I believe it's a mess for binutils. Vaguely recall much pain last time I fiddled with this, and this is just where I left off because it was SUCH a mess.

CROSS_HOST exists solely to humor the binutils and gcc builds, because the people who wrote that are crazy. It's always `uname -m`-walrus-linux. It's proof positive that autoconf is useless because it' can't figure out what the current system it's running the build on is, and must be told, and if I can't figure out how to remove it I think I'm going to hardwire it into the call sites.

Sigh. Attempting to clean up the multi-variable mess while separating the toolchain build from root-filesystem.sh (and unify it with cross-compiler.sh) turned into a large mess. I need to do this in stages, but it's all tangled together. Where to start...