Q: Where do I start?

The project provides development and test environments for lots of different hardware platforms, based on busybox and uClibc and configured to run under QEMU.

Most people want to do one of three things:

  • Download a prebuilt system image, boot it up under the emulator, and compile stuff natively for a target.

    Go here and download the appropriate system-image-$ARCH.tar.bz2 for your $TARGET, extract it, cd into it, and ./run-emulator.sh to boot it under qemu.

    Alternately, you can run the script ./development-environment.sh, which is a wrapper around run-emulator.sh that feeds QEMU extra options to add memory (256 megs) and writeable disk space (a blank 2 gigabyte disk image mounted on /home) to provide a more capable development environment.

    The system images contain native compiler toolchains, but if you install distccd on the host and add the appropriate cross compiler to your host's $PATH, the ./run-emulator.sh script will detect this and set up the system image to automatically use distcc to call out to the cross compiler through the virtual network, speeding up native builds significantly.

  • Build your own cross compilers and system images from source, using the build scripts.

    Go to the downloads directory, grab the most recent release tarball, extract it, and run ./build.sh to list the available targets. The run ./build.sh $TARGET to compile the one you like. The results wind up in the "build" directory.

    The build scripts are written in bash, and fairly extensively commented. All the scripts at the top level are designed to be run directly, and build.sh is just a wrapper script that calls them in order. The less commonly used scripts in sources/more are also designed to be run directly.

    A large number of variables can be set to configure the build, either by modifying the file "config" (which documents them all) or by exporting them as environment variables.

    To grab the latest development version of the build scripts out of the source control system, go to the mercurial archive. If you don't want to install mercurial, you can grab a tarball of the current code at any time.

  • If all else fails, look at the pretty screenshots.

    Q: What's all this source code for?

    A: The basic outline is:

    Q: Didn't this used to be called Firmware Linux?

    A: Yup. The name changed shortly before the 1.0 release in 2010.

    The name "Aboriginal Linux" is based on a synonym for "native", as in native compiling. It implies it's the first Linux on a new system, and also that it can be replaced. It turns a system into something you can do native development in, terraforming your environment so you can use it to natively build your deployment environment (which may be something else entirely).

    Aboriginal Linux is cross compiled, but after it boots you shouldn't need to do any more cross compiling. (Except optionally using the cross compiler as a native building accelerator via distcc.) Hence our motto, "We cross compile so you don't have to".

    The old name didn't describe the project very well. (It also had tens of millions of Google hits, most of which weren't this project.) If you're really bored, there's a page on the history of the project.

    Q: How do I add $PACKAGE to my system image's root filesystem?

    A: Use the rw-system-image instead of the system-image. This gives you a writeable root filesystem with enough extra space to install your package in.

    FWL builds squashfs images by default, and the prebuilt binary tarballs in the downloads/binaries directory are built with the default values. Squashfs is a read-only compressed filesystem, which means it's pretty durable (you never need to fsck it), but also a bit limiting. The dev-environment.sh script attaches a 2 gigabyte ext2 image to /dev/hdb (which is mounted on /home) so you always have writeable space to build stuff in, but that doesn't let you modify the root filesystem on /dev/hda: you can't install packages you build into /bin and such on a read-only root filesystem.

    The "SYSIMAGE_TYPE" and "SYSIMAGE_HDA_MEGS" config entries let you change the default system image type generated by the system-image.sh script. You can edit the file "config" or specify them as environment variables, ala:

    SYSIMAGE_TYPE=ext2 SYSIMAGE_HDA_MEGS=2048 ./build.sh $TARGET
    

    That creates a 2 gigabyte ext2 image, which you can boot into and install packages natively under, using the "./run-from-build.sh $TARGET" script. If you've already built a system image, you can repackage the existing root filesystem by just running system-image.sh (instead of the whole build.sh). As always, your new system image is created in the "build" subdirectory.

    Note: since this is a writeable image, you'll have to fsck it. You can also use "tune2fs -j" to turn it into an ext3 image.

    Q: ./run-emulator.sh says qemu-system-$TARGET isn't found, but I installed the qemu package and the executable "qemu" is there. Why isn't this working?

    A: You're using Ubuntu, aren't you? You need to install "qemu-kvm-extras" to get the non-x86 targets.

    The Ubuntu developers have packaged qemu in an actively misleading "interesting" way. They've confused the emulator QEMU with the virtualizer KVM.

    QEMU is an emulator that supports multiple hardware targets, translating the target code into host code a page at a time. KVM stands for Kernel Virtualization Module, a kernel module which allows newer x86 chips with support for the "VT" extension to run x86 code in a virtual container.

    The KVM project started life as a fork of QEMU (replacing QEMU's CPU emulation with a kernel module providing VT virtualization support, but using QEMU's device emulation for I/O), but KVM only ever offered a small subset of the functionality of QEMU, and current versions of QEMU have merged KVM support into the base package. (QEMU 0.11.0 can automatically detect and use the KVM module as an accelerator, where appropriate.)

    It's a bit like the X11 project providing a "drm" module (for 3D acceleration and such), which was integrated upstream into the Linux kernel. The Linux kernel was never part of the X11 project, and vice versa, and pretending the two projects were the same thing would be wrong.

    That said, on Ubuntu the "qemu" package is an alias for "qemu-kvm", a package which only supports i386 and x86_64 (because that's all KVM supports when running on an x86 PC). In order to install the rest of qemu (support for emulating arm, mips, powerpc, sh4, and so on), you need to install the "qemu-kvm-extras" package (which despite the name has nothing whatsoever to do with KVM).

    Support for non-x86 targets is part of the base package when you build QEMU from source. If you ignore Ubuntu's packaging insanity and build QEMU from source, you shouldn't have to worry about this strangely named artificial split.

    Q: I added a uClibc patch to sources/patches but it didn't do anything, what's wrong?

    The Linux filesystem is case sensitive, so the patch has to start with "uClibc-" with a capital C.

    Q: Why did the $NAME package build die with a complaint that it couldn't find $PREREQUISITE, even though that's installed on the host? (For example, distcc and python.)

    Because you skipped the host-tools.sh step, and because installing a package on the host isn't the same as installing it on the target.

    Even though host-tools.sh is technically an optional step, your host has to be carefully set up to work without it.

    Not only does host-tools.sh add prerequisite packages your build requires, it _removes_ everything else from the $PATH that might change the behavior of the build. Without this, the ./configure stages of various packages will detect that libtool exists, or that the host has Python or Perl installed, and configure the packages to make use of things that the cross compiler's headers and libraries don't have, and that the target root filesystem may not have installed.

    Q: Why isn't the build listening to the environment variables I set?

    Quick answer: export NO_SANITIZE_ENVIRONMENT=1.

    Long answer: you probably deleted the commented out variables from "config" and then tried to set them on the command line. The environment sanitizing logic has a whitelist of variables, but also looks at config to see what variables are exported in there (whether they're commented out or not) and lets those through from the environment as well. If you remove them from config, it won't let them through from the environment.

    Debugging questions

    Q: How do I get better log output from the build?

    Get a verbose, single-processor log of the build output.

    When something goes wrong, re-run your build with a couple extra variables, and log the output with "tee":

    BUILD_VERBOSE=1 CPUS=1 ./build.sh 2>&1 | tee out.txt

    The shell has a nice syntax for exporting variables just for a single command, by putting the command to run after the assignment. Doing that doesn't pollute your environment by leaving CPUS or BUILD_VERBOSE exported, but it exports them just for the new "build.sh" process it launches. And redirecting stderr to stdin and piping the result into "tee" captures the output so you can examine it with less or vi.

    BUILD_VERBOSE undoes the "pretty printing" of the linux kernel and uClibc, and makes a few other build steps produce more explicit output.

    CPUS controls the number of tasks make should run in parallel. The default value is the number of processors on the system, times 1.5. (So a 4 processor system runs 6 processes.) Making it single processor gives you much more readable output, because a single-processor build stops more reliably at the point where it hit a problem, rather than at some random later point forcing you to scroll back quite a ways to find the error. It also shouldn't interleave the output of multiple parallel commands.

    Use the command logging wrapper

    If you need more logging detail, run more/record-commands.sh, then re-run the build and look at the output in build/logs.

    The record-commands script sets up a wrapper which logs every command (and all its arguments) run out of $PATH. It populates build/wrapdir with symlinks for every command name currently in $PATH, all pointing to the "wrappy" binary (built from sources/toys/wrappy.c). If you run record-commands before running host-tools.sh it wraps the host $PATH, if you run it after host-tools.sh it wraps the sanitized $PATH in build/host.

    The wrappy binary depends on two environment variables (set up by sources/include.sh): $WRAPPY_LOGPATH is an absolute path to the current log file (updated by the "setupfor" function) and $OLDPATH is the $PATH to exec the real command out of after appending the current command line to the log.

    The script "more/report-recorded-commands.sh" prints out a list of all commands used by each build stage. (Comparing the host-tools version to a run without host-tools can be instructive; that's the extra stuff ./configure is picking up out of the host environment.)

    Q: How do I play around with package source code?

    The source code used by package builds lives in several directories, each with a different purpose:

    Downloading

    The list of source URLs is in the script download.sh, along with a list of mirrors to check if the original URL isn't available. Those URLs are the only place that specifies version numbers for packages, so if you want to switch versions just point to a new URL and re-run download.sh. (You can set SHA1= blank for the first download, and it will output the sha1sum for the file it downloads. Cut and paste that into the download script and re-run to confirm.)

    Extracting and patching

    Each script to build a package calls the shell function "setupfor" before building the package, and "cleanup" afterwards. Conceptually, "setupfor" extracts a tarball (from the "packages" directory), patches it if necessary (applying all the files in "sources/patches" that start with that package's name, which come from the aboriginal linux repository), and cd's into the resulting directory. The function "cleanup" does an "rm -rf" on that directory when you're done.

    In practice, the infrastructure behind the scenes caches the extracted tarballs. This optimization saves disk space, CPU time, and I/O bandwidth, speeding up builds considerably (especially when you do a lot of them in parallel). This optimization is designed to be easily ignored, but understanding the infrastructure can be useful for debugging.

    There are two places to look for extracted source packages: the package cache and the working copy. The package cache (in "build/packages") contains clean copies of all the previously extracted source tarballs, with patches already applied. Each working copy (in an architecture's temporary directory, "build/temp-$ARCH") is a tree of hardlinks to the package cache that provides a directory in which to configure, build, and install that package for a specific target.

    The source in the package cache stays clean, can be re-used across multiple builds, and is only used to create working copies. Working copies fill up with temporary files from configure/make/install, and are normally deleted after each successful build. If you want to look at clean source, you want the package cache. If you want to look at the state of a failed build to see how it was configured or re-run portions of it, you want the working copy.

    Q: What's the package cache for?

    The package cache contains clean architecture-independent source code, which you can edit, use to run modified builds and create patches, and easily revert to its original condition. The package cache avoids re-extracting the same tarballs over and over, but also provides a place you can make temporary modifications to that source behind the build system's back without having to mess around with tarballs or patch files.

    The setupfor function calls "extract_package" to populate the package cache. First extract_package checks for an existing copy of the appropriate source directory, and when it doesn't find one it extracts the source tarballs from the "packages" directory, applies the appropriate patches from "sources/patches/$PACKAGENAME-*.patch", and saves the results into its own directory (named after the package) under "build/packages". (USE_UNSTABLE packages work the same way, but insert an "alt-" prefix on the package name.)

    When the package cache has an existing copy of the package, extract_package checks the list of sha1sums in that copy's "sha1-for-source.txt" file against the sha1sums for the tarball and for each of the patch files it needs to apply. If the list matches, it uses the existing copy. If it doesn't match, it deletes the existing copy out of the package cache, re-extracts the tarball, and reapplies each patch to it.

    This means if you can edit the copy under sources/patches all you like, and as long as you don't modify sha1-for-source.txt, don't replace the tarball, or add/remove/edit any of the patches to apply to it, it will re-use that source for subsequent builds. So go ahead and fill it full of printf()s and test code, then when you want to go back to a clean copy, delete the build/packages directory (either one package or the whole thing) and let setupfor recreate it.

    If you come up with changes you want to keep, you can create a patch from the package cache this way:

      # Rename the modified package directory
    
      cd $TOP
      cd build/packages
      mv $PACKAGE $PACKAGE.bak
    
      # Extract a clean copy
    
      cd $TOP
      more/test.sh host extract_package $PACKAGE
    
      # Diff the two and write out the patch to sources/patches
    
      cd build/packages
      diff -ruN $PACKAGE $PACKAGE.bak > ../../sources/patches/$PACKAGE-$NAME.patch
      rm -rf $PACKAGE
    
      # Run a clean test build
    
      cd $TOP
      rm -rf build/packages/$PACKAGE
      ./build.sh $ARCH
    

    Where $TOP is your top level Aboriginal Linux directory, $PACKAGE is the name of the package you're modifying, and $NAME is some unique name for your patch. Don't forget to delete the $PACKAGE.bak directory to reclaim its disk space when you're satisfied with your patch (or "rm -rf build/packages" to zap the entire package cache, or just "rm -rf build" to clean up all the temporary files).

    If the environment variable EXTRACT_ALL is set, download.sh will call extract_package on each package as soon as it confirms the tarball's sha1sum. (The environment variable FORK makes each package download happen in parallel, including the call to extract_package if any.) Prepopulating the package cache this way is useful before running different architecture builds in parallel, or when testing that new patches (added to the sources/patches directory) apply correctly to the relevant package(s).

    This means you can do the following to get a freshly extracted and patched clean copy of all packages:

      rm -rf build/packages
      EXTRACT_ALL=1 ./download.sh
    

    Q: What are working copies for?

    Working copies are target-specific copies of package source where builds actually happen. The build scripts clone a fresh working copy for each build, then run configure, make, and install commands in the new copy. They leave the aftermath of failed builds lying around for analysis; to keep the working copies of successful builds around too, set the NO_CLEANUP environment variable. If you want to cd into a source directory and re-run bits of a previous build, use the working copy of a package's source. (You'll probably have to add the appropriate cross compiler's bin directory to your $PATH, but otherwise it'll usually just work.)

    Working copies of source packages are cloned from the package cache by the the function "setupfor", which first calls extract_package to ensure the package cache is up to date, then creates a directory of hardlinks to the package cache via "cp -l" (or symlinks via "cp -s" if $SNAPSHOT_SYMLINK is set).

    The working copies use hardlinks to avoid creating redundant copies of the file contents, which would waste I/O bandwidth and eat lots of disk space and disk cache memory. Using hardlinks instead of symlinks for the working copies also saves inodes and dentry cache, since each symlink consumes an inode, but that optimization requires that the package cache and working copies be on the same filesystem.

    Linking to the page cache instead of copying it doesn't cause problems for most packages, because most methods of modifying files used by package builds break hardlinks or symlinks by first creating a temporary copy with the modifications, then deleting the original and moving the copy into its place. Modifying files that are tracked by source control also creates spurious noise for the package's developers. Occasionally a package will make a mistake (such as zlib 1.2.5 shipping a Makefile which is generated by configure, and modified in place), in which case the build has to break the link itself. (Note that editing the working copies of source files in build/temp-$ARCH can modify the cached copy if your editor isn't configured to break hardlinks. Usually you edit the package cache version and let setupfor create a new working copy.)

    If you want to search just the generated files and not the snapshot of the source, use "find $PACKAGE -links 1". If you want to search just the source files and not the generated files, that's what the package cache is for.

    TODO:
      - more/test.sh ARCH build_section thingy