Documentation for Firmware Linux

Note, this documentation is currently under construction. This is three files concatenated together, with their contents in the process of being reorganized and rewritten. Some of it's up to date, some isn't.


What is Firmware Linux?

Firmware Linux is an embedded Linux build system, designed to replace cross compiling with native compiling under emulation. It provides an easy way to get started with embedded development, building your own code against uClibc, and testing it on non-x86 hardware platforms.

This documentation uses the name "Firmware Linux' (or abbreviation "FWL") to refer to the build system, and calls the output of the build a "system image". The build system is implemented as a series of bash scripts and configuration files which compile a Linux development system for the specified target and package it into a bootable binary image.

These system images provide a simple native Linux development environment for a target, built from seven source packages: busybox, uClibc, gcc, binutils, make, bash, and the Linux kernel. This is the smallest environment that can rebuild itself entirely from source code, and thus the minimum a host system must cross compile in order to create a fully independent native development environment for a target.

Booting a development system image under an emulator such as QEMU allows fully native builds for supported target platforms to be performed on cheap and powerful commodity PC hardware. Building and installing additional packages (zlib, bison, openssl...) within a system image can provide an arbitrarily complex native development environment, without resorting to any additional cross compiling.

FWL currently includes full support for arm, mips, powerpc, x86 and x86-64 targets, and partial support for sh4, mips, and sparc. The goal for the FWL 1.0 release is to support every target QEMU can emulate in "system" mode.

Firmware Linux is licensed under GPL version 2. Its component packages are licensed under their respective licenses (mostly GPL and LGPL).

Optional extra complexity

Intermediate stages of the build (such as the cross compiler and the unpackaged root filesystem directory) may also be useful to Linux developers, so tarballs of them are saved during the build.

By default the build cross-compiles some optional extra packages (toybox, distcc, uClibc++) and preinstalls them into the target filesystem. This is just a convenience; these packages build and install natively within the minimal development system image just fine.)


Using system images

If you want to jump straight to building your own software natively for embedded targets, you can download a prebuilt binary image instead of running the build scripts to produce your own.

Here are the different types of output produced by the build:

system-image-*.tar.bz2

System images boot a complete linux system under an emulator. Each system-image tarball contains an ext2 root filesystem image, a Linux kernel configured to run under the emulator QEMU, and a run-emulator.sh script.

The steps to test boot a system image under qemu 0.9.1 are:

This boots the system image under the appropriate emulator, with the emulated Linux's /dev/console hooked to stdin and stdout of the emulator process. (I.E. the shell prompt the script gives you after the boot messages scroll past is for a shell running inside the emulator. This lets you pipe the output of other programs into the emulator, and capture the emulator's output.)

Type "cat /proc/cpuinfo" to confirm you're running in the emulator, then play around and have fun. Type "exit" when done.

Inside a system image, you generally wget source code from some URL and compile it. (For example, you can wget the FWL build, extract it, and run it inside one of its own system images to trivially prove it can rebuild itself.) If you run a web server on your host's loopback interface, you an access it inside QEMU using the special address "10.0.2.2". Example build scripts are available in the /usr/src directory.

Extra space and speed

The system images by themselves are fairly small (64 megabytes), and don't have a lot of scratch space for building or installing other packages. If a file named "hdb.img" exists in the current directory, run-emulator.sh will automatically designate it as a second virtual hard drive and attempt to mount the whole unpartitioned device on /home inside the emulator.

Some optional command line arguments to run-emulator.sh provide extra space and extra speed for compiling more software:

Running an armv4l system image with the cross compiler installed in the user's home directory, using a hard drive image in the user's home directory (to be created with a size of 2 gigabytes if it doesn't already exist) might look like:

./run-emulator.sh --make-hdb 2048 --with-hdb ~/blah.img --with-distcc ~/cross-compiler-armv4l

mini-native-*.tar.bz2

These tarballs contain the same root filesystem as the corresponding system images, just in an archive instead of packaged into a filesystem image.

If you want to boot your own system image on real hardware instead of an emulator, the appropriate mini-native tarball is a good starting point. If all you want is a native uClibc development environment for your host, try:

chroot mini-native-x86_64 /usr/chroot-setup.sh

The boot script /usr/qemu-setup.sh or /usr/chroot-setup.sh performs minimal setup for the appropriate environment, mounting /proc and /sys and such. It starts a single shell prompt, and automatically cleans up when that process exits.

If you're interested in building a more complex development environment within this one (adding zlib and perl and such before building more complicated packages), the best way to learn how is to read Linux From Scratch.

Note that mini-native is just one potential filesystem layout; the FWL build scripts have several other configurations available when you build from source.

cross-compiler-*.tar.bz2

The cross compilers created during the FWL build are relocatable C compilers for each target platform. The primary reason for offering each cross compiler as a downloadable binary is to implement the distcc accelerator trick. Using them to cross compile additional software is supported, but not recommended.

If you'd like to use one for something other than distcc, this documentation mostly assumes you already know how. Briefly:

Also, stock up on asprin and clear a space to beat your head against; you'll need both. See why cross compiling sucks for more details.

Note that although this cross compiler has g++, it doesn't have uClibc++ in its lib or include subdirectories, which is required to build most c++ programs. If you need extra libraries, it's up to you to cross-compile and install them into those directories.

How do I build my own customized system images from source code?

To build your own root filesystem and system images from source code, download and run the FWL build scripts. You'll probably want to start with the most recent release version, although once you've got the hang of it you might want to follow the development version.

For a quick start, download the tarball, extract it, cd into it, and run "./build.sh". This script takes one argument, which is the target to build for. Run it with no arguments to list available targets.

This should produce all the tarballs listed in the previous section in the the "build" directory. To perform a clean build, "rm -rf build" and re-run build.sh.

How building from source works

The build system is a series of shell scripts which download, compile, install, and use the appropriate source packages to generate a system image. These shell scripts are designed to be easily read and modified, acting both as tools to perform a build and as documentation on how to build these packages.

The build.sh script is a simple wrapper which calls the following other scripts in sequence:

  1. download.sh
  2. host-tools.sh
  3. cross-compiler.sh $TARGET
  4. mini-native.sh $TARGET
  5. package-mini-native.sh $TARGET

In theory, the stages are othogonal. If you have an existing cross compiler, you can add it to the $PATH and skip cross-compiler.sh. Or you can use _just_ cross-compiler.sh to create a cross compiler, and then go build something else with it. The host-tools.sh stage can often be skipped entirely.

Build stages

The following files control the individual stages of the build. Each may be called individually, from in the top level directory of FWL:

  • cross-compiler.sh - Build a cross compiler for the target, for use by mini-native.sh and the distcc accelerator.

    In order to build binaries for the target, the build must first create a cross compiler to build those target binaries with. This script creates that cross compiler. If you already have a cross compiler, you can supply it here (the easy way is to create a build/cross-compiler-$TARGET/bin directory and put "$TARGET-gcc" style symlinks in it) and skip this step.

    This script takes one argument: the architecture to build for. It produces a cross compiler that runs on the host system and produces binaries that run on the target system. This cross compiler is created using the source packages binutils, gcc, uClibc, the Linux kernel headers, and a compiler wrapper to make the compiler relocatable.<

    The reason for the compiler wrapper is that by default, gcc hardwires lots of absolute paths into itself, and thus only runs properly in the directory it was built in. The compiler wrapper rewrites its command line to prevent gcc from using its built-in (broken) path logic.

    The build requires a cross-compiler even if the host and target system use the same processor because the host and target may use different C libraries. If the host has glibc and the target uses uClibc, then the (dynamically linked) target binaries the compiler produces won't run on the host. (Target binaries that won't run on the host are what distinguishes cross-compiling from native compiling. Different processors are just one reason for it: glibc vs uClibc is another, ELF vs binflat or a.out executable format is a third...)

    This script produces produces a working cross compiler in the build directory, and saves a tarball of it as "cross-compiler-$TARGET.tar.bz2" for use outside the build system. This cross compiler is fully relocatable (because of the compiler wrapper), so any normal user can extract it into their home directory, add cross-compiler-$TARGET/bin to their $PATH, and run $TARGET-gcc to create target binaries.

  • mini-native.sh - Use the cross compiler to create a minimal native build environment for the target platfrom.

    This script takes one argument: the architecture to build for.

    This script uses the cross compiler found at build/cross-compiler-$ARCH/bin (with $ARCH- prefixes) to build a root filesystem for the target, as well as a target Linux kernel configured for use with qemu. A usable cross compiler is left in the build directory by the cross-compiler.sh script, or you can install your own.

    The basic root filesystem consists of busybox and uClibc. If the configuration variable NATIVE_TOOLCHAIN is set (this is enabled by default), this script adds a native compiler to the target, consisting of linux kernel headers, gcc, binutils, make, and bash. It also adds distcc to potentially distribute work to cross compilers living outside the emulator. This provides a minimal native development environment, which may be expanded by building and installing more packages under the existing root filesystem.

  • package-mini-native.sh - Create an ext2 filesystem image of the native root filesystem.

  • This script takes one argument: the architecture to package.

    This uses genext2fs to create an ext2 filesystem image from the build/mini-native-$ARCH directory left by running mini-native.sh, and creates a system-image-tarball containing the result. It first compiles genext2fs and adds it to build/host if the host system hasn't already got a copy.

    This script also generates a run-emulator.sh script to call the appropriate emulator, using the architecture's configuration information.

  • run-from-build.sh - Runs a system image you compiled from source.

    Calls run-emulator.sh in the appropriate build/system-image-$TARGET directory, with a 2 gigabyte hdb.img for /home and distcc connected to build/cross-compiler/$TARGET. Between runs it calls e2fsck on the system image's root filesystem.

    This is not technically a build stage, as it isn't called from build.sh, but it's offered as a convenience for users. It uses the existing cross-compiler and system-image directories in build/ and doesn't mess with the tarballs that were created from them.

  • The following generally aren't called directly, but are used by the rest of the build.


    How is Firmware Linux implemented?

    Directory layout

    The top level directory of FWL contains the user interface of FWL (scripts the user is expected to call or edit), and the "sources" directory containing code that isn't expected to be directly called by end users.

    Important directories under sources include:

    Output files from running the build scripts, and all temporary files, go in the "build" subdirectory. This entire directory can be deleted between builds.

    Shared infrastructure

    The top level file for the behind-the-scenes plumbing is sources/include.sh. This script is not run directly, but is instead included from the other scripts. It does a bunch of things:

    It also reads sources/functions.sh, which provides shell functions used by the rest of the build, including:

    Most of what include.sh does is optional. A lot of it's there to speed up and simplify the rest of the build. (You don't really need to call "make -j" for parallel bilds, and can re-extract and re-patch source code each time you need it rather than cacheing the extracted version.) Most of the rest is error checking, from "dienow" to the sha1sum checking in "download".

    None of this should be important to understanding how to build or install any of the actual packages. It just abstracts away repetitive, uninteresting bits.

    Downloading source code

    The FWL source distribution does not include any third party source tarballs. Instead, these are downloaded by running download.sh, which calls the shell function download, which calls wget as necessary. The download.sh script is just a series of calls to the download function.

    The first thing download.sh does is check for the --extract option, and if present set the environment variable EXTRACT_ALL, which tells each call to the download function to call the extract shell function on the appropriate tarball to prepopulate the source cache. (See "Extracting source code", below.)

    Each call to the download function is controlled by the following environment variables:

    At the end of download.sh is a call to the shell function cleanup_oldfiles, which deletes unused files. The include.sh snapshots the current time in the variable $START_TIME, and download calls "touch" to update the timestamp on each file it verifies the sha1sum of. Then cleanup_oldfiles deletes every file from sources/packages with a date older than $START_TIME.

    Note that download updates the timestamp on stable packages when downloading corresponding unstable stable packages, so cleanup_oldfiles won't delete them. In this special case they're not considered "unused files", but it won't verify their integrity or fetch them if they're not already there. If a package is not in the USE_UNSTABLE list, download.sh won't update the timestamp on unstable source tarballs, leaving them marked as unused and thus deleted by cleanup_oldfiles.

    Extracting source code

    The function "setupfor" extracts sources/packages/$PACKAGENAME-* tarballs. (If $PACKAGENAME is found in the comma separated $USE_UNSTABLE list, the build adds an "alt-" prefix to the package name.) This populates a corresponding directory under build/sources, and applies all the sources/patches/$PACKAGENAME-*.patch files in alphabetical order. (So if a package has multiple patches that need to be applied in a specific order, name them something like "bash-001-dothingy.patch", "bash-002-next.patch" to control this.)

    The trailing "-" before filename wildcards prevents collisions between things like "uClibc" and "uClibc++". Packages are allowed to contain dashes (such as gcc-core), but cannot have a digit immediately after the dash.

    FWL implements source cacheing. The first call to setupfor extracts the package into build/sources, and then creates a directory of hard links in the current target's build/temp-$TARGET directory with cp -lfR. Later setupfor calls just create the directory of hard links from the existing source tree. (This is a hybrid approach between building "out of tree" and building in-tree.)

    The ./download.sh --extract option prepopulates the source cache, extracting and patching each source tarball. This is useful for scripts such as sources/build-all-targets.sh which perform multiple builds in parallel.

    The reason for keeping extracted source tarballs around is that extracting and patching tarballs is a fairly expensive operation, which uses a significant amount of disk space and doesn't parallelize well. (It tends to be disk limited as much as CPU limited, so trying for more parallelism wouldn't necessarily help.) In addition, the same packages are repeatedly extracted: the cross-compiler and mini-native stages use many of the same packages, and some packages (such as the Linux kernel) are extracted and removed repeatedly to grab things like kernel headers separately from actually building a bootable kernel. (Also, different architectures build the exact same packages, with the same set of patches. Even patches to fix a bug on a single architecture are applied for all architectures; if this causes a problem, it's not a mergeable patch capable of going upstream.)

    Building host tools

    The host-tools.sh script sets up the host environment. Usually the host environment is already in a usable state, but this script explicitly enumerates exactly what we need to build, and provides our own (known) versions of everything except the host compiler toolchain in the directory build/host. Once we've finished, the $PATH can be set to just that directory.

    The build calls seven commands from the host compiler toolchain: ar, as, nm, cc, gcc, make, and ld. All those have to be in the $PATH, so host-tools.sh creates symlinks to those from the original $PATH.

    Next host-tools.sh builds toybox for the "patch" command, because busybox patch can't simple handle offsets and is thus extremely brittle in the face of new package versions. (This is different from "fuzz factor", which removes context lines to find a place to insert a patch, and tends to break a lot.) If USE_TOYBOX is enabled, a defconfig toybox is used and all commands are installed.

    Next host-tools builds a "defconfig" busybox and installs it into build/host. This provides all the other commands the build needs.

    What's the minimum the build actually needs?

    When building a new system, environmental dependencies are a big issue. Figuring out what package needs what, and what order to build things in, is the hardest part of putting together a system.

    Running the build without build/host calls lots of extra commands, including perl, pod2man, flex, bison, info, m4, and so on. This is because the ./configure stages of the various packages detect optional functionality, and use it. One big reason to limit the build environment is to consistently produce the same output files, no matter what's installed on the host.

    The minimal list of commands needed to build a working system image is 1) a working toolchain (ar, as, nm, cc, gcc, make, ld), 2) /bin/bash (and a symlink /bin/sh pointing to it), 3) the following command line utilities in the $PATH:

    awk basename bzip2 cat chmod chown cmp cp cut date dd diff dirname echo egrep env expr find grep gzip hostname id install ln ls mkdir mktemp mv od patch pwd readlink rm rmdir sed sha1sum sleep sort tail tar touch tr true uname uniq wc which whoami xargs yes

    These commands are supplied by current versions of busybox.

    Bash has been the standard Linux shell since before the 0.0.1 release in 1991, and is installed by default on all Linux systems. (Ubuntu broke its /bin/sh symlink to point to the Defective Annoying SHell, so many scripts call #!/bin/bash explicitly now rather than relying on a broken symlink.) We can't stop the build from relying on the host version of this tool; editing $PATH has no effect on the #!/bin/bash lines of shell scripts.

    The minimal set of commands necessary to build a system image was determined experimentally, by running a build with $RECORD_COMMANDS and then removing commands from the list and checking the effect this had on the build. (Note that the minimal set varies slightly from target to target.)

    $RECORD_COMMANDS tells host-tools.sh to set up a logging wrapper that intercepts each command line in the build and writes it to a log file, so you can see what the build actually uses. (Note that when host-tools.sh sets up build/wrapper, it doesn't set up build/host, so the build still uses the host system's original command line utilities instead of building busybox versions. If you'd like to record the build using build/host commands, run host-too.sh without $RECORD_COMMANDS set and then run it again with $RECORD_COMMANDS to set up the logging wrapper pointing to the busybox tools.)

    The way $RECORD_COMMANDS works is by building a logging wrapper (sources/toys/wrappy.c) and populating a directory (build/wrapper) with symlinks to that logging wrapper for each command name in $PATH. When later build stages run commands, the wraper appends the command line to the log file (specified in the environment variable $WRAPPY_LOGPATH, host-tools.sh sets this to "$BUILD/cmdlines.$STAGE_NAME.$PACKAGE_NAME"), recording each command run. The logging wrapper then searches $WRAPPY_REALPATH to find the actual command to hand its command line off to.

    Building a cross compiler

    We cross compile so you don't have to. The point of this project is to make cross compiling go away, but you need to do some to get past it. So let's get it over with.

    The cross-compiler.sh script builds a cross compiler. Its output goes into build/cross-compiler-$TARGET directory, which is deleted at the start of the build if it already exists, so re-running this script always does a clean build.

    Creating a cross compiler is a five step process:

    Afterwards the build strips some of the binaries, tars up the result, and performs some quick sanity tests (building dynamic and static versions of hello world. If the target configuration lists a version of QEMU to test individual binaries under on the host, it runs the static version to make sure it outputs "Hello world".


    Building a minimal native development environment for the target system

    The mini-native.sh script uses the cross compiler from the previous step to build a kernel and root filesystem for the target. The resulting system should boot and run under an emulator, or on real target hardware.

    If you really want to learn how to cross compile a target system, this is the script you want to read, and possibly append your own packages to. That said: please don't, and here's why:

    Because cross-compiling is persnickety and difficult, we do as little of it as possible. This script should perform all the cross compiling anyone ever needs to do. It uses the cross-compiler to generate the simplest possible native build environment for the target which is capable of rebuilding itself under itself.

    Anything else that needs to be built for the target can then be built natively, by running this kernel and root filesystem under an emulator and building new packages there, bootstrapping up to a full system if necessary. The emulator we use for this is QEMU. Producing a minimal build environment powerful enough to boot and compile a complete Linux system requires seven packages: the Linux kernel, binutils, gcc, uClibc, BusyBox, make, and bash. We build a few more than that, but those are optional extras.

    This root filesystem can also be packaged using the Linux From Scratch /tools directory approach, staying out of the way so the minimal build environment doesn't get mixed into the final system, by setting the $NATIVE_TOOLSDIR environment variable. If you don't know why you'd want to do that, you probably don't want to.

    In either configuration, the main target directory the build installs files into is held in the environment variable "$TOOLS". If $NATIVE_TOOLSDIR is set this will be "/tools" in the new root filesystem, otherwise it'll be "/usr".

    The steps the script goes through are:

    In theory, you can add more packages to mini-native.sh, or run another similar script to use the cross compiler to produce output into the mini-native directory. In practice, this is not recommended. Cross compiling is an endless sinkhole of frustration, and the easiest way to deal with it is not to go there.

    Packaging up a system image to run under emulation

    The package-mini-native.sh script packages a system image for use by QEMU. Its output goes into build/system-image-$TARGET directory, which is deleted at the start of the build if it already exists, so re-running this script always does a clean build.

    The steps here are:

    Running on real hardware

    To run a system on real hardware (not just under an emulator), you need to do several things. Dealing with myriad individual devices is beyond the scope of this project, but the general theory is:

    Speeding up emulated builds (the distcc accelerator trick)

    Cross compiling is fast but unreliable. The ./configure stage is designed wrong (it asks questions about the host system it's building on, and thinks the answers apply to the target binary it's creating).


    Why do things this way

    UNDER DEVELOPMENT

    This entire section is a dumping ground for historical information. It's incomplete, lots of it's out of date, and it hasn't been integrated into a coherent whole yet. What is here is in no obvious order.

    Why cross compiling sucks

    Cross compiling is fast but unreliable. Most builds go "./configure; make; make install", but entire ./configure stage is designed wrong for cross compiling: it asks questions about the host system it's building on, and thinks the answers apply to the target binary it's creating.

    Build processes often create temporary binaries which run during the build (to generate header files, parse configuration information ala kconfig, various "makedep" style dependency generators...). These builds need two compilers, one for the host and one for the target, and need to keep straight when to use each one.

    Cross compilers leak host data, falling back to the host's headers and libraries if they can't find the target files they need.

    TODO: finish this.

    Speeding up emulated builds (the distcc accelerator trick)

    TODO: FILL THIS OUT

    The basic theory

    The Linux From Scratch approach is to build a minimal intermediate system with just enough packages to be able to compile stuff, chroot into that, and build the final system from there.

    This approach completely isolates the host from the target, which means you should be able to run the FWL build under a wide variety of Linux distributions, and since the final system is built with a known set of tools you should get a consistent result. It also means you could run a prebuilt system image under a different host operating system entirely (such as MacOS X, or an arm version of linux on an x86-64 host) as long as you have an appropriate emulator.

    A minimal build environment consists of a compiler, command line tools, and a C library. In theory you just need three packages:

    Unfortunately, that doesn't work yet.

    Some differences between theory and reality.

    We actually need seven packages (linux, uClibc, busybox, binutils, gcc, make, and bash) to create a working build environment. We also add an optional package for speed (distcc), and use two more (genext2fs and QEMU) to package and run the result.

    Environmental dependencies.

    Environmental dependencies are things that need to be installed before you can build or run a given package. Lots of packages depend on things like zlib, SDL, texinfo, and all sorts of other strange things. (The GnuCash project stalled years ago after it released a version with so many environmental dependencies it was virtually impossible to build or install. Environmental dependencies have a complexity cost, and are thus something to be minimized.)

    A good build system will scan its environment to figure out what it has available, and disable functionality that depends on anything that isn't. (This is generally done with autoconf, which is disgusting but suffers from a lack of alternatives.) That way, the complexity cost is optional: you can build a minimal version of the package if that's all you need.

    A really good build system can be told that the environment it's building in and the environment the result will run in are different, so just because it finds zlib on the build system doesn't mean that the target system will have zlib installed on it. (And even if it does, it may not be the same version. This is one of the big things that makes cross-compiling such a pain. One big reason for statically linking programs is to eliminate this kind of environmental dependency.)

    The Firmware Linux build process is structured the way it is to eliminate as many environmental dependencies as possible. Some are unavoidable (such as C libraries needing kernel headers or gcc needing binutils), but the intermediate system is the minimal fully functional Linux development environment we currently know how to build, and then we switch into that and work our way back up from there by building more packages in the new environment.

    Resolving environmental dependencies.

    To build uClibc you need kernel headers identifying the syscalls and such it can make to the OS. We get them from the Linux kernel source tarball, using the "make headers_install" infrastructure created by David Woodhouse. This runs various scripts against the Linux kernel source code to sanitize the kernel's own headers for use by userspace. (This was merged in 2.6.18-rc1, and was more or less debugged by 2.6.19.)

    We install bash because the busybox shell situation is a mess. Busybox has several different shell implementations which share little or no code. (It's better now than it was a few years ago, but thanks to Ubuntu breaking the #!/bin/sh symlink with the Defective Annoying SHell, many scripts point explicitly at #!/bin/bash and BusyBox can't use that name for any of its shells yet.)

    Most packages expect gcc. The gnu compiler "toolchain" actually consists of three packages (binutils, gcc, and make). (The split between binutils and gcc is for purely historical reasons, and you have to match the right versions with each other or things break.)

    Adding an SUSv3 make implementation to busybox or toybox isn't a major problem, but until a viable GCC replacement emerges there's not much point.

    None of the other compilers under development are a drop-in replacement for gcc yet, especially for building the Linux kernel (which makes extensive use of gcc extensions). Intel's C Compiler implemented the necessary gcc extensions to build the Linux kernel, but it's a closed source package only supporting x86 and x86-64 targets. Since the introduction of C99, the Linux kernel has replaced many of these gcc extensions with equivalent C99 idioms, so in theory building the Linux kernel with other compilers is now easier.

    With the introduction of GPLv3, the Free Software Foundation has pissed off enough people that work on an open source replacement for gcc is ongoing on several fronts. The most promising is probably PCC, which is supported by what's left of the BSD community. Apple sponsors another significant effort, LLVM/Clang. Both are worth watching.

    Several others (Such as TinyCC and Open Watcom) once showed promise but have been essentially moribund since about 2005, which is when compilers that only ran on 32 bit hosts and supported C89 stopped being interesting. (A significant amount of effort is required to retool an existing compiler to cleanly run on an x86-64 host and support the full C99 feature set, let alone produce output for the dozens of hardware platforms supported by Linux, or produce similarly optimized binaries.)

    Additional complications

    Cross-compiling and avoiding root access

    Any time you create target binaries that won't run on the host system, you're cross compiling. Even when both the host and target are on the same processor, if they're sufficiently different that one can't run the other's binaries, then you're cross-compiling. In our case, the host is usually running both a different C library and an older kernel version than the target, even when it's the same processor.

    We want to avoid requiring root access to build Firmware Linux. If the build can run as a normal user, it's a lot more portable and a lot less likely to muck up the host system if something goes wrong. This means we can't modify the host's / directory (making anything that requires absolute paths problematic). We also can't mknod, chown, chgrp, mount (for --bind, loopback, tmpfs)...

    In addition, the gnu toolchain (gcc/binutils) is chock-full of hardwired assumptions, such as what C library it's linking binaries against, where to look for #included headers, where to look for libraries, the absolute path the compiler is installed at... Silliest of all, it assumes that if the host and target use the same processor, you're not cross-compiling (even if they have a different C library and a different kernel, and even if you ./configure it for cross-compiling it switches that back off because it knows better than you do). This makes it very brittle, and it also tends to leak its assumptions into the programs it builds. New versions may someday fix this, but for now we have to hit it on the head repeatedly with a metal bar to get anything remotely useful out of it, and run it in a separate filesystem (chroot environment) so it can't reach out and grab the wrong headers or wrong libraries despite everything we've told it.

    The absolute paths problem affects target binaries because all dynamically linked apps expect their shared library loader to live at an absolute path (in this case /lib/ld-uClibc.so.0). This directory is only writeable by root, and even if we could install it there polluting the host like that is just ugly.

    The Firmware Linux build has to assume it's cross-compiling because the host is generally running glibc, and the target is running uClibc, so the libraries the target binaries need aren't installed on the host. Even if they're statically linked (which also mitigates the absolute paths problem somewhat), the target often has a newer kernel than the host, so the set of syscalls uClibc makes (thinking it's talking to the new kernel, since that's what the ABI the kernel headers it was built against describe) may not be entirely understood by the old kernel, leading to segfaults. (One of the reasons glibc is larger than uClibc is it checks the kernel to see if it supports things like long filenames or 32-bit device nodes before trying to use them. uClibc should always work on a newer kernel than the one it was built to expect, but not necessarily an older one.)

    Ways to make it all work

    Cross compiling vs native compiling under emulation

    Cross compiling is a pain. There are a lot of ways to get it to sort of kinda work for certain versions of certain packages built on certain versions of certain distributions. But making it reliable or generally applicable is hard to do.

    I wrote an introduction to cross-compiling which explains the terminology, plusses and minuses, and why you might want to do it. Keep in mind that I wrote that for a company that specializes in cross-compiling. Personally, I consider cross-compiling a necessary evil to be minimized, and that's how Firmware Linux is designed. We cross-compile just enough stuff to get a working native build environment for the new platform, which we then run under emulation.

    Which emulator?

    The emulator Firmware Linux 0.8x used was User Mode Linux (here's a UML mini-howto I wrote while getting this to work). Since we already need the linux-kernel source tarball anyway, building User Mode Linux from it was convenient and minimized the number of packages we needed to build the minimal system.

    The first stage of the build compiled a UML kernel and ran the rest of the build under that, using UML's hostfs to mount the parent's root filesystem as the root filesystem for the new UML kernel. This solved both the kernel version and the root access problems. The UML kernel was the new version, and supported all the new syscalls and ioctls and such that the uClibc was built to expect, translating them to calls to the host system's C library as necessary. Processes running under User Mode Linux had root access (at least as far as UML was concerned), and although they couldn't write to the hostfs mounted root partition, they could create an ext2 image file, loopback mount it, --bind mount in directories from the hostfs partition to get the apps they needed, and chroot into it. Which is what the build did.

    Current Firmware Linux has switched to a different emulator, QEMU, because as long as we're we're cross-compiling anyway we might as well have the ability to cross-compile for non-x86 targets. We still build a new kernel to run the uClibc binaries with the new kernel ABI, we just build a bootable kernel and run it under QEMU.

    The main difference with QEMU is a sharper dividing line between the host system and the emulated target. Under UML we could switch to the emulated system early and still run host binaries (via the hostfs mount). This meant we could be much more relaxed about cross compiling, because we had one environment that ran both types of binaries. But this doesn't work if we're building an ARM, PPC, or x86-64 system on an x86 host.

    Instead, we need to sequence more carefully. We build a cross-compiler, use that to cross-compile a minimal intermediate system from the seven packages listed earlier, and build a kernel and QEMU. Then we run the kernel under QEMU with the new intermediate system, and have it build the rest natively.

    It's possible to use other emulators instead of QEMU, and I have a todo item to look at armulator from uClinux. (I looked at another nommu system simulator at Ottawa Linux Symposium, but after resolving the third unnecessary environmental dependency and still not being able to get it to finish compiling yet, I gave up. Armulator may be a patch against an obsolete version of gdb, but I could at least get it to build.)

    Packaging

    Filesystem Layout

    Firmware Linux's directory hierarchy is a bit idiosyncratic: some redundant directories have been merged, with symlinks from the standard positions pointing to their new positions. On the bright side, this makes it easy to make the root partition read-only.

    Simplifying the $PATH.

    The set "bin->usr/bin, sbin->usr/sbin, lib->usr/lib" all serve to consolidate all the executables under /usr. This has a bunch of nice effects: making a a read-only run-from-CD filesystem easier to do, allowing du /usr to show the whole system size, allowing everything outside of there to be mounted noexec, and of course having just one place to look for everything. (Normal executables are in /usr/bin. Root only executables are in /usr/sbin. Libraries are in /usr/lib.)

    For those of you wondering why /bin and /usr/sbin were split in the first place, the answer is it's because Ken Thompson and Dennis Ritchie ran out of space on the original 2.5 megabyte RK-05 disk pack their root partition lived on in 1971, and leaked the OS into their second RK-05 disk pack where the user home directories lived. When they got more disk space, they created a new direct (/home) and moved all the user home directories there.

    The real reason we kept it is tradition. The execuse is that the root partition contains early boot stuff and /usr may get mounted later, but these days we use initial ramdisks (initrd and initramfs) to handle that sort of thing. The version skew issues of actually trying to mix and match different versions of /lib/libc.so.* living on a local hard drive with a /usr/bin/* from the network mount are not pretty.

    I.E. The seperation is just a historical relic, and I've consolidated it in the name of simplicity.

    On a related note, there's no reason for "/opt". After the original Unix leaked into /usr, Unix shipped out into the world in semi-standardized forms (Version 7, System III, the Berkeley Software Distribution...) and sites that installed these wanted places to add their own packages to the system without mixing their additions in with the base system. So they created "/usr/local" and created a third instance of bin/sbin/lib and so on under there. Then Linux distributors wanted a place to install optional packages, and they had /bin, /usr/bin, and /usr/local/bin to choose from, but the problem with each of those is that they were already in use and thus might be cluttered by who knows what. So a new directory was created, /opt, for "optional" packages like firefox or open office.

    It's only a matter of time before somebody suggests /opt/local, and I'm not humoring this. Executables for everybody go in /usr/bin, ones usable only by root go in /usr/sbin. There's no /usr/local or /opt. /bin and /sbin are symlinks to the corresponding /usr directories, but there's no reason to put them in the $PATH.

    Consolidating writeable directories.

    All the editable stuff has been moved under "var", starting with symlinking tmp->var/tmp. Although /tmp is much less useful these days than it used to be, some things (like X) still love to stick things like named pipes in there. Long ago in the days of little hard drive space and even less ram, people made extensive use of temporary files and they threw them in /tmp because ~home had an ironclad quota. These days, putting anything in /tmp with a predictable filename is a security issue (symlink attacks, you can be made to overwrite any arbitrary file you have access to). Most temporary files for things like the printer or email migrated to /var/spool (where there are persistent subdirectories with known ownership and permissions) or in the user's home directory under something like "~/.kde".

    The theoretical difference between /tmp and /var/tmp is that the contents of /tmp should be deleted by the system init scripts on every reboot, but the contents of /var/tmp may be preserved across reboots. Except there's no guarantee that the contents of any temp directory won't be deleted. So any program that actually depends on the contents of /var/tmp being preserved across a reboot is obviously broken, and there's no reason not to just symlink them together.

    (I case it hasn't become apparent yet, there's 30 years of accumulated cruft in the standards, convering a lot of cases that don't apply outside of supercomputing centers where 500 people share accounts on a mainframe that has a dedicated support staff. They serve no purpose on a laptop, let alone an embedded system.)

    The corner case is /etc, which can be writeable (we symlink it to var/etc) or a read-only part of the / partition. It's really a question of whether you want to update configuration information and user accounts in a running system, or whether that stuff should be fixed before deploying. We're doing some cleanup, but leaving /etc writeable (as a symlink to /var/etc). Firmware Linux symlinks /etc/mtab->/proc/mounts, which is required by modern stuff like shared subtrees. If you want a read-only /etc, use "find /etc -type f | xargs ls -lt" to see what gets updated on the live system. Some specific cases are that /etc/adjtime was moved to /var by LSB and /etc/resolv.conf should be a symlink somewhere writeable.

    The resulting mount points

    The result of all this is that a running system can have / be mounted read only (with /usr living under that), /var can be ramfs or tmpfs with a tarball extracted to initialize it on boot, /dev can be ramfs/tmpfs managed by udev or mdev (with /dev/pts as devpts under that: note that /dev/shm naturally inherits /dev's tmpfs and some things like User Mode Linux get upset if /dev/shm is mounted noexec), /proc can be procfs, /sys can bs sysfs. Optionally, /home can be be an actual writeable filesystem on a hard drive or the network.

    Remember to put root's home directory somewhere writeable (I.E. /root should move to either /var/root or /home/root, change the passwd entry to do this), and life is good.

    Firmware Linux is an embedded Linux distribution builder, which creates a bootable single file Linux system based on uClibc and BusyBox/toybox. It's basically a shell script that builds a complete Linux system from source code for an arbitrary target hardware platform.

    The FWL script starts by building a cross-compiler for the appropriate target. Then it cross-compiles a small Linux system for the target, which is capable of acting as a native development environment when run on the appropriate hardware (or under an emulator such as QEMU). Finally the build script creates an ext2 root filesystem image, and packages it with a kernel configured to boot under QEMU and shell scripts to invoke qemu appropriately.

    The FWL boot script for qemu (/tools/bin/qemu-setup.sh) populates /dev from sysfs, sets up an emulated (masquerading) network (so you can wget source packages or talk to distcc), and creates a few symlinks needed to test build normal software packages (such as making /lib point to /tools/lib). It also mounts /dev/hdb (or /dev/sdb) on /home if a second emulated drive is present.

    For most platforms, exiting the command shell will exit the emulator. (Some, such as powerpc, don't support this yet. For those you have to kill qemu from another window, or exit the xterm. I'm working on it.)

    To use this emulated system as a native build environment, see native compiling.

    Adding a new target platform

    The differences between platforms are confined to a single directory, sources/targets. Each subdirectory under that contains all the configuration information for a specific target platform FWL can produce system images for. The same scripts build the same packages for each platform, differing only in which configuration directory they pull data from.

    Each target configuration directory has three interesting files:

    These configuration files are read and processed by the script include.sh.

    Target name.

    The name of the target directory is saved in the variable "$ARCH", and used to form a "tuple" for gcc and binutils by appending "-unknown-linux" to the directory name. So the first thing to do is find out what platform name gcc and binutils want for your target platform, and name your target directory appropriately.

    (Note: if your platform really can't use an "${ARCH}-unknown-linux" style tuple, and instead needs a tuple like "bfin-elf", you can set the variable CROSS_TARGET in the "details" file to override the default value and feed some other --target to gcc and binutils. You really shouldn't have to do this unless gcc doesn't yet fully support Linux on your platform, or unless you're doing multiple variants of the same target such as powerpc and ppc440. Try the default first, and fix it if necessary.)

    The name of the target directory is also used in the name of the various directories generated during the build (temp-$ARCH, cross-compiler-$ARCH, and mini-native-$ARCH, all in the build/ directory), and as the prefix of the cross compiler binaries ($ARCH-gcc and friends).

    $ARCH/details

    The following environment variables may be set in the "details" file:

    Miniconfig files

    The expanded .config files used to build both Linux and uClibc are copied into the /usr/src directory of mini-native filesystems during the build, and kept for future reference.

    The Linux kernel and uClibc each need a configuration file to build. Firmware Linux uses the "miniconfig" file format, which contains only the configuration symbols a user would have to switch on in menuconfig if they started from allnoconfig.

    To generate a miniconfig, first configure your kernel with menuconfig, then copy the resulting .config file to a temporary filename (such as "tempfile"). Then run the miniconfig.sh script in the sources/toys directory with the temporary file name as your argument and with the environment variable ARCH set to the $KARCH value in your new config file (and exported if necessary). This should produce a new file, "mini.config", which is your .config file converted to miniconfig format.

    For example, to produce a miniconfig for a given platform:

    make ARCH=$KARCH menuconfig
    mv .config tempfile
    ARCH=$KARCH miniconfig.sh tempfile
    ls -l mini.config
    

    To expand a mini.config back into a full .config file (to build a kernel by hand, or for further editing with menuconfig), you can go:

    make ARCH=$KARCH allnoconfig KCONFIG_ALLCONFIG=mini.config
    

    Remember to supply an actual value for $KARCH.

    $ARCH/miniconfig-linux

    This is the miniconfig file to build a Linux kernel for the appropriate target. This is usually aimed at booting under QEMU, but if you'd like to come up with your own configuration for actual target hardware, feel free.

    The starting point for kernel configs is generally one of the defconfig files from the Linux kernel source code, usually at "arch/$ARCH/configs/*_defconfig". Copy that to .config at the top of the kernel source, run menuconfig to edit it, then shrink it into a miniconfig.

    Kernels to run system images under qemu generally require the following hardware: serial port (for /dev/console), hard drive (for hda and hdb images), network card (for distcc), and a persistent realtime clock (make gets unhappy if source files are newer than the current time). The ability to address at least 512 megs of memory is also nice, although some targets (such as mips) are limited to less than that by the hardware. The "qemu-system-$ARCH -M ?" and "qemu-system-$ARCH -cpu ?" options may be informative here, also the QEMU System emulator for non PC targets documentation.

    $ARCH/miniconfig-uClibc

    Just like the Linux kernel, uClibc needs a .config file to build, and so the Firmware Linux configuration file supplies a miniconfig. Note that uClibc doesn't require an ARCH= value, because all its architecture information is stored in the config file. Otherwise the procedure for creating and using it is the same as for the Linux kernel, just with a different filename and contents.

    Most of each miniconfig-uClibc is identical from platform to platform. Usually only the "Target Architecture" changes (and occasionally an entry or two out of Target Architecture Features and Options). At some point in the future the rest of the uClibc configuration might be factored out into a common file, but so far removing the duplication hasn't been worth the extra complexity.