Cross Compiling Linux - 2 hour tutorial

Instructor: Rob Landley

This is a practical introduction to cross compiling, during which we'll build a working cross-compiler, use it to cross-compile a native uClibc-based Linux development environment, and boot this new environment under QEMU.

Attendees may choose arm, mips, x86, x86_64, sparc, or PPC as the platform they wish to build for. The author's Firmware Linux project (which already does all this) will be used as an example. Attendees should bring a reasonably fast laptop with net access and at least 256 megs of ram.

General outline:

Terminology

Handout: Introduction to cross compiling for Linux covers the following:

Long rant of how gcc is broken.

GCC thinks it's "special". Perhaps it is in the sense of "very special shark-jumping episode", or the "special olympics", but as programs go, it's arogant and chock-full of unnecesary assumptions.

Fundamentally, a compiler is no different from a docbook to pdf converter. It takes input files, processes them, and produces output files. The fact that sometimes these output files are executable files that will run in the current environment is irrelevant. It doesn't change the nature of the program.

A compiler has six total sources of input, most of which are implicit. In addition to the files listed on the command line, a compiler also needs system header files (things like libc), compiler internal header files (things like varargs), compiler libraries (libgcc.a), system libraries (libc.so), and an executable search path (to call things like ld. Why not just use $PATH? Because it's "special".) But a docbook to pdf converter needs fonts, which are implicit input files not listed anywhere on the command line, and are found in some other directory. This is no excuse for having to say "--host=i686" when compiling something like "xmlto", and a compiler is no different.

So why does gcc insist you tell it "--host" when compiling? Because it's "special".

How exactly is GCC arogant and broken?

Arogant: GCC thinks the only compiler in the world that can competently compile it is itself. So it builds itself three times. First, it builds a temporary version of itself called xgcc, using the existing host system's compiler. Then it builds a second version of itself with xgcc. Then it builds a _third_ version of itself with the second version, and compares the two to ensure they're identical. (Paranoid? Why yes!)

Note: this doesn't work when it's making a cross compiler, if it built any executables with xgcc, they wouldn't run on the host system. Luckily the gcc build system is just sane enough to realize this, and behaves completely differently for cross compiling than for native compiling. Its' cross-compiling behavior is almost sane. Unfortunately, it falls back to the insane behavior at the drop of a hat. (Try to cross-compile from "i686 with glibc" to "i686 with uClibc" and it goes "the processor's the same, we must not be cross compiling". Despite the fact the result won't run on the host system. This is why the "walrus" trick.)

Broken: The six paths mentioned earlier aren't orthogonal. If you could specify them as colon-separated search paths, life would be good. But gcc won't let you control that explicitly. Instead it synthesizes big long search paths using data from lots of different sources. Some elements can be specified by passing arguments to --configure. Some come from environment variables. Some are hardwired into the C source code. Whenever the previous layer of path logic is declared unsalvageable, they write another layer in front of it, but they don't remove the old layer. Instead they fall back when the new one fails to find the file it needs. So the gcc front-end gets bigger, and bigger, and the old code keeps festering.

A few years ago the gcc developers invented "spec" files, (which of course have more hardwired paths in them,) but nobody other than the gcc developers ever showed any inteest in learning the spec file format. Rather than ripping it out again, they sucked the spec files into the gcc executable as big hardwired strings which it parses at runtime. (Of course it still searches the filesystem for spec files to override the built-in ones.)

All these paths are absolute paths from the root directory. This means if you build gcc in one directory (such as your home directory), and then tar it up and copy it somewhere else: it won't work. (This is also why you have to tell it where you're going to install it at ./configure time instead of at install time, so it can hardwire all those absolute paths into the executable.)

In an attempt to be dynamic, for some (but not all) paths, gcc figures out where its executable is, constructs a path to a subdirectory under that, tacks "../../.." onto that, descends into other directories, and checks for files there. Unfortunately, it A) doesn't do this for all the paths it needs to, B) gets it wrong when it does.

What all this means is that gcc often can't find files it installed. Even though these files came from gcc, and are still right where the gcc build put them, it has no idea where they are. (Don't run gcc under strace unless you have a really strong stomach. The amount of wasted effort looking for files _it_installed_ is just infuriating.)

Worse, if it can't find things it falls back to default locations like /usr/include, and /usr/lib, so if you have a host compiler installed on the system its files can leak into your cross compiler, producing incorrect results. You can't _stop_ it from falling back. This can result in misleading error messages, or it can quietly build binaries that don't work right.

Recently, gcc added yet another layer on top of all the other layers, where you can specify --sysroot (each time you invoke gcc) and tell gcc to act as if it's running in a chroot directory. It still stores everything as absolute paths, but at runtime it replaces the start of each absolute path with the --sysroot value. This doesn't make it much more likely to find the files than it was before (painting over the dry rot with yet another layer of gcc path logic does not inspire confidence), but at least it has the advantage that when it fails and falls back, the search paths it's using are so damaged that it's less likely to accidentally find the host compiler's files and leak those into your cross compile build.

Still, even with --sysroot it's searching for the same file in a half-dozen places, even though it was the one that installed that file. I'll trust them when they start _removing_ old code that doesn't work, and stop looking in lots of places the file isn't.

because it's constructed almost entirely out of unnecessary assumptions.

details of the machine it's running on. A compiler no more needs to care about the host system it's running on than "xmlto" does. The compiler that built it has to know how to produce executables for the host. The new compiler only needs to care about the target.

mean you have to configure (Fonts? Search path.) A compiler has six sources of input: compiler headers, system headers, compiler libraries, system libraries, executable search path (why not just $PATH? "special"), command line.

See handout "Introduction to cross compiling for Linux".

Page 2:


  Components: linux, binutils, gcc, and uClibc.

Using the cross compiler toolchain.

  Add it to the path, then build a statically linked version of "hello world".

    PATH=~/cross:$PATH prefix-gcc -static hello.c -o hello

  Note: We could build dynamic, but then we'd have to install the libraries
  into the host system to test it.

  Test-run the result with your emulator.

    qemu-arm ./a.out
  

4) Making a native build environment (adding make, busybox, and bash).
5) Packaging disk images, booting, and running under QEMU.
6) Optimizations and alternatives.
   (distcc, armulator, boards/bootloaders, nfs, tsrpm)
7) Where to from here?  (LFS, gentoo, etc.)

Links:
  http://www.landley.net/writing/docs/cross-compiling.html
  http://www.landley.net/code/firmware/about.html
  http://www.landley.net/code/firmware/design.html
  http://cross-lfs.org/files/BOOK/1.0.0/
  http://www.gentoo.org/proj/en/base/embedded/index.xml
  http://gentoo-wiki.com/Embedded_Gentoo