changeset 584:26a5ac0c01ce

Random update to docs I did a while ago, need to do more.
author Rob Landley <rob@landley.net>
date Tue, 13 Jan 2009 18:32:38 -0600
parents 1bb180e6a4ba
children dd03aa5996e6
files www/documentation.html
diffstat 1 files changed, 152 insertions(+), 114 deletions(-) [+]
line wrap: on
line diff
--- a/www/documentation.html	Tue Jan 13 18:28:21 2009 -0600
+++ b/www/documentation.html	Tue Jan 13 18:32:38 2009 -0600
@@ -281,7 +281,7 @@
 <p>This script populates the <b>build/host</b> directory with
 host versions of the busybox and toybox command line tools (the same ones
 that the target's eventual root filesystem will contain), plus symlinks to the
-host's compiler toolchain.</p>
+host's compiler toolchain (I.E. compiler, linker, assembler, and so on).</p>
 
 <p>This allows the calling scripts to trim the $PATH to point to just this
 one directory, which serves several purposes:</p>
@@ -1257,50 +1257,91 @@
 (ext2, initramfs, jffs2).</p></li>
 </ul>
 
+<a name="distcc_trick"><h2>Speeding up emulated builds (the distcc accelerator trick)</h2></a>
+
+<p>Cross compiling is fast but unreliable.  The ./configure stage is
+designed wrong (it asks questions about the host system it's building
+on, and thinks the answers apply to the target binary it's creating).
+
+<p>
+
 <hr>
 
 <a name="why"><h1>Why do things this way</h1></a>
 
-<h1>NOTE: FROM HERE TO THE END OF THIS FILE IS OUT OF DATE</h1>
+<h1>UNDER DEVELOPMENT</h1>
+
+<p>This entire section is a dumping ground for historical information.
+It's incomplete, lots of it's out of date, and it hasn't been integrated into
+a coherent whole yet.  What is here is in no obvious order.</p>
+
+<a name="cross_compiling"><h2>Why cross compiling sucks</h2></a>
+
+<p>Cross compiling is fast but unreliable.  Most builds go "./configure; make;
+make install", but entire ./configure stage is designed wrong for cross
+compiling: it asks questions about the host system it's building
+on, and thinks the answers apply to the target binary it's creating.</p>
 
-<p>This section has some historical information.  It describes the evolution
-of the firmware Linux build process, but hasn't been brought up to date
-in all the particulars.</p>
+<p>Build processes often create temporary binaries which run during the
+build (to generate header files, parse configuration information ala
+kconfig, various "makedep" style dependency generators...).  These builds
+need two compilers, one for the host and one for the target, and need to
+keep straight when to use each one.</p>
+
+<p>Cross compilers leak host data, falling back to the host's headers and
+libraries if they can't find the target files they need.</p>
+
+<p>TODO: finish this.</p>
+
+<a name="distcc_trick"><h2>Speeding up emulated builds (the distcc accelerator trick)</h2></a>
+
+<p>TODO: FILL THIS OUT</p>
 
 <h2>The basic theory</h2>
 
 <p>The Linux From Scratch approach is to build a minimal intermediate system
 with just enough packages to be able to compile stuff, chroot into that, and
-build the final system from there.  This isolates the host from the target,
-which means you should be able to build under a wide variety of distributions.
-It also means the final system is built with a known set of tools, so you get
-a consistent result.</p>
+build the final system from there.</p>
 
-<p>A minimal build environment consists of a C library, a compiler, and BusyBox.
-So in theory you just need three packages:</p>
+<p>This approach completely isolates the host from the
+target, which means you should be able to run the FWL build under a wide
+variety of Linux distributions, and since the final system is built with a
+known set of tools you should get a consistent result.  It also means you
+could run a prebuilt system image under a different host operating system
+entirely (such as MacOS X, or an arm version of linux on an x86-64 host)
+as long as you have an appropriate emulator.</p>
+
+<p>A minimal build environment consists of a compiler, command line tools,
+and a C library.  In theory you just need three packages:</p>
 
 <ul>
+  <li>A C compiler.</li>
+  <li>BusyBox</li>
   <li>A C library (uClibc)</li>
-  <li>A toolchain (pcc, or llvm/clang)</li>
-  <li>BusyBox</li>
 </ul>
 
 <p>Unfortunately, that doesn't work yet.</p>
 
 <h2>Some differences between theory and reality.</h2>
 
+<p>We actually need seven packages (linux, uClibc, busybox, binutils, gcc,
+make, and bash) to create a working build environment.  We also add an optional
+package for speed (distcc), and use two more (genext2fs and QEMU) to package
+and run the result.</p>
+
 <h3>Environmental dependencies.</h3>
 
 <p>Environmental dependencies are things that need to be installed before you
 can build or run a given package.  Lots of packages depend on things like zlib,
 SDL, texinfo, and all sorts of other strange things.  (The GnuCash project
 stalled years ago after it released a version with so many environmental
-dependencies it was impossible to build or install.  Environmental dependencies
-have a complexity cost, and are thus something to be minimized.)</p>
+dependencies it was virtually impossible to build or install.  Environmental
+dependencies have a complexity cost, and are thus something to be
+minimized.)</p>
 
 <p>A good build system will scan its environment to figure out what it has
-available, and disable functionality that depends on stuff that isn't
-available.  (This is generally done with autoconf, which is disgusting but
+available, and disable functionality that depends on anything that isn't.
+(This is generally done with autoconf, which is disgusting but
 suffers from a lack of alternatives.)  That way, the complexity cost is
 optional: you can build a minimal version of the package if that's all you
 need.</p>
@@ -1317,83 +1358,78 @@
 as many environmental dependencies as possible.  Some are unavoidable (such as
 C libraries needing kernel headers or gcc needing binutils), but the
 intermediate system is the minimal fully functional Linux development
-environment I currently know how to build, and then we switch into that and
+environment we currently know how to build, and then we switch into that and
 work our way back up from there by building more packages in the new
 environment.</p>
 
 <h3>Resolving environmental dependencies.</h3>
 
 <p><b>To build uClibc you need kernel headers</b> identifying the syscalls and
-such it can make to the OS.  Way back when you could use the kernel headers
-straight out of the Linux kernel 2.4 tarball and they'd work fine, but sometime
-during 2.5 the kernel developers decided that exporting a sane API to userspace
-wasn't the kernel's job, and stopped doing it.</p>
-
-<p>The 0.8x series of Firmware Linux used
-<a href=http://ep09.pld-linux.org/~mmazur/linux-libc-headers/>kernel
-headers manually cleaned up by Mariusz Mazur</a>, but after the 2.6.12 kernel
-he had an attack of real life and fell too far behind to catch up again.</p>
-
-<p>The current practice is to use the Linux kernel's "make headers_install"
-target, created by David Woodhouse.  This runs various scripts against the
-kernel headers to sanitize them for use by userspace.  This was merged in
-2.6.18-rc1, and was more or less debugged by 2.6.19.  So can use the Linux
-Kernel tarball as a source of headers again.</p>
-
-<p>Another problem is that the busybox shell situation is a mess with four
-implementations that share little or no code (depending on how they're
-configured).  The first question when trying to fix them is "which of the four
-do you fix?", and I'm just not going there.  So until bbsh goes in we
-<b>substitute bash</b>.</p>
+such it can make to the OS.  We get them from the Linux kernel source tarball,
+using the "make headers_install" infrastructure created by David Woodhouse.
+This runs various scripts against the Linux kernel source code to sanitize
+the kernel's own headers for use by userspace.  (This was merged in 2.6.18-rc1,
+and was more or less debugged by 2.6.19.)</p>
 
-<p>Finally, <b>most packages expect gcc</b>.  None of the other compilers under
-development are a drop-in replacement for gcc yet, and none of them include
-a "make" program.  The tcc project once showed great promise, but 
-development stalled because Fabrice Bellard's other major project
-(qemu) is taking up all his time these days, and the developers he handed
-off to have chosen to stick with a 20 year old CVS repository format
-which hinders new development.  Back in 2004 Fabrice
-<a href=http://bellard.org/tcc/tccboot.html>built a modified Linux
-kernel with tcc</a>, and
-<a href=http://fabrice.bellard.free.fr/tcc/tccboot_readme.html>listed</a>
-what needed to be upgraded to build an unmodified kernel, but sometime
-around 2005 the project essentially died.  Since then, the BSD guys have
-made a serious effort at reviving pcc, and Apple has sponsored LLVM/clang.</p>
+<p><b>We install bash</a> because the busybox shell situation is a mess.
+Busybox has several different shell implementations which share little or no
+code.  (It's better now than it was a few years ago, but thanks to Ubuntu
+breaking the #!/bin/sh symlink with the Defective Annoying SHell, many
+scripts point explicitly at #!/bin/bash and BusyBox can't use that name for
+any of its shells yet.)</p>
 
-<p>At some point, either busybox or toybox will probably grow a "make"
-implementation (if for no other reason that I have vague plans to write
-one), but that's not very interesting until there's a viable alternative to
-the gnu toolchain.  In the meantime the only open source compiler that can
-build a complete Linux system is still GCC.</p>
-
-<p>The gnu compiler actually consists of three packages <b>(binutils, gcc, and
-make)</b>, which is why it's generally called the gnu "toolchain".  (The split
+<p><b>Most packages expect gcc</b>.  The gnu compiler "toolchain" actually
+consists of three packages <b>(binutils, gcc, and make)</b>.  (The split
 between binutils and gcc is for purely historical reasons, and you have
 to match the right versions with each other or things break.)</p>
 
-<p>This means that to compile a minimal build environment, you need seven
-packages, and to actually run the result we use an eighth package (QEMU).</p>
+<p>Adding an SUSv3
+<a href=http://www.opengroup.org/onlinepubs/009695399/utilities/make.html>make</a>
+implementation to busybox or toybox isn't a major problem, but until a viable
+GCC replacement emerges there's not much point.</p>
+
+<p>None of the other compilers under development are a drop-in replacement for
+gcc yet, especially for building the Linux kernel (which makes extensive use of
+gcc extensions).  <a href=http://www.intel.com/cd/software/products/asmo-na/eng/277618.htm>Intel's C Compiler</a>
+implemented the necessary gcc extensions to build the Linux kernel, but it's
+a closed source package only supporting x86 and x86-64 targets.  Since
+the introduction of C99, the Linux kernel has replaced many of these gcc
+extensions with equivalent C99 idioms, so in theory building the Linux kernel
+with other compilers is now easier.</p>
 
-<p>This can actually be made to work.  The next question is how?</p>
+<p>With the introduction of GPLv3, the Free Software Foundation has pissed off
+enough people that work on an open source replacement for gcc is ongoing on
+several fronts.  The most promising is probably
+<a href=http://pcc.ludd.ltu.se/>PCC</a>, which is
+supported by what's left of the BSD community.  Apple sponsors another
+significant effort, <a href=http://clang.llvm.org/>LLVM/Clang</a>.  Both are
+worth watching.</p>
+
+<p>Several others (Such as TinyCC and Open Watcom) once showed promise but have
+been essentially moribund since about 2005, which is when compilers that only
+ran on 32 bit hosts and supported C89 stopped being interesting.  (A
+significant amount of effort is required to retool an existing compiler to
+cleanly run on an x86-64 host and support the full C99 feature set, let alone
+produce output for the dozens of hardware platforms supported by Linux, or
+produce similarly optimized binaries.)</p>
 
 <h2>Additional complications</h2>
 
 <h3>Cross-compiling and avoiding root access</h3>
 
-<p>The first problem is that we're cross-compiling.  We can't help it.
-You're cross-compiling any time you create target binaries that won't run on
-the host system.  Even when both the host and target are on the same processor,
+<p>Any time you create target binaries that won't run on the host system, you're
+cross compiling.  Even when both the host and target are on the same processor,
 if they're sufficiently different that one can't run the other's binaries, then
 you're cross-compiling.  In our case, the host is usually running both a
 different C library and an older kernel version than the target, even when
 it's the same processor.</p>
 
-<p>The second problem is that we want to avoid requiring root access to build
-Firmware Linux.  If the build can run as a normal user, it's a lot more
-portable and a lot less likely to muck up the host system if something goes
-wrong.  This means we can't modify the host's / directory (making anything
-that requires absolute paths problematic).  We also can't mknod, chown, chgrp,
-mount (for --bind, loopback, tmpfs)...</p>
+<p>We want to avoid requiring root access to build Firmware Linux.  If the
+build can run as a normal user, it's a lot more portable and a lot less likely
+to muck up the host system if something goes wrong.  This means we can't modify
+the host's / directory (making anything that requires absolute paths
+problematic).  We also can't mknod, chown, chgrp, mount (for --bind, loopback,
+tmpfs)...</p>
 
 <p>In addition, the gnu toolchain (gcc/binutils) is chock-full of hardwired
 assumptions, such as what C library it's linking binaries against, where to look
@@ -1490,48 +1526,7 @@
 yet, I gave up.  Armulator may be a patch against an obsolete version of gdb,
 but I could at least get it to build.)</p>
 
-<h1>Packaging</h1>
-
-<p>The single file packaging combines a linux kernel, initramfs, squashfs
-partition, and cryptographic signature.</p>
-
-<p>In Linux 2.6, the kernel and initramfs are already combined into a single
-file.  At the start of this file is either the obsolete floppy boot sector
-(just a stub in 2.6), or an ELF header which has 12 used bytes followed by 8
-unused bytes.  Either way, we can generally use the 4 bytes starting at offset
-12 to store the original length of the kernel image, then append a squashfs
-root partition to the file, followed by a whole-file cryptographic
-signature.</p>
-
-<p>Loading an ELF kernel (such as User Mode Linux or a non-x86 ELF kernel)
-is controlled by the ELF segments, so the appended data is ignored.
-(Note: don't strip the file or the appended data will be lost.)  Loading an x86
-bzImage kernel requires a modified boot loader that can be told the original
-size of the kernel, rather than querying the current file length (which would
-be too long).  Hence the patch to Lilo allowing a "length=xxx" argument in the
-config file.</p>
-
-<p>Upon boot, the kernel runs the initramfs code which finds the firmware
-file.  In the case of User Mode Linux, the symlink /proc/self/exe points
-to the path of the file.  A bootable kernel needs a command line argument
-of the form firmware=device:/path/to/file (it can lookup the device in
-/sys/block and create a temporary device node to mount it with; this is
-in expectation of dynamic major/minor happening sooner or later).
-Once the file is found, /dev/loop0 is bound to it with an offset (losetup -o,
-with a value extracted from the 4 bytes stored at offset 12 in the file), and
-the resulting squashfs is used as the new root partition.</p>
-
-<p>The cryptographic signature can be verified on boot, but more importantly
-it can be verified when upgrading the firmware.  New firmware images can
-be installed beside old firmware, and LILO can be updated with boot options
-for both firmware, with a default pointing to the _old_ firmware.  The
-lilo -R option sets the command line for the next boot only, and that can
-be used to boot into the new firmware.  The new firmware can run whatever
-self-diagnostic is desired before permanently changing the default.  If the
-new firmware doesn't boot (or fails its diagnostic), power cycle the machine
-and the old firmware comes up.  (Note that grub does not have an equivalent
-for LILO's -R option; which would mean that if the new firmware doesn't run,
-you have a brick.)</p>
+<h2>Packaging</h2>
 
 <h2>Filesystem Layout</h2>
 
@@ -1959,6 +1954,49 @@
 itself again with that temporary compiler, and then build itself a _third_
 time with the second compiler.
 
+<h1>Packaging</h1>
+
+<p>The single file packaging combines a linux kernel, initramfs, squashfs
+partition, and cryptographic signature.</p>
+
+<p>In Linux 2.6, the kernel and initramfs are already combined into a single
+file.  At the start of this file is either the obsolete floppy boot sector
+(just a stub in 2.6), or an ELF header which has 12 used bytes followed by 8
+unused bytes.  Either way, we can generally use the 4 bytes starting at offset
+12 to store the original length of the kernel image, then append a squashfs
+root partition to the file, followed by a whole-file cryptographic
+signature.</p>
+
+<p>Loading an ELF kernel (such as User Mode Linux or a non-x86 ELF kernel)
+is controlled by the ELF segments, so the appended data is ignored.
+(Note: don't strip the file or the appended data will be lost.)  Loading an x86
+bzImage kernel requires a modified boot loader that can be told the original
+size of the kernel, rather than querying the current file length (which would
+be too long).  Hence the patch to Lilo allowing a "length=xxx" argument in the
+config file.</p>
+
+<p>Upon boot, the kernel runs the initramfs code which finds the firmware
+file.  In the case of User Mode Linux, the symlink /proc/self/exe points
+to the path of the file.  A bootable kernel needs a command line argument
+of the form firmware=device:/path/to/file (it can lookup the device in
+/sys/block and create a temporary device node to mount it with; this is
+in expectation of dynamic major/minor happening sooner or later).
+Once the file is found, /dev/loop0 is bound to it with an offset (losetup -o,
+with a value extracted from the 4 bytes stored at offset 12 in the file), and
+the resulting squashfs is used as the new root partition.</p>
+
+<p>The cryptographic signature can be verified on boot, but more importantly
+it can be verified when upgrading the firmware.  New firmware images can
+be installed beside old firmware, and LILO can be updated with boot options
+for both firmware, with a default pointing to the _old_ firmware.  The
+lilo -R option sets the command line for the next boot only, and that can
+be used to boot into the new firmware.  The new firmware can run whatever
+self-diagnostic is desired before permanently changing the default.  If the
+new firmware doesn't boot (or fails its diagnostic), power cycle the machine
+and the old firmware comes up.  (Note that grub does not have an equivalent
+for LILO's -R option; which would mean that if the new firmware doesn't run,
+you have a brick.)</p>
+
 -->