view design.html @ 2:be48c60f9edb

Update it a bit. Talk about environmental dependencies, etc.
author landley@driftwood
date Sun, 13 Aug 2006 22:30:17 -0400
parents 9b6afefcc082
line source
1 <title>Flimsy rationalizations for all of my design mistakes</title>
3 <h1>Build Process</h1>
5 <h2>Executive summary</h2>
7 <p>Cross-compile just enough to get a native compiler for the new environment,
8 and then emulate the new environment with QEMU to build the final system
9 natively.</p>
11 <p>The intermediate system is built and run using only the following eight
12 packages:</p>
14 <ul>
15 <li>linux-kernel</li>
16 <li>uclibc</li>
17 <li>busybox</li>
18 <li>binutils</li>
19 <li>gcc</li>
20 <li>make</li>
21 <li>bash<li>
22 <li>QEMU</li>
23 </ul>
25 <h2>The basic theory</h2>
27 <p>What we want to do is build a minimal intermediate system with just enough
28 packages to be able to compile stuff, chroot into that, and build the final
29 system from there. This isolates the host from the target, which means you
30 should be able to build under a wide variety of distributions. It also means
31 the final system is built with a known set of tools, so you get a consistent
32 result.</p>
34 <p>A minimal build environment consists of a C library, a compiler, and BusyBox.
35 So in theory you just need three packages:</p>
37 <ul>
38 <li>A C library (uClibc)</li>
39 <li>A toolchain (tcc)</li>
40 <li>BusyBox</li>
41 </ul>
43 <p>Unfortunately, that doesn't work yet.</p>
45 <h2>Some differences between theory and reality.</h2>
47 <h3>Environmental dependencies.</h2>
49 <p>Environmental dependencies are things that need to be installed before you
50 can build or run a given package. Lots of packages depend on things like zlib,
51 SDL, texinfo, and all sorts of other strange things. (The GnuCash project
52 stalled years ago after it released a version with so many environmental
53 dependencies it was impossible to build or install. Environmental dependencies
54 have a complexity cost, and are thus something to be minimized.)</p>
56 <p>A good build system will scan its environment to figure out what it has
57 available, and disable functionality that depends on stuff that isn't
58 available. (This is generally done with autoconf, which is disgusting but
59 suffers from a lack of alternatives.) That way, the complexity cost is
60 optional: you can build a minimal version of the package if that's all you
61 need.</p>
63 <p>A really good build system can be told that the environment
64 it's building in and the environment the result will run in are different,
65 so just because it finds zlib on the build system doesn't mean that the
66 target system will have zlib installed on it. (And even if it does, it may not
67 be the same version. This is one of the big things that makes cross-compiling
68 such a pain. One big reason for statically linking programs is to eliminate
69 this kind of environmental dependency.)</p>
71 <p>The Firmware Linux build process is structured the way it is to eliminate
72 environmental dependencies. Some are unavoidable (such as C libraries needing
73 kernel headers or gcc needing binutils), but the intermediate system is
74 the minimal fully functional Linux development environment I currently know
75 how to build, and then we chroot into that and work our way back up from there
76 by building more packages in the new environment.</p>
78 <h3>Resolving environmental dependencies.</h2>
80 <p><b>To build uClibc you need kernel headers</b> identifying the syscalls and
81 such it can make to the OS. Way back when you could use the kernel headers
82 straight out of the Linux kernel 2.4 tarball and they'd work fine, but sometime
83 during 2.5 the kernel developers decided that exporting a sane API to userspace
84 wasn't the kernel's job, and stopped doing it.</p>
86 <p>The 0.8x series of Firmware Linux used
87 <a href=http://ep09.pld-linux.org/~mmazur/linux-libc-headers/>kernel
88 headers manually cleaned up by Mariusz Mazur</a>, but after the 2.6.12 kernel
89 he had an attack of real life and fell too far behind to catch up again.</p>
91 <p>The current practice is to use the 2.6.18 kernel's "make headers_install"
92 target, created by David Woodhouse. This runs various scripts against the
93 kernel headers to sanitize them for use by userspace. This was merged in
94 2.6.18-rc1, so as of 2.6.18 we can use the Linux Kernel tarball as a source of
95 headers again.</p>
97 <p>Another problem is that the busybox shell situation is a mess with four
98 implementations that share little or no code (depending on how they're
99 configured). The first question when trying to fix them is "which of the four
100 do you fix?", and I'm just not going there. So until bbsh goes in we
101 <b>substitute bash</b>.</p>
103 <p>Finally, <b>most packages expect gcc</b>. The tcc project isn't a drop-in
104 gcc replacement yet, and doesn't include a "make" program. Most importantly,
105 tcc development appears stalled because Fabrice Bellard's other major project
106 (qemu) is taking up all his time these days. In 2004 Fabrice
107 <a href=http://fabrice.bellard.free.fr/tcc/tccboot.html>built a modified Linux
108 kernel with tcc</a>, and
109 <a href=http://fabrice.bellard.free.fr/tcc/tccboot_readme.html>listed</a>
110 what needed to be upgraded in TCC to build an unmodified kernel, but
111 since then he hardly seems to have touched tcc. Hopefully, someday he'll get
112 back to it and put out a 1.0 release of tcc that's a drop-in gcc replacment.
113 (And if he does, I'll add a make implementation to BusyBox so we don't need
114 to use any of the gnu toolchain). But in the meantime the only open source
115 compiler that can build a complete Linux system is still the gnu compiler.</p>
117 <p>The gnu compiler actually consists of three packages <b>(binutils, gcc, and
118 make)</b>, which is why it's generally called the gnu "toolchain". (The split
119 between binutils and gcc is for purely historical reasons, and you have
120 to match the right versions with each other or things break.)</p>
122 <p>This means that to compile a minimal build environment, you need seven
123 packages, and to actually run the result we use an eighth package (QEMU).</p>
125 <p>This can actually be made to work. The next question is how?</p>
127 <h2>Additional complications</h2>
129 <h3>Cross-compiling and avoiding root access</h2>
131 <p>The first problem is that we're cross-compiling. We can't help it.
132 You're cross-compiling any time you create target binaries that won't run on
133 the host system. Even when both the host and target are on the same processor,
134 if they're sufficiently different that one can't run the other's binaries, then
135 you're cross-compiling. In our case, the host is usually running both a
136 different C library and an older kernel version than the target, even when
137 it's the same processor.</p>
139 <p>The second problem is that we want to avoid requiring root access to build
140 Firmware Linux. If the build can run as a normal user, it's a lot more
141 portable and a lot less likely to muck up the host system if something goes
142 wrong. This means we can't modify the host's / directory (making anything
143 that requires absolute paths problematic). We also can't mknod, chown, chgrp,
144 mount (for --bind, loopback, tmpfs)...</p>
146 <p>In addition, the gnu toolchain (gcc/binutils) is chock-full of hardwired
147 assumptions, such as what C library it's linking binaries against, where to look
148 for #included headers, where to look for libraries, the absolute path the
149 compiler is installed at... Silliest of all, it assumes that if the host and
150 target use the same processor, you're not cross-compiling (even if they have
151 a different C library and a different kernel, and even if you ./configure it
152 for cross-compiling it switches that back off because it knows better than
153 you do). This makes it very brittle, and it also tends to leak its assumptions
154 into the programs it builds. New versions may someday fix this, but for now we
155 have to hit it on the head repeatedly with a metal bar to get anything remotely
156 useful out of it, and run it in a separate filesystem (chroot environment) so
157 it can't reach out and grab the wrong headers or wrong libraries despite
158 everything we've told it.</p>
160 <p>The absolute paths problem affects target binaries because all dynamically
161 linked apps expect their shared library loader to live at an absolute path
162 (in this case /lib/ld-uClibc.so.0). This directory is only writeable by root,
163 and even if we could install it there polluting the host like that is just
164 ugly.</p>
166 <p>The Firmware Linux build has to assume it's cross-compiling because the host
167 is generally running glibc, and the target is running uClibc, so the libraries
168 the target binaries need aren't installed on the host. Even if they're
169 statically linked (which also mitigates the absolute paths problem somewhat),
170 the target often has a newer kernel than the host, so the set of syscalls
171 uClibc makes (thinking it's talking to the new kernel, since that's what the
172 ABI the kernel headers it was built against describe) may not be entirely
173 understood by the old kernel, leading to segfaults. (One of the reasons glibc
174 is larger than uClibc is it checks the kernel to see if it supports things
175 like long filenames or 32-bit device nodes before trying to use them. uClibc
176 should always work on a newer kernel than the one it was built to expect, but
177 not necessarily an older one.)</p>
179 <h2>Ways to make it all work</h2>
181 <h3>Cross compiling vs native compiling under emulation</h3>
183 <p>Cross compiling is a pain. There are a lot of ways to get it to sort of
184 kinda work for certain versions of certain packages built on certain versions
185 of certain distributions. But making it reliable or generally applicable is
186 hard to do.</p>
188 <p>I wrote an
189 <a href=https://crossdev.timesys.com/documentation/introduction-to-cross-compiling-for-linux/>introduction
190 to cross-compiling</a> which explains the terminology, plusses and minuses,
191 and why you might want to do it. Keep in mind that I wrote that for a company
192 that specializes in cross-compiling. Personally, I consider cross-compiling
193 a necessary evil to be minimized, and that's how Firmware Linux is designed.
194 We cross-compile just enough stuff to get a working native compiler for the
195 new platform, which we then run under emulation.</p>
197 <h3>Which emulator?</h3>
199 <p>The emulator Firmware Linux 0.8x used was User Mode Linux (here's a
200 <a href=http://www.landley.net/code/UML.html>UML mini-howto</a> I wrote
201 while getting this to work). Since we already need the linux-kernel source
202 tarball anyway, building User Mode Linux from it was convenient and minimized
203 the number of packages we needed to build the minimal system.</p>
205 <p>The first stage of the build compiled a UML kernel and ran the rest of the
206 build under that, using UML's hostfs to mount the parent's root filesystem as
207 the root filesystem for the new UML kernel. This solved both the kernel
208 version and the root access problems. The UML kernel was the new version, and
209 supported all the new syscalls and ioctls and such that the uClibc was built to
210 expect, translating them to calls to the host system's C library as necessary.
211 Processes running under User Mode Linux had root access (at least as far as UML
212 was concerned), and although they couldn't write to the hostfs mounted root
213 partition, they could create an ext2 image file, loopback mount it, --bind
214 mount in directories from the hostfs partition to get the apps they needed,
215 and chroot into it. Which is what the build did.</p>
217 <p>Current Firmware Linux has switched to a different emulator, QEMU, because
218 as long as we're we're cross-compiling anyway we might as well have the
219 ability to cross-compile for non-x86 targets. We still build a new kernel
220 to run the uClibc binaries with the new kernel ABI, we just build a bootable
221 kernel and run it under QEMU.</p>
223 <p>The main difference with QEMU is a sharper dividing line between the host
224 system and the emulated target. Under UML we could switch to the emulated
225 system early and still run host binaries (via the hostfs mount). This meant
226 we could be much more relaxed about cross compiling, because we had one
227 environment that ran both types of binaries. But this doesn't work if we're
228 building an ARM, PPC, or x86-64 system on an x86 host.</p>
230 <p>Instead, we need to sequence more carefully. We build a cross-compiler,
231 use that to cross-compile a minimal intermediate system from the seven packages
232 listed earlier, and build a kernel and QEMU. Then we run the kernel under QEMU
233 with the new intermediate system, and have it build the rest natively.</p>
235 <p>It's possible to use other emulators instead of QEMU, and I have a todo
236 item to look at armulator. (I looked at another nommu system simulator at
237 Ottawa Linux Symposium, but after resolving the third unnecessary environmental
238 dependency and still not being able to get it to finish compiling yet, I
239 gave up. Armulator may be a patch against an obsolete version of gdb, but I
240 could at least get it to build.)</p>
242 <h3>Alternatives to emulation</h3>
244 <p>The main downsides of emulation are that is it's slow, can use a lot of
245 memory, and can be tricky to debug if something goes wrong in the emulated
246 environment. Cross compiling is sufficiently harder than native compiling that
247 I consider it a good trade-off, but there are alternatives.</p>
249 <p>Some other build systems (such as uClibc's Buildroot) use a package called
250 <a href=http://freshmeat.net/projects/fakeroot/>fakeroot</a>, which is sort
251 of a halfway emulator. It creates an environment where binaries run as if
252 they had root access, but without being able to do anything that actually
253 requires root access. This is nice if you want to create tarballs with
254 device nodes and different ownership in them, but not so useful if you want
255 to actually use one of those device nodes, or twiddle mount points. Firmware
256 Linux doesn't use fakeroot (we use a real emulator instead), but it's
257 an option.</p>
259 <p>In theory, we could work around the "host hasn't got uClibc" problem by
260 statically linking our apps for the intermediate system, and work around the
261 "host kernel older than the kernel headers we're using" problem by either
262 building the intermediate version of uClibc with the host's kernel headers
263 or just linking against glibc instead of uClibc.</p>
265 <p>This has a number of
266 downsides: harvesting the host's kernel headers is distribution-specific, and
267 could easily leak bits of the host into the final system. Linking the host
268 tools against glibc (or a temporary version of uClibc built with different
269 kernel headers) doesn't give us as much evidence that the resulting system
270 will be able to rebuild itself under itself, and statically linking against
271 glibc wastes a regrettable amount of space. None of this works with real
272 cross-compiling between different processors (such as building an ARM system
273 from x86).</p>
275 <p>We'd still have to solve the other problems (such as gcc wanting absolute
276 paths) anyway, there just wouldn't be a switchover point where we could
277 run the binaries we were building and start native compiling. Instead we'd
278 have to keep cross-compiling all the way to the final system, and if anything's
279 wrong with it we wouldn't find out until we tried to run it. With the native
280 build, we've given the tools a bit of a workout during the build, so if the
281 build completes then the finished system shouldn't have anything too
282 fundamentally wrong with it.</p>
284 <p>(Note: QEMU can export a host directory to the target through the emulated
285 network card as an smb filesystem, but you don't want to run your root
286 filesystem on smb.)</p>
288 <h2>Filesystem Layout</h2>
290 <p>Firmware Linux's directory hierarchy is a bit idiosyncratic: some redundant
291 directories have been merged, with symlinks from the standard positions
292 pointing to their new positions. On the bright side, this makes it easy to
293 make the root partition read-only.</p>
295 <h3>Simplifying the $PATH.</h3>
297 <p>The set "bin->usr/bin, sbin->usr/sbin, lib->usr/lib" all serve to consolidate
298 all the executables under /usr. This has a bunch of nice effects: making a
299 a read-only run-from-CD filesystem easier to do, allowing du /usr to show
300 the whole system size, allowing everything outside of there to be mounted
301 noexec, and of course having just one place to look for everything. (Normal
302 executables are in /usr/bin. Root only executables are in /usr/sbin.
303 Libraries are in /usr/lib.)</p>
305 <p>For those of you wondering why /bin and /usr/sbin were split in the first
306 place, the answer is it's because Ken Thompson and Dennis Ritchie ran out
307 of space on the original 2.5 megabyte RK-05 disk pack their root partition
308 lived on in 1971, and leaked the OS into their second RK-05 disk pack where
309 the user home directories lived. When they got more disk space, they created
310 a new direct (/home) and moved all the user home directories there.</p>
312 <p>The real reason we kept it is tradition. The execuse is that the root
313 partition contains early boot stuff and /usr may get mounted later, but these
314 days we use initial ramdisks (initrd and initramfs) to handle that sort of
315 thing. The version skew issues of actually trying to mix and match different
316 versions of /lib/libc.so.* living on a local hard drive with a /usr/bin/*
317 from the network mount are not pretty.</p>
319 <p>I.E. The seperation is just a historical relic, and I've consolidated it in
320 the name of simplicity.</p>
322 <p>The one bit where this can cause a problem is merging /lib with /usr/lib,
323 which means that the same library can show up in the search path twice, and
324 when that happens binutils gets confused and bloats the resulting executables.
325 (They become as big as statically linked, but still refuse to run without
326 opening the shared libraries.) This is really a bug in either binutils or
327 collect2, and has probably been fixed since I first noticed it. In any case,
328 the proper fix is to take /lib out of the binutils search path, which we do.
329 The symlink is left there in case somebody's using dlopen, and for "standards
330 compliance".</p>
332 <p>On a related note, there's no reason for "/opt". After the original Unix
333 leaked into /usr, Unix shipped out into the world in semi-standardized forms
334 (Version 7, System III, the Berkeley Software Distribution...) and sites that
335 installed these wanted places to add their own packages to the system without
336 mixing their additions in with the base system. So they created "/usr/local"
337 and created a third instance of bin/sbin/lib and so on under there. Then
338 Linux distributors wanted a place to install optional packages, and they had
339 /bin, /usr/bin, and /usr/local/bin to choose from, but the problem with each
340 of those is that they were already in use and thus might be cluttered by who
341 knows what. So a new directory was created, /opt, for "optional" packages
342 like firefox or open office.</p>
344 <p>It's only a matter of time before somebody suggests /opt/local, and I'm
345 not humoring this. Executables for everybody go in /usr/bin, ones usable
346 only by root go in /usr/sbin. There's no /usr/local or /opt. /bin and
347 /sbin are symlinks to the corresponding /usr directories, but there's no
348 reason to put them in the $PATH.</p>
350 <h3>Consolidating writeable directories.</h3>
352 <p>All the editable stuff has been moved under "var", starting with symlinking
353 tmp->var/tmp. Although /tmp is much less useful these days than it used to
354 be, some things (like X) still love to stick things like named pipes in there.
355 Long ago in the days of little hard drive space and even less ram, people made
356 extensive use of temporary files and they threw them in /tmp because ~home
357 had an ironclad quota. These days, putting anything in /tmp with a predictable
358 filename is a security issue (symlink attacks, you can be made to overwrite
359 any arbitrary file you have access to). Most temporary files for things
360 like the printer or email migrated to /var/spool (where there are
361 persistent subdirectories with known ownership and permissions) or in the
362 user's home directory under something like "~/.kde".</p>
364 <p>The theoretical difference between /tmp and /var/tmp is that the contents
365 of /var/tmp should definitely be deleted by the system init scripts on every
366 reboot, but the contents of /tmp may be preserved across reboots. Except
367 deleting everyting out of /tmp during a reboot is a good idea anyway, and any
368 program that actually depends on the contents of /tmp being preserved across
369 a reboot is obviously broken, so there's no reason not to symlink them
370 together.</p>
372 <p>(I case it hasn't become apparent yet, there's 30 years of accumulated cruft
373 in the standards, convering a lot of cases that don't apply outside of
374 supercomputing centers where 500 people share accounts on a mainframe that
375 has a dedicated support staff. They serve no purpose on a laptop, let alone
376 an embedded system.)</p>
378 <p>The corner case is /etc, which can be writeable (we symlink it to
379 var/etc) or a read-only part of the / partition. It's really a question of
380 whether you want to update configuration information and user accounts in a
381 running system, or whether that stuff should be fixed before deploying.
382 We're doing some cleanup, but leaving /etc writeable (as a symlink to
383 /var/etc). Firmware Linux symlinks /etc/mtab->/proc/mounts, which
384 is required by modern stuff like shared subtrees. If you want a read-only
385 /etc, use "find /etc -type f | xargs ls -lt" to see what gets updated on the
386 live system. Some specific cases are that /etc/adjtime was moved to /var
387 by LSB and /etc/resolv.conf should be a symlink somewhere writeable.</p>
389 <h3>The resulting mount points</h3>
391 <p>The result of all this is that a running system can have / be mounted read
392 only (with /usr living under that), /var can be ramfs or tmpfs with a tarball
393 extracted to initialize it on boot, /dev can be ramfs/tmpfs managed by udev or
394 mdev (with /dev/pts as devpts under that: note that /dev/shm naturally inherits
395 /dev's tmpfs and some things like User Mode Linux get upset if /dev/shm is
396 mounted noexec), /proc can be procfs, /sys can bs sysfs. Optionally, /home
397 can be be an actual writeable filesystem on a hard drive or the network.</p>
399 <p>Remember to
400 put root's home directory somewhere writeable (I.E. /root should move to
401 either /var/root or /home/root, change the passwd entry to do this), and life
402 is good.</p>
404 </p>