toybox

Toybox was just a personal project until it got relaunched in November 2011 with a new goal to make Android self-hosting. This involved me relicensing my own code, which made people who had never used or participated in the project loudly angry. The switch came after a lot of thinking about licenses and the transition to smartphones, which led to a 2013 talk laying out a strategy to make Android self-hosting using toybox. This helped bring it to Android's attention, and they merged it into Android M.

The unfixable problem with busybox was licensing: BusyBox predates Android by almost a decade, but Android still doesn't ship with it because GPLv3 came out around the same time Android did and caused many people to throw out the GPLv2 baby with the GPLv3 bathwater. Android explicitly discourages use of GPL and LGPL licenses in its products, and has gradually reimplemented historical GPL components (such as its bluetooth stack) under the Apache license. Apple's less subtle response was to freeze xcode at the last GPLv2 releases (GCC 4.2.1 with binutils 2.17) for over 5 years while sponsoring the development of new projects (clang/llvm/lld) to replace them, implementing a new SMB server from scratch to replace samba, switching bash with zsh, and so on. Toybox itself exists because somebody in a legacy position just wouldn't shut up about GPLv3, otherwise I would probably still happily be maintaining BusyBox. (For more on how I wound up working on busybox in the first place, see here.)

Q: Do you capitalize toybox?

A: Only at the start of a sentence. The command name is all lower case so it seems silly to capitalize the project name, but not capitalizing the start of sentences is awkward, so... compromise. (It is _not_ "ToyBox".)

Q: Why a 7 year support horizon?

A: Our longstanding rule of thumb is to try to run and build on hardware and distributions released up to 7 years ago, and feel ok dropping support for stuff older than that. (This is a little longer than Ubuntu's Long Term Support, but not by much.)

My original theory was "4 to 5 of the 18-month cycles of moore's law should cover the vast majority of the installed base of PC hardware", loosely based on some research I did back in 2003 and updated in 2006 which said that low end systems were 2 iterations of moore's law below the high end systems, and that another 2-3 iterations should cover the useful lifetime of most systems no longer being sold but still in use and potentially being upgraded to new software releases.

That analysis missed industry changes in the 1990's that stretched the gap from low end to high end from 2 cycles to 4 cycles, and ignored the switch from PC to smartphone cutting off the R&D air supply of the laptop market. Meanwhile the Moore's Law s-curve started bending back down (as they always do) back in 2000, and these days is pretty flat: the drive for faster clock speeds stumbled and died, with the subsequent drive to go "wide" maxing out for most applications around 4x SMP with maybe 2 megabyte caches. These days the switch from exponential to linear growth in hardware capabilities is common knowledge and widely accepted.

But the 7 year rule of thumb stuck around anyway: if a kernel or libc feature is less than 7 years old, I try to have a build-time configure test for it to let the functionality cleanly drop out. I also keep old Ubuntu images around in VMs to perform the occasional defconfig build there to see what breaks. (I'm not perfect about this, but I accept bug reports.)

Q: Why time based releases?

A: Toybox targets quarterly releases (a similar schedule to the Linux kernel) because Martin Michlmayr's excellent talk on the subject was convincing. This is actually two questions, "why have releases" and "why schedule them".

Releases provide synchronization points where the developers certify "it worked for me". Each release is a known version with predictable behavior, and right or wrong at least everyone should be seeing similar results so might be able to google an unexpected outcome. Releases focus end-user testing on specific versions where issues can be reproduced, diagnosed, and fixed. Releases also force the developers to do periodic tidying, packaging, documentation review, finish up partially implemented features languishing in their private trees, and give regular checkpoints to measure progress.

Changes accumulate over time: different feature sets, data formats, control knobs... Toybox's switch from "ls -q" to "ls -b" as the default output format was not-a-bug-it's-a "design improvement", but the difference is academic if the change breaks somebody's script. Releases give you the option to schedule upgrades as maintenance, not to rock the boat just now, and use a known working release version until later.

The counter-argument is that "continuous integration" can be made robust with sufficient automated testing. But like the waterfall method, this places insufficent emphasis on end-user feedback and learning from real world experience. Developer testing is either testing that the code does what the developers expect given known inputs running in an established environment, or it's regression testing against bugs previously found in the field. No plan survives contact with the enemy, and technology always breaks once it leaves the lab and encounters real world data and use cases in new runtime and build environments.

The best way to give new users a reasonable first experience is to point them at specific stable versions where development quiesced and extra testing occurred. There will still be teething troubles, but multiple people experiencing the _same_ teething troubles can potentially help each other out.

Releases on a schedule are better than releases "when it's ready" for the same reason a regularly scheduled bus beats one that leaves when it's "full enough": the schedule lets its users make plans. Even if the bus leaves empty you know when the next one arrives so missing this one isn't a disaster. and starting the engine to leave doesn't provoke a last-minute rush of nearby not-quite-ready passengers racing to catch it causing further delay and repeated start/stop cycles as it ALMOST leaves. (The video in the first paragraph goes into much greater detail.)

Q: Where do I start understanding the source code?

A: Toybox is written in C. There are longer writeups of the design ideas and a code walkthrough, and the about page summarizes what we're trying to accomplish, but here's a quick start:

Toybox uses the standard three stage configure/make/install build, in this case "make defconfig; make; make install". Type "make help" to see available make targets.

The configure stage is copied from the Linux kernel (in the "kconfig" directory), and saves your selections in the file ".config" at the top level. The "make defconfig" target selects the maximum sane configuration (enabling all the commands and features that aren't unfinished, or only intended as examples, or debug code...) and is probably what you want. You can use "make menuconfig" to manually select specific commands to include, through an interactive menu (cursor up and down, enter to descend into a sub-menu, space to select an entry, ? to see an entry's help text, esc to exit). The menuconfig help text is the same as the command's "--help" output.

The "make" stage creates a toybox binary (which is stripped, look in generated/unstripped for the debug versions), and "make install" adds a bunch of symlinks to toybox under the various command names. Toybox determines which command to run based on the filename, or you can use the "toybox" name in which case the first argument is the command to run (ala "toybox ls -l").

You can also build individual commands as standalone executables, ala "make sed cat ls". The "make change" target builds all of them, as in "change for a $20".

The main() function is in main.c at the top level, along with setup plumbing and selecting which command to run this time. The function toybox_main() in the same file implements the "toybox" multiplexer command that lists and selects the other commands.

The individual command implementations are under "toys", and are grouped into categories (mostly based on which standard they come from, posix, lsb, android...) The "pending" directory contains unfinished commands, and the "examples" directory contains example code that aren't really useful commands. Commands in those two directories are _not_ selected by defconfig. (Most of the files in the pending directory are third party submissions that have not yet undergone proper code review.)

Common infrastructure shared between commands is under "lib". Most commands call lib/args.c to parse their command line arguments before calling the command's own main() function, which uses the option string in the command's NEWTOY() macro. This is similar to the libc function getopt(), but more powerful, and is documented at the top of lib/args.c. A NULL option string prevents this code from being called for that command.

The build/install infrastructure is shell scripts under "scripts" (starting with scripts/make.sh and scripts/install.sh). These populate the "generated" directory with headers created from other files, which are described in the code walkthrough. All the build's temporary files live under generated, including the .o files built from the .c files (in generated/obj). The "make clean" target deletes that directory. ("make distclean" also deletes your .config and deletes the kconfig binaries that process .config.)

Each command's .c file contains all the information for that command, so adding a command to toybox means adding a single file under "toys". Usually you start a new command by copying an existing command file to a new filename (toys/examples/hello.c, toys/examples/skeleton.c, toys/posix/cat.c, and toys/posix/true.c have all been used for this purpose) and then replacing all instances of its old name with the new name (which should match the new filename), and modifying the help text, argument string, and what the code does. You might have to "make distclean" before your new command shows up in defconfig or menuconfig.

The toybox test suite lives in the "tests" directory, and is driven by scripts/test.sh and scripts/runtest.sh. From the top level you can "make tests" to test everything, or "make test_sed" to test a single command's standalone version (which should behave identically, but that's why we test). You can set TEST_HOST=1 to test the host version instead of the toybox version (in theory they should work the same), and VERBOSE=all to see diffs of the expected and actual output for all failing tests. The default VERBOSE=fail stops at the first such failure.

Q: When were historical toybox versions released?

A: For vanilla releases, check the date on the commit tag or the example binaries against the output of "toybox --version". Between releases the --version information is in "git describe --tags" format with "tag-count-hash" showing the most recent commit tag, the number of commits since that tag, and the hash of the current commit.

Android makes its own releases on its own schedule using its own version tags, but lists corresponding upstream toybox release versions here. For more detail you can look up AOSP's git tags. (The Android Open Source Project is the "upstream" android vendors start form when making their own releases. Google's phones run AOSP versions verbatim, other vendors tend to take those releases as starting points to modify.)

If you want to find the vanilla toybox commit corresponding to an AOSP toybox version, find the most recent commit in the android log that isn't from a @google or @android address and search for it in the vanilla commit log. (The timestamp should match but the hash will differ, because each git hash includes the previous git hash in the data used to generate it so all later commits have a different hash if any of the tree's history differs; yes Linus Torvalds published 3 years before Satoshi Nakamoto.) Once you've identified the vanilla commit's hash, "git describe --tags $HASH" in the vanilla tree should give you the --version info for that one.

Q: Where do I report bugs?

A: Ideally on the mailing list, although emailing the maintainer is a popular if slightly less reliable alternative. Issues submitted to codeberg are generally dealt with less promptly, but mostly get done eventually. AOSP has its own bug reporting mechanism (although for toybox they usually forward them to the mailing list) and Android vendors usually forward them to AOSP which forwards them to the list.

Note that if we can't reproduce a bug, we probably can't fix it. Not only does this mean providing enough information for us to see the behavior ourselves, but ideally doing so in a reasonably current version. The older it is the greater the chance somebody else found and fixed it already, so the more out of date the version you're reporting a bug against the less effort we're going to put into reproducing the problem.

Q: What are those /b/number bug report links in the git log?

A: It's a Google thing. Replace /b/$NUMBER with https://issuetracker.google.com/$NUMBER to read it outside the googleplex.

Q: What is the relationship between toybox and android?

A: The about page tries to explain that, and Linux Weekly News has covered toybox's history a little over the years.

Toybox is a traditional open source project created and maintained by hobbyist (volunteer) developers, originally for Linux but these days also running on Android, BSD, and MacOS. The project started in 2006 and its original author (Rob Landley) continues to maintain the open source project.

Android's base OS maintainer (Elliott Hughes, I.E. enh) ported toybox to Android in 2014, merged it into Android M (Marshmallow), and remains Android's toybox maintainer. (He explained it in his own words in this podcast, starting either 18 or 20 minutes in depending how much backstory you want.)

Android's policy for toybox development is to push patches to the open source project (submitting them via the mailing list) then "git pull" the public tree into Android's tree. To avoid merge conflicts, Android's tree doesn't change any of the existing toybox files but instead adds parallel build infrastructure off to one side. (Toybox uses a make wrapper around bash scripts, AOSP builds with soong/ninja instead and checks in a snapshot of the generated/ directory to avoid running kconfig each build). Android's changes to toybox going into the open source tree first and being pulled from there into Android keeps the two trees in sync, and makes sure each change undergoes full open source design review and discussion.

Rob acknowledges Android is by far the largest userbase for the project, but develops on a standard 64-bit Linux+glibc distro while building embedded 32-bit big-endian nommu musl systems requiring proper data alignment for work, and is not a Google employee so does not have access to the Google build cluster of powerful machines capable of running the full AOSP build in a reasonable amount of time. Rob is working to get android building under android (the list of toybox tools Android's build uses is here, and what else it needs from its build environment is here), and he hopes someday to not only make a usable development environment out of it but also nudge the base OS towards a more granular package management system allowing you to upgrade things like toybox without a complete reinstall and reboot, plus the introduction of a "posix container" within which you can not only run builds, but selinux lets you run binaries you've just built). In the meantime, Rob tests static bionic builds via the Android NDK when he remembers, but has limited time to work on toybox because it's not his day job. (The products his company makes ship toybox and they do sponsor the project's development, but it's one of many responsibilities at work.)

Elliott is the Android base OS maintainer, in which role he manages a team of engineers. He also has limited time for toybox, both because it's one of many packages he's responsible for (he maintains bionic, used to maintain dalvik...) and because he allowed himself to be promoted into management and thus spends less time coding than he does sitting in meetings where testers talk to security people about vendor issues.

Android has many other coders and security people who submit the occasional toybox patch, but of the last 1000 commits at the time of writing this FAQ entry, Elliott submitted 276 and all other google.com or android.com addresses combined totaled 17. (Rob submitted 591, leaving 116 from other sources, but for both Rob and Elliott there's a lot of "somebody else pointed out an issue, and then we wrote a patch". A lot of patches from both "Author:" lines thank someone else for the suggestion in the commit comment.)

Q: Will you backport fixes to old versions?

A: Probably not. The easiest thing to do is get your issue fixed upstream in the current release, then get the newest version of the project built and running in the old environment.

Backporting fixes generally isn't something open source projects run by volunteer developers do because the goal of the project's development community is to extend and improve the project. We're happy to respond to our users' needs, but if you're coming to the us for free tech support we're going to ask you to upgrade to a current version before we try to diagnose your problem.

The volunteers are happy to fix any bugs you point out in the current versions because doing so helps everybody and makes the project better. We want to make the current version work for you. But diagnosing, debugging, and backporting fixes to old versions doesn't help anybody but you, so isn't something we do for free. The cost of volunteer tech support is using a reasonably current version of the project.

If you're using an old version built with an old compiler on an old OS (kernel and libc), there's a fairly large chance whatever problem you're seeing already got fixed, and to get that fix all you have to do is upgrade to a newer version. Diagnosing a problem that wasn't our bug means we spent time that only helps you, without improving the project. If you don't at least _try_ a current version, you're asking us for free personalized tech support.

Reproducing bugs in current versions also makes our job easier. The further back in time you are, the more work it is for us digging back in the history to figure out what we hadn't done yet in your version. If spot a problem in a git build pulled 3 days ago, it's obvious what changed and easy to fix or back out. If you ask about the current release version 3 months after it came out, we may have to think a while to remember what we did and there are a number of possible culprits, but it's still tractable. If you ask about 3 year old code, we have to reconstruct the history and the problem could be anything, there's a lot more ground to cover and we haven't seen it in a while.

As a rule of thumb, volunteers will generally answer polite questions about a given version for about three years after its release before it's so old we don't remember the answer off the top of our head. And if you want us to put any _effort_ into tracking it down, we want you to put in a little effort of your own by confirming it's still a problem with the current version (I.E. we didn't fix it already). It's also hard for us to fix a problem of yours if we can't reproduce it because we don't have any systems running an environment that old.

If you don't want to upgrade, you have the complete source code and thus the ability to fix it yourself, or can hire a consultant to do it for you. If you got your version from a vendor who still supports the older version, they can help you. But there are limits as to what volunteers will feel obliged to do for you.

Commercial companies have different incentives. Your OS vendor, or hardware vendor for preinstalled systems, may have their own bug reporting mechanism and update channel providing backported fixes. And a paid consultant will happily set up a special environment just to reproduce your problem.

Q: How do I install toybox?

A: Multicall binaries like toybox behave differently based on the filename used to call them, so if you "mv toybox ls; ./ls -l" it acts like ls. Creating symlinks or hardlinks and adding them to the $PATH lets you run the commands normally by name, so that's probably what you want to do.

If you already have a toybox binary you can install a tree of command symlinks to the standard path locations (export PATH=/bin:/usr/bin:/sbin:/usr/sbin) by doing:

for i in $(/bin/toybox --long); do ln -s /bin/toybox $i; done

Or you can install all the symlinks in the same directory as the toybox binary (export PATH="$PWD:$PATH") via:

for i in $(./toybox); do ln -s toybox $i; done

When building from source, use the "make install" and "make install_flat" targets with an appropriate PREFIX=/target/path either exported or on the make command line. When cross compiling, "make list" outputs the command names enabled by defconfig. For more information, see "make help".

The command name "toybox" takes the second argument as the name of the command to run, so "./toybox ls -l" also behaves like ls. The "toybox" name is special in that it can have a suffix (toybox-i686 or toybox-1.2.3) and still be recognized, so you can have multiple versions of toybox in the same directory.

When toybox doesn't recognize its filename as a command, it dereferences one level of symlink. So if your script needs "gsed" you can "ln -s sed gsed", then when you run "gsed" toybox knows how to be "sed".

Q: What's this ./ on the front of commands in your examples?

A: When you don't give a path to a command's executable file, linux command shells search the directories listed in the $PATH envionment variable (in order), which usually doesn't include the current directory for security reasons. The magic name "." indicates the current directory (the same way ".." means the parent directory and starting with "/" means the root directory) so "./file" gives a path to the executable file, and thus runs a command out of the current directory where just typing "file" won't find it. For historical reasons PATH is colon-separated, and treats an empty entry (including leading/trailing colon) as "check the current directory", so if you WANT to add the current directory to PATH you can PATH="$PATH:" but doing so is a TERRIBLE idea.

Toybox's shell (toysh) checks for built-in commands before looking at the $PATH (using the standard "bash builtin" logic just with lots more builtins), so "ls" doesn't have to exist in your filesystem for toybox to find it. When you give a path to a command the shell won't run the built-in version but will run the file at that location. (But the multiplexer command won't: "toybox /bin/ls" runs the built-in ls, you can't point it at an arbitrary file out of the filesystem and have it run that. You could "toybox nice /bin/ls" though.)

Q: How do I make individual/standalone toybox command binaries?

A: You can use almost * any command name as a make target (ala "make sed") or test the standalone versions individually with the test_ prefix ("make test_sed"). You'll need to run the configure step first (generally "make defconfig") so the .config file exists for the build. For a list of currently available commands run "make list".

The "make change" target (as in change for a $20) builds every command standalone (in the "change" subdirectory). Note that this is collectively about 10 times as large as the all-in-one multiplexer version (in disk space, runtime memory, how long the build takes...)

As always, the Makefile is a thin wrapper around bash scripts actually doing the work, you can just all "scripts/single.sh cat ls mv" directly if you like.

* A few command names, like "help" and "test" have other meanings to the Makefile, and you have to use scripts/single.sh or "make change" to build them standalone.

How do I build toybox on a system with a broken $PATH?

Toybox can provide its own build prerequisites (I.E perform a "hermetic" build) via scripts/prereq/build.sh which is a canned minimal toybox build script that basically does "cc *.c" against saved headers to build just the commands needed by the the full toybox build scripts.

The wrapper scripts/prereq/use.sh calls that hermetic build.sh script to create a minimal toybox-prereq, then populates a directory of symlinks to those toybox commands plus the cc/ld/as/strip binaries out of the host $PATH, and finally runs scripts/genconfig.sh and scripts/make.sh to configure and build a full toybox binary. By default it builds defconfig, but can take as its first argument a miniconfig file.

So to build toybox on MacOS (without homebrew) run scripts/prereq/use.sh scripts/macos_miniconfig and it should create a toybox binary configured for mac using the host toolchain. The scripts directory also contains freebsd_miniconfig and android_miniconfig files enumerating the commands known to work under those operating systems.

Most of the files in the scripts/prereq directory were created by scripts/recreate-prereq.sh which records the commands used by a toybox build, harvests stripped down headers, and writes a build.sh to compile the appropriate source files. It's a couple dozen lines of bash if you're interested. (Yes we check in generated files, to break a chicken-and-egg cycle.)

At the moment toybox's full scripts/make.sh still requires bash (until toysh is finished and promoted out of pending). Freebsd users can invoke "/opt/usr/local/bin/bash scripts/make.sh" or similar to work around their distro's policy insisting that /bin/env can be trusted to live at a specific path but /bin/bash can't. (On Android both env and sh live in /system/bin, which is at least internally consistent.)

Toybox does not yet provide "make" either. The Makefile is mostly convenience wrappers around shell scripts, so you can directly call "scripts/genconfig.sh -d" (defconfig), scripts/make.sh, scripts/install.sh, scripts/test.sh, scripts/single.sh and so on, all without needing make. The exception is menuconfig: most config functions have been migrated to scripts/genconfig.sh using scripts/kconfig.c, but for the moment menuconfig is still using the old kconfig/ infrastructure and thus still needs gmake.

Q: How do I cross compile toybox?

A: You need a compiler "toolchain" capable of producing binaries that run on your target. A toolchain is an integrated suite of compiler, assembler, and linker, plus the standard headers and libraries necessary to build C programs. (And a few miscellaneous binaries like nm and objdump that display info about ELF files.)

Toybox supports the standard $CROSS_COMPILE prefix environnment variable, same as the Linux kernel build uses. This is used to prefix all the tools (target-cc, target-ld, target-strip) during the build, meaning the prefix usually ends with a "-" that's easy to forget but kind of important ("target-cc" and "targetcc" are not the same name).

You can either provide a full path in the CROSS_COMPILE string, or add the appropriate bin directory to your $PATH. I.E:

make LDFLAGS=--static CROSS_COMPILE=~/musl-cross-make/ccc/m68k-linux-musl-cross/bin/m68k-linux-musl- distclean defconfig toybox

Is equivalent to:

export "PATH=~/musl-cross-make/ccc/m68k-linux-musl-cross/bin:$PATH"
LDFLAGS=--static CROSS_COMPILE=m68k-linux-musl- make distclean defconfig toybox

Both of those examples use static linking so you can install just the single file to target, or test them with "qemu-m68k toybox". Feel free to dynamically link instead if you prefer, mkroot offers a "dynamic" add-on to copy the compiler's shared libraries into the new root filesystem.

Although you can individually override $CC and $STRIP and such, providing the prefix twice applies it twice, ala "CROSS_COMPILE=prefix- CC=prefix-cc" gives "prefix-prefix-cc".

Toybox's system builder can use a simpler $CROSS variable to specify the target name(s) to build for if you've installed compatible cross compilers under the "ccc" directory. Behind the scenes this uses wildcard expansion to set $CROSS_COMPILE to an appropriate "path/prefix-".

Q: What architectures does toybox support?

Toybox runs on 64 bit and 32 bit processors, little endian and big endian, tries to respect alignment, and will enable nommu support when fork() is unavailable (or when TOYBOX_FORCE_NOMMU is enabled in the config to work around broken nommu toolchains), but otherwise tries to be processor agnostic (although some commands such as strace can't avoid a processor-specific if/else staircase.).

Several commands (such as ps/top) are unavoidably full of Linux assumptions. Some subset of the commands have been made to run on BSD and MacOS X, and lib/portability.* and scripts/genconfig.sh exist to catch some known variations.

Each release gets tested against two compilers (llvm, gcc), three C libraries (bionic, musl, glibc), and a half-dozen different processor types, in the following combinations:

1) gcc+glibc = host toolchain

Most Linux distros come with that as a host compiler, which is used by default when you build normally (make distclean defconfig toybox, or make menuconfig followed by make).

You can use LDFLAGS=--static if you want static binaries, but static glibc is hugely inefficient ("hello world" is 810k on x86-64) and throws a zillion linker warnings because one of its previous maintainers was insane (which meant at the time he refused to fix obvious bugs), plus it uses dlopen() at runtime to implement basic things like DNS lookup (which is almost impossible to support properly from a static binary because you wind up with two instances of malloc() managing two heaps which corrupt as soon as a malloc() from one is free()d into the other, although glibc added improper support which still requires the shared libraries to be installed on the system alongside the static binary: in brief, avoid). These days glibc is maintained by a committee instead of a single maintainer, if that's an improvement. (As with Windows and Cobol, most people just try to get on with their lives.)

2) gcc+musl = musl-cross-make

These cross compilers are built from the musl-libc maintainer's musl-cross-make project, built by running toybox's scripts/mcm-buildall.sh in that directory, and then symlink the resulting "ccc" subdirectory into toybox where "make root CROSS=" can find them, ala:

cd ~ git clone https://codeberg.org/landley/toybox git clone https://github.com/richfelker/musl-cross-make cd musl-cross-make ../toybox/scripts/mcm-buildall.sh # this takes a while ln -s $(realpath ccc) ../toybox/ccc

Since this takes a long time to run, and builds lots of targets (cross and native), we've uploaded the resulting binaries so you can wget and extract a tarball or two instead of compiling them all yourself. (See the README in that directory for details. Yes there's a big source tarball in there for license compliance reasons.)

Instead of CROSS= you can also specify a CROSS_COMPILE= prefix in the same format the Linux kernel build uses. You can either provide a full path in the CROSS_COMPILE string, or add the appropriate bin directory to your $PATH. I.E:

make LDFLAGS=--static CROSS_COMPILE=~/musl-cross-make/ccc/m68k-linux-musl-cross/bin/m68k-linux-musl- distclean defconfig toybox

Is equivalent to:

export "PATH=~/musl-cross-make/ccc/m68k-linux-musl-cross/bin:$PATH"
LDFLAGS=--static make distclean defconfig toybox CROSS=m68k-linux-musl-

Note: these examples use static linking because a dynamic musl binary won't run on your host unless you install musl's libc.so into the system libraries (which is an accident waiting to happen adding a second C library to most glibc linux distribution) or play with $LD_LIBRARY_PATH. (The dynamic package in mkroot copies the shared libraries out of the toolchain to create a dynamic linking environment in the root filesystem, but it's not nearly as well tested.)

3) llvm+bionic = Android NDK

The Android Native Development Kit provides an llvm toolchain with the bionic libc used by Android. To turn it into something toybox can use, you just have to add an appropriately prefixed "cc" symlink to the other prefixed tools, ala:

unzip android-ndk-r21b-linux-x86_64.zip cd android-ndk-21b/toolchains/llvm/prebuilt/linux-x86_64/bin ln -s x86_64-linux-android29-clang x86_64-linux-android-cc PATH="$PWD:$PATH" cd ~/toybox make distclean make LDFLAGS=--static CROSS_COMPILE=x86_64-linux-android- defconfig toybox

Again, you need to static link unless you want to install bionic on your host. Binaries statically linked against bionic are almost as big as with glibc, but at least it doesn't have the dlopen() issues. (You still can't sanely use dlopen() from a static binary, but bionic doesn't use dlopen() internally to implement basic features.)

Note: although the resulting toybox will run in a standard Linux system, even "hello world" statically linked against bionic segfaults before calling main() when /dev/null isn't present. This presents mkroot with a chicken and egg problem for both chroot and qemu cases, because mkroot's init script has to mount devtmpfs on /dev to provide /dev/null before the shell binary can run mkroot's init script. Since mkroot runs as a normal user, we can't "mknod dev/null" at build time to create a "null" device in the filesystem we're packaging up so initramfs doesn't start with an empty /dev, and the kernel developers repeatedly rejected a patch to make the Linux kernel honor DEVTMPFS_MOUNT in initramfs. Teaching toybox cpio to accept synthetic filesystem metadata, presumably in get_init_cpio format, remains a todo item.

Q: What part of Linux/Android does toybox provide?

A: Toybox is one of three packages (linux, libc, command line) which together provide a bootable unix-style command line operating system. Toybox provides the "command line" part, with a bash compatible command line interpreter and over two hundred commands to call from it, as documented in posix, the Linux Standard Base, and the Linux Manual Pages.

Toybox is not by itself a complete operating system, it's a set of standard command line utilities that run in an operating system. Booting a simple system to a shell prompt requires a kernel to drive the hardware (such as Linux, or BSD with a Linux emulation layer), programs for the system to run (such as toybox's commands), and a C library ("libc") to connect them together.

Toybox has a policy of requiring no external dependencies other than the kernel and C library (at least for defconfig builds). Our "software bill of materials" (SBOM) defaults to just "the C library", both at build time and and runtime. You can optionally enable support for additional libraries in menuconfig (such as openssl, zlib, or selinux), but toybox either provides its own built-in versions of such functionality (which the libraries provide larger, more complex, often assembly optimized alternatives to), or allows things like selinux support to cleanly drop out.

Static linking (with the --static option) copies library contents into the resulting binary, creating larger but more portable programs which can run even if they're the only file in the filesystem. Otherwise, the "dynamically" linked programs require each shared library file to be present on the target system, either copied out of the toolchain or built again from source (with potential version skew if they don't match the toolchain versions exactly), plus a dynamic linker executable installed at a specific absolute path. See the ldd, ld.so, and libc man pages for details.

Most embedded systems will add another package to the kernel/libc/cmdline above containing the dedicated "application" that the embedded system exists to run, plus any other packages that application depends on. Build systems add a native version of the toolchain packages so they can compile additional software on the resulting system. Desktop systems add a GUI and additional application packages like web browsers and video players. A linux distro like Debian adds hundreds of packages. Android adds around a thousand.

But all of these systems conceptually sit on a common three-package "kernel/libc/cmdline" base (often inefficiently implemented and broken up into more packages), and toybox aims to provide a simple, reproducible, auditable version of the cmdline portion of that base.

Q: How do you build a working Linux system with toybox?

A: Toybox has a built-in system builder called "mkroot", with the Makefile target "make root". To enter the resulting root filesystem, "sudo chroot root/host/fs /init". Type "exit" to get back out.

Prebuilt binary versions of these system images, suitable for running under the emulator qemu, are uploaded to the website each release if you'd like to try before building from source.

You can cross compile simple three package (toybox+libc+linux) systems configured to boot to a shell prompt under qemu by setting CROSS_COMPILE= to a cross compiler prefix (or by installing cross compilers in the "ccc" subdirectory and specifying a target type with CROSS=) and also pointing the build at a Linux kernel source directory, ala:

make root CROSS=sh4 LINUX=~/linux

Then you can root/sh4/run-qemu.sh to launch the emulator, which boots the new Linux system (kernel and root filesystem) on a simulated CPU with its own memory and I/O devices, connecting the virtual serial console to the emulator's stdin and stdout. You'll need the appropriate qemu-system-* emulator binary for the selected architecture in your $PATH. Type "exit" when done to shut down the emulator, similar to exiting the chroot version.

The build finds the three packages needed to produce this system because 1) you're in a toybox source directory, 2) your cross compiler has a libc built into it, 3) you tell it where to find a Linux kernel source directory with LINUX= on the command line. If you don't say LINUX=, it skips that part of the build and just produces a root filesystem directory (root/$CROSS/fs or root/host/fs if no $CROSS target specified), which you can chroot into if your architecture can run those binaries. (For PID other than 1, the /init script at the top of the directory sets up and cleans up the /proc mount points, so chroot root/i686/fs /init is a reasonable "poke around and look at things" smoketest.)

The CROSS= shortcut expects a "ccc" symlink in the toybox source directory pointing at a directory full of cross compilers. The ones I test this with are built from the musl-libc maintainer's musl-cross-make project, built by running toybox's scripts/mcm-buildall.sh in a musl-cross-make checkout directory, and then symlinking the resulting "ccc" subdirectory into toybox where CROSS= can find them:

If you don't want to do that, you can download prebuilt binary versions and extract them into a "ccc" subdirectory under the toybox source.

Once you've installed the cross compilers, "make root CROSS=help" should list all the available cross compilers it recognizes under ccc, something like:

aarch64 armv4l armv5l armv7l armv7m armv7r i486 i686 m68k microblaze mips mips64 mipsel or1k powerpc powerpc64 powerpc64le riscv32 riscv64 s390x sh2eb sh4 sh4eb x32 x86_64

(A long time ago I tried to explain what some of these architectures were.)

You can build all the targets at once, and can add additonal packages to the build, by calling the script directly and listing packages on the command line:

mkroot/mkroot.sh CROSS=all LINUX=~/linux dropbear

An example package build script (building the dropbear ssh server, adding a port forward from 127.0.0.1:2222 to the qemu command line, and providing a ssh2dropbear.sh convenience script to the output directory) is provided in the mkroot/packages directory. If you add your own scripts elsewhere, just give a path to them on the command line. (No, I'm not merging more package build scripts, I learned that lesson long ago. But if you want to write your own, feel free.)

(Note: currently mkroot.sh cheats. If you don't have a .config it'll make defconfig and add CONFIG_SH and CONFIG_ROUTE to it, because the new root filesystem kinda needs those commands to function properly. If you already have a .config that _doesn't_ have CONFIG_SH in it, you won't get a shell prompt or be able to run the init script without a shell. This is currently a problem because sh and route are still in pending and thus not in defconfig, so "make root" cheats and adds them. I'm working on it. tl;dr if make root doesn't work "rm .config" and run it again, and all this should be fixed up in future when those two commands are promoted out of pending so "make defconfig" would have what you need anyway. It's designed to let yout tweak your config, which is why it uses the .config that's there when there is one, but the default is currently wrong because it's not quite finished yet. All this should be cleaned up in a future release, before 1.0.)

Q: Why doesn't toybox have cttyhack?

A: Because it's unnecessary (it has "hack" in the name). Here's what mkroot does in its PID 1 init script instead (after mounting /sys and /dev):

trap '' CHLD
CONSOLE=$(sed '$s@.*/@@' /sys/class/tty/console/active)
: ${HANDOFF:=/bin/sh}
setsid -c <>/dev/$CONSOLE >&0 2>&1 $HANDOFF
reboot -f &
sleep 5

The "trap" tells the shell to accept and discard exiting child processes (so zombies don't accumulate). Child processes whose parents have already exited get reparented to init (I.E. pid 1) and the shell script is sticking around as PID 1. Setting SIGCHLD to SIG_IGN (which trap with an empty string does) prevents them from waiting around in Z state to deliver their exit status in case the parent ever gets around to calling wait().

$CONSOLE fishes the underlying console device behind /dev/console out of sysfs, because the linux kernel's /dev/console device can't act as a controlling tty (for some reason). Since there may be more than one, and it might or might not have a /dev/ prefix, we use sed to take the last entry and remove any path.

$HANDOFF is the child program to run, and the third line above gives it the default value of /bin/sh if it wasn't already set on the kernel command line. The bash ${NAME:=default value} syntax assigns a default value to blank environment variables (see the bash man page) and : is a synonym for the "true" command which ignores its arguments, so this combination is a quick way to assign default values to blank variables. You can set $HANDOFF on the kernel command line via "KARGS='HANDOFF=cal' ./run-qemu.sh" since the run-qemu.sh script appends $KARGS to the end of the kernel command line when launching QEMU, and unrecognized linux kernel command line arguments with an = in them are treated as variable assignments exported into PID 1's environment.

The "setsid" command runs a command in a new session (see "man 7 credentials") and the -c option makes stdin the controling TTY for the new session. The first redirect points stdin at the new console device (the <> redirect opens the file for both reading and writing at the same time) and the second and third redirects duplicate the stdin file descriptor to stdout and stderr. Redirects are guaranteed to be evaluated from left to right, and all redirects happen before launching the command, so -c grabs the new TTY device as the child's controlling tty.

When the child process setsid launched exits (usually by using the shell's builtin "exit" command) the PID 1 shell script resumes and calls "reboot" to exit qemu. Ordinarily the reboot command sends SIGTERM to PID 1, but that won't do anything useful here, so we give it the -f option to force it to call the reboot() syscall directly (see man 2 reboot). For some reason the Linux reboot() syscall exits the process instead of blocking, and if PID 1 exits the kernel panics, which aborts the reboot process, so we background the reboot request into a child process and sleep 5 to give the reboot time to finish.

Toybox also has a oneit command that can do all this, and has a -3 option which hands off daemon management to a child process by writing each exiting orphaned task's PID to the child's file descriptor 3 (the next available on after stdin, stdout, and stderr). It can also respawn its child (instead of halting or rebooting) when it exits, but you could add a loop to the shell script easily enough.