Goals and use cases

We have several potential use cases for a new set of command line utilities, and are using those to determine which commands to implement for Toybox's 1.0 release.

The most interesting standards are POSIX-2008 (also known as the Single Unix Specification version 4) and the Linux Standard Base (version 4.1). The main test harness including toybox in Aboriginal Linux and if that can build itself using the result to build Linux From Scratch (version 6.8). We also aim to replace Android's Toolbox.

At a secondary level we'd like to meet other use cases. We've analyzed the commands provided by similar projects (klibc, sash, sbase, s6, embutils, nash, and beastiebox), along with various vendor configurations of busybox, and some end user requests.

Finally, we'd like to provide a good replacement for the Bash shell, which was the first program Linux ever ran and remains the standard shell of Linux no matter what Ubuntu says. This doesn't mean including the full set of Bash 4.x functionality, but does involve {various,features} beyond posix.

See the status page for the combined list and progress towards implementing it.


Use case: standards compliance.

POSIX-2008/SUSv4

The best standards are the kind that describe reality, rather than attempting to impose a new one. (I.E. a good standard should document, not legislate.)

The kind of standards which describe existing reality tend to be approved by more than one standards body, such ANSI and ISO both approving C. That's why the IEEE POSIX committee's 2008 standard, the Single Unix Specification version 4, and the Open Group Base Specification edition 7 are all the same standard from three sources.

The "utilities" section of these standards is devoted to the unix command line, and are the best such standard for our purposes. (My earlier work on BusyBox was implemented with regard to SUSv3, an earlier version of this standard.)

Problems with the standard

Unfortunately, these standards describe a subset of reality, lacking any mention of commands such as init, login, or mount required to actually boot a system. It provides ipcrm and ipcs, but not ipcmk, so you can use System V IPC resources but not create them.

These standards also contain a large number of commands that are inappropriate for toybox to implement in its 1.0 release. (Perhaps some of these could be reintroduced in later releases, but not now.)

Starting with the full "utilities" list, we first remove generally obsolete commands (compess ed ex pr uncompress uccp uustat uux), commands for the pre-CVS "SCCS" source control system (admin delta get prs rmdel sact sccs unget val what), fortran support (asa fort77), and batch processing support (batch qalter qdel qhold qmove qmsg qrerun qrls qselect qsig qstat qsub).

Some commands are for a compiler toolchain (ar c99 cflow ctags cxref gencat iconv lex m4 make nm strings strip tsort yacc), which is outside of toybox's mandate and should be supplied externally. (Again, some of these may be revisited later, but not for toybox 1.0.)

Some commands are part of a command shell, and cannot be implemented as separate executables (alias bg cd command fc fg getopts hash jobs kill read type ulimit umask unalias wait). These may be revisited as part of a built-in toybox shell, but are not exported into $PATH via symlinks. (If you fork a child process and have it "cd" then exit, you've accomplished nothing.)

A few other commands are judgement calls, providing command-line internationalization support (iconv locale localedef), System V inter-process communication (ipcrm ipcs), and cross-tty communication from the minicomputer days (talk mesg write). The "pax" utility was supplanted by tar, "mailx" is a command line email client, and "lp" submits files for printing to... what exactly? (cups?) The standard defines crontab but not crond.

Removing all of that leaves the following commands, which toybox should implement:

at awk basename bc cal cat chgrp chmod chown cksum cmp comm cp csplit cut date dd df diff dirname du echo env expand expr false file find fold fuser getconf grep head id join kill link ln logger logname ls man mkdir mkfifo more mv newgrp nice nl nohup od paste patch pathchk printf ps pwd renice rm rmdir sed sh sleep sort split stty tabs tail tee test time touch tput tr true tty uname unexpand uniq unlink uudecode uuencode vi wc who xargs zcat

Linux Standard Base

One attempt to supplement POSIX towards an actual usable system was the Linux Standard Base. Unfortunately, the quality of this "standard" is fairly low.

POSIX allowed its standards process to be compromised by leaving things out, thus allowing IBM mainframes and Windows NT to drive a truck through the holes and declare themselves compilant. But it means what they DID standardize tends to be respected.

The Linux Standard Base's failure mode is different, they respond to pressure by including special-case crap, such as allowing Red Hat to shoehorn RPM on the standard even though all sorts of distros (Debian, Slackware, Arch, Gentoo) don't use it and probably never will. This means anything in the LSB is at best a suggestion: arbitrary portions of this standard are widely ignored.

The LSB does specify a list of command line utilities:

ar at awk batch bc chfn chsh col cpio crontab df dmesg du echo egrep fgrep file fuser gettext grep groupadd groupdel groupmod groups gunzip gzip hostname install install_initd ipcrm ipcs killall lpr ls lsb_release m4 md5sum mknod mktemp more mount msgfmt newgrp od passwd patch pidof remove_initd renice sed sendmail seq sh shutdown su sync tar umount useradd userdel usermod xargs zcat

Where posix specifies one of those commands, LSB's deltas tend to be accomodations for broken tool versions which aren't up to date with the standard yet. (See more and xargs for examples.)

Since we've already committed to using our own judgement to skip bits of POSIX, and LSB's "judgement" in this regard is purely bug workarounds to declare various legacy tool implementations "compliant", this means we're mostly interested in the set of tools that aren't specified in posix at all.

Of these, gettext and msgfmt are internationalization, install_initd and remove_initd aren't present on ubuntu 10.04, lpr is out of scope, and lsb_release is a distro issue (it's a nice command, but the output of lsb_release -a is the name and version number of the linux distro you're running, which toybox doesn't know).

This leaves:

chfn chsh dmesg egrep fgrep groupadd groupdel groupmod groups gunzip gzip hostname install killall md5sum mknod mktemp mount passwd pidof sendmail seq shutdown su sync tar umount useradd userdel usermod zcat

Use case: provide a self-hosting development environment

The following commands are enough to build the Aboriginal Linux development environment, boot it to a shell prompt, and build Linux From Scratch 6.8 under it. (Aboriginal Linux currently uses BusyBox for this, thus provides a drop-in test environment for toybox. We install both implementations side by side, redirecting the symlinks a command at a time until the older package is no longer used, and can be removed.)

This use case includes running init scripts and other shell scripts, running configure, make, and install in each package, and providing basic command line facilities such as a text editor. (It does not include a compiler toolchain or C library, those are outside the scope of this project.)

bzcat cat cp dirname echo env patch rmdir sha1sum sleep sort sync true uname wc which yes zcat awk basename bzip2 chmod chown cmp cut date dd diff egrep expr find grep gzip head hostname id install ln ls mkdir mktemp mv od readlink rm sed sh tail tar touch tr uniq wget whoami xargs chgrp comm gunzip less logname man split tee test time bunzip2 chgrp chroot comm cpio dmesg dnsdomainname ftpd ftpget ftpput gunzip ifconfig init less logname losetup man mdev mount mountpoint nc pgrep pkill pwd route split stat switch_root tac umount vi

Note: Aboriginal Linux installs bash 2.05b as #!/bin/sh and its scripts require bash extensions not present in shells such as busybox ash. This means that toysh needs to supply several bash extensions _and_ work when called under the name "bash".


Use case: Replacing Android Toolbox

Android has a policy against GPL in userspace, so even though BusyBox predates Android by many years, they couldn't use it. Instead they grabbed an old version of ash and implemented their own command line utility set called "toolbox".

Toolbox doesn't have its own repository, instead it's part of Android's system/core git repository (this analysis looked at commit 51ccef27cab58).

Toolbox commands:

According to core/toolbox/Android.mk the toolbox directory builds the following commands:

ls mount cat ps kill ln insmod rmmod lsmod ifconfig setconsole rm mkdir rmdir reboot getevent sendevent date wipe sync umount start stop notify cmp dmesg route hd dd df getprop setprop watchprops log sleep renice printenv smd chmod chown newfs_msdos netstat ioctl mv schedtop top iftop id uptime vmstat nandread ionice touch lsof md5 r cp du grep watchdogd

If selinux is enabled, you also get:

getenforce setenforce chcon restorecon runcon getsebool setsebool load_policy

The Android.mk file also refers to dynarray.c and toolbox.c as library code. This leaves the following apparently unused C files in toolbox/*.c, each of which has a command_main() function and seems to implement a standalone command:

alarm exists lsusb readtty rotatefb setkey syren

Command shell (ash)

The core/sh subdirectory contains a fork of ash 1.17, and sucks in liblinenoise to provide command line history/editing.

Other Android core commands

Other than the toolbox and sh directories, the currently interesting subdirectories in the core repository are fs_mgr, gpttool, init, logcat, logwrapper, mkbootimg, netcfg, run-as, and sdcard.

Almost all of these reinvent an existing wheel with less functionality and a different user interface. We may want to provide that interface, but implementing the full commands (mount, fdisk, init, ifconfig with dhcp, and sudo) come first.

Although logcat/logwrapper also reinvent a wheel, Android did so in the kernel and these provide an interface to that.

Also, gpttool and mkbootimg are install tools, and sdcard looks like a testing tool. These aren't a priority if android wants to use its own bespoke code to install itself.

Analysis

For reference, combining everything listed above, we get:

alarm ash cat chcon chmod chown cmp cp date dd df dmesg du exists fs_mgr getenforce getevent getprop getsebool gpttool grep hd id ifconfig iftop init insmod ioctl ionice kill ln load_policy log logcat logwrapper ls lsmod lsof lsusb md5 mkbootimg mkdir mount mv nandread netcfg netstat newfs_msdos notify printenv ps r readtty reboot renice restorecon rm rmdir rmmod rotatefb route run-as runcon schedtop sdcard sendevent setconsole setenforce setkey setprop setsebool sleep smd start stop sync syren top touch umount uptime vmstat watchdogd watchprops wipe

We may eventually implement all of that, but for toybox 1.0 we need to focus a bit. For our first pass, let's ignore selinux, strip out the "unlisted" commands except lsusb, and grab just logcat and logwrapper from the "core" commands (since the rest have some full/standard version providing that functionality, which we can implement a shim interface for later).

This means toybox should implement:

cat chmod chown cmp cp date dd df dmesg du getevent getprop grep hd id ifconfig iftop insmod ioctl ionice kill ln log logcat logwrapper ls lsmod lsof lsusb md5 mkdir mount mv nandread netstat newfs_msdos notify printenv ps r reboot renice rm rmdir rmmod route schedtop sendevent setconsole setprop sleep smd start stop sync top touch umount uptime vmstat watchprops watchdogd wipe

The following Toolbox commands are already covered in previous sections of this analysis:

cat chmod chown cmp cp date dd df dmesg du grep id ifconfig insmod kill ln ls lsmod mkdir mount mv ps renice rm rmdir rmmod route sleep sync top touch umount

Which leaves the following commands as new from Toolbox:

getevent getprop hd iftop ioctl ionice log lsof nandread netstat newfs_msdos notify printenv r reboot schedtop sendevent setconsole setprop smd start stop top uptime vmstat watchprops watchdogd wipe

klibc:

Long ago some kernel developers came up with a project called klibc. After a decade of development it still has no web page or HOWTO, and nobody's quite sure if the license is BSD or GPL. It inexplicably requires perl to build, and seems like an ideal candidate for replacement.

In addition to a C library even less capable than bionic (obsoleted by musl), klibc builds a random assortment of executables to run init scripts with. There's no multiplexer command, these are individual executables:

cat chroot cpio dd dmesg false fixdep fstype gunzip gzip halt ipconfig kill kinit ln losetup ls minips mkdir mkfifo mknodes mksyntax mount mv nfsmount nuke pivot_root poweroff readlink reboot resume run-init sh sha1hash sleep sync true umount uname zcat

To get that list, build klibc according to the instructions (I looked at version 2.0.2 and did cd klibc-*; ln -s /output/of/kernel/make/headers_install linux; make) then echo $(for i in $(find . -type f); do file $i | grep -q executable && basename $i; done | grep -v '[.]g$' | sort -u) to find executables, then eliminated the *.so files and *.shared duplicates.

Some of those binaries are build-time tools that don't get installed, which removes mknodes, mksyntax, sha1hash, and fixdep from the list. (And sha1hash is just an unpolished sha1sum anyway.)

The run-init command is more commonly called switch_root, nuke is just "rm -rf -- $@", and minips is more commonly called "ps". I'm not doing aliases for the oddball names.

Yet more stale forks of dash and gzip sucked in here (see "dubious license terms" above), adding nothing to the other projects we've looked at. But we still need sh, gunzip, gzip, and zcat to replace this package.

By the time I did the analysis toybox already had cat, chroot, dmesg, false, kill, ln, losetup, ls, mkdir, mkfifo, readlink, rm, switch_root, sleep, sync, true, and uname.

The low hanging fruit is cpio, dd, ps, mv, and pivot_root.

The "kinit" command is another gratuitous rename, it's init running as PID 1. The halt, poweroff, and reboot commands work with it.

I've got mount and umount queued up already, fstype and nfsmount go with those. (And probably smbmount and p9mount, but this hasn't got one. Those are all about querying for login credentials, probably workable into the base mount command.)

The ipconfig command here has a built in dhcp client, so it's ifconfig and dhcpcd and maybe some other stuff.

The resume command is... weird. It finds a swap partition and reads data from it into a /proc file, something the kernel is capable of doing itself. (Even though the klibc author attempted to remove that capability from the kernel, current kernel/power/hibernate.c still parses "resume=" on the command line). And yet various distros seem to make use of klibc for this> Given the history of swsusp/hibernate (and TuxOnIce and kexec jump) I've lost track of the current state of the art here. Ah, Documentation/power/userland-swsusp.txt has the API docs, and here's a better tool...

So the list of things actually in klibc are:

cat chroot dmesg false kill ln losetup ls mkdir mkfifo readlink rm switch_root sleep sync true uname cpio dd ps mv pivot_root mount nfsmount fstype umount sh gunzip gzip zcat kinit halt poweroff reboot ipconfig resume

glibc

Rather a lot of command line utilities come bundled with glibc:

catchsegv getconf getent iconv iconvconfig ldconfig ldd locale localedef mtrace nscd rpcent rpcinfo tzselect zdump zic

Of those, musl libc only implements ldd.

catchsegv is a rudimentary debugger, probably out of scope for toybox.

iconv has been previously discussed.

iconvconfig is only relevant if iconv is user-configurable; musl uses a non-configurable iconv.

getconf is a posix utility which displays several variables from unistd.h; it probably belongs in the development toolchain.

getent handles retrieving entries from passwd-style databases (in a rather lame way) and is trivially replacable by grep.

locale was discussed under posix. localedef compiles locale definitions, which musl currently does not use.

mtrace is a perl script to use the malloc debugging that glibc has built-in; this is not relevant for musl, and would necessarily vary with libc.

nscd is a name service caching daemon, which is not yet relevant for musl. rpcinfo and rpcent are related to rpc, which musl does not include.

The remaining commands involve glibc's bundled timezone database, which seems to be derived from the IANA timezone database. Unless we want to maintain our own fork of the standards body's database like glibc does, these are of no interest, but for completeness:

tzselect outputs a TZ variable correponding to user input. The documentation does not indicate how to use it in a script, but it seems that Debian may have done so. zdump prints current time in each of several timezones, optionally outputting a great deal of extra information about each timezone. zic converts a description of a timezone to a file in tz format.

None of glibc's bundled commands are currently of interest to toybox.


Stand-Alone Shell

Wikipedia has a good summary of sash, with links. The original Stand-Alone Shell project reached a stopping point, and then "sash plus patches" extended it a bit further. The result is a megabyte executable that provides 40 commands.

Sash is a shell with built-in commands. It doesn't have a multiplexer command, meaning "sash ls -l" doesn't work (you have to go "sash -c 'ls -l'").

The list of commands can be obtained via building it and doing "echo help | ./sash | awk '{print $1}' | sed 's/^-//' | xargs echo", which gives us:

alias aliasall ar cd chattr chgrp chmod chown cmp cp chroot dd echo ed exec exit file find grep gunzip gzip help kill losetup losetup ln ls lsattr mkdir mknod more mount mv pivot_root printenv prompt pwd quit rm rmdir setenv source sum sync tar touch umask umount unalias where

Plus sh because it's a shell. A dozen or so commands can only sanely be implemented as shell builtins (alias aliasall cd exec exit prompt quit setenv source umask unalias), where is an alias for which, and at triage time toybox already has chgrp, chmod, chown, cmp, cp, chroot, echo, help, kill, losetup, ln, ls, mkdir, mknod, printenv, pwd, rm, rmdir, sync, and touch.

This leaves:

ar chattr dd ed file find grep gunzip gzip lsattr more mount mv pivot_root sh sum tar umount

(For once, this project doesn't include a fork of gzip, instead it sucks in -lz from the host.)


sbase:

It's on suckless. So far it's implemented:

basename cat chmod chown cksum cmp cp date dirname echo false fold grep head kill ln ls mc mkdir mkfifo mv nl nohup pwd rm seq sleep sort tail tee test touch true tty uname uniq wc yes

And has a TODO list:

cal chgrp chvt comm cut df diff du env expand expr id md5sum nice paste printenv printf readlink rmdir seq sha1sum split sync test tr unexpand unlink who

At triage time, of the first list I still need to do: fold grep mc mv nl. Of the second list: diff expr paste printf split test tr unexpand who.


s6

The website skarnet has a bunch of small utilities as part of something called "s6". This includes the s6-portabile-utils and the s6-linux-utils.

Both packages rely on multiple bespoke external libraries without which they can't compile. The source is completely uncommented and doesn't wrap at 80 characters. Doing a find for *.c files brings up the following commands:

basename cat chmod chown chroot clock cut devd dirname echo env expr false format-filter freeramdisk grep halt head hiercopy hostname linkname ln logwatch ls maximumtime memoryhog mkdir mkfifo mount nice nuke pause pivotchroot poweroff printenv quote quote-filter reboot rename rmrf sleep sort swapoff swapon sync tail test touch true umount uniquename unquote unquote-filter update-symlinks

Triage: memoryhog isn't even listed on the website nor does it have a documentation file, clock seems like a subset of date, devd is some sort of netlink wrapper that spawns its command line every time it gets a message (maybe this is meant to implement part of udev/mdev?), format-filter is sort of awk's '{print $2}' function split out into its own command, hiercopy a subset of "cp -r", maximumtime is something I implemented as a shell script (more/timeout.sh in Aboriginal Linux), nuke isn't the same as klibc (this one's "kill SIG -1" only with hardwared SIG options), pause is a program that literally waits to be killed (I generally sleep 999999999 which is a little over 30 years), pivotchroot is a subset of switch_root, rmrf is rm -rf...

I see "nuke" resurface, and if "rmrf" wasn't also here I might think klibc had a point.

basename cat chmod chown chroot cut dirname echo env expr false freeramdisk grep halt head hostname linkname ln logwatch ls mkdir mkfifo mount nice pivotchroot poweroff printenv quote quote-filter reboot rename sleep sort swapoff swapon sync tail test touch true umount uniquename unquote unquote-filter update-symlinks

nash:

Red Hat's nash was part of its "mkinitrd" package, replacement for a shell and utilities on the boot floppy back in the 1990's (the same general idea as BusyBox, developed independently). Red Hat discontinued nash development in 2010, replacing it with dracut (which collects together existing packages, including busybox).

I couldn't figure out how to beat source code out of Fedora's current git repository. The last release version that used it was Fedora Core 12 which has a source rpm that can be unwound with "rpm2cpio mkinitrd.src.rpm | cpio -i -d -H newc --no-absolute-filenames" and in there is a mkinitrd-6.0.93.tar.bz2 which has the source.

In addition to being a bit like a command shell, the nash man page lists the following commands:

access echo find losetup mkdevices mkdir mknod mkdmnod mkrootdev mount pivot_root readlink raidautorun setquiet showlabels sleep switchroot umount

Oddly, the only occurrence of the string pivot_root in the nash source code is in the man page, the command isn't there. (It seems to have been removed when the underscoreless switchroot went in.)

A more complete list seems to be the handlers[] array in nash.c:

access buildEnv cat cond cp daemonize dm echo exec exit find kernelopt loadDrivers loadpolicy mkchardevs mkblktab mkblkdevs mkdir mkdmnod mknod mkrootdev mount netname network null plymouth hotplug killplug losetup ln ls raidautorun readlink resume resolveDevice rmparts setDeviceEnv setquiet setuproot showelfinterp showlabels sleep stabilized status switchroot umount waitdev

This list is nuts: "plymouth" is an alias for "null" which is basically "true" (which thie above list doesn't have). Things like buildEnv and loadDrivers are bespoke Red Hat behavior that might as well be hardwired in to nash's main() without being called.

Instead of eliminating items from the list with an explanation for each, I'm just going to cherry pick a few: the device mapper (dm, raidautorun) is probably interesting, hotplug (may be obsolete due to kernel changes that now load firmware directly), and another "resume" ala klibc.

But mostly: I don't care about this one. And neither does Red Hat anymore.

Verdict: ignore


Beastiebox

Back in 2008, the BSD guys vented some busybox-envy on sourceforge. Then stopped. Their repository is still in CVS, hasn't been touched in years, it's a giant hairball of existing code sucked together. (The web page says the author is aware of crunchgen, but decided to do this by hand anyway. This is not a collection of new code, it's a katamari of existing code rolled up in a ball.)

Combining the set of commands listed on the web page with the set of man pages in the source gives us:

[ cat chmod cp csh date df disklabel dmesg echo ex fdisk fsck fsck_ffs getty halt hostname ifconfig init kill less lesskey ln login ls lv mksh more mount mount_ffs mv pfctl ping poweroff ps reboot rm route sed sh stty sysctl tar test traceroute umount vi wiconfig

Apparently lv is the missing link ed and vi, copyright 1982-1997 (do not want), ex is another obsolete vi mode, lesskey is "used to specify a set of key bindings to be used with less", and csh is a shell they sucked in, [ is an alias for test. Several more bsd-isms that don't have Linux equivalents (even in the ubuntu "install this package" search) are disklabel, fsck_ffs, mount_ffs, and pfctl. And wiconfig is a wavelan interface network card driver utility. Subtracting all that and the commands toybox already implements at triage time, we get:

fdisk fsck getty halt ifconfig init kill less mksh more mount mv ping poweroff ps reboot route sed sh stty sysctl tar test traceroute umount vi

Not a hugely interesting list, but eh.

Verdict: ignore


BsdBox

Somebody decided to do a multicall binary for freebsd.

They based it on crunchgen, a tool that glues existing programs together into an archive and uses the name to execute the right one. It has no simplification or code sharing benefits whatsoever, it's basically an archiver that produces executables.

That's about where I stopped reading.

Verdict: ignore.


OpenSolaris Busybox

Somebody wrote a wiki page saying that Busybox for OpenSolaris would be a good idea.

The corresponding "files" tab is an auto-generated stub. The project never even got as far as suggesting commands to include before Oracle discontinued OpenSolaris.

Verdict: ignore.


Requests:

The following additional commands have been requested by various users:

dig freeramdisk getty halt hexdump hwclock klogd modprobe ping ping6 pivot_root poweroff readahead rev sfdisk sudo syslogd taskset telnet telnetd tracepath traceroute unzip usleep vconfig zip free login modinfo unshare netcat help w ntpd iwconfig iwlist rdate dos2unix unix2dos catv clear pmap realpath setsid timeout truncate mkswap swapon swapoff count oneit fstype acpi blkid eject pwdx