# HG changeset patch # User Rob Landley # Date 1326699857 21600 # Node ID f6ffc6685a9e0c2fe43a9c3ae607048e63459f7a # Parent 2551e517b80036c3fb42b98eddec7a147f6e3b75 Fluff out documentation and skeleton code. diff -r 2551e517b800 -r f6ffc6685a9e toys/hello.c --- a/toys/hello.c Sat Jan 14 23:28:15 2012 -0600 +++ b/toys/hello.c Mon Jan 16 01:44:17 2012 -0600 @@ -2,9 +2,9 @@ * * hello.c - A hello world program. * - * Copyright 2006 Rob Landley + * Copyright 2012 Rob Landley * - * Not in SUSv3. + * Not in SUSv4. * See http://opengroup.org/onlinepubs/9699919799/utilities/ USE_HELLO(NEWTOY(hello, "e@d*c#b:a", TOYFLAG_USR|TOYFLAG_BIN)) @@ -13,6 +13,8 @@ bool "hello" default n help + usage: hello [-a] [-b string] [-c number] [-d list] [-e count] [...] + A hello world program. You don't need this. Mostly used as an example/skeleton file for adding new commands, @@ -34,7 +36,24 @@ #define TT this.hello +#define FLAG_a 1 +#define FLAG_b 2 +#define FLAG_c 4 +#define FLAG_d 8 +#define FLAG_e 16 + void hello_main(void) { printf("Hello world\n"); + + if (toys.optflags & FLAG_a) printf("Saw a\n"); + if (toys.optflags & FLAG_b) printf("b=%s\n", TT.b_string); + if (toys.optflags & FLAG_c) printf("c=%ld\n", TT.c_number); + while (TT.d_list) { + printf("d=%s\n", TT.d_list->arg); + TT.d_list = TT.d_list->next; + } + if (TT.e_count) printf("e was seen %ld times", TT.e_count); + + while (*toys.optargs) printf("optarg=%s\n", *(toys.optargs++)); } diff -r 2551e517b800 -r f6ffc6685a9e www/code.html --- a/www/code.html Sat Jan 14 23:28:15 2012 -0600 +++ b/www/code.html Mon Jan 16 01:44:17 2012 -0600 @@ -18,7 +18,7 @@

The primary goal of toybox is _simple_ code. Small is second, speed and lots of features come in somewhere after that. Note that environmental dependencies are a type of complexity, so needing other packages -to build or run is a downside. For example, don't use curses when you can +to build or run is a big downside. For example, don't use curses when you can output ansi escape sequences instead.

Infrastructure:

@@ -70,18 +70,27 @@
  • Change the copyright notice to your name, email, and the current year.

  • -
  • Give a URL to the relevant standards document, or say "Not in SUSv3" if +

  • Give a URL to the relevant standards document, or say "Not in SUSv4" if there is no relevant standard. (Currently both lines are there, delete -whichever is appropriate.) The existing link goes to the directory of SUSv3 +whichever is inappropriate.) The existing link goes to the directory of SUSv4 command line utility standards on the Open Group's website, where there's often a relevant commandname.html file. Feel free to link to other documentation or standards as appropriate.

  • -
  • Update the USE_YOURCOMMAND(NEWTOY(yourcommand,"blah",0)) line. The -arguments to newtoy are: 1) the name used to run your command, 2) -the command line arguments (NULL if none), and additional information such -as where your command should be installed on a running system. See [TODO] for -details.

  • +
  • Update the USE_YOURCOMMAND(NEWTOY(yourcommand,"blah",0)) line. +The NEWTOY macro fills out this command's toy_list +structure. The arguments to the NEWTOY macro are:

    + +
      +
    1. the name used to run your command

    2. +
    3. the command line argument option parsing string (NULL if none)

    4. +
    5. a bitfield of TOYFLAG values +(defined in toys.h) providing additional information such as where your +command should be installed on a running system, whether to blank umask +before running, whether or not the command must run as root (and thus should +retain root access if installed SUID), and so on.

    6. +
    +
  • Change the kconfig data (from "config YOURCOMMAND" to the end of the comment block) to supply your command's configuration and help @@ -89,7 +98,16 @@ also what the CFG_ and USE_() macros are generated from (see [TODO]). The help information here is used by menuconfig, and also by the "help" command to describe your new command. (See [TODO] for details.) By convention, -unfinished commands default to "n" and finished commands default to "y".

  • +unfinished commands default to "n" and finished commands default to "y", +so "make defconfig" selects all finished commands. (Note, "finished" means +"ready to be used", not that it'll never change again.)

    + +

    Each help block should start with a "usage: yourcommand" line explaining +any command line arguments added by this config option. The "help" command +outputs this text, and scripts/config2help.c in the build infrastructure +collates these usage lines for commands with multiple configuration +options when producing generated/help.h.

    +
  • Update the DEFINE_GLOBALS() macro to contain your command's global variables, and also change the name "hello" in the #define TT line afterwards @@ -113,7 +131,28 @@

    Top level directory.

    -

    This directory contains global infrastructure. +

    This directory contains global infrastructure.

    + +

    toys.h

    +

    Each command #includes "toys.h" as part of its standard prolog.

    + +

    This file sucks in most of the commonly used standard #includes, so +individual files can just #include "toys.h" and not have to worry about +stdargs.h and so on. Individual commands still need to #include +special-purpose headers that may not be present on all systems (and thus would +prevent toybox from building that command on such a system with that command +enabled). Examples include regex support, any "linux/" or "asm/" headers, mtab +support (mntent.h and sys/mount.h), and so on.

    + +

    The toys.h header also defines structures for most of the global variables +provided to each command by toybox_main(). These are described in +detail in the description for main.c, where they are initialized.

    + +

    The global variables are grouped into structures (and a union) for space +savings, to more easily track the amount of memory consumed by them, +so that they may be automatically cleared/initialized as needed, and so +that access to global variables is more easily distinguished from access to +local variables.

    main.c

    Contains the main() function where execution starts, plus @@ -123,14 +162,16 @@

    Execution starts in main() which trims any path off of the first command name and calls toybox_main(), which calls toy_exec(), which calls toy_find() -and toy_init() before calling the appropriate command's function from toy_list. +and toy_init() before calling the appropriate command's function from +toy_list[] (via toys.which->toy_main()). If the command is "toybox", execution recurses into toybox_main(), otherwise the call goes to the appropriate commandname_main() from a C file in the toys directory.

    The following global variables are defined in main.c:

    -
  • union toy_union this - Union of structures containing each +

  • union toy_union this - Union of structures containing each command's global variables.

    Global variables are useful: they reduce the overhead of passing extra @@ -224,19 +268,20 @@ running would be wasteful.

    Toybox handles this by encapsulating each command's global variables in -a structure, and declaring a union of those structures. The DECLARE_GLOBALS() -macro contains the global variables that should go in a command's global -structure. Each variable can then be accessed as "this.commandname.varname". +a structure, and declaring a union of those structures with a single global +instance (called "this"). The DEFINE_GLOBALS() macro contains the global +variables that should go in the current command's global structure. Each +variable can then be accessed as "this.commandname.varname". Generally, the macro TT is #defined to this.commandname so the variable -can then be accessed as "TT.variable".

    +can then be accessed as "TT.variable". See toys/hello.c for an example.

    -A command that needs global variables should declare a structure to +

    A command that needs global variables should declare a structure to contain them all, and add that structure to this union. A command should never declare global variables outside of this, because such global variables would allocate memory when running other commands that don't use those global variables.

    -

    The first few fields of this structure can be intialized by get_optargs(), +

    The first few fields of this structure can be intialized by get_optargs(), as specified by the options field off this command's toy_list entry. See the get_optargs() description in lib/args.c for details.

  • @@ -290,7 +335,7 @@ to make generated/config.h and determine which toys/*.c files to build.

    You can create a human readable "miniconfig" version of this file using -these +these instructions.

    @@ -333,12 +378,12 @@

    Each command has a configuration entry matching the command name (although configuration symbols are uppercase and command names are lower case). Options to commands start with the command name followed by an underscore and -the option name. Global options are attachd to the "toybox" command, +the option name. Global options are attached to the "toybox" command, and thus use the prefix "TOYBOX_". This organization is used by scripts/cfg2files to select which toys/*.c files to compile for a given .config.

    -

    A commands with multiple names (or multiple similar commands implemented in +

    A command with multiple names (or multiple similar commands implemented in the same .c file) should have config symbols prefixed with the name of their C file. I.E. config symbol prefixes are NEWTOY() names. If OLDTOY() names have config symbols they're options (symbols with an underscore and suffix) @@ -388,7 +433,203 @@ strlcpy(), xexec(), xopen()/xread(), xgetcwd(), xabspath(), find_in_path(), itoa().

    +

    lib/args.c

    +

    Toybox's main.c automatically parses command line options before calling the +command's main function. Option parsing starts in get_optflags(), which stores +results in the global structures "toys" (optflags and optargs) and "this".

    + +

    The option parsing infrastructure stores a bitfield in toys.optflags to +indicate which options the current command line contained. Arguments +attached to those options are saved into the command's global structure +("this"). Any remaining command line arguments are collected together into +the null-terminated array toys.optargs, with the length in toys.optc. (Note +that toys.optargs does not contain the current command name at position zero, +use "toys.which->name" for that.) The raw command line arguments get_optflags() +parsed are retained unmodified in toys.argv[].

    + +

    Toybox's option parsing logic is controlled by an "optflags" string, using +a format reminiscent of getopt's optargs but has several important differences. +Toybox does not use the getopt() +function out of the C library, get_optflags() is an independent implementation +which doesn't permute the original arguments (and thus doesn't change how the +command is displayed in ps and top), and has many features not present in +libc optargs() (such as the ability to describe long options in the same string +as normal options).

    + +

    Each command's NEWTOY() macro has an optflags string as its middle argument, +which sets toy_list.options for that command to tell get_optflags() what +command line arguments to look for, and what to do with them. +If a command has no option +definition string (I.E. the argument is NULL), option parsing is skipped +for that command, which must look at the raw data in toys.argv to parse its +own arguments. (If no currently enabled command uses option parsing, +get_optflags() is optimized out of the resulting binary by the compiler's +--gc-sections option.)

    + +

    You don't have to free the option strings, which point into the environment +space (I.E. the string data is not copied). A TOYFLAG_NOFORK command +that uses the linked list type "*" should free the list objects but not +the data they point to, via "llist_free(TT.mylist, NULL);". (If it's not +NOFORK, exit() will free all the malloced data anyway unless you want +to implement a CONFIG_TOYBOX_FREE cleanup for it.)

    + +

    Optflags format string

    + +

    Note: the optflags option description string format is much more +concisely described by a large comment at the top of lib/args.c.

    + +

    The general theory is that letters set optflags, and punctuation describes +other actions the option parsing logic should take.

    + +

    For example, suppose the command line command -b fruit -d walrus -a 42 +is parsed using the optflags string "a#b:c:d". (I.E. +toys.which->options="a#b:c:d" and argv = ["command", "-b", "fruit", "-d", +"walrus", "-a", "42"]). When get_optflags() returns, the following data is +available to command_main(): + +

      +
    • In struct toys: +

        +
      • toys.optflags = 13; // -a = 8 | -b = 4 | -d = 1
      • +
      • toys.optargs[0] = "walrus"; // leftover argument
      • +
      • toys.optargs[1] = NULL; // end of list
      • +
      • toys.optc=1; // there was 1 leftover argument
      • +
      • toys.argv[] = {"-b", "fruit", "-d", "walrus", "-a", "42"}; // The original command line arguments +
      +

    • + +
    • In union this (treated as long this[]): +

        +
      • this[0] = NULL; // -c didn't get an argument this time, so get_optflags() didn't change it and toys_init() zeroed "this" during setup.)
      • +
      • this[1] = (long)"fruit"; // argument to -b
      • +
      • this[2] = 42; // argument to -a
      • +
      +

    • +
    + +

    If the command's globals are:

    + +
    +DECLARE_GLOBALS(
    +	char *c;
    +	char *b;
    +	long a;
    +)
    +#define TT this.command
    +
    +

    That would mean TT.c == NULL, TT.b == "fruit", and TT.a == 42. (Remember, +each entry that receives an argument must be a long or pointer, to line up +with the array position. Right to left in the optflags string corresponds to +top to bottom in DECLARE_GLOBALS().

    + +

    long toys.optflags

    + +

    Each option in the optflags string corresponds to a bit position in +toys.optflags, with the same value as a corresponding binary digit. The +rightmost argument is (1<<0), the next to last is (1<<1) and so on. If +the option isn't encountered while parsing argv[], its bit remains 0. +(Since toys.optflags is a long, it's only guaranteed to store 32 bits.) +For example, +the optflags string "abcd" would parse the command line argument "-c" to set +optflags to 2, "-a" would set optflags to 8, "-bd" would set optflags to +6 (I.E. 4|2), and "-a -c" would set optflags to 10 (2|8).

    + +

    Only letters are relevant to optflags, punctuation is skipped: in the +string "a*b:c#d", d=1, c=2, b=4, a=8. The punctuation after a letter +usually indicate that the option takes an argument.

    + +

    Automatically setting global variables from arguments (union this)

    + +

    The following punctuation characters may be appended to an optflags +argument letter, indicating the option takes an additional argument:

    + +
      +
    • : - plus a string argument, keep most recent if more than one.
    • +
    • * - plus a string argument, appended to a linked list.
    • +
    • # - plus a singed long argument. A {LOW,HIGH} range can also be appended to restrict allowed values of argument.
    • +
    • @ - plus an occurrence counter (stored in a long)
    • +
    + +

    Arguments may occur with or without a space (I.E. "-a 42" or "-a42"). +The command line argument "-abc" may be interepreted many different ways: +the optflags string "cba" sets toys.optflags = 7, "c:ba" sets toys.optflags=4 +and saves "ba" as the argument to -c, and "cb:a" sets optflags to 6 and saves +"c" as the argument to -b.

    + +

    Options which have an argument fill in the corresponding slot in the global +union "this" (see generated/globals.h), treating it as an array of longs +with the rightmost saved in this[0]. Again using "a*b:c#d", "-c 42" would set +this[0]=42; and "-b 42" would set this[1]="42"; each slot is left NULL if +the corresponding argument is not encountered.

    + +

    This behavior is useful because the LP64 standard ensures long and pointer +are the same size, and C99 guarantees structure members will occur in memory +in the +same order they're declared, and that padding won't be inserted between +consecutive variables of register size. Thus the first few entries can +be longs or pointers corresponding to the saved arguments.

    + +

    char *toys.optargs[]

    + +

    Command line arguments in argv[] which are not consumed by option parsing +(I.E. not recognized either as -flags or arguments to -flags) will be copied +to toys.optargs[], with the length of that array in toys.optc. +(When toys.optc is 0, no unrecognized command line arguments remain.) +The order of entries is preserved, and as with argv[] this new array is also +terminated by a NULL entry.

    + +

    Option parsing can require a minimum or maximum number of optargs left +over, by adding "<1" (read "at least one") or ">9" ("at most nine") to the +start of the optflags string.

    + +

    The special argument "--" terminates option parsing, storing all remaining +arguments in optargs. The "--" itself is consumed.

    + +

    Other optflags control characters

    + +

    The following characters may occur at the start of each command's +optflags string, before any options that would set a bit in toys.optflags:

    + +
      +
    • ^ - stop at first nonoption argument (for nice, xargs...)
    • +
    • ? - allow unknown arguments (pass non-option arguments starting +with - through to optargs instead of erroring out).
    • +
    • & - the first argument has imaginary dash (ala tar/ps. If given twice, all arguments have imaginary dash.)
    • +
    • < - must be followed by a decimal digit indicating at least this many leftover arguments are needed in optargs (default 0)
    • +
    • > - must be followed by a decimal digit indicating at most this many leftover arguments allowed (default MAX_INT)
    • +
    + +

    The following characters may be appended to an option character, but do +not by themselves indicate an extra argument should be saved in this[]. +(Technically any character not recognized as a control character sets an +optflag, but letters are never control characters.)

    + +
      +
    • ^ - stop parsing options after encountering this option, everything else goes into optargs.
    • +
    • | - this option is required. If more than one marked, only one is required.
    • +
    • +X enabling this option also enables option X (switch bit on).
    • +
    • ~X enabling this option disables option X (switch bit off).
    • +
    • !X this option cannot be used in combination with X (die with error).
    • +
    • [yz] this option requires at least one of y or z to also be enabled.
    • +
    + +

    --longopts

    + +

    The optflags string can contain long options, which are enclosed in +parentheses. They may be appended to an existing option character, in +which case the --longopt is a synonym for that option, ala "a:(--fred)" +which understands "-a blah" or "--fred blah" as synonyms.

    + +

    Longopts may also appear before any other options in the optflags string, +in which case they have no corresponding short argument, but instead set +their own bit based on position. So for "(walrus)#(blah)xy:z" "command +--walrus 42" would set toys.optflags = 16 (-z = 1, -y = 2, -x = 4, --blah = 8) +and would assign this[1] = 42;

    + +

    A short option may have multiple longopt synonyms, "a(one)(two)", but +each "bare longopt" (ala "(one)(two)abc" before any option characters) +always sets its own bit (although you can group them with +X).

    Directory scripts/

    @@ -404,4 +645,28 @@

    Menuconfig infrastructure copied from the Linux kernel. See the Linux kernel's Documentation/kbuild/kconfig-language.txt

    +
    +

    Directory generated/

    + +

    All the files in this directory except the README are generated by the +build. (See scripts/make.sh)

    + +
    + +

    Everything in this directory is a derivative file produced from something +else. The entire directory is deleted by "make distclean".

    diff -r 2551e517b800 -r f6ffc6685a9e www/design.html --- a/www/design.html Sat Jan 14 23:28:15 2012 -0600 +++ b/www/design.html Mon Jan 16 01:44:17 2012 -0600 @@ -158,8 +158,10 @@

    Simple

    -

    Complexity is a cost, just like code size or runtime speed. Treat it as -a cost, and spend your complexity budget wisely.

    +

    Complexity is a cost, just like code size or runtime speed. Treat it as +a cost, and spend your complexity budget wisely. (Sometimes this means you +can't afford a feature because it complicates the code too much to be +worth it.)

    Simplicity has lots of benefits. Simple code is easy to maintain, easy to port to new processors, easy to audit for security holes, and easy to