qcc

view todo/todo.txt @ 596:3cffd74ad346

Plan for relicensing under 2-clause BSD.
author Rob Landley <rob@landley.net>
date Thu, 14 Jun 2012 20:28:25 -0500
parents
children
line source
1 QCC - QEMU C Compiler.
3 Use QEMU's Tiny Code Generator as a backend for a compiler based on my old
4 fork of Fabrice Bellard's tinycc project.
6 Why?
8 QEMU's TCG provides support for many different targets (x86, x86-64, arm,
9 mips, ppc, sh4, sparc, alpha, m68k, cris). It has an active development
10 community upgrading and optimizing it.
12 QEMU application emulation also provides existing support for various ELF
13 executable and library formats, so linking logic can presumably be merged.
14 (See elf.h at the top of qemu.) QEMU is also likely to grow coff and pxe
15 support in future.
17 Building a self-bootstrapping system:
19 My Firmware Linux project builds the smallest self-bootstrapping system
20 I could come up with using the following existing packages:
22 gcc, binutils, make, bash, busybox, uClibc, linux
24 This new compiler should replace both binutils and gcc above. (As a smoke
25 test, the new system should still be able to build all seven packages.)
27 To build those packages, FWL needs the following commands from the host
28 toolchain. (It can build everything else from source, but building these
29 without already having them is a chicken and egg problem.)
31 ar as nm cc gcc make ld /bin/bash
33 The reason it needs "gcc" is that the linux and uClibc packages assume
34 their host compiler is named "gcc", and call that name instead of cc even
35 when it's not there. (You can mostly override this by specifying HOSTCC=$CC
36 on the make command line, although a few places need actual source patches.)
38 Ignoring gcc, make, and bash, this leaves "ar, as, nm, cc, and ld" as
39 commands qcc needs to provide for a minimal self-bootstrapping system.
41 Note that the above set of tools is specifically enough to build a fresh
42 compiler. When building a linux kernel, creating a bzImage requires objcopy,
43 building qemu requires strip, etc.
45 What commands does the current gcc/binutils combo provide?
47 gcc 4.1 provides the commands:
48 cc/gcc - C compiler
49 cpp - C preprocessor (equivalent to cc -E)
50 gcov - coverage tester (optional debugging tool)
52 Of these, cc is required, cpp is low hanging fruit, and gcov is probably
53 unnecessary.
55 Binutils provides:
56 ar - archiver, creates .a files.
57 ranlib - generate index to .a archive (equivalent to ar -s)
58 as - assembler
59 ld - linker
60 strip - discard symbols from object files (equilvalent to ld -S)
61 nm - list symbols from ELF files.
62 size - show ELF section sizes
63 objdump - show contents of ELF files
64 objcopy - copy/translate ELF files
65 readelf - show contents of ELF files
66 addr2line - convert addresses to filename/line number (optional debug tool)
67 strings - show printable characters from binary file
68 gprof - profiling support (optional)
69 c++filt - C++ and Java, not C.
70 windmc, dlltool - Windows only (why is it installed on Linux?)
71 nlmconv - Novell Netware only (why is this installd on Linux?)
73 Of these, ar, as, ld, and nm are needed, ranlib, strip, addr2line, and
74 size are low hanging fruit, size, objdump, obcopy, and readelf are
75 variants of the same logic as nm, and gprof, c++filt, windmc, dlltool,
76 and nlmconv are probably unnecessary.
78 Standards:
80 The following utilities have SUSv4 pages describing their operation, at
81 http://www.opengroup.org/onlinepubs/9699919799/utilities
83 ar, c99, nm, strings
85 This means the following don't:
87 ld, cpp, as, ranlib, strip, size, readelf, objdump, objcopy, addr2line
89 (There isn't a "cc" standard, but you can probably use "c99" for that.)
91 Existing code:
93 multiplexer:
95 The compiler must be provide several different names, yet the same
96 functionality must be callable from a single compiler executable,
97 assembling when it encounters embedded assembler, passing on linker
98 options via "-Wl," to the linking stage, and so on.
100 The easy way to do this is for the qcc executable to be a swiss-army-knife
101 executable, like busybox. It needs a command multiplexer which can figure
102 out which name it was called under and change behavior appropriately, to
103 act as a compiler, assembler, linker, and so on.
105 This multiplexer should accept arbitrary prefixes, so cross compiler names
106 such as "i686-cc" work. This means instead of matching entire known names,
107 the multiplexer should checks that commands _end_ with recognized strings.
108 (This would not only allow it to be called as both "qcc" and "cc", but
109 would have the added bonus of making "gcc" work like "cc" as well.)
111 Both busybox and tinycc already handle this. Pretty straightforward.
113 cc/c99 - front-end option parsing
115 Both tinycc's options.c and ccwrap.c (in FWL) handle command line option
116 parsing, in different ways. Both take as input the same command line
117 syntax as gcc, which is more or less the c99 command line syntax from
118 SUSv4:
120 http://www.opengroup.org/onlinepubs/9699919799/utilities/c99.html
122 What ccwrap.c does is rewrite a gcc command line to turn "cc hello.c"
123 into a big long command line with -L and -I entries, explicitly specifying
124 header and library paths, the need to link against standard libraries
125 such as libc, and to link against crt1.o and such as appropriate.
127 Such a front end option parser could perform such command line rewriting
128 and then call a "cc1" that contains no built-in knowledge about standard
129 paths or libraries. This would neatly centralize such behavior, and
130 if the rewritten command line could actually be extracted it could be
131 tested against other compilers (such as gcc) to help debugging.
133 Note that adding distcc or ccache support to such a wrapper is a fairly
134 straightforward item for future expansion.
136 The option parser needs to distinguish "compiling" from "linking".
138 When compiling, the option parser needs to specify two include paths;
139 one for the compiler (varargs.h, defaulting to ../qcc/include) and
140 one for the system (stdio.h, defaulting to ../include).
142 When linking, the option parser needs to specify the compiler library
143 path (where libqcc.a lives, defaulting to ../qcc/lib), the system
144 library path (where libc.a lives, defaulting to ../lib), and add
145 explicit calls to link in the standard libraries and the startup/exit
146 code. Currently, ccwrap.c does all this.
148 Note that these default paths aren't relative to the current directory
149 (which can't change or files listed on the command line wouldn't be found),
150 but relative to the directory where the qcc executable lives. This allows
151 the compiler to be relocatable, and thus extracted into a user's home
152 directory and called from there. (The user's home directory name cannot
153 be known at compile time.) The defaults can also be specified as absolute
154 paths when the compiler is configured.
156 The current ccwrap.c also modifies the $PATH (so gcc's front-end can
157 shell out to tools such as its own "cc1" and "ld"), and supports C++.
158 Although qcc doesn't need either of these, both are useful for shelling
159 out to another compiler (such as gcc).
161 The wrapper can split "compiling and linking" lines into two commands,
162 either saving intermediate results in the /tmp directory or forking and
163 using pipes. (That way cc1 doesn't need to know anything about linking.)
164 Optionally, the compiler can initialize the same structures used by the
165 linker, but is the speed/complexity tradeoff here worth it?
167 Note that "-run" support is actually a property of the linker.
169 cpp - preprocessor
171 This performs macro substitution, like "qcc -E".
173 cc1 - compiler
175 This compiles C source code. Specifically, it converts one or more .c
176 files into to a single .o file, for a specific target.
178 Generating assembly output is best done by running the binary tcg output
179 through a disassembler. Keep it orthogonal.
181 ld - linker
182 This needs to be able to read .o, .a, and .so files, and produce ELF
183 executables and .so files. It should also support linker scripts.
185 This needs to "#include <elf.h>", which non-linux hosts won't always have
186 but which qemu has it's own copy of already.
188 ar - library archiver
189 This is a wimpy archiver. It creates .a files from .o files
190 (and extracts .o files from .a files). It's a flat archive, with no
191 subdirectories.
193 Busybox has partial support for this (still read-only, last I checked).
195 The ranlib command indexes these archives.
197 SUSv4 has a standards document for this command:
199 http://www.opengroup.org/onlinepubs/9699919799/utilities/ar.html
201 as - assembler
202 Tinycc has an x86 assembler. It should be genericized.
204 nm - name list
206 For some reason, gcc won't build without this.
208 SUSv4 has a standards document for this command:
210 http://www.opengroup.org/onlinepubs/9699919799/utilities/nm.html