The ARM processor is popular in the embedded space because it has the best
power consumption to performance ratio, meaning it has the longest battery
life and smallest amount of heat generated for a given computing task.
It's the standard processor of smartphones. The 64 bit version (ARMv8)
was announced in 2011 with a 2014 ship date for volume silicon.
Although ARM hardware has many different processor designs with varying
clock speeds, cache sizes, and integrated peripherals, from a software
perspective what matters is ARM
architectures, which are the different
instruction sets a compiler can produce. The architecture names have a "v"
in them and the processor designs don't, so "ARM922T" is a hardware processor
design which implements the "ARMv4T" software instruction set.
The basic architectures are numbered: ARMv3, ARMv4, ARMv5, ARMv6, and
ARMv7. An ARMv5 capable processor can run ARMv4 binaries, ARMv6 can run ARMv5,
and so on. Each new architecture is a superset of the old ones, and the main
reason to compile for newer platforms is efficiency: faster speed and better
battery life. (I.E. they work about like i386, i486, i586, and i686 do in
the x86 world. Recompiling ARMv4 code to ARMv5 code provides about a 25%
speedup on the same hardware.)
The oldest architecture this compatibility goes back to is ARMv3 (which
introduced 32-bit addressing), but that hardware is now obsolete. (Not just no
longer being sold, but mostly cycled out of the installed base.) The oldest
architecture still in use is "ARMv4", which should run on any ARM hardware
still in use today (except ARMv7M, which is ARM in name only: it only
implements the Thumb2 instruction set, not traditional arm instructions).
ARM architectures can have several instruction set extensions, indicated
by letters after the ARMv# part. Some (such as the letter "E" denoting the
"Jazelle" bytecode interpreter, which provides hardware acceleration for
running Java bytecode) can safely be ignored if you're not using them, and
others are essentially always there in certain architectures (such as the DSP
extension signified by the letter "E" which always seems to be present in
ARMv5). But some are worth mentioning:
The "Thumb" extension (ARMv4T) adds
a smaller instruction set capable of fitting more code in a given amount
of memory. Unfortunately thumb instructions often run more slowly, and
the instruction set isn't complete enough to implement a kernel, so they
supplement rather than replace the conventional ARM instruction set. Note
that all ARMv5 and later processors include Thumb support by default, only
ARMv4T offers it as an extension. The newer "Thumb2" version fixes most of the
deficiencies of the original Thumb instruction set (you _can_ do a kernel
in that), and is part of the ARMv7 architecture. The ARMv7M (Mangled? Mutant?)
chip supports nothing _but_ Thumb2, abandoning backwards compatability with
any other ARM binaries.
The VFP (Vector Floating Point) coprocessor provides hardware floating
point acceleration. There are some older hardware floating point options,
and some newer ones backwards compatible with VFP, but in general you can
treat a chip as either "software floating point" or "VFP".
The other detail is "l" vs "b" for little-endian and big-endian.
In theory ARM can do both (this is a compiler distinction, not a hardware
distinction), but in practice little-endian is almost universally
used on ARM, and most boards are wired up to support little-endian only even if
the processor itself can theoretically handle both.
So for example, "armv4tl" is ARMv4 with Thumb extensions, little endian.
This is the minimum requirements to use EABI, the current binary interface
standard for Arm executables. (The older OABI is considered obsolete.)
So ARM has two ABIs that can run on the same hardware, the old one is
called OABI and the new one is
EABI. (This is a bit like
the way BSD binaries won't run under Linux even though the hardware's the
same, or the long ago switch from
aout to ELF executable
formats.
The oldest hardware that can run EABI is ARMv4T, so ARMv4 hardware
without the Thumb extensions still has to use OABI, which is why you don't
see much of that anymore. The kernel, C
library, and compiler must all agree which ABI is in use or the binaries
won't run. The transition to EABI happened somewhere around 2005, and
these days everything is EABI.
This doesn't quite tell the full story: the x86 and powerpc directories
were created by merging together 32 bit and 64 bit versions of the same
architectures: the i386 and x86_64 directories for x86, and the ppc and ppc64
directories for powerpc. (A similar merge of arm and arm64 is expected when
ARMv8 support stabilizes.)
The resulting dates are when the merge happened,
the corresponding architectures were added much earlier.
Where did all these processors come from?
Before Linux
In the 1950's and 60's
mainframe
and minicomputer processors took up an entire circuit board. Unix
started on these kind of systems, few of them remain in use today.
In 1969 an engineer
at Intel invented the first microprocessor, the Intel 4004, by being the
first to squeeze all the functions of a CPU onto a single silicon "chip".
As transistor
budgets
increased
they upgraded the 4004's design into the 8008 and then the 8080, the chip
inside coin-operated Space Invaders machines and the Mits Altair. The Altair
was widely cloned to form the first family of microcomputers, which contained
(and were named after) the S-100 bus, programed in Basic from
a startup called Micro-soft, and ran an OS called
CP/M
from a startup called Digital Research.
One of the
Intel engineers left to form his own company that made an 8080
clone called the Z80. But the main alternative to the 8080 was from
some ex-motorola engineers who left form MOStek, the company
that did the (much cheaper) 6502 processor, with its own instruction
set. Motorola sued the escaped engineers for being better at it than they were,
and in the end the engineers went back to Motorola and
commodore bought the rest of the company to use these processors in the Vic 20
(the first computer to sell a million units) and its successor the Commodore
64 (even more popular). The 6502 also wound up running the Apple I and Apple II,
and the first big home game console (the Atari 2600).
The march of Moore's Law quickly drove the microcomputer world to
move beyond 8 bit processors. Moore's Law says that memory size doubles
every 18 months, which quickly became a self-fulfulling prophecy hardware
manufacturers used to manage inventory and schedule product releases.
The first widely cloned microcomputer (the MITS Altair) was introduced in
1975, and by the middle of the year established a range of 1-4 kilobytes of
memory installed in altair compatible systems. (The original Altairs
had only 256 bytes memory, but that wasn't enough and 1k expansion boards
immediately became standard equipment. The first version of its standard
de-facto standard programming language, Micro-soft basic, required 7k of
memory to run but that was too expensive so they trimmed it down to run in
4k.) From then on Moore's Law turned the old high end into the new low end
every 3 years. (In 1978: 4-16k. In 1981, 16-64k.)
These early microcomputers were called 8-bit machines because they had
8-bit registers, storing values from 0 to 255. But they used pairs of registers
to access memory, providing 16 bits (64k) of address space. That was enough for
four 18-month Moore's Law doublings before the high end of the microcomputer
address range hit the 64k address space limit in mid-1981.
The result was a switch to 16-bit systems. IBM introduced its PC in
August 1981, based on Intel's 8086 processor (actually a variant called
the 8088 that ran the same software but fit in cheaper 8-bit motherboards
and took twice as many clock cycles to do anything). True to form, it
offered anywhere from 16k to 64k of memory preinstalled.
The main competitor to the 8086 was Motorola's 32-bit 68000 line of
processors, used in just about everything except the PC (Macintosh, Amiga,
Sun workstations...) Just as the Intel's 8086 was a sequel to the 8080,
the Motorola's 68k was a sequel to the 6502.
Motorola skipped 16 bit registers and jumped straight to 32 bits, but back
when 64k was as much memory as most high-end users could afford (costing
hundreds of dollars) this provided no immediate market advantage. The 68k
powered Apple's Macintosh (and several incompatible designs such as Commodore's
Amiga and Sun's original unix workstations). But Apple successfully defended
its hardware design in court, while IBM lost its court case
against Compaq, spawning an army of clones that quickly marginalized the
proprietary hardware designs and rendered the PC the standard computer.
The 8086 also used pairs of registers to access memory, but it overlapped
their address ranges so instead of 16+16=32 bits of address space, each
"segment" started only 16 bits from the previous one (with a 64k offset that
mostly redundantly accessed the same memory as other segments), providing
16+4=20 bits of address space, for 1 megabyte. (The 640k limit of DOS was
because the top third of the address range was reserved for I/O memory,
starting with the video card's frame buffer.)
The continuing advance of Moore's Law meant high-end PCs would collide
with the 1 megabyte limit in 1987. To prepare for this, Intel introduced its
own 32-bit processor, the 80386, in 1985. Unfortunately IBM had bought the
entire first year's production run of Intel's previous processor (the 80286)
to keep it out of the hands of the PC cloners, and Moore's Law quickly left
IBM with a giant unsold pile of slow expensive processors. This delayed
its introduction of new 32-bit PCs until the original cloner introduced
the "Compaq Deskpro 386" and the rest of the clones followed that, leaving
IBM's PCs in obscurity.
32 bits and RISC
In the 1980's and 90's, a
new
technology called RISC led to a gold rush
of processors hoping to take market share away from Intel. These risc designs
came from three sources: start-up companies producing new RISC processors,
surviving mainframe and minicomputer vendors redesigning their big iron to use
microprocessors, and m68k users giving up on that design once Intel's move
to 32 bits removed the only argument in its favor. (If m68k had lost out to
16 bit x86, clearly it was even less interesting after the 386 came out.)
RISC designs ("Reduced Instruction Set Computers") simplified processor
instruction sets down to fixed-length instructions that only took one clock
cycle. This led to more verbose machine code which took up more memory, but
also meant that since you didn't have to execute the previous instruction
to figure out where the next one started (because they're all the same size),
processors could contain a second "execution core" looking over the shoulder
of the first core to potentially execute the next instruction in the same
clock cycle (if it didn't use the same registers, memory locations, or depend
on processor flags set by the previous instruction). Once compilers
advanced to produce instructions in non-interfering pairs, RISC processors
added a third core to potentially execute a third instruction in the same
clock cycle (if that one didn't interfere with the first two).
Several processor manufacturers were convinced that RISC designs were
superior to the older designs (which they called "CISC", for Complex
Instruction Set Computers). The commodity PC had taken over the market,
running a large installed base of CISC software, but the cloners were
sure that during the 16 to 32 bit transition they could capture the market
when everyone had to throw out their old software anyway.
MIPS was one early RISC startup, created and commercialized
by some university professors based on early RISC research. The ARM design
came from a tiny start-up in the UK which made the Acorn Risc Machine and
got a contract with the British Broadcasting Service to sell it as the "BBC
Micro" in conjunction with BBC educational programming.
At the other end of things, IBM engineers produced the Power minicomputer
architecture, Hewlett Packard developed PA-RISC, Hitachi developed the
SuperH (more commonly known as sh4), and Digital Equipment
Corporation migrated its VAX minicomputers to their new Alpha processor
(the first 64 bit processor).
Sun Microsystems develop the Sparc processor to replace m68k, but the
official successor to m68k came from a partnership between Apple, Motorola,
and IBM, redesigning IBM's Power processor to produce PowerPC. (Apple's
macintosh redesign brought its m68k vendor Motorola together with IBM,
which at the time had the world's leading microchip manufacturing facilities.
The 90's were a great decade for IBM's microprocessor manufacturing arm: first
to replace microchips' traditional aluminum wiring with copper, first to layer
"silicon on insulator" to improve power efficiency, first gigahertz
processor...)
When the PC market smoothly transitioned to the 80386,
the RISC proponents were sure that better technology would eventually
prevail over market forces. The 386 was an extension of the 8086 which was
an extension of the 8080 which was an extension of the 8008. And the 386 wasn't
just similar to previous designs to ease porting, it actually implemented a
compatability mode to fully emulate the previous chip and run its software
unmodified! Surely this long chain of backwards compatability that had
accumulated customers for 20 years until it snowballed into market dominance
had to collapse from accumulated complexity at some point?
Next Intel came out with the 486, which introduced
CPU cache, allowing for "clock doubling" and "clock tripling" to run the
processor faster than the rest of the motherboard and execute loops of code
in cache while the slower main memory delivered the next part of the program
and stored the cached results of earlier computations. But the RISC proponents
saw this as merely buying time, sure that RISC would still win the day.
Then Intel introduced the Pentium, which translated CISC instructions
into RISC internally. It could run as fast as RISC designs (executing two
instructions per clock cycle) while remaining compatible with existing
software.
Mobile processors
Intel did hit a bit of a wall at 1 gigahertz, and produced
a horrible RISC-like design (itanium) as its planned 64 bit transition,
but AMD filled the gap with other x86-compatible designs and a sane
64 bit extension of the x86 architecture (x86-64), and Intel's customers forced
it to adopt AMD's 64-bit design.
Intel and AMD competed on two metrics: absolute performance, and the
price to performance ratio. They tried to produce the fastest chips, and
the fastest chip for the money.
But in the late 90's laptop systems became a viable PC option, and
in 2005 laptops outsold desktops. These systems are battery powered, and
care about another metric: power consumption to performance ratio. The
best performance for the battery life. In 2000 a startup called Transmeta
(most widely known for employing Linus Torvalds in his first job out of
college) proved that a processor consuming just 1 watt could provide
reasonable performance. (Meanwhile Intel's itanium and Pentium 4 processors
consumed around 90 watts, enough power to fry an egg on the heat sink.)
The processor with the best power consumption to performance ratio was
ARM, which came to dominate the cell phone and handheld systems. When those
coverged into smartphones, arm retained its dominant position.
Recently introduced processor designs attempted to compete with Arm, not
Intel. Coldfire was a stripped down version of m68k, and new designs include
blackfin, hexagon, tile, and so on. So far, these are doing as well against
arm as the RISC gold rush did against x86.