Everyone and their pet cats know about C, but relatively few people realize that C was indirectly derived from a language called BCPL, which stands for "Basic CPL"; CPL in turn stands for "Combined Programming Language".
I first encountered BCPL when I was at Cambridge University, the home of the language and its designer, Martin Richards. BCPL was the systems programming language of choice at Cambridge, because it generated code better than the IBM Fortran compilers while not requiring the agony of Assembler. In turn, Computer Science students were taught it as their first "serious" language (interestingly enough, not by Martin Richards). For many (myself not included) it was their first introduction to anything more advanced than Fortran 66.
After graduation, I went to work for a local manufacturer of Z80-based systems that did almost all of its systems programming in BCPL. Only after the move to Unix did we in turn move to C. As a result, I find myself with several years of practical BCPL programming experience.
When the topic came up once again on the net, I wrote this summary of the language for some readers who already know C. It is in no sense a comparison or a historical survey but simply a high-speed description. Feedback on this description is welcome; I have no doubt that there are parts which are still unduly terse and would benefit from expansion, but I'd like to know which !
BCPL implementations are available from Martin Richards, or from the old BCPL distribution (Australian and German mirrors).
Clive D.W. Feather
The memory of a BCPL program is divided into cells. All cells have the same size. A cell stores a single value which can be treated as any of:
Every cell has an address. Under certain circumstances, two or more cells are required to be adjacent in memory. In this case, their addresses, if treated as integers, are consecutive.
In addition, a group of adjacent cells may be treated as a character
buffer. There are at least
BYTESPERWORD
characters
for every cell (though possibly more).
Cells have a storage duration, which can be static, dynamic (associated with a scope), or heap.
Integer constants are decimal; prefix #
means octal,
and #x
hexadecimal.
TRUE
and FALSE
are reserved words.
?
is a constant with an indeterminate value (which
may be different each time it is evaluated); any constant expression
containing an evaluated ?
is itself a ?
.
Character constants are of the form 'a'
; there are
no multi-character constants.
String constants are of
the form "abc"
; they contain between
0 and 255 characters.
Escape sequences in both string and character constants are
*"
, *'
, **
(representing the second character) and
*N (newline),
*T (tab),
*S (space),
*B (backspace), and
*P (pagethrow).
In addition, a string can be extended over more than one line; the
sequence *<white space>* is ignored within a string.
A string constant is replaced by the address of a large enough number of contiguous cells initialized so that:
"abc" % 0 = 3 "abc" % 1 = 'a' "abc" % 2 = 'b' "abc" % 3 = 'c'The cells have static storage duration.
A constant expression may use any operator, any constants other than string constants, and any identifier declared as a manifest.
Floating-point constants are of the form 1.2
, 1.2e-3
,
or 1e+2
.
! (dyadic) % :: highest precedence @ ! (monadic) * / REM #* #/ + - #+ #- (dyadic and monadic) = ~= < <= > >= #= #~= #< #<= #> #>= << >> ~ & | EQV NEQV -> TABLE VALOF lowest precedence SLCT precedence not described in R&W-S
Operators associate left-to-right except for -> and TABLE.
Except where stated, an identifier on the left of an assignment means that the result of the right hand side is to be stored in the cell named by that identifier, while an identifier in any other location represents the content of the cell it names. Statements like "a is an address" or "the integer a" mean "the value of the expression a, now taken as an address" or "the value of the expression a, now taken as an integer".
!
b
!(
a+
b)
.
!
a
%
b
BITSPERBYTE
bits of the result of the right side. In all
other contexts, the value is that of the bth character
padded to the left with zeroes (that is, characters are unsigned).
@
a
@
may only be used in this way or in the following two special ways).
@!
a
@
a!
b
(
a+
b)
.
*
b and
a/
b,
a REM
b,
a+
b,
+
a,
a-
b,
-
a
#*
b and
a#/
b,
a#+
b,
#+
a,
a#-
b,
#-
a
=
b etc.
TRUE
or FALSE
.
<
b<=
c etc.
<
b
&
b<=c.
#=
b etc.
<<
b and
a>>
b
~
a
&
b
|
b
NEQV
b
EQV
b
~(
a NEQV
b)
.
,
c
SLCT
a and
SLCT
a:
b and
SLCT
a:
b:
c,
OF
b
(SLCT
x:
y)
OF
b
selects a bitfield from within !
b:
!
b to
the bottom x bits of the right hand
side. The bits set are bits y to
y+x-1, where bit 0 is the rightmost bit;
if x is 0, they are all bits except the rightmost y.
(!
b >>
y)
.
(SLCT
x:
y:
z)
OF
b is equivalent to
(SLCT
x:
y)
OF
(
b+
z)
.
TABLE
k1,k2,...kn, where all the ki are
constant expressions,
VALOF
c
RESULTIS
command is reached, that controls the
value of the expression. Otherwise the value is
unspecified.
Logical operations are evaluated according to one of two rules. If the result of the operator is in "truth context", then "truth rules" are used. Otherwise the operator takes bit-patterns as argument(s) and operates on each bit separately. Truth context is:
->
;
TRUE
or FALSE
. For dyadic
operators, the left operand is evaluated. If and only if this does
not determine the result, then the right operand is evaluated.
TRUE
and FALSE
have values such that logical
operations not using truth rules evaluate correctly to TRUE
or FALSE
.
e1 () e1 (e2, e3, ..., ex)and associates from left to right. e1 to ex are all evaluated, and e1 must result in a procedure designator; the procedure is called with the appropriate arguments. If the procedure is a function, the value of the call is that of the evaluated function body; if it is a routine, the value is
?
.
The number of arguments does not have to match the number
of parameters of the procedure.
Arguments are passed strictly by value.
$(
and $)
optionally followed by a
tag - which is a sequence of letters, digits, and dots (not
necessarily beginning with a letter) - without an intervening
space.
If the tag of the closing bracket does not match that of the
opening one, it also closes any intermediate open section.
For example, the following is valid BCPL:
$(1 some code $(1.1 some code $)1.1 $(1.2 some code $( some code $)1 // Also closes 1.2 and the untagged block
c REPEAT c REPEATWHILE e c REPEATUNTIL e IF e THEN c UNLESS e DO c TEST e THEN c1 ELSE c2 WHILE e DO c UNTIL e DO c FOR i = e1 TO e2 BY e3 DO c RESULTIS e SWITCHON e INTO c GOTO e FINISH RETURN BREAK LOOP ENDCASE
Assignments have the form:
a1, a2, ..., an := e1, e2, ..., enwhere the lists on the two sides must have the same length, and ai must be valid on the left side. The order of the assignments is unspecified; they may be interleaved. Most recent implementations allow the assignment symbol to be prefixed with any dyadic operator, as in:
a1, a2, a3 <<:= e1, e2, e3This is equivalent to:
a1, a2, a3 := a1 << e1, a2 << e2, a3 << e3except that the expressions
a1
, a2
,
a3
are evaluated only once, and the <<
has lower precedence than any operators in the six expressions.
A call has the same form as in an expression, except that any result is ignored.
The contained command of a REPEAT
command is the smallest
possible; in particular, in:
I := VALOF F () REPEATWHILE Jthe repeated command is the call to
F
. The
unconditional REPEAT
command repeats forever.
REPEAT
commands always execute the contained
command at least once.
In the FOR
statement,
the BY
e3 part is optional (equivalent
to BY
1); e3 must be a
constant expression.
The identifier i is declared as a new dynamic cell
whose scope is the contained command. The command
is executed zero or more times with the cell taking values
e1, e1+
e3,
e1+2*
e3, ... so long as
the value is less than (greater if e3 is negative)
e2. The expressions are evaluated before the
first iteration, and not re-evaluated each time.
In the SWITCHON
command, the contained command must be a
block which cannot contain declarations, but can contain case
labels. These have the form
CASE e: DEFAULT:where e is a constant expression. Note that case labels can only be in that block, and not in inner blocks (except for nested
SWITCHON
s).
FINISH
terminates the program; RETURN
terminates
the current procedure (with an undefined result if it is a function);
BREAK
terminates, and LOOP
ends the
current iteration of, the textually surrounding WHILE
,
UNTIL
, REPEAT
, or FOR
.
ENDCASE
similarly exits the textually
surrounding SWITCHON
. RESULTIS
cannot
terminate the current procedure.
Each declaration that creates a cell with dynamic storage duration causes a new cell to be created each time the declaration is executed; the cell is (notionally) destroyed when the block containing the declaration terminates. The order that the cells declared in the same simultaneous declaration are initialized is undefined; they may be interleaved.
There are three types of section declarations. In each case, vi stands for an identifier not declared at the current level, and ki for a constant expression.
MANIFEST
$(
v1 =
k1;
v2 =
k2;
v3 =
k3;
... ;
vn =
kn $)
STATIC $(
v1 =
k1;
v2 =
k2;
v3 =
k3;
... ;
vn =
kn $)
GLOBAL
$(
v1 :
k1;
v2 :
k2;
v3 :
k3;
... ;
vn :
kn $)
Simultaneous declarations take the form:
There are four types of declaration that can be used; they are shown here withLET
d1AND
d2AND ... AND
dn
LET
, but of course can also occur with
AND
. The types may be freely mixed within one
simultaneous declaration.
LET
v1,
v2,
v3, ...,
vn =
e1,
e2,
e3, ...,
en
LET
v = VEC
k
LET
v (
v1,
v2,
...,
vn) BE
commandLET
v (
v1,
v2,
...,
vn) =
e
GLOBAL
declaration for
v, then it continues to designate the cell of the global
vector. Otherwise it designates a cell with static storage
duration. Before the program starts, the cell
is initialized to a procedure designator that invokes
the appropriate body.
The first form generates a routine (which has no result), and the second a function (which does). In effect, the following are equivalent:
LET
v (
v1,
v2, ...,
vn) BE
command
LET
v (
v1,
v2,
...,
vn) =
VALOF $(
command ;
RESULTIS
?
$)
The first two forms may only occur within blocks.
This isn't a formal constraint written in the
BCPL book by Richards and
Whitby-Strevens, but I've never
seen file-scope LET
declarations of variables. In
particular, if I wanted a static or global array, I would always
declare the array within START
, and then assign it to
the static or global variable.
The parameters of a function or routine designate contiguous cells, with adjacent parameters being adjacent. It is implementation defined whether the first parameter has the lowest or highest address. It is implementation-defined whether the values of arguments in excess of the number of parameters are stored in further contiguous cells or are discarded.
Functions and routines may be declared inside blocks. A function or routine may not refer to an identifier which designates a cell with dynamic storage duration and is not declared in that function or routine.
GLOBAL
declaration of
the identifier is in scope, that cell is used) which is
initialized, before the program starts, to a jump target
referring to the location of the label. A label is
in scope within the smallest of:
A jump closure is a value which refers to
a particular invocation of a function or routine. The closure
of the current invocation can be determined by calling the library
routine LEVEL()
; this value is no longer valid when
the invocation terminates.
There are two ways to jump to a label: GOTO
t,
and calling the library routine
LONGJUMP(
c,
t)
;
t is an expression which evaluates to a jump target,
and c is an expression evaluating to a jump closure.
The jump must be made either from the scope of the target label,
or from a function or routine called (possibly indirectly) from
within that scope.
In the latter case, c must evaluate to the closure of
an invocation of the function containing the label and which is
still active.
GLOBAL
declarations in
the header LIBHDR
. The allocation of
routines and functions to particular cells is implementation
defined, but only cells less than FIRSTFREEGLOBAL
are used.
Note that input and output goes via the designators held in
the global vector cells designated by RDCH
and WRCH
.
Altering those cells will alter the behaviour of all
I/O operations.
LIBHDR
contains the manifests:
BITSPERBYTE BYTESPERWORD ENDSTREAMCH FIRSTFREEGLOBAL
START
(
implementation-defined parameters)
character
=
RDCH
()
ENDSTREAMCH
at end of file.
WRCH
(character)
SELECTINPUT
(stream.designator)
SELECTOUTPUT
(stream.designator)
stream.designator
=
FINDINPUT
(string)
stream.designator
=
FINDOUTPUT
(string)
stream.designator
=
FINDFILE
(string)
stream.designator
=
CURRENTINPUT
()
stream.designator
=
CURRENTOUTPUT
()
integer
=
READN
()
RDCH
and returns it.
WRITES
(string)
WRITEN
(integer)
WRITED
(integer,
minimum.width)
WRITEOCT
(integer,
minimum.width)
WRITEHEX
(integer,
minimum.width)
WRCH
.
Padding is done with spaces on the left.
WRITEF
(format,
args ...)
WRCH
.
The format is a string which can contain:
%% call WRCH('%') %S call WRITES with the next argument %C call WRCH with the next argument %O1 call WRITEOCT with the next argument %X1 call WRITEHEX with the next argument %I1 call WRITED with the next argument %N call WRITEN with the next argumentwhere
1
can be any single base-36 digit (i.e. A
means 10, B
means 11, and so on), and is passed as the
second argument of the call.
Other formats are implementation-defined. Note that these calls go
via the global vector (normally twice, since each
then calls WRCH
).
character
=
GETBYTE
(s,
i)
s%i
; used when %
is not supported.
PUTBYTE
(s,
i,
c)
s%i:=c
; used when %
is not supported.
integer
=
PACKSTRING
(v,
s)
FOR i
=
0 TO v!0 DO s%i
=
v!i
except that it returns the
offset from v
of the cell holding s%(v!0)
.
UNPACKSTRING
(s,
v)
FOR i
=
0 TO s%0 DO v!i
=
s%i
.
integer
=
MULDIV
(x,
y,
z)
(x*y/z)
evaluated without overflow if the final
result fits in a cell.
integer
=
RANDOM
(seed)
closure
=
LEVEL
()
LONGJUMP
(closure,
target)
value
=
APTOVEC
(function,
n)
address
=
GETBLK
(
n)
FREEBLK
(address)
//
or ||
or \\
to end of line, or from /*
or
|*
or \*
to the corresponding close symbol
(the two characters reversed);
comments do not nest, and all comment symbols other than the
required close symbol are ignored in a comment.
THEN
and DO
are
interchangeable in all circumstances, as are OR
and ELSE
.
THEN
/DO
may be omitted if followed
by another keyword or a block.
GET
followed by a string constant causes the
file indicated by the string to be textually included at
this point. The keyword and string (possibly
followed by a comment) should form a source line of their own.
The following parts of the language are not supported by all compilers:
%
operator
(but string constants are
part of the core language)
SLCT
and ::
operators
//
+:=