Table of Contents:
This document briefly explains implementing the programs written in the book Software Tools in Pascal[KP81] by Brian W. Kernighan and P. J. Plauger, a follow up to their earlier Software Tools[KP76]. A Software Tools User Group also went through the tools, and exercises, in Fortran, using C primitives. The modern version of this is the Bell Labs portability kit, distributed as Plan 9 from User Space, replacing the original research Unix kit. These effectively made the Software Tools user group, and software distribution, redundant.
I worked through the original
Software Tools[KP76] some years ago using GNU
Fortran, and f2c
with GNU C.
I was able to compile the bootstrap Ratfor of the tools tape
on my custom GNU/Linux distribution, and more recently on the Linux
Subsystem for Windows. The only difficulty was in needing to rename the index
function
to hindex
(the h being for Hollerith) to get it to compile with a Fortran 77 compliant compiler, (a compatibility mode in GCC). The C version of Ratfor from research Unix
might be tracked down from the Unix Heritage site, as well as
one written by Oz from Stanford.
However, it became clear to me that Extended Fortran (i.e. Fortran 90 and later) was a far more
robust language, no longer needing Ratfor, and that referencing the original Software Tools
while working through
The C Programming Language[KR78]
[KR88]
had better advantages. Thanks to Professor Kernighan for his help with my copy of his book, and
for hinting at improvements to Fortran in his correspondance. Unless interested in Fortran or PL/1
specifically, working on the original Software Tools was no longer as educational in
context of the elaborated on and modernized Pascal version, except perhaps with the
crypt
tool.
The Pascal language is a derivative of Wirth's work on Algol. Algol started in 1958 as the International Algebraic Language (IAL) with a preliminary report. In 1960 a report was released as the Alogrithmic Language (ALGOL[N60]). It was implemented by Edsgar Dijkstra as part of his Ph.D. dissertation on the implementation of the X1. Wirth was a part of the Algol community, and did his Ph.D. disertation at UCB Berkeley on a dynamically typed version of Algol called Euler[WW66]. This was implemented in 1962-1965 on the IBM 704, then at Stanford on the Burroughs B5000, and later on the IBM 360/30 with an improved, stack oriented interpreter.
In 1964, Wirth began work on a proposal to the IFIP Working Group 2.1 for Algol X, which was rejected in favor of what became Algol-68. He continued work on that proposal as Algol-W[WH66] at Stanford University with Tony Hoare, also implementated on the IBM 360 in a custom, high level assembler, PL360[W68]. This was later continued at Stanford by others.
Wirth moved on to a professorship at ETH Zuerich, where he developed Pascal in 1968 as an Algol and Algol-W derived language, intended to be kept similar, on the CDC 6400 (a predecessor of Cray) with the SCOPE-3.4 operating environment. The introduction to Pascal began with The Programming Language Pascal[W70] (their first technical paper, ETH-1), in November 1970. A version of this was published in Acta Informatica.[WSV71]. A minor revision was published in July 1971 as the Second Edition[W71]. The Revised Report[W72] was published in November 1972, and a further minor revision in July 1973[W73]. The 1973 revision adds the optional program header that is more recognizable to Pascal users outside ETH.
During this time, the Pascal-P compiler[AJNN74] was designed as a portability kit. The CDC compiler was later rewritten for the language of the Revised Report against the P-system, version P4 being the final P-system release. The Pascal-P language is a small, minimal language intended for porting to other platforms in a unique approach: the compiler written in Pascal against a stack machine interpreter, also written in Pascal, but in such a way that it can be translated into a language for another platform. Pascal-P was used to build the UCSD system, which added its own non-standard extensions, which was also picked up by Turbo Pascal, both of which popularized Pascal, making this work of Wirth's students significant to the success of Pascal.
Pascal was slowly being tweaked through these ETH published reports. Beginning with the book Systematiches Programmeren[Wir72] in 1972, and its English equivalent, Systematic Programming: An Introduction[W76] in 1976, these tweaks can be seen inbetween the reports. The first edition of the K&R PASCAL: user manual and report[JW74] was published in 1974, then a text on algorithms[W75] was published in the summer of 1975 in German, then in English as Algorithms + Data Structures = Programs[Wir76].
A subset language called Pascal-S[Wir75] (bigger than Pascal-P), was implemented by Wirth for the sake of teaching. Perhaps it was this language that led to the misunderstanding that Pascal was designed only as a teaching language, or the fact that Wirth published Pascal as a university professor, instead of as a student or committee member. Pascal, like its predecessors, was intended to be a full, complete language for programming.
In 1978, the second edition of the User Manual and Report[JW78] was published, which went through several reprints (accumulating corrections). I have the first and fourth printing of this edition. Most notably, it enforces the use of the program header. Other papers were published relating to Pascal at ETH. In 1980 a proposal for a standard started at ANSI, and later ISO, completed in 1983. The University of Minnesota continued the CDC work on the Pascal compiler, and Mickel and Miner issued a third edition of the User Manual and Report in November 1984, and its final fourth edition against the revised and final ANSI standard in February of 1991. The CDC edition of Pascal gave way to Welsh and Hay's Model Implementation. There was also an Extended Pascal, modeled after UCSD's units, (and ADA's packages?).
From time to time, Software Tools in Pascal requires that primitives be built for the
programs to function as intended. These were provided with a toolstape, now available from
Kernighan's Princeton page. (See
plan9.io for a partial mirror of the old plan9/cm website.)
This tape relies on original forms of the research Unix ar
and nroff
commands, so are only partly useful in a modern Unix or GNU/Linux environment.
The first exercise wants you to be familiarized with your compiler environment. It provides a
complete program to compile, copyprog
, which is similar to the copytext
program provided as an example in the User Manual and Report, Second Edition, pg. 164 or
section 13 of the Report. This is provided on the toolstape as wholecopy.p
. I first
started with the p2c
package, such as is found on Slackware. I was unable, presumably
due to I/O bugginess using p2cc
, to get copyprog
working without
error.
Because it is mentioned in the Appendix for the Whitesmith's Primitives, here's an example with the Amsterdam Compiler Kit:
ack -o copyprog wholecopy.p
I made an RPM generated for RHEL 7 from the
ackit 6.1 alpha code tree. Installation on RHEL
requires the correct configuration of the ACKDIR
, ACKM
, and
ACKFE
variables. I used the following shell profile configuration:
ACKDIR=/usr ACKFE=/usr/share/ack/descr/fe ACKM=linux386 export ACKDIR ACKFE ACKM
The Whitesmith's compilers are no longer available, having been purchased by several companies and finally buried. They are connected to Plauger's company Whitesmith's Ltd. and its Idris Unix system. The Pascal version of the book was mainly written by Kernighan, who used Bill Joy's BSD interpreter (not the corresponding compiler) for the initial work. The P4 derived UCSD OS for Z80, Intel 8080, and later Intel 8088 (the IBM PC), also has available primitives which includes a simplistic shell interpreter for handling files and redirection. It has been tempting to look at the UCSD primitives for p5, as well as try out the Atari Pascal ISO compatible p-system.
There were several other compilers I found, though all but p5 (carrying on the P4 system
against the INCITS/ISO/IEC standard, and providing an alternative to the proprietary Model
Implementation) was defunct. However, Free Pascal was starting to add its ISO mode, and I found
this sufficiently suitable, and the most modern implementation, though as of version 3.2
it still has some bugs. The program example for copyprog
was buildable, on all
supported platforms, with the following command:
fpc -Miso -Xst -v0 -l- -ocopyprog wholecopy.p
getc
and putc
Several reasons are given for getc
and putc
. The
first is to hide the details of what is unique to any particular system: its
input and output devices. Hiding not only the details of how to pick the
standard in and out devices, but also how lines and files are handled in terms
of markers and functions that identify these details, is incredibly useful. The
general answer to needing these is first explained in the authors' book
The Elements of Programming
Style[KP78]. Isolate the
details of I/O into one place that is recognized as being non-portable,
including different character sets. This is explained with both PL/1 and
Pascal.
Under Fortran 66, there is no character type. (There is in Fortran 77.)
There is only the Holerith string type. Integers had to be passed around and
converted to Holeriths. PL/1 had a character type, but one of the exercises
asks why it wasn't used, and both books suggest this will be explained (which
it does in several places). Passing of integer and character around makes less
sense at first with Pascal, until an underlying key piece is explained: most of
the compilers available to Kernighan and Plauger are written in C. The
char
type of C, to be portable, needs to be abstracted to
distinguish between signed and unsigned integers, not only differences in
character set. This becomes especially needful when using a negative integer
(e.g. -1) as an end-of-file sentinel. There's low level nuggets like this
scattered through both books. This lends to solving problems of efficiency,
which also merit having a separate abstraction. Compiler authors are dealing
with complex software. Writing for small systems often required tricks to make
a program usable. With modern computing, perhaps this is mostly unnecessary,
(except with HPC needs).
charcount
requires the putdec
procedure (as
described on pages 57-58), but it is not introduced in chapter 1. The standard
procedure write
, if it is fully supported, can be used until you
arrive at the end of chapter 2 where putdec
is described:
{ putdec(nc, 1) }
write(nc:1)
Kernighan was careful to only use a compatible subset of the definition of the 1974 (final) Report, the (at the time) proposed ANSI/ISO standard, and existing implementations (see pg 28-29). This approach of using a compatible subset also explains the primitives approach of both books, which at first seem redundant, but ultimately become clear in practice as the only way to handle portability between implementations, as well as provide the opportunity to tweak the efficiency of those primitives (sometimes due to inadequacies of a compiler).
The include
command is not provided until chapter 3, yet its
use is introduced in the last program of chapter 1 (detab
), and
implied with charcount
(see the wrapper on pg. 71). The book hints
that #include
was used by Kernighan with the Unix C preprocessor.
The Free Pascal $include
can be used instead (similar to the
PL/1 example of pg. 75 of the original Software Tools), but I found this
made fixing mistakes harder as the line numbers didn't match up in error
messages. Wirth's CDC compiler used external references to independently
compiled libraries (i.e. object files, such as can be used with the
-c
flag of the c99
or gfortran
commands).
This is consistent with the Whitesmith's and ACK example in the appendix. Free
Pascal supports a similar external referencing feature if you write libraries
with another compiler (or use unit files). I found it a useful exercise in
efficiency, using the example of copyprog
, to assemble each
procedure and function manually into a single program file until I had the
include
command built.
The getarg
function under Free Pascal required modifications
to the UCB example in the Appendix on page 331. Instead of argv
and argc
, paramcount
and paramstr
can
be used. Replace (n < argc)
with
(n <= paramcount)
, and argv(n, arg)
with
arg := paramstr(n)
.
See the UCB globdefs.p
example in the Appendix for the
string
type.
Though a goto
and label
could be used for
each specific program, or even a branch with a simpler program where a
writeln
is at the end, an error
function is fairly
simple in Free Pascal. First, Free Pascal provides a halt
statement, the same as is described in the User Manual and Report,
Second Edition. Using the Free Pascal shortstring
type, and
writeln
for directing output to STDERR, the macros
suggested by the book can be avoided:
PROCEDURE message (CONST s: shortstring);
BEGIN writeln(openlist[STDERR].filevar, s)
END;
PROCEDURE error (CONST s: shortstring);
BEGIN message(s); halt
END;
Free Pascal can also write to erroutput
, (instead of
the initialized STDIN/STDOUT/STDERR
environment of the Appendix UCB primitives).
NOTE. In the crypt
example from Why Pascal Is Not My
Favorite Language[Ker81],
using the FreePascal xor
built-in allowed me to build
crypt
. However, instead of using halt
, it made more
sense to change the conditional so that the last part of the program was an
else
that was at the end of the program. (End of note.)
Currently, compare0
is problematic with Free Pascal 3.2, as
the use of files in the program header does not allow the required type
declaration, perhaps not being sufficiently bug free for this program to
work.
The function getline
as provided in the Appendix opens up a
whole can of worms for other procedures not yet discussed until the
entirety of chapter 3's primitives are complete.
The first thing needed for compare
is the open
primitive from UCB. This primitive pads intname
with blanks,
which is not explained in the Appendix, but which makes sense once comparing
against the BSD Unix manual. The for
loop should be removed for
Free Pascal.
The second thing is that Free Pascal uses the standard
reset
and rewrite
commands, so the extended
syntax cannot be used. Instead, use the Free Pascal assign
procedure:
assign(openlist[i].filevar, intname);
IF (mode = IOREAD) THEN RESET(openlist[i].filevar)
ELSE REWRITE(openlist[i].filevar);
A fix for the return status deficiency of the UCB example can be
tested against ioresult
at the end of the procedure:
IF (ioresult <> 0) THEN open := IOERROR
This requires that { $i+ }
preceeds, and
{ $i- }
succeeds the procedure.
The open
procedure relies on initio
from pg. 326. In Free Pascal on Unix and GNU/Linux, instead of
assigning /dev/tty
to STDERR, use a blank
string: assign(openlist[STDERR].filevar, '')
.
The rest can be taken verbatim from the UCB primitives.
The file descriptors described here seem confusing to some, however the
approach seems simple enough. It allows for the amount of files to be
numbered, and makes it simpler for assigning the internal file name in a way
that can count what the maximum file amount is. The MAXOPEN
variable for file handle count has a small number, but a modern OS can easily
deal with thousands of files. In GNU/Linux, see the file
/proc/sys/fs/file-max
, printed using:
$ sysctl fs.file-max
Free Pascal has a built-in close
procedure, which
name collides with close.p
(first used in
include.p
), requiring the procedure to be called
xclose
(otherwise it calls itself recursively). Until the
macro tool of chapter 8 is built, and the #define
shown on
page 340 for the UCSD wrapper can be used (using the macro
or define
syntax. See pg. 280 or 305), any program that
uses xclose
will have to be manually edited so uses of
close
are changed to xclose
. Following are the
affected files:
To assemble a program, follow the instructions in Chapter 3, and the
appendix, making the assembled primitive files in their particular directories
(resulting in the described globdefs.p
, prims.p
,
and utility.p
primitive files. A similar batch script as below
could be used), and updating the outer.p
file to include the
program file (e.g copy.p
) and call the appropriate main program,
assembly of the programs should be fairly basic. On Windows, I made the
following batch script, for use with the Free Pascal x64 cross compiler, to
assemble copy
:
include <outer.p >copy.pas
ppcrossx64 -Miso copy.pas
del *.o
rem comment the below to debug:
del copy.pas
As the appendix notes, this is inefficient for small programs that don't
use all the primitives and utilities. Something similar to the Whitesmith's
example can also be used with Free Pascal's external
statement
for function declarations. Though most modern command interpreters follow the
full Unix conventions described in the book, adapting the UCSD custom
interpreter might be a fruitful exercise.
Building and using the programs while reading this book was fun, and gave an in depth view of the fundamentals of good CLI and small tool programming. I had planned a second pass of the book to work on the exercises with Free Pascal, but the weaknesses of Pascal that Kernighan makes abundently clear are fixed in Pascal's successor language, so I have continued working on the exercises in Modula-2 [W88]. Some good information exists at Scott Moore's site on Pascal (as well as early Basic) programming and history.
©2016-2023 David Egan Evans.