Implementing Software Tools in Pascal

This document relates my experience implementing the programs as written in the books by Brian Kernighan and P. J. Plauger, primarily Software Tools in Pascal. The tools are also used in Kernighan's book (with Dennis Ritchie) The C Programming Language. A Bell Labs portability kit is distributed as Plan 9 from User Space. Finally, an implementation has been done in Haskell, http://www.crsr.net/Programming_Languages/SoftwareTools/, as well as my own working through the exercises in Modula-2, Pascal's direct successor.

The Amsterdam Compiler Kit on Linux (ACK. See the ackit RPM I put together for Red Hat Enterprise Linux) is noted in the Appendix on primitives, and current sources remain effectively the same if the time is taken to build it. The Free Pascal compiler is far more portable and well maintained, and suitable implementation of standard Pascal to use with the book. The examples should work on Windows and Unix (e.g. macOS, Linux) systems.

The Pascal standard refers to Revised Pascal as published in the Second Edition of the Pascal User Manual and Report, printed in 1974, with a corrected edition printed in 1978. This is typically called J&W Pascal. I used the 1978 print as a reference, as well having Mickel's and Miner's (Fourth Edition) rewrite on hand. Kernighan's book mentions ANSI Pascal, which was a draft standard at the time, later to become an IEEE standard, with a corresponding international standard ISO/IEC 7185:1983, updated in 1990 (as noted above), now INCITS/ISO/IEC 7185:1990[S2008], a stablized standard. The IEEE has withdrawn theirs. The international standard is a strict interpretation and extension of J&W Pascal, and therefore is expected to be upwards compatible with the Report.

The character type

A common criticism of the primitives of Software Tools in Pascal is the use of the character type, requiring separate utilities to handle it (e.g. putdec instead of write). The original Ratfor version, using Fortran 66 (and PL/1), seems to imply this is because of the lack of a character type in Fortran 66, yet the mention of the character type of PL/1 and Fortran 77 (the latter not yet formally published as a standard when the book went to press), make it clear this was not the primary reason.

Software Tools in Pascal does explain why we didn't just use read and write (pg. 8). Though showing the implementations and explanations at the end of the chapter, the real explanations are given as you go throughout the book, prefaced at the beginning: Certainly, someone ultimately has to worry about the choice of character set, detecting end of line and end of input, efficiency and the like, but most people need not be concerned, because getc and putc conceal the details.

To elaborate, most Pascal systems require that upon reaching the end of a line, detected with the eol function, that the readln procedure be called. On Windows, where the ASCII standard carriage return, followed by line feed, is used, NEWLINE is treated as a character (not two). This was expected by design (see pg. 14), and is subtly implied by the primitives support for systems that use CRLF (e.g. CP/M, VMS). A byte counter is not the same as a character counter (though they are treated the same in the standard Unix wc command).

However, this discussion makes the point that varying character sets (e.g. EBCDIC, ISO-8859-1, ASCII) require changing the source to accomodate the difference. By burying these details within the primitive, getc can receive a Windows-1252 code point character, pass the internal representation on to putc, and only have to worry about ASCII inside the primitives themselves. I had considered using the EOT character under Unix, SUB under Windows, (and in Modula-2, as suggested by PIM4, the FS character), but once the case is made for getc/putc, passing characters around as a signed INTEGER (i.e. the character subrange) makes more explicit their handling, as evidenced by the globdefs.p module text, and only requires a few extra primitives.

The Appendix makes clear that the direct use of read and write, mapped to their C equivalents under Unix, were inefficient. This alone doesn't make the primitives necessary, but the fact that they can be optimized later without changing the program does.

Finally, having functions that can pass the character as its return value makes for a simpler main procedure. This has received criticism from Pascal users that this is too much like C, and indeed some of the preprocessing and integer passing of characters are used there too, but the Pascal p-system itself and other examples of Wirth's use getch in a similar way.

Reading the text carefully, it becomes clear that Kernighan was thoughtful, consistent, and elegant in his approach to Pascal.

Getting Started

From time to time, Software Tools requires that primitives be built for the programs to function as intended. Software Tools in Pascal provides primitives for the UCB Pascal interpreter, WhiteSmith's Pascal (including references to the UCB compiler for DEC Vax, and ACK, though it is assumed that external primitives are built, perhaps from C, or are already available), and UCSD Pascal Version IV from SofTech. In most cases, the UCB Pascal primitives could be updated to work in an equivalent way with Free Pascal, but Kernighan's descriptions leave out some interesting assumptions that are compiler specific for which I had to track down a copy of the UCB manual to identify the details of the assumption (e.g. blank padding).

Software Tools and Software Tools in Pascal both had tapes, last available for download from the plan9/cm website, which is now defunct. (See plan9.io for a partial mirror.) I was able to obtain the Ratfor Fortran source in time, and built it with GNU Fortran for GNU/Linux. The C version of Ratfor might be tracked down from the Unix Heritage site, as well as one written by Oz from Stanford. (The 4.3BSD Tahoe release has the UCB Pascal interpreter code in a portable edition that might be buildable as well.) The Pascal tape is now available from Kernighan's Princeton mirror of the Bell Labs' material. Having said that, in general, the books must be followed without prebuilt software being available, even though it seems intended that a professor will have provided everything necessary for a student to use. I have posted the book versions of necessary tools (e.g. ratfor, include, define) for Linux. As a note, the use of include has the benefit of always having the primitive and utility modules available, but to the detriment that superfluous code is compiled, used or not, into every binary.

Using FPC's ISO mode, I found building the tools very easy. For instance, the program example for copyprog was buildable, on all supported platforms, with the following:

fpc -Miso -Xst -v0 -l- -ocopyprog wholecopy.p

This is essentially the copytext program provided as an example in the User Manual and Report, Second Edition, pg. 164 (1978 Springer Study Edition), or section 13 of the Report.

Under ACK, this can be done as follows:

ack -o copyprog wholecopy.p

This assumes the RPM I generated for RHEL 7 from the 6.1 alpha code tree, and the correct configuration of the ACKDIR, ACKM, and ACKFE variables. I used the following shell profile configuration:

ACKDIR=/usr
ACKFE=/usr/share/ack/descr/fe
ACKM=linux386
export ACKDIR ACKFE ACKM

The Pascal-p system can be used, (though its strict implementation makes it unsuitable for most of the file and argument primitives). With S. A. Moore's build, I was able to get the following to work:

pcom <wholecopy.p

Copy the resulting prr file to prd, then run the pint command to run the program.

Finally, the p2c package, such as is found on Slackware, is a conversion tool for translating Pascal (or Modula-2) to C. It seems more suitable for converting an existing project to C, with expected code changes, than for being used as an actual compiler. At the least, I was unable, presumably due to I/O bugginess using p2cc, to get copyprog working without error:

$ p2cc -o copy copyprog.p
copyprog..c:63:1: warning: return type defaults to 'int' 
[-Wimplicit-int]
 main(int argc, Char *argv[])
 ^
$ chmod a+rx copy
$ ./copy
hello, world
hello, world


^C
$ ./copy
hello, world
hello, world
Pascal system I/O error 30 (end-of-file)

The first run above was on Slackware. After typing hello, world, I typed Ctrl plus d, then Enter, which is the normal way to signal the end-of-file on Unix (use the cat command as an example), but it didn't work, so I had to type Ctrl plus c. After the second run, I typed Ctrl plus d twice, resulting in the error.

hello, world

Exercise 1-2 of Software Tools in Pascal asks about another beach head. For those used to The C Programming Language, the simple hello, world is an obvious choice. Here are sources for Fortran, Pascal, Modula-2:

       C     Fortran 66
       WRITE (6,7)
     7 FORMAT(13H HELLO, WORLD)
       STOP
       END

program hello
! Fortran 90 and above.
  print *, "hello, world"
end program hello

program hello(output);
{ J&W and INCITS/ISO/IEC 7185:1990[S2008] Pascal }
begin writeln('hello, world')
end.

Counting tools

charcount requires the putdec procedure, which is not introduced in chapter 1, but the standard procedure write, if it is fully supported, can be used, as described on page 57:

{ putdec(nc, 1) }
write(nc:1)

This can be done until you arrive at the end of chapter 2 where putdec is described. I found it best to move along with the book in progression, instead of building other components that hadn't been introduced. Use the example of copyprog to assemble each program into a single file.

The putdec procedure keeps consistent the use of the standard interface with the character type. Pascal has had the difficulty of often being provided in terms of the P2 subset interpreter, and thus compilers often did not always provide the standard write procedure (as the standard get and put procedures are often not available to construct them), which freely mixes integer and char. Since Free Pascal 2.6, an ISO mode has been made available, which is maturing, and as of 3.1.1 is feature complete. Compilers based on INCITS/ISO/IEC 7185:1990[S2008], such as FPC's ISO mode, should be compatible with the definition of the 1978 (final) Report.

#include

The include command is not provided until chapter 3, yet its use is introduced in the last program of chapter 1 (detab). Its use is implied with charcount, and thereafter, with the assumption of the availability of putdec, such as through the inclusion of globdefs.p, prims.p, and utility.p, (see the UCB wrapper example on pg. 322). If the include command is available, I found it easiest to take this in steps, as with this command example:

<detab.p include >detab.pas

I then filled in the rest of the context for detab.pas. This is described in greater detail in section 3.3, on pg 71. Of course, Free Pascal's $include can be used instead (similar to the PL/1 example of pg. 75 of the original Software Tools), but I found this made fixing mistakes harder as the line numbers didn't match up in error messages. (Some early Pascal systems, such as that provided for the Atari 800 XL, had #include as a built-in.) Wirth's CDC compiler used external references to independently compiled libraries (i.e. object files, such as can be independently compiled with the -c flag of the c99 or gfortran commands). Free Pascal supports a similar external referencing feature if you write libraries with another compiler.

getarg

The original Software Tools recognized that common standard in and out simplicity of computers. Often tying an input mechanism through a control card was necessary, or changing the output to a different teletype printer. Dumb terminals changed this to a keyboard direct input and TV screen output. As the Appendix of the original Software Tools indicates, it may be necessary to use an input file to present the arguments for the program. The Revised Pascal of the Second Edition(s) Manual and Report (1974, 1978) had a program header that defined the computer's standard in and out, as well as other files for input and output. Unextended, Pascal is thus restricted to an input file for argument which getarg must be built around. The influence of Unix on operating systems since the 1970s has been significant, as commonly seen in the late 80s and since. The use of Free Pascal here essential assumes a Unix (e.g. macOS, Linux) or Windows cmd.exe command line environment, though other operating systems do exist (e.g. RiscOS). The UCSD example in Software Tools in Pascal shows how the getarg primitive implies the use of an argument parser, even if one has to be built.

The getarg function under Free Pascal required modifications to the UCB example in the Appendix on page 331. Instead of argv and argc, paramcount and paramstr must be used. Replace (n < argc) with (n <= paramcount), and argv(n, arg) with arg := paramstr(n).

See the UCB globdefs.p example in the Appendix for the string type.

error

Though a goto and label could be used for each specific program, or even a branch with a simpler program where a writeln is at the end (as I did with crypt), an error function is fairly simple in Free Pascal. First, Free Pascal provides a halt statement, the same as is described in the User Manual and Report, Second Edition. Using the Free Pascal shortstring type, and writeln for directing output to STDERR, the macro suggested by the book can be avoided:

  PROCEDURE error (CONST s: shortstring);
  BEGIN writeln(openlist[STDERR].filevar, s); halt
  END; 

Free Pascal can also write to erroutput (instead of the initialized STDIN/STDOUT/STDERR environment).

compare0

Currently, compare0 is not possible with Free Pascal 3.0.4, as the use of files in the program header does not allow the required type declaration. Program header functionality has been introduced in 3.1.1, but does not appear to be bug free.

The function getline as provided in the Appendix opens up a whole can of worms for other procedures not yet discussed. Unless the entirety of chapter 3's primitives are complete, it is best to follow the directions in the chapter to build this function with getc, which did not seem as clear to me how to do this as claimed.

Finally, the file descriptors seem confusing to some, and I have seen arguments suggesting that this is a bad practice. However, this seems like a simple approach. It allows for the amount of files to be numbered, and makes it simpler for assigning the internal file name in a way that can count what the maximum file amount is. The globdefs module uses 10 for MAXOPEN, but a modern OS can easily deal with thousands of files. In Linux, see the file /proc/sys/fs/file-max, printed using:

 sysctl fs.file-max

compare

The first thing needed for compare is the open primitive from UCB. This primitive pads intname with blanks, which is not explained in the Appendix, but which makes sense once comparing against the UCB BSD Unix manual. The for loop should be removed for Free Pascal.

The second thing is that Free Pascal uses the standard reset and rewrite commands, so the extended syntax cannot be used. Instead, use the Free Pascal assign procedure:

  assign(openlist[i].filevar, intname);
  IF (mode = IOREAD) THEN RESET(openlist[i].filevar)
  ELSE REWRITE(openlist[i].filevar);

Finally, fix the return status deficiency of the UCB example, by testing against ioresult at the end of the procedure:

  IF (ioresult <> 0) THEN open := IOERROR

This requires that { $i+ } preceeds, and { $i- } succeeds the procedure.

The open procedure relies on initio from pg. 326, but in Free Pascal, instead of assigning /dev/tty to STDERR, use a blank string: assign(openlist[STDERR].filevar, '').

The rest can be taken verbatim from the UCB primitives. message is the same as error, but without halt. In fact, error can be designed as a call to message, adding halt.

xclose

Free Pascal has a built-in close procedure. It provides a name collision with close.p, first encountered with include.p, requiring the procedure be called xclose (otherwise it calls itself recursively). Until the macro tool of chapter 8 is built, and the #define shown on page 340 for the UCSD wrapper can be used (using the macro or define syntax. See pg. 280 or 305), any program that uses xclose will have to be manually edited so uses of close are changed to xclose. These are the following files:

outer.p

Assembling a program following the instructions in Chapter 3, and the Appendix, using the described outer.p template, making the assembled primitive files in their particular directories (resulting in the described globdefs.p, prims.p, and utility.p primitive files), and updating the outer.p file to include copy.p and call the appropriate main program, assembly of the programs should be fairly easy. On Windows, I made the following batch script for use with the Free Pascal 3.1.1 x64 cross compiler:

include <outer.p >copy.pas
ppcrossx64 -Miso copy.pas
del *.o
rem comment the below to debug:
del copy.pas

©2016-2018 David Egan Evans.