Implementing Software Tools in Pascal

This document relates my experience implementing the software tools as written in the books by Brian Kernighan and P. J. Plauger, primarily Software Tools in Pascal. The Amsterdam Compiler Kit on Linux (32-bit; ackit RPM for RHEL 7) is noted in the Appendix on primitives, and current sources remain effectively the same if the time is taken to build it. The Free Pascal compiler is far more portable and well maintained: the examples should work on Windows and Unix (e.g. macOS, Linux).

The Pascal standard refers to Revised Pascal, as published in the Second Edition of Pascal User Manual and Report, printed in 1974 with a corrected edition printed in 1978. This is typically called J&W Pascal. I used the 1978 print as reference. Kernighan's book mentions ANSI Pascal, which was a draft standard at the time, later to become an ANSI and IEEE standard, with a corresponding international standard ISO/IEC 1785:1983. An updated standard was published in 1990, and is now INCITS/ISO/IEC 1785:1990, a stablized standard. The IEEE has withdrawn theirs. The international standard is a strict interpretation and extension of J&W Pascal, and is therefore is expected to be upwards compatible with the Report.

The character type

A common criticism of the primitives of Software Tools in Pascal is the use of the character type, requiring separate utilities to handle this type (e.g. putdec instead of write). The original Ratfor version, using Fortran 66 (and PL/1), seems to imply this is because of the lack of a character type in Fortran 66, yet the mention of the character type of PL/1 and Fortran 77 (the latter not yet formally published as a standard when the book went to press), make it clear this was not the primary reason.

Software Tools in Pascal does explain why we didn't just use read and write, but though some of it is closer to showing the implementations and later explanations at the end of the chapter, the real explanations are given as you go throughout the book, prefaced at the beginning: Certainly, someone ultimately has to worry about the choice of character set, detecting end of line and end of input, efficiency and the like, but most people need not be concerned, because getc and putc conceal the details (emphasis on not reflected as italics in the book).

To elaborate, most Pascal systems of the day, and modern ones too, require that upon reaching the end of a line, detected with the eol function, that the readln procedure be called. On Windows, where the ASCII standard carriage return, followed by line feed, is used, NEWLINE is treated as a single character instead of two. This was expected by design, as is subtly implied by the primitives support for systems that use CRLF (e.g. CP/M, VMS). Though most C-based Unix programs have fixed what is perceived as a character miscount, I found that some programs failed if not treated properly, (such as my translation of Software Tools crypt, where Free Pascal's built-in xor function made completing this program easy).

However, this discussion makes the point that varying character sets (e.g. EBCDIC, ISO-8859-1, ASCII) require changing the source to accomidate the difference. By burying these details within the primitive, getc can receive a Windows-1252 code point character, pass the internal representation on to putc, and only have to worry about ASCII inside the primitives themselves. I had considered using the EOT character under Unix, SUB under Windows, (and in Modula-2, as suggested by PIM4, the FS character), but once the case is made for getc/putc, passing characters around as a signed INTEGER (i.e. the character subrange) makes more explicit their handling, as evidenced by the globdefs.p module text, and only requires a few extra primitives.

The Appendix makes clear that the direct use of read and write, mapped to their C equivalents under Unix, were inefficient. This alone doesn't make the primitives necessary, but the fact that they can be optimized later without changing the program does.

Finally, having functions that can pass the character as its return value makes for a simpler main procedure. This has received criticism from Pascal users that this is too much like C, and indeed some of the preprocessing and integer passing of characters are used there too, but the Pascal p-system itself and other examples of Wirth's use getch in a similar way.

Reading the text carefully, it becomes clear that Kernighan was thoughtful, consistent, and elegant in his approach to Pascal.

Getting Started

From time to time, Software Tools requires that primitives be built for the programs to function as intended. Software Tools in Pascal provides primitives for UCB Pascal, WhiteSmith's Pascal, and UCSD Pascal Version IV from SofTech. In most cases, the UCB Pascal primitives could be updated to work in an equivalent way with Free Pascal, but Kernighan's descriptions leave out some interesting assumptions that are compiler specific for which I had to track down a copy of the UCB manual to identify the details of the assumption (e.g. blank padding).

Software Tools and Software Tools in Pascal both had tapes that can be downloaded from the Plan 9 website, which is now partly defunct. (See plan9.io for a more complete mirror.) I was able to obtain the Ratfor Fortran source in time, and built it with GNU Fortran for GNU/Linux. The C version of Ratfor might be tracked down from the Unix Heritage site, as well as one written by Oz from Stanford. Having said that, in general, the books must be followed without prebuilt software being available, even though it seems intended that a professor will have provided everything necessary for a student to use. I have posted the book versions of necessary tools (e.g. ratfor, include, define) for Linux.

Using FPC's ISO mode, I found building the tools very easy. For instance, the program example for copyprog was buildable, on all supported platforms, with the following:

 fpc -Miso -Xst -v0 -l- -ocopyprog wholecopy.p

This is essentially the copytext program provided as an example in the User Manual and Report, Second Edition, pg. 164 (1978 Springer Study Edition), or section 13 of the Report.

charcount requires the putdec procedure, which is not introduced in chapter 1, but the standard procedure write, if it is fully supported, can be used, as described on page 57:

  { putdec(nc, 1) }
  write(nc:1)

This can be done until you arrive at the end of chapter 2 where the program is described. I found it best to move along with the book in progression, instead of building other components that I hadn't been introduced to. Use the example of copyprog to assemble each program into a single file. However, if a complete globdefs.p, prims.p, and utility.p are available, as well as the include and define commands, then the use of putdec allows for discussion of the procedure until later.

The putdec procedure keeps consistent the use of the standard interface with the character type. Pascal has had the difficulty of often being provided in terms of the P2 subset interpreter, and thus compilers often did not always provide the standard write procedure (as the standard get and put procedures are often not available to construct them), which freely mixes integers and chars. Since Free Pascal 2.6, an ISO mode has been made available, which is maturing, and as of 3.2 is feature complete. Compilers based on INCITS/ISO/IEC 1785:1990, such as FPC's ISO mode, should be compatible with the definition of the 1978 (final) Report.

#include

The include command is not provided until chapter 3, yet its use is introduced in the last program of chapter 1 (detab). The simplest solution is to continue building the programs in a single text file I found it easiest to take this in steps, as with this command example:

 <detab.p include >detab.pas

I then filled in the rest of the context for detab.pas. This is described in greater detail in section 3.3, on pg 71. Of course, Free Pascal's $include can be used (similar to the PL/1 example) instead, but I found this made fixing mistakes harder as the line numbers didn't match up in error messages. (Some early Pascal systems, such as that provided by the Atari 800 XL, had #include as a built-in.) Wirth's CDC compiler used external references to independently compiled (like -c from c99 or gfortran) libraries (object files). Free Pascal supports a similar external referencing feature if you write libraries with another compiler.

getarg

The getarg function under Free Pascal required modifications to the UCB example in the Appendix on page 331. Instead of argv and argc, paramcount and paramstr must be used. Replace (n < argc) with (n <= paramcount), and argv(n, arg) with arg := paramstr(n).

See the UCB globdefs.p example in the Appendix for the string type.

error

Though a goto and label could be used for each specific program, or even a branch with a simpler program where a writeln is at the end (as I did with crypt), an error function is fairly simple in Free Pascal. First, Free Pascal provides a halt statement, the same as is described in the User Manual and Report, Second Edition. Using the Free Pascal shortstring type, and writeln for directing output to STDERR, the macro suggested by the book can be avoided:

  PROCEDURE error (CONST s: shortstring);
  BEGIN writeln(openlist[STDERR].filevar, s); halt
  END; 

Free Pascal can also write to erroutput (instead of the initialized STDIN/STDOUT/STDERR environment).

compare0

Currently, compare0 is not possible with Free Pascal 3.0.2, as the use of files in the program header does not allow the required type declaration. This has been fixed in 3.1.1.

The function getline as provided in the Appendix opens up a whole can of worms for other procedures not yet discussed. Unless the entirety of chapter 3's primitives are complete, it is best to follow the directions in the chapter to build this function with getc, which did not seem as clear to me how to do this as claimed.

Finally, the file descriptors seem confusing to some, and I have seen arguments suggesting that this is a bad practice. However, this seems like a simple approach. It allows for the amount of files to be numbered, and makes it simpler for assigning the internal file name in a way that can count what the maximum file amount is. The globdefs module uses 10 for MAXOPEN, but a modern OS can easily deal with thousands of files. In Linux, see the file /proc/sys/fs/file-max, printed using:

 sysctl fs.file-max

compare

The first thing needed for compare is the open primitive from UCB. This primitive pads intname with blanks, which is not explained in the Appendix, but which makes sense once comparing against the UCB BSD Unix manual. The for loop should be removed for Free Pascal.

The second thing is that Free Pascal uses the standard reset and rewrite commands, so the extended syntax cannot be used. Instead, use the Free Pascal assign procedure:

  assign(openlist[i].filevar, intname);
  IF (mode = IOREAD) THEN RESET(openlist[i].filevar)
  ELSE REWRITE(openlist[i].filevar);

Finally, fix the return status deficiency of the UCB example, by testing against ioresult at the end of the procedure:

  IF (ioresult <> 0) THEN open := IOERROR

This requires that { $i+ } preceeds, and { $i- } succeeds the procedure.

The open procedure relies on initio from pg. 326, but in Free Pascal, instead of assigning /dev/tty to STDERR, use a blank string: assign(openlist[STDERR].filevar, '').

The rest can be taken verbatim from the UCB primitives. message is the same as error, but without halt. In fact, error be a call to message adding halt.

xclose

Free Pascal has a built-in close procedure. It provides a name collision with close.p, first encountered with include.p, requiring the procedure be called xclose (otherwise it calls itself recursively). Until the macro tool of chapter 8 is built, and the #define shown on page 340 for the UCSD wrapper can be used, any program that uses xclose will have to be manually edited so uses of close are changed to xclose. These are the following files:

©2016-2017 David Egan Evans.