Implementing Software Tools in Modula-2

[Note]: This document is a work in progress as I port the tools from Pascal to Modula-2, begun in 2013, but progressing slowly as work and life constantly get in the way.

In a previous document, I relate my experience of a first pass through Software Tools in Pascal [KP81], and to some degree Software Tools [KP76], using Free Pascal. This document follows my experience with the tools, as well as exercises, of Software Tools in Pascal, but in Modula-2. It is assumed that Software Tools and Software Tools in Pascal have been read and the tools built in Pascal.

Modula-2 was chosen because it is the successor to Pascal, and fixes most of the problems of Pascal, (as described by Kernighan in Why Pascal is Not My Favorite Programming Language [Ker81]). However, some of those problems still remain for other reasons. Just as many Pascal compilers implemented the Pascal-p/s subsets, implementing extensions incompatible with the complete language, Modula-2 compilers have subtle differences as well as differing opinions and approaches to fix Wirth's language. MOD and DIV of the ISO/IEC standard are compatible with PIM4, but may not be with PIM3 compilers. Sadly, many PIM compilers stopped with supporting either version 2 or 3, then jumped to the ISO/IEC standard, instead of maintaining PIM4 support. Following Kernighan's Pascal approach of only using the parts of the language that are universal, i.e. consistent with the Pascal 1978 report and the ANSI 1983 standard (now INCITS/ISO/IEC 1785:1990[S2008]), to ensure the programs compile with existing compilers, a (much larger) subset of the PIM4 Modula-2 language can be used. However, I decided to draw the line with PIM4 compatibility and the ISO/IEC standard. I/O handling is almost worse than Pascal, where everything is now in libraries, no compiler having the same implementation, and Wirth's report only recommending standard modules (including SYSTEM).

Because of the unsuitability of the Medos-2 libraries, an ETH project existed to build the tools primitives on Medos-2 as a kind of portable library for Modula-2 called HOST [KU87]. Like the UCSD example in Software Tools in Pascal, it required building a custom interpreter environment, and had an interesting direction it took in supplied primitives. This is likely due to the simplistic Medos-2 command-line interface and matching InOut module.

The ISO/IEC 10514-1:1996 standard has multiple, sometimes subtle, differences with PIM4. Due to these differences, extensions and design decisions, the U.S. ANSI deligation rejected ISO/IEC 10514-1:1996. Programming in Modula-2, Fourth Edition (PIM4), published in English in 1988, and in German in 1992, is the last Modula-2 report, along with some curious extensions found in Algorithms and Data Structures (Modula-2 edition). Like the 1978 Pascal Manual and Report, it is the standard for Modula-2, but unlike Pascal a Modula-2 program in some ways, will not work correctly with an ISO/IEC 10514-1:1996 compiler or even existing PIM compilers. Only one PIM4 compiler exists that is current: GNU Modula-2 (gm2). I began a couple years ago working with m2c, but found so many bugs in it, that I spent more time patching it than doing anything productive. m2c is an incomplete implementation of Modula-2. To use it requires working with further restrictions.

For now, I have compiled and packaged the Amsterdam Compiler Kit (ACK) onto Red Hat Enterprise Linux 7, along with the Mocka compiler. Unfortunately, these are 32-bit compilers (without support for 64-bit as Free Pascal has), and ACK does not have a separate compilation facility. It is also PIM3 only as is Mocka. Along with the RPMs I provide, the 1208m version is especially recommended. The m is for Maurer's editions, who also provides the Murus libraries. I decided not to adopt Benjamin Kowarsch's Mocka. Though original indications were an update for 64-bit platforms, PIM4 compatibility, and bug fixes, explorations of the code on GitHub, and asking him electronically about his differing compilers, has shown he is more interested in M2R10, and updating the ISO/IEC standard, than actually implementing PIM4 as written. GNU Modula-2 (gm2) provides a PIM4 compiler which does support 64-bit, perhaps one of the few, but it is not yet part of the mainstream GCC. For macOS, the only known compiler is p1. For the original Macintosh, the MacMETH compiler is in a way the newest reference compiler from Wirth. For Windows, the ADW (previously Stony Brook) compiler is accessible, or perhaps the now unsupported Logitech compiler.

Getting Started

Exercise 1-2 of Software Tools in Pascal is easy with the classic Kernighan hello, world in Modula-2, as given by the echo man page. The following provides a PIM and ISO/IEC 10514-1:1996 version:

MODULE hello;
(* PIM/Medos-2/RT-11 standard library. *)
  FROM InOut IMPORT WriteString, WriteLn;
BEGIN
  WriteString("hello world!"); WriteLn
END hello.
MODULE hello;
(* ISO/IEC 10514-1:1996 standard library. *)
  FROM STextIO IMPORT WriteString, WriteLn;
BEGIN
  WriteString("hello world!");
  WriteLn
END hello.
MODULE hello;
(* Using Kernighan's message function. *)
  FROM Prims IMPORT message;
BEGIN message("hello world!")
END hello.

Mocka is a curious creature depending on the version used. 9905, 0605m, 0608m, and 1208m. With the latter, the m2.tgz package should be extracted as the root (ID 0) user to /usr/local/ or /opt/, and a /etc/profile.d/m2.sh file, with the correct variables, set. Instead of the .mi and .md previously required extensions, .mod and .def are used. The first time you run the m2 command on a file, it will create a $HOME/m2/ directory with bin/, src/, and out/ directories. Your binaries will be placed in bin/. Author your source files directly in src/. Object code and symbol files will go in out/.

For ACK, I put together an RPM, installed it to Red Hat Enterprise Linux 7, and created a $HOME/src/ack/ directory. In that directory, I copied my hello.m2 file as hello.mod and ran the following command:

$ ack -Rm2-3 -o bin/hello hello.mod

I found that the current 6.1pre1 release did not build for me, so I had to download the source from the git repository. The author graciously provided a pre-packaged tape archive for me to use. A Pascal compiler is also provided, and is mentioned in the Appendix to Software Tools in Pascal.

Finally, the ISO/IEC 10514-1:1996 hello source was compiled on Windows 10 Pro using the ADW (Stony Brook) IDE. I created a project called Tools, created a new program module, and using the text, compiled it and linked it. I pointed the project directory to C:\Users\me\M2-Tools\. In that directory a hello.exe file was created. This is the primary compiler I'm using for porting the tools. Though I have tested under Mocka, ACK, and gm2 in some places, there seems to be little point to porting the tools to compilers on Unix systems, including macOS and Linux. More sophisticed tools exist there and in the Plan 9 from User Space kit.

Syntax changes in copyprog

The first obvious thing when dealing with moving Pascal programs to Modula-2 is case sensitivity. Following the Mesa language design, upper-case statements, such as BEGIN, should only be used for reserved objects of the compiler. This avoids portability problems between compilers. It also means that begin and Begin are different from BEGIN. The second is that all statements close with END. For instance, if an IF statement has a single condition, END needs to be added (but no BEGIN). If an IF statement has multiple conditions, and thus in Pascal has begin and end, the begin statement needs to be removed. In Software Tools (both versions), upper-case symbolic constants are used to stand out. I decided to stay with this convention, since most of them are provided by the globdefs.p (which I chose to call GlobDefs.def). By calling the module unqualified (e.g. GlobDefs.ENDFILE), reserved object collisions can be avoided. Unfortunately, different compilers interpreted the Modula-2 Report in differing ways as to how IMPORT was used. I would argue that if FROM foo IMPORT bar is used, a separate IMPORT foo is not needed to use foo.bar. In this case, at least documenting bar as being (* foo.bar *) should be sufficiently clear.

Comments are now only (* and *): the curly brace comments have to be converted.

FUNCTION and PROCEDURE have been combined, i.e. there is no FUNCTION, but instead a PROCEDURE can now have parameters that it returns using the RETURN statement (instead of assigning to itself the return value).

However, the biggest difference to note is the loss of built-in I/O statements. Instead, this is provided by compiler specific modules. Where the draw back of no built-in statements and procedures for I/O is a draw back, the benefit is that everything is in modules, so that extensions are better contained and thus more easily portable. The ISO/IEC 10514-1:1996 modules STextIO and SIOResult can be used with compilers such as ADW, p1, and gm2, but as a whole these modules become more and more complex as the primitives are being handled. Kernighan's primitives really come in handy here and show their merit in assisting portability. Note that Wirth's InOut module design is intended to be interactive, i.e. a single command terminated by a space as provided by Terminal. InOut is supposed to switch between the use of Terminal and Streams, based on context. Modules described in PIM are different between the RT-11 and Medos-2 implementations. Specifically, Streams is recommended in PIM as the primary file interface, with Files being the intermediate library (not needing to be standard). The book suggests their combination into FileSystem, as found with Medos-2, but in a later paper, Wirth regretted the combination, and suggested that Streams, with something like Files, really was the best design. (LineDrawing is conspicuously absent from most Modula-2 compilers. The Oberon system used Texts and Files as abstractions, similar to Streams and Files. It was Reiser who introduced In and Out.) Building custom modules using SYSTEM might be effective. For more details, see K. N. King's Modula-2: A Complete Guide. As noted in PIM, Wirth anticipated that compiler authors write modules adapted to their environments and machines. Though InOut, RealInOut, LineDrawing, and MathLib0 (originally MathLib, and Math under Oberon) are considered standard, built on top of an abstraction following the model of Streams and Files, such as FileSystem in Medos-2, ultimately even these are system specific in their model, and not as familiar to Unix or CP/M (or DOS) users of the era. Unfortunately, I was unable to identify a standard in/out file default making copyprog possible, though an in and out file could be predefined.

One thought for the sake of Pascal to Modula-2 portability is to build a PascalIO module that implements Pascal's I/O functions. This is similar to a compatibility library in ACK. That way, the sometimes drastic differences in Modula-2 implementations will make the primitives more easily portabile from Pascal, (i.e. push down get, put, read, write, reset, and rewrite another level, and build the primitives against them). Ultimately, the primitives should be built against the most optimal and useful provided modules for efficiency. I rejected the PascalIO approach as not taking advantage of Modula's conceptualization of text and file streams and types: the eof function in Wirth's library is replaced with the Done boolean, which the module returns depending on whether the Read primitives are able to get another CHAR (or INTEGER, or REAL, etc.). Also, other than the provided UCSD and Berkeley Pascal (interpreter) primitives, the book assumed that the needed primitives would be handled with something like the CDC compiler's extern statement: Fortran-based on the CDC, C-based in context of the book's referenced compilers.

The last primary difference with Pascal is everything is a module. Use MODULE instead of program. program file parameters are no longer supported, but are provided through modules (e.g. Files). The input and output built-ins of Pascal are assumed by many compilers, though strictly, InOut switches between Terminal and Streams. Due to Medos-2, and perhaps Algorithms & Data Structures, FileSystem is the more common library than Streams (and Files which it relies on).

The initial result for the copy program is:

MODULE copy; (* DEE 2013-12-11 *)
  FROM GlobDefs IMPORT ENDFILE, character;
  FROM Prims IMPORT getc, putc;

  PROCEDURE copy;
    VAR c: character;
  BEGIN
    WHILE (getc(c) # ENDFILE) DO
      putc(c)
    END
  END copy;

BEGIN copy
END copy.

wordcount

Here are the charcount, linecount, and wordcount programs in Modula-2, assuming that the GlobDefs, Prims, and Utility modules, modeled after those of the UCB interpreter of the book, are built against your compiler's SYSTEM and I/O libraries:

MODULE charcount; (*DEE 2015-11-25*)
  FROM GlobDefs IMPORT ENDFILE, NEWLINE, character;
  FROM Prims IMPORT getc, putc;
  FROM Utility IMPORT putdec;

  (* charcount: count characters in standard input *)
  PROCEDURE charcount;
    VAR nc: CARDINAL; c: character;
  BEGIN nc := 0;
      (* GlobDefs.ENDFILE *)
    WHILE (getc(c) # ENDFILE) DO INC(nc) END;
    putdec(nc, 1);
    putc(NEWLINE) (* GlobDefs.NEWLINE *)
  END charcount;

BEGIN charcount
END charcount.
MODULE linecount; (*DEE 2015-11-25*)
  FROM GlobDefs IMPORT ENDFILE, NEWLINE, character;
  FROM Prims IMPORT getc, putc;
  FROM Utility IMPORT putdec;

  (* linecount: count lines in standard input *)
  PROCEDURE linecount;
    VAR nl: CARDINAL; c: character;
  BEGIN nl := 0;
    WHILE (getc(c) # ENDFILE) DO (* GlobDefs.ENDFILE *)
      IF (c = NEWLINE) THEN (* GlobDefs.NEWLINE *)
        INC(nl)
      END
    END;
    putdec(nl, 1);
    putc(NEWLINE) (* GlobDefs.NEWLINE *)
  END linecount;

BEGIN linecount
END linecount.
MODULE wordcount; (*DEE 2015-11-25*)
  FROM GlobDefs IMPORT BLANK, ENDFILE, NEWLINE, TAB, character;
  FROM Prims IMPORT getc, putc;
  FROM Utility IMPORT putdec;

  (* wordcount: count words in standard input *)
  PROCEDURE wordcount;
    VAR nw: CARDINAL; c: character; inword: BOOLEAN;
  BEGIN nw := 0; inword := FALSE;
    WHILE (getc(c) # ENDFILE) DO (* GlobDefs.ENDFILE *)
        (* GlobDefs.BLANK, GlobDefs.NEWLINE, GlobDefs.TAB *)
      IF (c = BLANK) OR (c = NEWLINE) OR (c = TAB) THEN
        inword := FALSE
      ELSIF ~inword THEN
        inword := TRUE; INC(nw)
      END
    END;
    putdec(nw, 1);
    putc(NEWLINE) (* GlobDefs.NEWLINE *)
  END wordcount;

BEGIN wordcount
END wordcount.

Exercise 1-4 asked for the combination of the three counting programs, asking if it was better to have a single program or many. I decided that it was better to have one program, as long as there was a way to separate output, such as a tool like the Unix cut command, or Awk, was available to separate the output. The following was the result:

MODULE wordcount; (*DEE 2015-11-25/2016-08-23*)
  (*See Exercise 1-4 in Software Tools.*)
  FROM GlobDefs IMPORT BLANK, ENDFILE, NEWLINE, TAB, character;
  FROM Prims IMPORT getc, putc;
  FROM Utility IMPORT putdec;

  PROCEDURE wordcount;
    VAR nc, nl, nw: INTEGER; c: character; inword: BOOLEAN;
  BEGIN nl := 0; nw := 0; nc := 0; inword := FALSE;
    WHILE (getc(c) # ENDFILE) DO (* GlobDefs.ENDFILE *)
        (* GlobDefs.BLANK, GlobDefs.NEWLINE, GlobDefs.TAB *)
      IF (c = BLANK) OR (c = NEWLINE) OR (c = TAB) THEN
        inword := FALSE;
        IF (c = NEWLINE) THEN INC(nl) END (* GlobDefs.NEWLINE *)
      ELSIF ~inword THEN inword := TRUE; INC(nw)
      END;
      INC(nc)
    END;
    putdec(nl,8); putdec(nw,8); putdec(nc,8);
    putc(NEWLINE) (* GlobDefs.NEWLINE *)
  END wordcount;

BEGIN wordcount
END wordcount.

detab, entab, and overstrike

The detab and entab commands are the first preprocessor tools, along with include, define, and macro that the book provides. Both Pascal and Modula-2 use the blank character as symbol separator. Though most Pascal and M2 compilers allow tabs as separator, some languages strictly speaking do not allow them (e.g. Fortran). The EBNF scanner of PIM makes it clear that blank refers to a blank character, not formatting control characters. This is also demonstrated in the PL/0 parser and the Oberon/0 GetSym procedure. Though not prohibited, a Pascal, Modula-2, or Oberon compiler could decide to only use the blank character.

detab is also the first to make use of the include preprocessor. It does so nested within the detab procedure, avoiding being declared globally. Where Modula-2's modules provide the equivalent encapsulation, it seemed reasonable to build a module for this purpose, and with the forsight that entab will also use it:

DEFINITION MODULE Tabs; (* DEE 2014-01-22 *)
  CONST MAXLINE = 1000; (* or whatever *)
  TYPE tabtype = ARRAY [1..MAXLINE] OF BOOLEAN;
  VAR tabstops: tabtype;

  PROCEDURE settabs (VAR tabstops: tabtype);
  PROCEDURE tabpos (col: INTEGER; VAR tabstops: tabtype): BOOLEAN;
END Tabs.
IMPLEMENTATION MODULE Tabs;(* DEE 2014-01-22 *)
  CONST TABSPACE = 4;

  (*Software Tools in Pascal, exercise 1-5*)
  PROCEDURE settabs (VAR tabstops: tabtype);
    VAR i: INTEGER;
  BEGIN
    FOR i := 1 TO MAXLINE DO
      tabstops[i] := (i MOD TABSPACE = 1)
    END
  END settabs;

  (*tabpos: return true if col is a tab stop.*)
  PROCEDURE tabpos (col: INTEGER; VAR tabstops: tabtype): BOOLEAN;
  BEGIN
    IF (col > MAXLINE) THEN RETURN TRUE
    ELSE RETURN tabstops[col]
    END
  END tabpos;
END Tabs.

Part of exercise 1-5 was performing boundary tests. When modifying detab for other exercises, it became clear quickly that it was important to preserve the existing behavior while making adjustments, even if the tests weren't perfect. I included files with backspaces and other control characters, and used the Unix cmd and diff commands, as well as in Windows cmd.exe which has a similar comp command. Here's detab in Modula-2:

MODULE detab; (* DEE 2014-01-20 *)
(* detab from Software Tools in Pascal *)
  FROM GlobDefs IMPORT BLANK, ENDFILE, NEWLINE, TAB, character;
  FROM Prims IMPORT getc, putc;
  FROM Tabs IMPORT settabs, tabpos, tabstops;

  (* Convert tabs to equivalent number of blanks. *)
  PROCEDURE detab;
    VAR c: character; col: INTEGER;
  BEGIN settabs(tabstops); (* Set initial tab stops. *)
    col := 1;
    WHILE (getc(c) # ENDFILE) DO (* GlobDefs.ENDFILE *)
      IF (c = TAB) THEN (* GlobDefs.TAB *)
        REPEAT putc(BLANK); INC(col) (* GlobDefs.BLANK *)
        UNTIL (tabpos(col, tabstops))
      (* GlobDefs.NEWLINE *)
      ELSIF (c = NEWLINE) THEN putc(NEWLINE); col := 1
      ELSE putc(c); INC(col)
      END
    END
  END detab;

BEGIN detab
END detab.

Exercise 1-6 asks for the reasonable addition of handling the ASCII backspace character. This requires the addition of a backspace test after the test for NEWLINE (before the final ELSE):

ELSIF (c = BACKSPACE) THEN (* GlobDefs.BACKSPACE *)
  putc(c); IF (col >= 1) THEN DEC(col) END

Exercise 1-7a (misprinted as 1-7e in my January 2001 second printing) requires a code change to detab. The SetTabPos function combines the settabs and tabpos functions:

(*Software Tools in Pascal, exercise 1-7a*)
  PROCEDURE setTabPos (col: INTEGER): BOOLEAN;
  BEGIN
    IF (col > MAXLINE) THEN RETURN TRUE
    ELSIF (col MOD TABSPACE = 1) THEN RETURN TRUE
    ELSE RETURN FALSE
    END
  END setTabPos;
END Tabs.

Here's the change to detab:

BEGIN col := 1;
  WHILE (getc(c) # ENDFILE) DO (* GlobDefs.ENDFILE *)
    IF (c = TAB) THEN (* GlobDefs.Tab *)
      REPEAT putc(BLANK); INC(col) (* GlobDefs.BLANK *)
      UNTIL (setTabPos(col))

This removes setting the initial tab stop, and changes the REPEAT loop's UNTIL test to use setTabPos. The complexity of the program is slightly decreased, as is the build size of the resulting binary. My guess is that the four changes in exercise 1-7 become more obvious in exercises 2-16 and 2-17, after the addition of arguments. Once determined, the desired approach can be added to the Tabs module in chapter 2. Here's entab:

MODULE entab; (*DEE 2014-01-20*)
  FROM GlobDefs IMPORT BLANK, ENDFILE, NEWLINE, TAB, character;
  FROM Prims IMPORT getc, putc;
  FROM Tabs IMPORT settabs, tabstops, tabpos;
  
  (* Software Tools in Pascal's entab. *)
  PROCEDURE entab; (* Replace blanks by tabs and blanks. *)
    VAR c: character; col, newcol: INTEGER;
  BEGIN settabs(tabstops); (* Set initial tab stops. *)
    col := 1;
    REPEAT newcol := col;
      WHILE (getc(c) = BLANK) DO (* GlobDefs.BLANK *)
        INC(newcol);
        IF (tabpos(newcol, tabstops)) THEN
          putc(TAB); (* GlobDefs.TAB *)
          col := newcol
        END
      END;
      WHILE (col < newcol) DO
        putc(BLANK); (* Output leftover blanks. GlobDefs.BLANK *)
        INC(col)
      END;
      IF (c # ENDFILE) THEN (* GlobDefs.ENDFILE *)
        putc(c);
        IF (c = NEWLINE) THEN (* GlobDefs.NEWLINE *)
          col := 1
        ELSE INC(col)
        END
      END
    UNTIL (c = ENDFILE) (* GlobDefs.ENDFILE *)
  END entab;

BEGIN entab
END entab.

As with detab, the entab command doesn't handle backspaces. This is handled with a conditional before ELSE in the final IF statement of the entab procedure:

ELSIF (c = BACKSPACE) THEN (* GlobDefs.BACKSPACE *)
  IF (col >= 2) THEN DEC(col) END

Exercise 2-2 also asks about tab characters. This is handled with a conditional in the middle of the REPEAT loop:

IF (c = TAB) THEN col := newcol END; (* GlobDefs.TAB *)

Ultimately, I decided not to include overstrike in the Modula-2 migration. Though ANSI carriage control is demonstrated in Wirth's texts on Pascal, and in his Pascal program examples, it is no longer relevant with modern computers, and is not mentioned in Wirth's Modula-2 texts. I have rewritten overstrike without Kernighan's primitives, but considering some of the exercises from the book. It can be downloaded from my website: /dee/overstrike/.

compress and expand

Since the putrep procedure is shared by both commands, a separate library module is needed. After the commands it supports, I called it CE:

DEFINITION MODULE CE; (*DEE 2014-08-06*)
(* Shared module of compress/expand from Software Tools. *)
  FROM GlobDefs IMPORT TILDE, character;
  CONST WARNING = TILDE;

  (* putrep: put out representation of run of n 'c's *)
  PROCEDURE putrep(n: INTEGER; c: character);
END CE.
IMPLEMENTATION MODULE CE; (*DEE 2014-08-06*)
  FROM GlobDefs IMPORT TILDE, character;
  FROM Prims IMPORT putc;
  FROM Utility IMPORT min;

  (* Software Tools in Pascal putrep. *)
  PROCEDURE putrep(n: INTEGER; c: character);
    VAR m, ord: INTEGER;
    CONST MAXREP = 26; (* Assuming ISO 646 'A'..'Z' *)
      THRESH = 4; WARNING = TILDE;
  BEGIN ord := 1 + ORD('A');
    WHILE (n >= THRESH) OR ((c = WARNING) & (n > 0)) DO
      putc(WARNING);
      putc(min(n, MAXREP) - ord);
      putc(c); n := n - MAXREP
    END;
    FOR m := n TO 1 BY -1 DO putc(c) END
  END putrep;

END CE

Here's ports of the compress and expand commands:

MODULE compress; (*DEE 2014-08-06*)
  FROM GlobDefs IMPORT ENDFILE, character;
  FROM Prims IMPORT getc, putc;
  FROM CE IMPORT WARNING, putrep;

  (* compress standard input. See expand. *)
  PROCEDURE compress;
    VAR c, lastc: character; n: INTEGER;
  BEGIN n := 1; lastc := getc(lastc);
    WHILE (lastc # ENDFILE) DO
      IF (getc(c) = ENDFILE) THEN
        IF (n > 1) OR (lastc = WARNING) THEN
          putrep(n, lastc)
        ELSE putc(lastc)
        END
      ELSIF (c = lastc) THEN INC(n)
      ELSIF (n > 1) OR (lastc = WARNING) THEN
        putrep(n, lastc); n := 1
      ELSE putc(lastc)
      END;
      lastc := c
    END
  END compress;

BEGIN compress
END compress.
MODULE expand; (*DEE 2014-08-06*)
  FROM GlobDefs IMPORT ENDFILE, character;
  FROM Prims IMPORT getc, putc;
  FROM Utility IMPORT isupper;
  FROM CE IMPORT WARNING;

  (* Uncompress standard input. See compress. *)
  PROCEDURE expand;
    VAR c: character; n, ord: INTEGER;
  BEGIN
    WHILE (getc(c) # ENDFILE) DO
      ord := ORD('A');
      IF (c # WARNING) THEN putc(c)
      ELSIF (isupper(getc(c))) THEN
        n := c - ord; INC(n);
        IF (getc(c) # ENDFILE) THEN
          FOR n := n TO 1 BY -1 DO putc(c) END
        ELSE putc(WARNING);
          putc(n - 1 + ord)
        END
      ELSE putc(WARNING);
        IF (c # ENDFILE) THEN putc(c) END
      END
    END
  END expand;

BEGIN expand
END expand.

One approach to exercise 2-9 is to change the representation of the encoding to all printable characters. This requires changes to the CE library module and to expand to accomodate. The important change here is hiding the values behind the definition of CE. This is a significant improvement:

DEFINITION MODULE CE; (*DEE 2014-08-06/2015-12-02/2018-09-15*)
(* Shared module of compress/expand from Software Tools. *)
  FROM GlobDefs IMPORT TILDE, character;
  CONST WARNING = TILDE;
  VAR FIRST, LAST: INTEGER;

  PROCEDURE isrep(c: character): BOOLEAN;
  PROCEDURE putrep(n: INTEGER; c: character);
END CE.
IMPLEMENTATION MODULE CE; (*DEE 2014-08-06/2015-12-02/2018-09-15*)
  FROM GlobDefs IMPORT TILDE, character;
  FROM Prims IMPORT putc;
  FROM Utility IMPORT min;
  VAR ord: INTEGER;
  
  PROCEDURE isrep(c: character): BOOLEAN;
  BEGIN RETURN (c >= FIRST) AND (c <= LAST)
  END isrep;
  
  (* Software Tools in Pascal putrep. *)
  PROCEDURE putrep(n: INTEGER; c: character);
    VAR m: INTEGER;
    CONST MAXREP = 26; (* Assuming ISO 646 'A'..'Z' *)
      THRESH = 4; WARNING = TILDE;
  BEGIN
    WHILE (n >= THRESH) OR ((c = WARNING) & (n > 0)) DO
      putc(WARNING);
      putc(min(n, MAXREP) - ord);
      putc(c); n := n - MAXREP
    END;
    FOR m := n TO 1 BY -1 DO putc(c) END
  END putrep;

BEGIN FIRST := ORD('A'); LAST := ORD('Z');
  ord := 1 + FIRST;
END CE.
MODULE expand; (*DEE 2014-08-06/2015-12-02/2018-09-15*)
  FROM GlobDefs IMPORT ENDFILE, character;
  FROM Prims IMPORT getc, putc;
  FROM CE IMPORT FIRST, WARNING, isrep;

  (* Uncompress standard input. See compress. *)
  PROCEDURE expand;
    VAR c: character; n: INTEGER;
  BEGIN
    WHILE (getc(c) # ENDFILE) DO
      IF (c # WARNING) THEN putc(c)
      ELSIF (isrep(getc(c))) THEN
        n := c - FIRST; INC(n);
        IF (getc(c) # ENDFILE) THEN
          FOR n := n TO 1 BY -1 DO putc(c) END
        ELSE putc(WARNING);
          putc(n - 1 + FIRST)
        END
      ELSE putc(WARNING);
        IF (c # ENDFILE) THEN putc(c) END
      END
    END
  END expand;

BEGIN expand
END expand.

Now to make the main change for exercise 2-9, update the FIRST and LAST variables in CE to be GlobDefs.EXCLAM and GlobDefs.RBRACE, and then change the value of MAXREP to 93.

echo, crypt, and wordcount

echo is both universal and not. The POSIX standard defines the same behavior as does Software Tools in Pascal, yet both Unix (e.g. -e, -n) and Windows (e.g. .\echo on) have exceptions. Here is echo using Kernighan's primitives. A version with the ISO/IEC 10514-1:1996 libraries can be downloaded from /dee/echo/. As a note, the string type in PIM4 Modula-2 (as opposed to previous versions) must use a null character to terminate the string.

MODULE echo; (*David Egan Evans 2014-10-31/2015-12-05*)
  FROM GlobDefs IMPORT BLANK, MAXSTR, NEWLINE, string;
  FROM Prims IMPORT getarg, putc;
  FROM Utility IMPORT length;

  PROCEDURE echo;
    VAR i, j: INTEGER; argstr: string;
  BEGIN i := 1;
    WHILE (getarg(i, argstr, MAXSTR)) DO
      IF (i > 1) THEN putc(BLANK) END;
      FOR j := 1 TO length(argstr) DO putc(argstr[j]) END;
      INC(i)
    END;
    IF (i > 1) THEN putc(NEWLINE) END
  END echo;

BEGIN echo
END echo.

In Software Tools in Pascal, the echo example replaces crypt as shown in Software Tools. This is because the xor function that Kernighan uses is not possible using standard Pascal. There are two ways to do this in Modula-2. The more efficient, though less portable approach, would be to use Modula-2's BITSET type and the VAL function (to cast between CHAR and INTEGER). If your compiler already has a built-in XOR function, that is likely even better. In the end, I settled on the following in (PIM4) standard Modula-2:

MODULE crypt; (*DEE 2014-03-04/2015-12-05*)
  FROM GlobDefs IMPORT ENDFILE, character, string;
  FROM Prims IMPORT error, getarg, getc, putc;
  FROM Utility IMPORT length;
  
  (* Thanks to Peter De Wachter. See Software Tools exercise 2-18. *)
  (* c := ((NOT b) AND a) OR ((NOT a) AND b); *)
  PROCEDURE xor(a, b: character): character;
    VAR c, k: CARDINAL;
  BEGIN c := 0; k := 1;
    WHILE (a # 0) OR (b # 0) DO
      IF (a MOD 2) # (b MOD 2) THEN
        c := c + k
      END;
      a := a DIV 2; b := b DIV 2; k := k * 2
    END;
    RETURN c
  END xor;

  (* Encrypt/decrypt using bitwise exclusive-or cipher.
    See Software Tools. *)
  PROCEDURE crypt;
    CONST MAXKEY = 256;
    VAR c: character; key: string; i, keylen: CARDINAL;
  BEGIN
    IF getarg(1, key, MAXKEY) THEN
      keylen := length(key); i := 1;
      WHILE (getc(c) # ENDFILE) DO
        putc(xor(c, key[i]));
        i := (i MOD keylen) + 1
      END
    ELSE error('usage: crypt key')
    END
  END crypt;

BEGIN crypt
END crypt.

Following the general example of the translit command, and the example of research Unix (and Plan 9), adding command options and a file argument (though concat of a later chapter can certainly read the files necessary) seemed useful. At the very least, separate options for calling wordcount allows for the deprecation of the charcount and linecount commands:

MODULE wordcount; (*DEE 2015-11-25/2016-08-23/2018-06-02*)
  (*See Exercise 1-4 in Software Tools.*)
  FROM GlobDefs IMPORT BLANK, ENDFILE, NEWLINE, MAXSTR, MINUS, TAB,
    character, string;
  FROM Prims IMPORT error, getarg, getc, putc; 
  FROM Utility IMPORT putdec;
  VAR cmd: string; (* command type *)
    nc, nl, nw: INTEGER;

  (* TODO: these should be able to be combined. *)
  PROCEDURE help;
  BEGIN error('usage: wordcount [ -l | -w | -c ]')
  END help;

  PROCEDURE wordcount;
    VAR c: character; inword: BOOLEAN;
  BEGIN nl := 0; nw := 0; nc := 0; inword := FALSE;
    WHILE (getc(c) # ENDFILE) DO (* GlobDefs.ENDFILE *)
      (* GlobDefs.BLANK, GlobDefs.NEWLINE, GlobDefs.TAB *)
      IF (c = BLANK) OR (c = NEWLINE) OR (c = TAB) THEN
        inword := FALSE;
        IF (c = NEWLINE) THEN INC(nl) END (* GlobDefs.NEWLINE *)
      ELSIF ~inword THEN inword := TRUE; INC(nw)
      END;
      INC(nc)
    END
  END wordcount;

BEGIN
  IF getarg(1, cmd, MAXSTR) THEN (* GlobDefs.MAXSTR *)
    IF (cmd[1] # MINUS) THEN help (* GlobDefs.MINUS *)
    (* PIM4 is superior: PIM3/ISO VAL nonsense. *)
    ELSIF (cmd[2] = VAL(INTEGER, ORD('l'))) THEN wordcount;
      putdec(nl,1)
    ELSIF (cmd[2] = VAL(INTEGER, ORD('w'))) THEN wordcount;
      putdec(nw, 1)
    ELSIF (cmd[2] = VAL(INTEGER, ORD('c'))) THEN wordcount;
      putdec(nc,1)
    ELSE help
    END
  ELSE wordcount; (* If options can be combined, default to 'nw, 1'. *)
    putdec(nl, 8); putdec(nw, 8); putdec(nc, 8)
  END;
  putc(NEWLINE) (* GlobDefs.NEWLINE *)
END wordcount.

Unix, macOS, Linux, and Plan 9 have a wc command superior to the above, but on Windows such a tool is still useful. Though it might be better to port a C version that is undoubtedly more feature complete (especially the UTF-8 Plan 9 version), I wanted to see what this would look like using the ISO/IEC 10514-1:1996[2008] libraries instead of Kernighan's. This can be found at http://oberon07.com/dee/wordcount/. I can't decide which version I like better. Both versions should also be compliant with both PIM 3 and 4, and was tested with GNU Modula-2. The above also compiled and worked correctly with Mocka 1208m.

References

[Ker81]
B. W. Kernighan, Why Pascal is Not My Favorite Programming Language, AT&T Bell Laboratories, Computing Science Technical Report No. 100, 2 April 1981
[KP76]
B. W. Kernighan, P. J. Plauger, Software Tools, Addison-Wesley, 1976
[KP81]
B. W. Kernighan, P. J. Plauger, Software Tools in Pascal, Addison-Wesley, 1981
[KU87]
Michel Kiener, Alfred Ultsch, HOST: An Abstract Machine for Modula-2 Programs, ETH-3161-01, February 1987

©2017-2018 David Egan Evans.