Implementing Software Tools in Modula-2

[Note]: This document is a work in progress as I port the tools from Pascal to Modula-2.

In a previous document, I relate my experience of a first pass through Software Tools in Pascal [KP81], and to some degree Software Tools [KP76], initially using Free Pascal. This document follows my experience with the exercises of Software Tools in Pascal, but in Modula-2.

Modula-2 was chosen because it is the successor to Pascal, and fixes most of the problems of Pascal, (as described by Kernighan in Why Pascal is Not My Favorite Programming Language [Ker81]). However, some of those problems still remain for other reasons. Just as many Pascal compilers implemented the Pascal-p/s subsets, implementing extensions incompatible with the complete language, Modula-2 compilers have subtle differences between its final implementation (PIM4), and the ISO/IEC 10514-1:1996 standard. Many left off on PIM3, before moving to the ISO standard. Incompatibilities remain. MOD and DIV of the ISO/IEC standard are compatible with PIM4, but may or may not be with PIM3. Following Kernighan's Pascal approach of only using the parts of the language that are universal, i.e. consistent with the 1978 report and the ANSI 1983 standard (now INCITS/ISO/IEC 1785:1990[S2008]), a subset of the Modula-2 language must be used. This neuters the use of CASE again (since ELSE, so much desired to fix the Pascal CASE statement, is required in the ISO/IEC language, and thus causes a new incompatibility). Finally, I/O handling is almost worse than Pascal, where everything is now in libraries, no compiler having the same implementation. The ISO/IEC standard made this worse with a baroque, complicated library implemented in an incompatible dialect of Modula-2. Like Pascal, trying to follow the original implementation is not practical with modern/common operating environments, and following the language requires careful use of the final version (PIM4) and compilers that mainly use the ISO/IEC standard.

For now, I have compiled and packaged the Amsterdam Compiler Kit (ACK) onto RHEL 7, along with the Mocka compiler. Unfortunately, these are 32-bit compilers (without support for 64-bit as Free Pascal has), and ACK does not have a separate compilation facility. GNU Modula-2 (gm2) provides a PIM4 compiler which does support 64-bit, perhaps one of the few, but it is not yet part of the mainstream GCC. For macOS, the only known compiler is p1. For the original Macintosh, the MacMETH compiler is in a way the reference compiler from Wirth. For Windows, the ADW (previously Stony Brook) compiler is excellent, or perhaps the now unsupported Logitech compiler.

As a note, an ETH project existed to build the tools primitives on MEDOS as a kind of portable library for Modula-2 called HOST [KU87]. Like the UCSD example in Software Tools in Pascal, it required building a custom interpreter environment, and had an interesting direction it took in supplied primitives. This is likely due to the simplistic MEDOS command-line interface and matching InOut module.

The ISO/IEC 10514-1:1996 standard (a Modula-2 derived language) had multiple, sometimes subtle, differences with PIM3/4. In some ways, it is a seeming hybrid of both. Unlike INCITS/ISO/IEC 7185:1990[S2008] Pascal, ISO/IEC 10514 is not compatible with Wirth's definitions. Due to this, and other extensions and design decisions, the U.S. ANSI deligation rejected ISO/IEC 10514. Programming in Modula-2, Fourth Edition (PIM4), published in English in 1988, and in German in 1992, is the last Modula-2 report. Like the 1978 Pascal Manual and Report, it is the standard for Modula-2, but unlike Pascal a Modula-2 program (any PIM version) will not always work correctly with an ISO/IEC 10514-1:1996 compiler. Only one PIM4 compiler exists that is current: GNU Modula-2 (gm2). I began a couple years ago working against m2c, but found so many bugs in it, that I spent more time patching it than doing anything productive. m2c is also an incomplete implementation of Modula-2. To use it requires working with further restrictions.

Getting Started

Exercise 1-2 of Software Tools in Pascal is easy with the classic Kernighan hello, world in Modula-2. From that link is a directory with a file hello.m2 with various versions that work with different compiler libraries. The following commands are for Mocka (1208m):

$ m2 -c hello; m2 -p hello

Mocka is a curious creature depending on the version used. I have provided Red Hat RPMs of Mocka 9905 and 0608m. (There's also a 0605m.) I used the successor to 0608m: 1208m. This requires that the m2.tgz package be extracted as the root (ID 0) user, and an /etc/profile.d/ file, with the correct variables, be set. Instead of the .mi and .md required extensions, .mod and .def are used. The first time you run the m2 command on a file, it will create a $HOME/m2/ directory with bin/, src/, and out/ directories. Your binaries will be placed in bin/. Author your source files directly in src/. Object code and symbol files will go in out/.

For ACK, I put together an RPM, installed it to Red Hat Enterprise Linux 7, and created a $HOME/src/ack/ directory. In that directory, I copied my hello.m2 file as hello.mod and ran the following command:

$ ack -Rm2-3 -o bin/hello hello.mod

I found that the current 6.1pre1 release did not build for me, so I had to download the source from the git repository. The author graciously provided a pre-packaged tape archive for me to use. A Pascal compiler is also provided, and is mentioned in the Appendix to Software Tools in Pascal.

Finally, the ISO/IEC 10514-1:1996 hello source was compiled on Windows 10 Pro using the ADW (Stony Brook) IDE. I created a project called Tools, created a new program module, and using the text, compiled it and linked it. I pointed the project directory to C:\Users\me\M2-Tools\. In that directory a hello.exe file was created.

Syntax changes in copyprog

The first obvious thing when dealing with moving Pascal programs to Modula-2 is case sensitivity. Following the Mesa language design, upper-case statements, such as BEGIN, should only be used for reserved objects of the compiler. This avoids portability problems between compilers. It also means that begin and Begin are different from BEGIN. The second is that all statements close with END. For instance, if an IF statement has a single condition, END needs to be added (but no BEGIN). If an IF statement has multiple conditions, and thus in Pascal has begin and end, the begin statement needs to be removed. In Software Tools (both versions), upper-case symbolic constants are used to stand out. I decided to stay with this convention, since most of them are provided by the globdefs.p (which I chose to call Globals.def). By calling the module unqualified (e.g. Globals.ENDFILE), reserved object collisions can be avoided.

Comments are now only (* and *): the curly brace comments have to be converted.

FUNCTION and PROCEDURE have been combined, i.e. there is no FUNCTION, but instead a PROCEDURE can now have parameters that it returns using the RETURN statement (instead of assigning to itself the return value).

However, the biggest difference to note is the loss of built-in I/O statements. Instead, this is provided by compiler specific modules. The ISO/IEC 10514-1:1996 modules STextIO and SIOResult can be used with compilers such as ADW, p1, and gm2, but as a whole these modules become more and more complex as the primitives are being handled. Kernighan's primitives really come in handy here and show their merit in assisting portability. Note that Wirth's InOut module design is intended to be interactive, i.e. a single command terminated by a space as provided by Terminal. InOut is supposed to switch between the use of Terminal and Streams, based on context. Modules described in PIM are different between the RT-11 and MEDOS implementations. Specifically, Streams is recommended in PIM as the primary file interface, with Files being the intermediate library (not needing to be standard). The book suggests their combination into FileSystem, as found with MEDOS, but in a later paper, Wirth regretted the combination, and suggested that Streams, with something like Files, really was the best design. (LineDrawing is conspicuously absent from most Modula-2 compilers.) Building modules using SYSTEM might be effective. For more details, see K. N. King's Modula-2: A Complete Guide. As noted in PIM, Wirth anticipated that compiler authors write modules adapted to their environments and machines.

One thought for the sake of Pascal to Modula-2 portability is to build a PascalIO module that implements Pascal's I/O functions. This is similar to a compatibility library in ACK. That way, the sometimes drastic differences in Modula-2 implementations will make the primitives more easily portabile from Pascal, (i.e. push down get, put, read, write, reset, and rewrite another level, and build the primitives against them). Ultimately, the primitives should be built against the most optimal and useful provided modules for efficiency. I rejected the PascalIO approach as not taking advantage of Modula's conceptualization of text and file streams and types: the eof function in Wirth's library is replaced with the Done boolean, which the module returns depending on whether the Read primitives are able to get another CHAR (or INTEGER, or REAL, etc.). Also, other than the provided UCSD and Berkeley Pascal (interpreter) primitives, the book assumed that the needed primitives would be handled with something like the CDC compiler's extern statement: Fortran-based on the CDC, C-based in context of the book's referenced compilers.

The last primary difference with Pascal is everything is a module. Use MODULE instead of program. program file parameters are no longer supported, but are provided through modules (e.g. Files). The input and output built-ins of Pascal are assumed by many compilers, though strictly, InOut switches between Terminal and Streams (the latter of which provides the FILE and TEXT types). Due to MEDOS, and perhaps Algorithms & Data Structures, FileSystem is the more common library than Streams (and Files which it relies on).

The initial result of a Mocka library for copy is:

DEFINITION MODULE Globals; (* DEE 2016-08-23 *)
  TYPE character = [-1..127];
END Globals.
DEFINITION MODULE Prims; (* DEE 2016-08-23 *)
  IMPORT Globals, TextIO;

  PROCEDURE getc(VAR c: Globals.character): Globals.character;
  PROCEDURE putc(c: Globals.character);
END Prims.

MODULE copy; (* DEE 2013-12-11/2016-08-23 *)
  IMPORT Globals;
  FROM Prims IMPORT getc, putc;

    VAR c: Globals.character;
    WHILE (getc(c) # Globals.ENDFILE) DO
  END copy;

BEGIN copy
END copy.

detab and entab

The detab and entab commands are the first preprocessor tools, along with include, define, and macro that the book provides. Both Pascal and Modula-2 use the blank character as symbol separator. Though most compilers allow tabs as separator, some languages strictly speaking do not allow them, (e.g. Fortran). The EBNF scanner of PIM makes it clear that blank refers to a blank character, not formatting control characters. This is also demonstrated in the PL/0 parser and the Oberon/0 GetSym procedure. Though not prohibited, a Pascal, Modula-2, or Oberon compiler could decide to only use the blank character.


Ultimately, I decided to not include overstrike in the Modula-2 migration. Though ANSI carriage control is demonstrated in Wirth's texts on Pascal, and in his Pascal program examples, it is no longer relevant with modern computers, and is not mentioned in Wirth's Modula-2 texts. I have rewritten overstrike without Kernighan's primitives, but considering some of the exercises from the book. It can be downloaded from my website: /dee/overstrike/.

I did work on converting overstrike to Modula-2, but only the version with exercises included, roughly equivalent to the Pascal version:

PROCEDURE overstrike;
    SKIP = BLANK; (* Skip to the next line *)
    PGSKIP = 49; (* Page skip *)
  VAR col, i, newcol: INTEGER; c: character; fflag: BOOLEAN;
BEGIN col := 1; fflag := FALSE;
  REPEAT newcol := col;
    WHILE (getc(c) = BACKSPACE) DO (* Eat backspaces. *)
      newcol := max(newcol - 1, 1)
    IF (newcol < col) THEN
      putc(NEWLINE); (* Start overstrike line. *)
      FOR i := 1 TO newcol - 1 DO putc(BLANK) END;
      col := newcol
    ELSIF (col = 1) AND (c # EOF) AND ~fflag THEN
      putc(SKIP) (* normal line *)
    (* ELSE middle of line *)
    IF (c # EOF) THEN
      IF (c = FF) THEN
        putc(NEWLINE); (*Start page break line.*)
        putc(PGSKIP); col := 1; fflag := TRUE
      ELSE putc(c); (* normal character *)
        fflag := FALSE;
        IF (c = NEWLINE) THEN col := 1
        ELSE INC(col)
  UNTIL (c = EOF)
END overstrike;

echo, crypt, and wordcount

At this point, echo is such a standard, cross platform tool, that I decided, similar to overstrike, to rewrite it without Kernighan's primitives. It can be downloaded from /dee/echo.

I moved the getarg and narg primitives into a module called Options (inspired by the MEDOS module), and have resurrected the crypt command from the original Software Tools. In the Utility module is a portable xor function to be used with it. A more efficient, though less portable approach, would be to use Modula-2's BITSET type and the VAL function (to cast between CHAR and INTEGER. If your compiler already has a built-in XOR function, that is likely even better, though perhaps xor would be a primitive instead of a portabile utility. Perhaps keep the portable version in Utility, and add the more efficient version to Prims.

Here's what the Software Tools in Modula-2 version of echo might look like:

    VAR i, j: INTEGER; argstr: string;
  BEGIN i := 1;
    WHILE (Options.getarg(i, argstr, MAXSTR)) DO
      IF (i > 1) THEN putc(BLANK) END;
      FOR j := 1 TO length(argstr) DO putc(argstr[j]) END;
    IF (i > 1) THEN putc(NEWLINE) END
  END echo;

Once the Options module is available, the charcount, linecount, and wordcount commands can be combined into a single command. I started with a simple procedure that combines the output into a wordcount0 command, but is restricted to the output of all three, and a wordcount that is able to provide individual output as well as combined output. Either way, the command is more efficient together than in parts. A tool like Awk or Unix cut could break up the pieces if not using options. I rewrote this using the ISO standard primitives, but ensuring PIM 4 compatibility:


B. W. Kernighan, Why Pascal is Not My Favorite Programming Language, AT&T Bell Laboratories, Computing Science Technical Report No. 100, 2 April 1981
B. W. Kernighan, P. J. Plauger, Software Tools, Addison-Wesley, 1976
B. W. Kernighan, P. J. Plauger, Software Tools in Pascal, Addison-Wesley, 1981
Michel Kiener, Alfred Ultsch, HOST: An Abstract Machine for Modula-2 Programs, ETH-3161-01, February 1987

©2017-2018 David Egan Evans.