Implementing Software Tools in Modula-2

[Note]: This document is a work in progress as I port the tools from Pascal to Modula-2.

In a previous document, I relate my experience of a first pass through Software Tools in Pascal [KP81], and to some degree Software Tools [KP76], initially using Free Pascal. This document follows my experience with the exercises of Software Tools in Pascal, but in Modula-2.

Modula-2 was chosen because it is the successor to Pascal, and fixes most of the problems of Pascal, (as described by Kernighan in Why Pascal is Not My Favorite Programming Language [Ker81]).

For now, I have compiled and packaged the Amsterdam Compiler Kit (ACK) onto RHEL 7, along with the Mocka compiler. Though these are 32-bit compilers (without support for 64-bit as Free Pascal has). Unfortunately, ACK does not have a separate compilation facility. GNU Modula-2 (gm2) provides a PIM4 compiler which does support 64-bit, perhaps one of the few, but it is not yet part of the mainstream GCC.

As a note, an ETH project existed to build the tools primitives on MEDOS as a kind of portable library for Modula-2 called HOST [KU87]. Like the UCSD example in Software Tools in Pascal, it required building a custom interpreter environment, and had an interesting direction it took in supplied primitives.

The ISO/IEC 10514-1:1996 standard for Modula-2 had multiple, subtle differences with PIM3 and PIM4, in someways a hybrid of both. Unlike INCITS/ISO/IEC 7185:1990 and Pascal, the ISO standard is not compatible with Wirth's definitions. Due to this, and other extensions and design decisions, the U.S. ANSI deligation rejected the Modula-2 standard. I'll stick with Programming in Modula-2, Fourth Edition (PIM4), published in English in 1988, and in German in 1992. Only one PIM4 compiler exists that is current: GNU Modula-2 (gm2). I began a couple years ago working against m2c, but found so many bugs in it, that I spent more time patching it than doing anything productive. m2c is also an incomplete implementation of Modula-2. To use it requires working with restrictions.

Getting Started

Exercise 1-2 of Software Tools in Pascal is easy with the classic Kernighan hello, world in Modula-2. From that link is a directory with a file hello.m2 with various versions that work with different compiler libraries. The following commands are for Mocka Modula-2:

$ m2 -c hello; m2 -p hello

Mocka is a curious creature depending on the version used. I have provided Red Hat RPMs of Mocka 9905 and 0608m. (There's also a 0605m.) I used the successor to 0608m: 1208m. This requires that the m2.tgz package be extracted as the root (ID 0) user, and an /etc/profile.d/m2.sh file, with the correct variables, be set. Instead of the .mi and .md required extensions, .mod and .def are used. The first time you run the m2 command on a file, it will create a $HOME/m2/ directory with bin/, src/, and out/ directories. You binaries will be placed in bin/. Author your source files directly in src/. Object code and symbol files will go in out/.

For ACK, I put together an RPM, installed it to Red Hat Enterprise Linux 7 and created a $HOME/src/ack/ directory. In that directory, I copied my hello.m2 file as hello.mod and ran the following command:

$ ack -Rm2-3 -o bin/hello hello.mod

I found that the current 6.1pre1 release did not build for me, so I had to download the source from the git repository. The author graciously provided a pre-packaged tape archive for me to use.

Finally, the ISO/IEC 10514-1:1996 source was compiled on Windows 10 Pro using the ADW IDE. I created a project called Tools, created a new program module, and using the text, compiled it and linked it. I pointed the project directory to C:\Users\me\M2-Tools\. In that directory a hello.exe file was created.

Syntax changes in copyprog

The first obvious thing when dealing with moving Pascal programs to Modula-2 is case sensitivity. BEGIN, begin, and Begin are different. The second is that all statements close with END. For instance, if an IF statement has a single condition, END needs to be added (but no BEGIN). If an IF statement has multiple conditions, and thus in Pascal has begin and end, the begin statement needs to be removed.

Comments are now only (* and *): the curly brace comments have to be converted.

FUNCTION and PROCEDURE have been combined, i.e. there is no FUNCTION, but instead a PROCEDURE can now have parameters that it returns using the RETURN statement (instead of assigning to itself the return value).

However, the biggest difference to note is the loss of built-in I/O statements. Instead, this is provided by compiler specific modules. The ISO/IEC 10514-1:1996 standardized modules STextIO and SIOResult can be used with compilers such as ADW, p1, and gm2, but as a whole these modules become more and more complex as the primitives are being handled. Kernighan's primitives really come in handy here and show their merit in assisting portability. These can be used with GNU Modula-2. Note that Wirth's InOut module design is intended to be interactive, i.e. a single command terminated by a space as provided by Terminal. InOut is supposed to switch between the use of Terminal and Streams, based on context. Modules described in PIM are different between the RT-11 and MEDOS implementations. Specifically, Streams is recommended in PIM as the primary file interface, with Files being the intermediate library (not needing to be standard). The book suggests their combination into FileSystem, as found with MEDOS, but in a later paper, Wirth regretted the combination, and suggested that Streams with something like Files really was the best design. (LineDrawing is conspicuously absent from most Modula-2 compilers.)

One thought for the sake of Pascal to Modula-2 portability is to build a PascalIO module that implements Pascal's I/O functions. This is similar to a compatibility library in ACK. That way, the sometimes drastic differences in Modula-2 implementations will make the primitives more easily portabile initially, (i.e. push down get, put, read, write, reset, and rewrite another level, and build the primitives against them). Ultimately, the primitives should be built against the most optimal and useful provided modules for efficiency. I rejected the PascalIO approach as not taking advantage of Modula's conceptualization of text and file streams and types: the eof function in Wirth's library is replaced with the Done boolean, which the module returns depending on whether the Read primitives are able to get another CHAR (or INTEGER, or REAL, etc.).

The last primary difference is that everything is a module. Use MODULE instead of program. program file parameters are no longer supported, but are provided through modules (e.g. InOut). The input and output built-ins of Pascal are assumed by many compilers, though strictly, InOut switches between Terminal and Streams (the latter of which provides the FILE and TEXT types).

The result for Mocka is:

DEFINITION MODULE Prims; (* DEE 2016-08-23 *)
  IMPORT TextIO;
  CONST EOF = -1;

  TYPE character = [-1..127];

  PROCEDURE getc(VAR c: character): character;
  PROCEDURE putc(c: character);
END Prims.

MODULE copy; (*DEE 2013-12-11*)
  FROM Prims IMPORT EOF, character, getc, putc;
  (* Software Tools in Pascal, Exercise 1-1. *)

  PROCEDURE copy;
    VAR c: character;
  BEGIN
    WHILE (getc(c) # EOF) DO
      putc(c)
    END
  END copy;

BEGIN copy
END copy.

Code examples and primitives can be found at my website in the /dee/STiM/ directory.

detab and entab

The detab and entab commands are the first preprocessor tools, along with include, define, and macro that the book provides. Both Pascal and Modula-2 use the blank character as symbol separator. Though most compilers allow tabs as separator, some languages strictly speaking do not allow them, (e.g. Fortran). The EBNF scanner of PIM makes it clear that blank refers to a blank character, not formatting control characters. This is also demonstrated in the PL/0 parser and the Oberon/0 GetSym procedure. Though not prohibited, a Pascal, Modula-2, or Oberon compiler could decide to only use the blank character.

overstrike

Ultimately, I decided to not include overstrike in the Modula-2 migration. Though ANSI carriage control is demonstrated in Wirth's texts on Pascal, and in his Pascal program examples, it is no longer relevant with modern computers, and is not mentioned in Wirth's Modula-2 texts. I have rewritten overstrike without Kernighan's primitives, but considering some of the exercises from the book. It can be downloaded from my website: /dee/overstrike/.

I did work on converting overstrike to Modula-2, but only the version with exercises included, roughly equivalent to the Pascal version:

PROCEDURE overstrike;
  CONST FF = 12; NOSKIP = PLUS;
    SKIP = BLANK; (* Skip to the next line *)
    PGSKIP = 49; (* Page skip *)
  VAR col, i, newcol: INTEGER; c: character; fflag: BOOLEAN;
BEGIN col := 1; fflag := FALSE;
  REPEAT newcol := col;
    WHILE (getc(c) = BACKSPACE) DO (* Eat backspaces. *)
      newcol := max(newcol - 1, 1)
    END;
    IF (newcol < col) THEN
      putc(NEWLINE); (* Start overstrike line. *)
      putc(NOSKIP);
      FOR i := 1 TO newcol - 1 DO putc(BLANK) END;
      col := newcol
    ELSIF (col = 1) AND (c # EOF) AND ~fflag THEN
      putc(SKIP) (* normal line *)
    (* ELSE middle of line *)
    END;
    IF (c # EOF) THEN
      IF (c = FF) THEN
        putc(NEWLINE); (*Start page break line.*)
        putc(PGSKIP); col := 1; fflag := TRUE
      ELSE putc(c); (* normal character *)
        fflag := FALSE;
        IF (c = NEWLINE) THEN col := 1
        ELSE INC(col)
        END;
      END
    END
  UNTIL (c = EOF)
END overstrike;

echo, crypt, and wordcount

At this point, echo is such a standard, cross platform tool, that I decided, similar to overstrike, to rewrite it without Kernighan's primitives. It can be downloaded from /dee/echo.

I moved the getarg and narg primitives into a module called Options (inspired by the MEDOS module), and have resurrected the crypt command from the original Software Tools. In the Utility module is a portable xor function to be used with it. A more efficient, though less portable approach, would be to use Modula-2's BITSET type and the VAL function (to cast between CHAR and INTEGER. If your compiler already has a built-in XOR function, that is likely even better, though perhaps xor would be a primitive instead of a portabile utility. Perhaps keep the portable version in Utility, and add the more efficient version to Prims.

Here's what the Software Tools in Modula-2 version of echo might look like:

PROCEDURE echo;
    VAR i, j: INTEGER; argstr: string;
  BEGIN i := 1;
    WHILE (Options.getarg(i, argstr, MAXSTR)) DO
      IF (i > 1) THEN putc(BLANK) END;
      FOR j := 1 TO length(argstr) DO putc(argstr[j]) END;
      INC(i)
    END;
    IF (i > 1) THEN putc(NEWLINE) END
  END echo;

Once the Options module is available, the charcount, linecount, and wordcount commands can be combined into a single command. I started with a simple procedure that combines the output into a wordcount0 command, but is restricted to the output of all three, and a wordcount that is able to provide individual output as well as combined output. Either way, the command is more efficient together than in parts. A tool like Awk or Unix cut could break up the pieces if not using options. I rewrote this using the ISO standard primitives, but ensuring PIM 4 compatibility: http://oberon07.com/dee/wordcount.

References

[Ker81]
B. W. Kernighan, Why Pascal is Not My Favorite Programming Language, AT&T Bell Laboratories, Computing Science Technical Report No. 100, 2 April 1981
[KP76]
B. W. Kernighan, P. J. Plauger, Software Tools, Addison-Wesley, 1976
[KP81]
B. W. Kernighan, P. J. Plauger, Software Tools in Pascal, Addison-Wesley, 1981
[KU87]
Michel Kiener, Alfred Ultsch, HOST: An Abstract Machine for Modula-2 Programs, ETH-3161-01, February 1987

©2017 David Egan Evans.