Software Tools in Modula-2

This document is currently in-progress, started as a supplement to Software Tools in Free Pascal. It relates my considerations for implementing the Software Tools in Modula-2.

A review of Modula-2

In considering Modula-2, understanding the history and motivations of Wirth's change from Pascal is useful. Perhaps of interest was the experience with the VENUS Multi-Access system, and its editor, and issues related to multiprogramming with that environment.

In 1973, the first Modula compiler was written on the ETH CDC machine with its Pascal compiler, then ported to a PDP-11/40 GT44. It appears to be a small language (e.g. no floating point handling), perhaps relying on the boot strap for loading. Software was written to handle the different hardware and peripheral devices, and to run some basic programs that demonstrate the use of its multi-process handling. This was a case insensitive language, similar to Pascal, but adding modules. Modula appears to be implemented as a single text language like Pascal with local modules that replaced the Algol own statement, and had device and interface (process) modules for specific, non-portable handling. At least one 3rd party Modula compiler provided independent or separate compilation.

After exposure to MESA, by 1978 the language became case-sensitive and followed MESA's separate text imports for separate compilation. This was the beginning of Modula-2. Though EBNF was introduced in the Pascal Algorithms book, and a paper in 1977 promoting the benefits of converting BNF to ENBF, the 1977 Modula report seems to be the first place it was used to describe a language. (I have not looked for differences between the 1977 ETH Modula report and that published in Software - Practice and Experience, not having access to the latter.)

The first Modula-2 compiler was developed, based on the Modula compiler. The late 1978 report (#27) seems to be adding back things from Pascal to Modula after adapting changes observed from MESA. Some things stand out: the separate compilation of modules, upper-case reserved words and case-sensitivity, and the switch to co-routines exemplified by the removal of the interface and device module syntax (though still described as such with the Typewriter module example). The addition of things from Pascal is not complete, however. One curious change in the procedures section example now uses log2 with the new CARDINAL type, instead of gcd, likely giving a cleaner example of the new type. The lineinput and trackreservation local module examples from Modula remain, but modified to conform to the new syntax, yet clearly with a hint of direct relationship between the two language versions. ALLOCATE and DEALLOCATE are now standard procedures. The SYSTEM module is provided. The module syntax is still simple, adding only a DEFINITION module for libraries, (i.e. no implimentation module. I can't help but see a form of the 1988, i.e. after 1987 but before the definition browser of 1989, Oberon syntax here.)

The ETH 36 report, in its first (March) and second (December) 1980 editions, was the first to provide a public report on the first compiler used by students at ETH on the DEC PDP-11/40, now implemented on RT-11. The language now looks about like what we're familiar with, the REAL type is added, and the IMPLEMENTATION module now is used separate from the program or main module. The compiler implementation is now described for how it's used on RT-11, and the InOut module is provided built from SYSTEM. Wirth adds InOut, Streams, and ProcessScheduler as portable utility modules. (Jacobi wrote the device specific modules TTIO, Files, and the stack interpreter Loader.)

The main changes to the second edition report seem related to modules and standard procedures. ADR is moved to SYSTEM and ASH is removed, as is ROUND. CHR and ORD are added. FLOAT is restricted to CARDINAL (but will be used with both in PIM4). VAL is now listed with the standard procedures. MathLib has no 0 (what was the 0 about?) and is added for the first time as a standard module.

With the introduction of PIM, the 1980 second edition report goes through what seem like big changes, but in reality only some interpretive clarifications (necessary for other compiler implimentations) are given: the language remains effectively the same. The report is rewritten to separate some parts into specific descriptions as a separate manual, the compiler usage descriptions are removed (though the Lilith compiler port, effectively the same compiler, is described in the Lilith Handbook in a similar way), and the manual portion of PIM now includes differing Lilith module definitions along with the RT-11 ones. LineDrawing and WindowHandler modules are added for the Medos-2 system, (see ETH paper #56). Some modules are renamed, or split. A word for word comparison of PIM with PIM2 suggests that little but the fonts and layout have changed, though the Processes implementation module is fixed from its broken state in PIM, and it introduces text duplication and missing output from an example, (most fixed in PIM3, except the windowing screenshot is moved out of context to the end of a later chapter). Several code bugs exist that are not fixed in PIM2, and only a couple fixed in PIM3.

The changes in PIM3 are effectively described in the ETH paper #59, which also revisits the multiprogramming research of the original Modula but in context of coroutines. The compiler rewrite by Wirth as a single-pass compiler is described in ETH paper #64. A German translation is made of PIM3. The Pascal text of Algorithms + Structures = Programs is separated into two texts by Wirth (in both the German and English versions, the German being the originator). The German text goes through five editions until translated back into English, also apparently by Wirth himself (a sixth edition?). Compilerbau is the last chapter of the original Pascal edition turned into a small manual, which goes through four editions. There is no English translation, but the first English Oberon edition in 1995 is based on it, and perhaps Pascal-S, again by Wirth.

The Ceres-1, the first true 32-bit system with a National Semiconductor chip has the Medos-2 system and Modula-2 compiler ported to it. PIM4 is released describing a couple of changes and clarifications from that port, and is translated into German and printed in 1991. The final edition of MacMETH adopts these changes. The changes from PIM3 to PIM4 are as follows:

From this point, Modula-2 gives way to Oberon, and the controversial ISO/IEC standard is finally finished in 1996, based on PIM3 and picking and choosing from PIM4 changes. Some compilers had already switched to the ISO/IEC standard in 1993/4.

getc/putc

For a port of Software Tools in Pascal, and following Kernighan's approach and suggestions, a compatible subset of the language will be used following Wirth's last report (PIM4 with a German second edition translation) and ISO/IEC standard (10514-1[1996]). This paper takes a different approach from that of Software Tools in Free Pascal. There, the differences needed to make the existing software build with Free Pascal was described, and some technical detail was given. Here, the basics for getting started are documented, and a focus on how Modula-2 fixes the complaints within Pascal are focused on. Unlike with HOST, my goals are to learn from the text in Modula-2, not build a new library for Medos-2.

Wirth's library and the Medos-2 command interface are very simple. Command launching is single command. It fills in the command (so the entire command is not required to be typed), but a space, newline, (or null character), end the command. Options, a switch that uses a / slash character, must be without a space. In some ways, this is perfect for the conditions of Kernighan's tools. However, for getc in the beginning, character buffering is required and can't use InOut (or FileSystem). As the HOST[KU87] paper eth-3161-01 indicates, too many module layers may not be the best in the end creating primitives from primitives. However, to get started, making the program work is the first priority. Optimizing, and perhaps making the primitive more portable, can be done later. Portability is difficult, because Modula-2 has no standard library. The PIM standard (3/4 in English, 1st/2nd in German) libraries are really no longer common. The ISO/IEC standard is a dialect in a way, not merely an interpreted implementation with extensions. Kernighan's libraries see their usefulness as a portable collection.

Here's an example of a (definition) module called ST, with just enough for the copyprog command to work with the copy procedure:

DEFINITION MODULE ST;
    CONST

    (* Universal manifest constants. *)
    ENDFILE = -1;

    TYPE
    character = [-1..127]; (* Byte-sized. ASCII + other stuff. *)

    (* Primitives *)
    PROCEDURE getc(VAR c: character): character;
    PROCEDURE putc(c: character);
END ST.

The getc function return under Modula-2 uses the RETURN statement instead of a reassignment of the variable, as in Pascal or Modula(-1).

charcount

Here is an example of charcount. Note the use of the INC standard procedure, instead of the typical expression nc := nc + 1. Also is Wirth's solution to Algol 68's solution to the dangling IF and DO (e.g. do .. od, if .. fi): all conditionals and loops end with the END statement. Multi-statement blocks are assumed, instead of single statements like in Pascal, so a BEGIN is unnecessary (and tends to only be found in procedure and module definitions).

MODULE charcount;
    IMPORT ST;

    (* charcount: count characters in standard input *)
    PROCEDURE charcount;
    VAR
        nc: CARDINAL;
        c: ST.character;
    BEGIN
        nc := 0;
        WHILE (ST.getc(c) # ST.ENDFILE) DO
            INC(nc)
        END;
        ST.putdec(nc, 1);
        ST.putc(ST.NEWLINE)
    END charcount;

BEGIN charcount
END charcount.

The use of ENDFILE and NEWLINE are qualified to their imported module, thus avoiding capitalized objects for the sake of avoiding reserved words by the compiler, (a Mesa convention). However, this makes portability of the PROCEDURE to code differences less flexible. This is a design choice. Some compilers interpret IMPORT differently. Future code examples here will assume FROM.

With putdec as described in Software Tools in Pascal the Pascal and Modula DIV statement is not used the same in the ISO standard. To keep it portable, I added a divide function to the primitives, and used it. This should be easily modifiable between / and DIV depending on the compiler. This keeps non-portable changes in the primitives, which are then the only code that has to change from compiler to compiler.

The current code can be found at http://oberon07.com/dee/software/ST/, tested on the Windows ADW and GNU/Linux gm2 compilers.

wordcount

wordcount introduces the BLANK and TAB character objects. Though the NOT statement could be used, the ~ operator, introduced in PIM3, resembles a different gliff from earlier times to represent a negation (¬). I decided to use this as it is exclusively used with its successor language Oberon, and is not unusual in other programming languages.

One of the improvements to Pascal, originally explained in the Modula report, is reserving ELSE to the final catch-all of IF with the introduction of ELSIF.

(* wordcount: count words in standard input *)
PROCEDURE wordcount;
VAR
    nw: CARDINAL;
    c: character;
    inword: BOOLEAN;
BEGIN
    nw := 0;
    inword := FALSE;
    WHILE (getc(c) # ENDFILE) DO
        IF (c = BLANK) OR (c = NEWLINE) OR (c = TAB) THEN
            inword := FALSE
        ELSIF ~inword THEN
            inword := TRUE;
            INC(nw)
        END
    END;
    putdec(nw, 1);
    putc(NEWLINE)
END wordcount;

overstrike

Line printer handling had an ANSI standard, also found in Fortran. However, into the 80s, this became less common. Modula-2 had worked mostly with laser printers, which had its own language. Today most printers have their own, far more complicated, printing language, and the ANSI standard has been withdrawn.

overstrike had its simplest examples in Wirth's book Systematic Programming. I spent a bit of time with overstrike to make it more robust from the exercises, having a bit of fun, but it is essentially a relic of the past. The formfeed character constant FF was added to the standard environment, and is not found in Kernighan's book.

putrep

The FOR loop in Modula-2 is different than Pascal, for instance using BY instead of DOWNTO. This is first used in the settabs procedure in detab. However, the putrep procedure used by the compress and expand commands is a good example for one of the differences in Modula-2 from Pascal. (Modula(-1) had no for loop.)

FOR m := n TO 1 BY -1 DO
    putc(c)
END

isupper

The function isupper is discussed in terms of a set, but also a range. Where the syntax of both is different that what is in the book, I found the following the simplest and most efficient:

PROCEDURE isupper(c: character): BOOLEAN;
VAR A, Z: INTEGER;
BEGIN
    A := ORD('A');
    Z := ORD('Z');
    RETURN (c >= A) AND (c <= Z)
END isupper;

crypt, xor, and strings

In Software Tools in Pascal, the echo example replaces crypt as shown in Software Tools. Kernighan indicated the language was incapable of making a portable xor. There are a couple ways to do this in Modula-2. The more efficient, though less portable approach, would be to use Modula-2's BITSET type and the VAL function (to cast between CHAR and INTEGER). If your compiler already has a built-in XOR function, that is better. Perhaps there is a variation on Kernighan's c := ((NOT b) AND a) OR ((NOT a) AND b); approach but in (PIM4) standard Modula-2.

PROCEDURE xor(x, y: INTEGER): INTEGER;
    VAR c: INTEGER;
  BEGIN c := x BXOR y; (* ADW specific function BXOR *)
    RETURN c
  END xor;

In context of strings, Modula-2 adds a standard open parameter for handling any kind of ARRAY type and size (still static). This allows for procedures to be made that handle varying string lengths. It also allows for the compiler to optimize resource size around the parameter being passed, effectively solving the Pascal problem. However, the discussion around how to define a string is still relevant for older versions of Modula. PIM4 expects the null termination of a string, a single character string with null is assignment compatible with a single CHAR, which is a change from past versions of Modula. (The ISO standard appears to follow PIM3 in this respect.)

References

[KP76]
B. W. Kernighan, P. J. Plauger, Software Tools, Addison-Wesley, 1976
[KP81]
B. W. Kernighan, P. J. Plauger, Software Tools in Pascal, Addison-Wesley, 1981
[KR78]
B. W. Kernighan, D. M. Ritchie, The C Programming Language, Prentice-Hall, 1978
[W88]
N. E. Wirth, Programming in Modula-2, 4th Edition, Springer-Verlag, 1988
[KU87]
Michel Kiener, Alfred Ultsch, HOST: An Abstract Machine for Modula-2 Programs, Eidgenoessische Technische Hoschschule Zuerich, Institute fuer Informatik, Report Nr. 73, February 1987

©2017-2020, 2022-2023 David Egan Evans.