b4lang: sprawl -- or: oberon, forth, and stackwise refinement

Yesterday I started writing a parser.

Not for B4, of course: in forthlike languages, each word is executed directly, so you don't need a parser, just a tokenizer. ( What's the difference? A tokenizer is a simple state machine, and doesn't require any memory to implement. A parser requires a stack.)

The parser I wrote yesterday is unusual. I wrote it in pascal, using a top-down style. I intended to write it as a recursive descent parser. But then I started refactoring it, and kept abstracting away the part where it actually reads in any characters, or building any data structure.
I think in the end, it's going to wind up looking a lot like SAX - the simple API for XML, where the underlying code just gets a "start" event when the tag opens and an "end" event when the tag closes.

The point though is that I look at it today, and it's a mess. Not because it's actually a mess, but because it's hard to see the mental image I had in my head yesterday of how everything fit together.

It's hard to get back into it, in other words.

This is true for all software, though. At least for me... Perhaps it's a byproduct of my ADHD, but I think maybe everybody deals with this.

I call it the gestalt problem. It's basically this: once you've created a bunch of code to support the main idea, how do you find the main idea in all the code?

Actually, the gestalt problem can be even worse than that: you can create a network of interdependent code so that the main idea is spread out over many objects, and there really is no way to see how it all fits together. Even if you start out coding that core concept, there's a good chance it will get refactored out of existence before too long.
That's great for you as the author, because you understand how it all works... But what happens when the next guy comes along and wants to understand your work?

Hence the word "gestalt"... It means an interconnected, synergystic whole that can't be understood by taking it apart and examining the pieces individually: you have to see it all working together to understand.

The problem is that we rely on this medium called text to encode our ideas about software, and that text has two audiences: the humans who maintain the software, and the computers who execute it.

Sometimes, the two work together. For example, take the programming language Oberon. Oberon is a refinement and simplification of pascal (created by the same person, Dr. Niklaus Wirth).

One of the driving forces of Oberon is simplicity. Oberon was implemented as part of project to build an operating system (also, confusingly, called Oberon). Dr. Wirth and his colleague, Dr. Jürg Gutknecht. These two gentlemen were building an operating from scratch in their spare time over the course of two years, and they just plain didn't have time for a lot of complexity.

So they took Modula 2 (another Pascal descendent, which had added the concept of modules ("units" in turbo pascal)), and stripped it down to the bare minimum feature set. There's no GOTO statement, no string type. No TRY..CATCH. No classes. At the time, there wasn't even a FOR loop.
Actually, it had three looping constructs: REPEAT..UNTIL, WHILE...DO...END, and LOOP..END, with an EXIT somewhere in the middle (like BREAK)... It turns out LOOP is redundant and the latest incarnations of the language get by with just the first two.

Oberon did add two things that pascal didn't have: type extension (subclassing) and garbage collection. That was basically it.

Anyway, Oberon is a sparse language, but it's also one of the most readable code bases I've ever encountered. (The entire system is documented in a book that's now available for free download : ProjectOberon.PDF )

Now when you look at the code, you might not like it:


MODULE example;
  USES Out;

  PROCEDURE count;
    VAR i : INTEGER;
  BEGIN
     i := 99;
     WHILE i > 1 DO
       Out.Int( i );
       Out.String( ' bottles of beer on the wall. ');
       Out.Ln;
     END;
     Out.String( '1 bottle of beer on the wall.');
     Out.Ln;
     (* or for the full song, see :
        http://99-bottles-of-beer.net/language-oberon-504.html *)
  END count;

BEGIN
  Out.String('hello world');
  Out.Ln;
  count;
END example;

If you know pascal, you may be surprised that Oberon is case sensitive. (Pascal came from the days when computers used 6 or 7 bits in a byte and didn't necessarily have upper or lower case letters.) The keywords are always uppercase. The { braces } that pascal uses for comments have been re-purposed for set literals. (Sets are basically bitmaps. In Wirth's languages, you don't do c-style options & opt_a, you say opt_a IN options)... And unlike (turbo) pascal, you'll notice that you have to use the module name explicitly when referring to something in that module.
Also, you'll notice that there's no pascal-style INTERFACE section (or no header file, if you're a C person). However, like all Wirth's languages, it's still possible to write a single-pass compiler, which implies that you are never allowed to refer to a symbol until after the symbol is defined.

This means, for example, that the following code is illegal:


MODULE fail;
  USES Out;

  PROCEDURE wontwork;
  BEGIN
    notdefinedyet;
  END wontwork;

  PROCEDURE notdefinedyet;
  BEGIN
    Out.String('not going to happen.');
    Out.Ln;
  END notdefinedyet;

END fail.

Now, all of this was most likely done to simplify their work. If everything is defined in order, then the compiler can just go from the top of the text to the bottom and compile each item it sees in order. And since you have to specify the module name, there's never any need to search through all the imports to see what defines a symbol. So the compiler can be much simpler. If you see a word, you either know where it is because it has a module name in front of it, or you know you can read backward to see where it was defined.

As a consequence of this, the "imports" relationship between the modules necessarily forms a directed graph: you can never have one module depend on another module that depends on the first module, because the compiler couldn't handle it.

You might consider all of these things to be limitations... But the result is that Oberon is one of the most readable languages I've ever encountered.

But: despite its strengths, Oberon code still suffers from burying the lede... In other words, because in order to say "my house is on fire", you first have to define HOUSE, and then you have to define FIRE... And then just cross your fingers and hope the compiler already understands MY, IS, and ON.

G.K. Chesteron wrote of his rival, George Bernand Shaw:

Many people know Mr. Bernard Shaw chiefly as a man who would write a very long preface even to a very short play. And there is truth in the idea; he is indeed a very prefatory sort of person.

Well, Pascal and Oberon are very prefatory sorts of languages, which is perhaps why they suit me. (I seem to share Mr. Shaw's penchant for preface, if not his astounding talent for the written word. (I couldn't help but wonder why I should call him Mr. Shaw, for example, and not Sir Shaw, as seems fitting. A quick Wikipedia lookup later revealed that he was indeed offered a knighthood, and rejected it.))

But it is the very readability of Dr. Wirth's languages that give rise to the problem I see as I look back at the parser I wrote yesterday. As I said, you're pretty much forced to bury the lede.

What the heck is a lede? It's the fancy newspaper-jargony way of saying "lead" as in "leading paragraph"... That is, the big top part of an inverted pyramid, where all the important stuff is supposed to be when you write for other people.

I'm not writing for other people here. I just started this blog. I know it's public. In the back of my mind I'm already thinking I'll probably share it. I assume there's a you there that I'm writing to. But you should understand that I'm not writing this for you. If I were, I'd have already done you the courtesy of editing this, and have arranged it in a nice inverted pyramid style.

Am I burying the lead now then? On the contrary. I'm digging for it. Sometimes I write just to get the ideas out of my head and into some sort of order. Because it's all tangled up in here, and the linear nature of language forces me to arrange it in some order, or there wouldn't be any sense to it. And hopefully, then, I find whatever idea or point it is I'm looking for.

To paraphrase M. Blaise Pascal himself, my posts are long because I haven't had the time to make them shorter.
So I ramble. And yet, often, when I'm writing code, that makes me very productive. I lost that for a while. Fifteen years or so. Because I switched programming languages, away from pascal.

I'm not sure this property is specific to pascal any more. But it's something perl and python don't have, because they defer symbol resolution until runtime, and it's something C doesn't have because it takes too long to compile. But back in the late 90's, when I learned to program, I used turbo pascal, and it had this property: it compiled very quickly, and then it showed you where your mistakes are.

That is, you press control-f9 to run your code in turbo pascal, and it compiles incredibly quickly even on an old 386 (because the language can be compiled in a single pass, remember?) and either everything worked out okay, and your program ran, or there was a problem, and the IDE takes you directly to the problem.

The only language/IDE pair I know of that works this way today is Visual C#. Even in the free IDE (Visual C# Express), you can write code that calls a function that doesn't (yet) exist, hit compile, and the IDE will take you directly to the piece of code with the problem. Then they have a nice little feature that lets you generate a stub code for the function directly. (PyCharm has something like this, but because of python's dynamic typing, you generally have to actually run the code and trigger a NameError, unless the system can infer the type to know where the method you're calling ought to be. Most likely, then, IntelliJ has the feature for java, but I don't know for sure.)

Anyway, with a feature like this, you can set up a feedback loop, where you write some code to get the basic idea out, and then press the compile or run button and it takes you back to where the problem is, and then you fix the problem, and then it takes you to the next thing.

This sort of feedback loop is incredibly productive. But it comes at a cost: you wind up with code that's completely out of order from the order in which it was built. (And it's not the only culprit: you have the same issue with object oriented programming.)

Now this is where pascal's picky parser comes in handy: in any pascal program, you can start at the bottom to see the main code, then read upward to see the code that it depends on, and so on. Borland's module system, where each UNIT has an INTERFACE section and then a separate IMPLEMENTATION section sort of breaks this property. The original pascal had no modules. ISO pascal added it in 1993, but I'm not yet familiar with how it works. But Oberon's module system addresses it.

Even in Oberon, though, the code is upside down, and the unsuspecting reader has to read the preface first, and then go hunting for the lede. In other languages, the next logical step could be anywhere. Don't bother with a printout. You need a debugger to read the code. Even then, there's no guarantee that there's a "top" anymore. There certainly isn't one in a well-factored object oriented system. Which is what makes OO powerful, and also what causes the gestalt problem.

Donald Knuth (Dr, surely, but that phrase is oddly never attached to his name) solved this problem with literate programming many years ago. His WEB system, written in pascal, even cites the backward nature of the pascal syntax as one of his main reasons for implementing it: he much preferred to write the code in a top down style, and to present the code that way as well.

Reading the oberon book, I often wished that Wirth and Gutknecht had adopted a more literate style. The entire source code of the operating system is presented, but in large swaths at the end of each chapter, rather than inline, the way Knuth approached the problem.

In fact, there is a section early on in Project Oberon that acknowledges the problem (in fact seems to take it as a given). It sticks out in my mind because it rather irritated me when I first read it. (Emphasis is mine):

Implementation of a system proceeds bottom-up. Naturally, because modules on higher levels are clients of those on the lower levels and cannot function without the availability of their imports. Description of a system, on the other hand, is better ordered in the top-down direction. This is because a system is designed with its expected applications and functions in mind. Decomposition into a hierarchy of modules is justified by the use of auxiliary functions and abstractions and by postponing their more detailed explanation to a later time when their need has been fully motivated. For this reason, we will proceed essentially in the top-down direction.

Don't get me wrong. I think Project Oberon is an amazing book. The clarity of the ideas it presents is inspiring. It's 400 pages of everything-you-really-need in a computer system.

Actually, it has one other glaring omission: a few of the base modules are written in assembly language, but they are omitted as the machine they were assembling to was custom-built by Dr. Wirth's team. In recent years, Dr. Wirth implemented a new processor on a widely available $100 FPGA, and, last I heard, is currently updating the book to target it.)

Still, I wish this inherent conflict between top-down design and bottom-up implementation didn't exist.
Why do I care? Because it introduces a disconnect between the design and the implementation. You can read Project Oberon, understand everything in the system, and walk away not knowing how to build an Oberon of your own. You have to go back and re-organize everything in your head so that you can implement it.

And here we have the gestalt problem.

Remember in the code above, how I use the Out module to print out text?

Yeah, the Out module isn't part of the Oberon system. It was added as a concession later for implementors of Oberon compilers on systems like DOS and Windows and Unix that still had a concept of terminals and command lines. The Oberon system has its own text-based user interface that did away with command prompts. It may not have a primitive String type, but it does have a Texts module that works quite a lot like a cross between a buffer in a text editor and the HTML DOM.

The Oberon system was written back in 1985. It expects you to have a three-button mouse, and even though the user interface is almost entirely textual, the mouse is the primary input device. Again, from the book:

The ease and flexibility in the way pieces of text can be selected, moved, copied, and designated as command and as command parameters, drastically reduces the need for typing. The mouse becomes the dominant input device: the keyboard merely serves to input textual data. This is accentuated by the use of so-called tool texts, compositions of frequently used commands, which are typically displayed in the narrower system track of viewers. One simply doesn't type commands! They are usually visible somewhere already. Typically, the user composes a tool text for every project pursued. Tool texts can be regarded as individually configurable private menus.

You would never write to standard output in the Oberon system, because there is no such thing. There are only Texts. (It's somewhat like emacs: everything is just a buffer. The Oberon system feels a lot like a powerful text editor in the same way the Emacs system feels a lot like an operating system.)

So here's this mouse-driven text interface that looks a lot like a GUI:

... But it isn't like any GUI you've ever seen or worked with. It's not just that the windows don't overlap. (Later versions added this (which in my opinion was a mistake.))

The menus don't act like menus. They're more like hyperlinks. The scrollbars don't act like scrollbars.

The three mouse buttons don't act like the mouse buttons you're used to. (You combine them, for example. You can hold down one button and "interclick" another button, and that triggers a different action than just clicking either of the buttons. So it's like each button is also modifier like shift/alt/control, and so you really have more like 12 buttons on your mouse.)

Again, it's like the first time you try emacs: every button you press does something unexpected, and the learning curve (or perhaps unlearning curve?) is rather steep.

My point is, Oberon is this amazing system, and it fits together like clockwork, and yet there's still a great deal of work involved in following in their footsteps.

Or perhaps what I'm saying is that you can't really follow along with them, because they didn't implement it the way they show it to you. You can see what they did, but you still have to read the whole book before you can even take the first step to implementing it yourself.

That's not a flaw, of course. These guys are professors, and so of course they were teaching students in an environment where they have a semester or more to explain everything. They had very strong pedagogical reasons to structure the book the way they did.

And yet, I'm not a student, and I'm not looking to rebuild Oberon. (Though I suspect whatever I do produce here is going to borrow heavily from it.) I'm looking to build my own thing, and I'm looking at their work for guidance. So I'm in a rather small demographic. I can't fault them for the incredible work they did, but given my circumstances and the specific problems I'm personally working on, I would have preferred to take a different approach.

For a long time, I thought the answer was in Dr. Knuth's literate programming. There's some logic to that. But literate programming, at least the way he envisioned it, describes a system the way it is.

To me, what's much more interesting is the question Daniel Quinn asks in his mind-blowing book, Ishmael: the question of how things came to be this way.

Once you understand how a system came to be the way it is, you no longer have a gestalt problem. In the software world, that means: don't just read the source code, read the history of commits in the source code repository, so you can watch the system take shape in front of you.

But even then, you often encounter fully formed systems springing into existence between commits, or more likely an "initial checkin" that contains the complete system from its pre-version-control existence. :)

So I came up with a couple versions of this thing called "trailblazer" over the years, which was basically a literate programming system that also took on the role of version control, so you could "tell the story" of your software as you went along, possibly rewriting the history (or retconning, as they say in the comics business) if you want to spare readers from the things you tried that didn't work.

But the trailblazer approach didn't work for me either, probably for the same reason this post is so long: my mind goes where it goes, and there isn't a swift feedback loop to keep it in check. I didn't pick the screen-name "tangentstorm" randomly, you know. :)

I have been struggling with this problem for a very long time... I kind of dropped off the face of the earth for most of a year, except for an occasional post to /r/learnprogramming on reddit.

And then one day, I was writing something like this to organize my ideas... Only it wasn't like this at all. There wasn't this mental sprawl. I wasn't all over the map.

I used a top-down approach, and I followed a few rules, and I basically created my own internal feedback loop.

I started with the main idea, and then I defined the first sub-idea, and then the first sub-sub idea, and so on, as if I were doing a depth-first walk of a tree.

It had this nice top-down, inverted-pyramid style to it, and it felt a lot like writing a program.

I was so amazed at what I'd done that I decided to "rejoin society," and posted a rather silly announcement that I was going to build a new programming language in time for an upcoming game jam.

Well, that was a month ago, and I've produced neither a game nor the language it was meant to be written in, so I suppose I (once again) look like a flake who bit off more than he could chew.

The trouble was, while the idea worked well in my head when I was writing, it still requires a whole set of tools if I want to apply the idea to programming. I had hoped to knock them out in a day or two, but that was naive.

It's not that they're not simple enough to be built in a day. I just didn't have them clear in my mind, nor did I have all the knowledge I needed to build them. I'm still not sure that I do, but I'm close.

I think it's possible to implement a system that preserves the simplicity of Oberon, but also addresses the bottom up / top-down issue.

First, the notion that code needs to be stored as text is obsolete. Once you move to a smalltalk-style system, where all the code lives in a database, you can arrange it any way you like.

Second, you don't really need a grammar or parser for your language. Forth proves that you can get by with nothing but a lexer (and in forth, that "lexer" is further simplified to breaking input into words whenever the space character is encountered).

Now, whether you might want text and grammar for other reasons is beside the point: you can always add those later. The point is, you can string together a pretty low level programming language that works like this very quickly. Or at least, once you've put the time into understanding it, there's not a lot of actual code involved.

The third step is stackwise refinement. The more familiar term is of course stepwise refinement, the top-down approach originating from that other luminary (you knew his name was going to show up here eventually), Dr. Edsger W. Dijkstra, whose work inspired Dr. Wirth to create Pascal and Dr. Knuth to create WEB. (Dr. Wirth may have coined the term "stackwise refinement" -- he certainly popularized it, and likely had a hand in the development of the concept, but even Wirth's famous article on the subject gives credit to Dijkstra.)

Anyway, stackwise refinement simply takes this same top-down algorithm all these super-smart professors were using, and hard-codes that thinking process into the language.

Which language? Well, the most bottom-up language on the planet, of course: forth.

Now, forth is pretty much a joke language, right? I mean it uses reverse polish notation, so it's basically just lisp without the parentheses, right?

That's kind of how I always thought of it, and pretty much why I never paid it much attention. But for some reason, not too long ago, I happened to read Programming a Problem-Oriented Language, by one Charles H. Moore. There aren't any fancy academic letters next to Chuck Moore's name, but he studied lisp at MIT from John McCarthy himself, and then went on to Stanford for a few years. I'm not sure why I'm talking about academic credentials today (I dropped out after my freshman year) but with all the "Dr Brilliant Professor" names I've been throwing around, I just wanted to point out that Chuck Moore's as smart or smarter than any of them. He's also got an unusual way of thinking about things, and his ideas haven't always gotten the widespread attention they deserve.

Turns out forth isn't anything like a backwards lisp. And it doesn't really have "reverse polish notation". It's just one word after another.

In forth, each word is its own procedure. You don't pass arguments to the procedures, the way you do in most languages, though: it's got the same call stack that every other language has, but then it's got an additional stack for passing data around. So you can say:


  2 2 +

And that sure looks like reverse polish notation, but really it's just a program called "2" that pushes the number 2 onto a stack. (Well, okay, it's actually part of the lexer: whenever forth encounters a word it doesn't recognize, it attempts to interpret the word as a number, and if successful, it pushes that number onto the stack.) Then there's a program called "+" that pops two things off the stack, adds them, and pushes their sum.

The simplicity of forth makes it rather easy to roll your own. Many people have done so, usually starting from assembly language.

Now, you might think that starting from assembly language makes forth code rather unportable, but this is not the case at all. You write a couple simple routines to set up the dictionary of words and the two stacks, add some code to talk to the storage device (think blocks, not files), and then you make a little i/o loop. Now you've abstracted away your machine, and you just have forth, and anything else you build from that point on is portable to any other forth system that has the same dictionary.

Now, not all forths have the same dictionary. There's a saying: if you've seen one forth... you've seen one forth.

But there is an ANSI standard for forth, and there are quite a few old-time forth guys that get pretty annoyed with anyone who makes a non-standard forth. (Fortunately, Chuck Moore isn't one of those guys. He think ANS Forth is too complicated and has moved on to create colorforth, where you basically do your own syntax-highlighting to tell the system whether a word ought to be compiled, defined, or executed, making the "compiler" even simpler than it already was in forth. (I'm not even sure it still qualifies as a lexer anymore: it just executes each character directly, even for identifiers.))

And then there's retroforth. Now, retroforth is also built atop an assembly language, but that assembly language is for a virtual machine called ngaro, and ngaro is written in... Well, everything.

Retroforth started way back in 1998/1999 as a 16-bit x86 assembly language program. Somewhere along the line, leadership of the project passed to the guy that runs it today: Mr. Charles Childers, (though I can't help thinking of him as "@crc[x]", since that's his screen name everywhere). Back in 2006, @crc decided to port Retro to a virtual machine of his own invention, and by 2007, that idea had solidified into the ngaro vm.

Now, ngaro is virtual computer built with forth in mind. It has no registers: only the two stacks. Each instruction is one integer. The first 33 are primitve opcodes, and any other number simply triggers a call to whatever subroutine lives at that particular offset in memory.

But @crc didn't stop at creating an interesting virtual machine: as I said, he ported it to everything.

There's are ngaro implementations in: C; C sharp; javascript; lisp; php; python; ruby; retroforth (of course); x86 assembly language; three different microcontrollers: avr (arduino!), pic, arm; plus partial implementations in fsharp, java, lua, scheme, and perl.

Someone else came along and ported it to go, adding concurrency support.

And of course, I hopped on the bandwagon and ported it to pascal.

Now, as I mentioned, I hadn't written a line of pascal code in about 15 years. Once upon a time, pascal was a contender. It was certainly the language of academia. They taught it in high schools. And for young hobby programmers, it was neck and neck with C, largely due to the incredible success of Borland's Turbo Pascal. The only other choice for a high school kid in those days was GW-BASIC, then later QBASIC, both of which shipped with MS-DOS, and had the advantage of being "free". I have fond memories of both, but Turbo Pascal was what shaped me as a programmer.

Turbo Pascal lives on today, but your machine may not run it. You can download Turbo Pascal 5.5 for free from embarcadero.com. It's built for a 16-bit machine (like a 386) so unless you happen to be on Windows XP or earlier, you'll need dosbox or a virtual machine running freedos to run it.

It's worth taking a look. Version 5.5 was not the last release of turbo pascal, and wasn't the best. Syntax highlighting didn't show up until version 7.0, and version 6.0 is the one where they introduced Turbo Vision, their text-based, object-oriented user interface framework.

Turbo pascal 5.5 had none of that stuff, and that's precisely why it mattered. It shipped with some examples, and 8 units: System, Graph, Crt, Printer, Graph3, Overlay, Turbo3, and Dos.

Only three of those mattered: CRT let you write to the (16-color, 80x25 character) screen, DOS was for getting at files and directories and, and System had the basic pascal words for you. Graph was of course for graphics, but in those days, if you wanted to do graphics, you were better off writing your routines in assembly language (which could be freely embedded in and mixed with turbo pascal.)

When I say turbo pascal 5.5 was the one that mattered, it's because it didn't come with any preconceptions. It did come with three books. One was a users guide. One was a reference which basically duplicated the help text. Those two big books pretty much came straight out of TP 5.0... And then, again, version 5.5 added the one that mattered: the comparatively tiny Turbo Pascal 5.5 Object Oriented Programming Guide (PDF).

See, turbo pascal 5.5 added objects... Which are really just normal records (structs if you're a C guy) with the added bonus of type extension (subtypes).

So in a way, Turbo Pascal 5.5 was quite a bit like Oberon. It didn't have the automatic memory managament stuff, but memory management in pascal is nowhere near as awful as memory management in C, and quite often you could get by without using pointers anyway.

This was all before the web took off, or even before internet service providers.

Instead of going to reddit or blogger or google, you'd dial a local number to ring up a Bulletin Board System - usually running on someone's home computer, hooked up to their second phone line. You could upload and download files from them, and also read and answer messages. (But you were tying up someone else's phone line, remember, so it was kinder if you downloaded the messages to read and reply to off line. (Of course you were also tying up your own phone line, and if your parents or siblings picked up the phone, they'd get an earful of modem squawks and you'd get something that looked like perl all over your screen).

Shareware spread that way: Games like DOOM, DESCENT, Commander Keene, Kingdom of Kroz, and ZZT. [ The developers of these classics, having created their masterpieces in the 90's, now tragically waste their talents on such no-name games like "Quake," "Unreal," "Max Payne," and "Prey,"... DOOM, of course, quickly faded into obscurity. ;) ]

There were apps, too, of course, like DIRMAGIC (for browsing your files), BLUEWAVE (for reading your BBS messages), THEDRAW (for making ANSI art), RIPTerm (for dialing in - though I can't remember if this is actually the one I used). And if you wanted to run your own bulletin board, you could download WildCat! or Renegade, or a bunch of others.

If you had a sound card for your MS-DOS box (or if you were lucky enough to own an Amiga) then you could download a tracker and make your own music. Aside from the amiga, computers didn't have nice polyphonic onboard sound in those days. You know that beep you get when you turn on a PC? You can make it beep at any frequency you want. That was what DOS music sounded like. There were a few cool DOS apps like PIANOMAN that let you simulate multiple voices by switching between frequencies very quickly, but Amigas had onboard polyphonic sound. MOD files and trackers came out of the Amiga world. They let you program all the sound channels simultaneously (one per "track") in a music notation that looked quite a bit like assembly language.

Anyway, I didn't mean to start waxing all nostalgic there. But I'd guess at least half of those programs that you could download off of the bulletin boards were written in Turbo Pascal.

Sometimes, people would even post their code. Looking through my old stuff, I found some ARC and ARJ files (like ZIP files, before PKZIP took over the known universe) with code for: programming the speaker, the sound blaster, assembly code to tweaking the DOS console by making font a bit shorter and cramming an extra 3 lines onto the screen, or even replacing the font altogether. (I made my own font back then! I saw it for the first time in years a few days ago, thanks to the setfont utility that ships with debian. Maybe I'll get it running on a virtual machine and take a screenshot soon).

Also, lots and lots of "demos". These were little animations that people made to show off their LEET SKILLZ. That's kind of a joke these days, maybe. "Script Kiddie" is sort of an insult you throw at guys that are just using canned tools to cause trouble. These guys really knew what they were doing. They'd disassemble your EXE, toss out your copy protection, and stick their own little message up in front of your title screen. (Yeah, you could download things that weren't exactly legal back then, and I can't honestly say I didn't know better.)

BUT: the demo scene sort of turned into its own little subculture, just like the ANSI art scene. Both of these scenes made an art form out of severely constrained working environments: the ansi guys made all kinds of amazing art with sixteen colors and 256 characters, and the demo scene guys were trying to cram these amazing procedurally-generated animations into tiny, tiny footprints. And sometimes they'd share their source code, or parts of it, and you could find it in the download area of a BBS you'd dial into. (Especially, once you found where FidoNet and the like had taken hold in your area code.)

So that's what the online world was like back in the late 90's -- at least for me as a teenager in north Texas -- and so that's how I learned to program: a little from the OOP guide, a little from code I found on the boards, a little from the few programming books I could get my hands on, but mostly from sitting at my computer writing code in turbo pascal, pressing control-f9 and letting the integrated debugger tell me what to fix.

I don't remember spending much time with Borland's example code. I think it was too far beyond my skill level at the time, and I didn't know anything about how to go about understanding other people's code.

So: I did my own thing. I did a lot of reinventing the wheel in those days, because there were a whole lot of wheels that I didn't know had already been invented.

There were no standard user interfaces back then (which is why it makes perfect sense for Oberon to be so weird). Each of those DOS apps, for example, had its own unique user interface. I'm sure some people read the manual, but what I'd do is log in and press all the keyboard buttons to see what happened.

Pretty much everything in the modern standard GUI user interface sprang from the mind of Douglas Englebart, got redesigned and reimagined by Alan Kay in his Smalltalk system at Xerox PARC where Don Ingalls most likely implemented it and Adele Goldberg explained it to the rest of us... Then Steve Jobs showed up, and he handed those ideas to guys like Jef Raskin and Bruce Tognazzini, and those guys had this crazy idea for a standard user experience.

Now, of course, all the cool kids know that MACS SUCK AND DOS RULES 4EVER! But it turns out Bill Gates was secretly a lame Mac guy, and of course he came up with this crazy idea to make DOS look like a mac. (Except, you know, with color screens.)

Probably the writing was already on the wall for Turbo Pascal. As I mentioned, Borland put out TP6.0 with a standardized user interface toolkit called Turbo Vision. When TP7.0 rolled around, MS Windows was already starting to take hold... And by the time windows 95 came around, Borland had killed off Turbo Pascal, and left this windows-oriented thing called Delphi in its place.

At this point, Borland was essentially in the business of competing with Microsoft, when Microsoft was in the business of taking over the world. Borland took a beating, which is why you don't see borland today. If I recall, it became Inprise, then Borland, then Code Gear, and now Embarcadero. Delphi lives on, but it's a high end, big dollar product, not the $35 awesomeware you could pick up at your local Babbage's. (Oh come on. You know the place I'm talking about: Babbage's.)

Anyhow, with Turbo Pascal 5.5, I had objects and an IDE with a rapid feedback cycle, a few books and tutorials, a whole lot of spare time, and direct access to the screen and the speaker (eventually the sound card). But I didn't have a big solid framework to learn from, so I did my own thing.

I read an article somewhere about how to use pointers to make linked lists, and linked lists to make a menu system. So I started using pointers and made a menu system.

One day I made a mistake and wrote some data to an un-initialized pointer, and some data went to the wrong place in memory, and it must have overwritten something important, because I pressed control-f9 and destroyed my hard drive and lost everything I had on my computer.

I knew a lot about recovering files from a hard drive or a corrupted floppy disk. But whatever I did that day, it was way beyond my ability to repair, and after many frustrating hours, I reluctantly gave up on the problem, and typed the saddest of all the DOS commands, FORMAT C:

I lost most of the novel I'd been writing since eighth grade. I lost my little side scrolling space shooter, "MR.DOOM>" (the letters "MR.DOOM>" was a spaceship that flew around the screen and shot things. Basically defender.) I lost my epic video game, DOTH: Quest for an Empire (which was mostly a cool map I'd drawn, along with a bunch of graphic tiles that would actually pan around in a graphic window). I lost my cool little cryptogram program. I lost my fancy new menu system with it's stupid un-initialized pointers. I'm sure there were all kinds of interesting little experiments on that box that I'd written as I learned, but of course I lost them too.

So that was the day I learned to make backups. :)

Still: good times. The year or so that I had to myself, with direct control over my computer with no steady guidance except the rules of structured programming hard-coded into the pascal grammar -- this was the part of my life where I became a good programmer.

I asked to be in the programming class in school. They said I had to take some other thing first. One day I showed DOTH: Quest for an Empire to my math teacher. Her reaction was something along the lines of: why didn't you tell anyone you could already do that? So my senior year, I got to take both of the programming classes simultaneously. I just sat in the BASIC class, did the QBASIC work with everybody else, and then did whatever the pascal work was by myself.

Now, if you haven't gotten the picture yet, I've always been kind of a loner. I'm still pretty quiet in person, but when I sit down to write, the words just come pouring out.

I didn't know I had ADHD back then. I hadn't been shown yet that I was kind of a decent teacher. And -- you can believe this or not -- but I graduated high school having no earthly idea whatsoever that there were grown-ups who got paid to write software.

I always just kind of assumed I'd be a novelist.

Well, all of that was a long time ago, and even if I wasn't the most clued-in kid, I did come out of my shell a bit my senior year.

My junior year, before the hard drive crash, I got to be in this new class called "research and development", which was run by the old shop teacher, who now had a room full of macs, and this new guy they brought in, who had a room full of MS-DOS boxes. (This is how I learned that MACS SUCK AND DOS RULES 4EVER.)

Having taken the class my junior year, I was allowed back my senior year, and allowed to do pretty much my own thing most of the time. As far as I can recall, I never walked into the mac lab at all the second time around, and (after I showed the turbo pascal license to Mr. Burson, I was allowed to bring it into the class.

Now, my hard drive crash happened the summer in between, and I had the 3.5" floppy from junior year, so a handful of old copies of my stuff survived. So I was rebuilding.

Somehow, over the summer, I acquired a copy of Turbo Pascal 6.0.

I know exactly where I got TP5.5: it was at the computer store next to the BX on Carswell Air Force Base (now some joint military base whose name I forget). I'm pretty sure it was around Christmas time, and we were standing in what amounts to a room filled with video games, and I can only assume they were trying to figure out which one I want. I guess they caught on pretty quickly that it wasn't some adventure game about speedy french mathematicians, but I asked them for it, and they bought it for me, and that's how I got turbo pascal 5.5.

I don't have any idea where I got TP6.0. I'd like to think it was a birthday present, but it's just as plausible that I snagged it off a BBS. I didn't even remember that I'd used it, until I dug through my old disks and found it.

I didn't start keeping a written journal until shortly after school started senior year, and I didn't talk too much about programming, but after digging up and sorting through my old turbo pascal source code, I started to piece together what happened.

Clearly I destroyed my hard drive in the summer time, and returned to school already starting to rebuild. I had menus on my mind.

Since I turbo pascal 6.0, I'd already encountered Turbo Vision, but I didn't understand it. TV was an event-based UI framework, just like every other UI framework that I've encountered... Except for two: the clunky one I made, and the awesome one Drs. Wirth and Gutknecht put together for Oberon.

Let me show you what I mean, because the idea that you can have a responsive widget-based UI system without events seems so alien to me today that it took me a long time to see it in Oberon, and an even longer time to see that I'd done something like it back in high school, when I rejected turbo vision and did my own thing, because I'd spent at least two years programming the way turbo pascal wanted me to program, and it sure as heck did not look anything like that.

Here is an example from page 94-95 of the Project Oberon PDF. In the book, it's a block of code, followed by line numbers with explanations. Rather than quoting, I'm paraphrasing their explanations as oberon-like psuedocode just to show the outline of how it works. This is describing the code that powers the little text-editor style windows, along with their scrollbars:


(* This procedure is called because the mouse is over the widget.
 * It could have been passed in from an event loop, but read on
 * and you'll see that's not how Oberon works.
 *)
PROCEDURE Edit( 
  self : Frame;   (* the textarea / window thing *)
  x, y : INT;     (* mouse coordinates *)
  Keys : SET      (* they call mouse buttons "keys" *)
                  (* remember, sets are like bitmasks *)
); 

  (* those were parameters. now we're on to local variables: *)
  VAR
    keysum : SET; (* for detecting mouse button chords/innerclicks *)

BEGIN 
  PollMouseHardware(x, y, Keys); (* procedures can update VAR parameters *)
  IF InsideScrollbar( x, y ) THEN
    IF (Right IN Keys) OR (Middle IN Keys) THEN 
      (*

       * Okay, so we know one of those buttons in pressed while

       * the mouse is over the scroll bar.

       *
       * We'll keep polling the mouse until all buttons are released.

       * This is the only thread in the system, so everything else

       * is blocked while you're dragging the scroll bar marker.
       * 
       * Unless, of course, there's a callback mechanism tucked
       * away in the polling procedure, and those callbacks run

       * really really fast, like in a conventional event loop.
       * 
       * So... Is there??! It doesn't matter. There's no GOTO or

       * CALL/CC, so we know with near certainty that we'll

       * wind up right back here when it's done.

       * In other words, your code doesn't know and doesn't

       * care what the mouse polling procedure does. You write
       * your code as a straightforward imperative loop and
       * there isn't an "event" object anywhere in sight.
       * 
       *)
      REPEAT
        PollMouseHardware(x, y, Keys);
        keysum := keysum + keys;  (* Set intersection. "|=" in c/c++ *)
      UNTIL Keys = { }; (* i.e., the set of pressed keys is empty. *)
      «check keysum to see if they i »
    ELSIF (Left IN Keys) THEN
      «left-click next to a line: it scrolls to top»       
    END
  ELSE (* Mouse must be inside the edit area, not the scrollbar *)
    «select/cut/copy/paste stuff»     
    «hyperlink stuff»     
    «style/formatting stuff» 
  END;
END Edit;

Now, that right there is some alien looking code if you know anything at all about event loops.

Dr. Wirth's programming style is a little more cryptic and mathematical than I like, but the way he takes the basic concepts of structured, imperative programming, and does things like this... It's astounding to me.

The Oberon language may not be your cup of tea. I've tried it, and I like it, but it's not the language I want to write my code in for the rest of my life.

When that game jam came along, a guy showed up in my IRC channel late on the first night, and I told him about retroforth, and then I went to bed. When I woke up, he was there, and he'd spent the night learning about retroforth.

I told him some of this stuff... Like about the gestalt problem and why trying to wrap your head around someone else's ideas is tough, because they're fully formed. (And hard to serialize, which is what causes rambling!)

I remember telling him that even though @crc wasn't the guy that started retroforth, there was no question it was his. Because retro, and the ngaro vm, and parable (a new forthlike he's building, also atop ngaro)... They all work the way that he sees the world.

Pascal works the way the young Niklaus Wirth saw the world. Modula and Modula 2 and the oberon of the book works the way he saw things back then. Now he's officially retired, but he's still writing code and he doesn't need loop with tests in the middle anymore, or the friendlier object-oriented stuff that crept into Oberon-2.

They say that mathematics is a young man's game. All the great math papers seems to be written by guys in their twenties. But the great minds of computer science don't seem to play by those rules. Somehow, they seem to just keep getting better.

We've lost some good ones. John Backus (Fortran, ALGOL, and the guy who kicked off functional programming) died in 2007. John McCarthy (LISP) died last October. Kenneth Iverson was already thinking about APL back world war 2, long before he ever saw a computer - perhaps before a computer had ever been built. He was still reinventing it (as J) when he died in 2004. In 2003, we lost E.F. Codd, who showed us what a relational database was, and then kept on showing us when we used SQL instead.

Then there was this guy, Manfred von Thun, who you maybe haven't heard so much about. He was a professor of logic down in Australia, and he wound up teaching computers and over the years his course evolved into a neat little book about symbolic programming in pascal, which is just a bunch of boring old parser-like stuff that everybody knows right up until the part in the middle where you're reading about higher order recursion and wondering whether you ever knew anything about programming at all before.

He also invented a sort of functional analogue to forth, called joy... Which has since influenced languages like Factor, and retroforth, and you can even see them talking about his ideas over on the forth newsgroups, even if they have no idea who he is.

I decided to email Dr von Thun back in May, to ask about reviving his old Pascal book. So I looked around to see what he was up to these days. The answer kind of shook me, and not knowing what else to do, I wrote a small obituary for him instead.

That one still kind of gets to me, and I don't think I'd even heard of the guy a week or two before. Or perhaps that's why...

Anyway... :/

It's not so much that these guys I'm talking about (young, old, famous or not) made programming languages. That was just a byproduct of what I admire them for. I'd put Richard Stallman in the same box, for the GPL, even though I kind of go out of my way to avoid letting that license near my own code these days... Though I suppose Stallman also created emacs lisp, which (no matter what the lisp people tell you) is the best lisp simply by virtue of the fact that it's the only one with an emacs attached.

The thing these great minds had in common is they dared to be what George I-don't-need-no-stinking-knighthood Bernard Shaw called an unreasonable man:

“The reasonable man adapts himself to the world: the unreasonable one persists to adapt the world to himself. Therefore all progress depends on the unreasonable man.”

Shaw of course needed no language but English to leave his mark on the world, but even he couldn't resist the urge to remake language. He was a longtime advocate of spelling reform and the creation of a phonetic alphabet for English. His will even provided funding for a contest to create a new alphabet.

(If you've taken the time to read this far, perhaps you've encountered the writings of another guy with the last name Shaw. I don't know much about Zed, but the world brings me to his website every once in a while, and I sure know what those symbols mean.)

Now, there are lots of problems for which reshaping language is a perfectly reasonable solution, especially when you're talking about computer languages - which needn't be programming languages: think of all the file formats for storing different kinds of data.

There are all kinds of tools to help you with such tasks. If you don't have one handy already, Terence Parr created an awesome meta-language called ANTLR for making parsers. Alessandro Warth made a programming language called OMeta that's basically all about making programming languages.

But make no mistake: these are also the tools of the unreasonable men (and women!) who won't settle for adapting themselves to the world.

The programmer who decides to making a new programming language as a programmer that has decided to make enemies.

Not only are you saying that what they're doing isn't good enough, it's one more thing they may wind up having to learn to keep up with the job market.

Once they see how you've made their life easier, they may forgive you and even take your side. But right up until you're part of the solution, you're part of the problem, or at least, that's likely how it's going to feel when the angry internet mob shows up in your inbox.

And yet, I know all those guys with those crazy languages have something very few people have: a language that fits their brain exactly, controlling a machine that's capable of doing almost anything you ask of it, if only you can find the words.

So anyway.

Yesterday I started writing a parser.

I was going to tell you about it, but that was about sixteen hours ago and it seems I decided to write some sort of manifesto / memoir / too-long-didn't-read history of computer science as I see it lesson instead.

I guess it turns out I'm kind of an unreasonable guy myself, and not all that great at adapting to the world around me anyway.

So tomorrow I'm going to get back to work on my parser, because that's one of the things you can do when you're being unreasonable and you're ready to start adapting the world to yourself.

Some Oberon Links:

Lukas Mathis wrote a nice article in 2009 about The Oberon User Interface discussing it's history and intellectual origins. (Hint: Wirth and Gutknecht had both spent a year at Xerox PARC)
You can try it out yourself via Bluebottle/A2 - a modern GUI (or perhaps ZUI: it scales on an infinite canvas) operating system written in a dialect of Oberon the language that contains a distribution of Oberon the system. (It'll run as an app on a host operating system in addition to bare hardware.)
Dr. Mike Spivey over at Oxford has implemented the open source and copyfree Oxford Oberon 2 Compiler that uses the "In" and "Out" modules if you want to try it out in a more traditional environment. (His compiler also allows lowercase keywords and relaxes the rules on the ordering of procedures.)
Dr Wirth's home page contains a bunch of papers about Oberon and his RISC-based processor. Also his Compiler Construction book, in which he details a compiler written for a subset of Oberon that targets the RISC machine.

b4lang

Monday, September 24, 2012

sprawl -- or: oberon, forth, and stackwise refinement

“The reasonable man adapts himself to the world: the unreasonable one persists to adapt the world to himself. Therefore all progress depends on the unreasonable man.”

1 comment: