Friday, October 12, 2012

never look back : forward only compilation in b4


What could society, backing up into its future, with eyes fixed only on the ever-receding and less adequate securities of yesterday, do to make this evolutionary process a gratifying rather than a painful experience? 
 -- Buckminster Fuller, Utopia or Oblivion 

Lately, I have been working on an assembler for a small programming language.

It's not meant to be a great language, or even a useful one for most tasks. It's only meant to be used for bootstrapping other systems, and to be as easy as possible to understand.

In particular, I want the code to be arranged very linearly, so that someone reading how it works could follow along and implement it for themselves in a single pass, without having to go back and change things they'd already done, or even to leave blanks that need to be filled in later.

Discussions in #LPMC ( reddit's LearnProgramming mentoring community on irc.freenode.net ) lead me to realize that the way my assembler works is quite like the way instructions work for knitting or crochet. ( Now that I think about it, music notation works the same way, as does the run-length encoding for GIF files )

As JFredett pointed out, these essentially consist of nothing but FOR loops: sections can repeat, but always a fixed number of times. There's never a WHILE x DO ... END or a REPEAT ... UNTIL x; loop.

FOR loops can be replaced just by cutting and pasting the code inside over and over ( at least, they can when the start and end points are constant, which is the case here ), but WHILE and REPEAT both requite a conditional jump.

The difference between WHILE and REPEAT is that with WHILE, the conditional jump is at the beginning, and with REPEAT, the conditional jump is at the end.

Interestingly, when a FOR loop's endpoints are constants, it can be implemented as a REPEAT loop, but when either endpoint is variable, it can only be implemented as a WHILE loop. This is because, in { FOR x := a TO b } you must account for the possibility that A >= B to begin with.

A second way to think about this is that WHILE loops happen 0 or more times and REPEAT loops happen 1 or more times.

A third way to think about this is that in a WHILE loop, you need two jumps: a conditional forward jump at the top, and an unconditional backward jump at the bottom. In a REPEAT loop, you only have the conditional backward jump to attend to.

The reason a constant FOR loop like { FOR x := 1 TO 10 } can be replaced entirely with copies of the loop body ( assuming the loop body doesn't modify x or contain a GOTO (not even BREAK/CONTINUE, which are weak GOTOs )) is because it maps to { x := 1 REPEAT loop_body; INC( x ); UNTIL x = 10; } and the condition is guaranteed to be true at some point. Thus the test could be done at compile time, and the compiler could simply lay out 10 copies of loop_body, eliminating the loop altogether.

All of the languages I've mentioned ( crochet, music notation, GIF files, and my assembler ) are purely sequential languages. They contain no recursion and no conditionals, thus they always terminate and always produce the same output, and therefore can't be used for computation.

Anyway, here is the question I've been thinking about : how can you assemble a program that contains conditions when the language you're writing it in is not capable of moving backward?

That is, in order for my assembler to tell a computer to jump forward, it needs to be able to specify a destination.

One trick is to say: Okay, I'm going to build a program for you that contains way fewer than N instructions. Then, in the very first instruction, I can say "GOTO N".

By the time I have finished assembling, I'll have a better idea where you should really start, so I will commit to leaving a second GOTO instruction at N, pointing you to the real starting point of the program.

If my assembler had access to a stack, I could use this trick for all forward jump instructions: the first jump target goes at N, the second jump target goes at N-W*2, where W is the size of a GOTO instruction. Then the third jump goes at N-2(W*2) and the fourth at N-3(W*2) ... Or in other words, at the J'th jump, you really jump to N-J*W*2, and it has the "proper" address.

I like this scheme. It's a perfectly good scheme for demonstrating what you can do when you have jumps and access to RAM but no conditionals. ( You can't have conditionals with this scheme, because that might allow the jumps to go in a different order. )

However, I don't want my assembler to have access to a stack, or to any memory at all. I'm willing to let it "remember" the one jump at the end, because that particular forward jump could be accomplished just by having the interpreter start at the end of RAM and work backwards until it encounters an instruction. In other words, it's just syntactic sugar.

I cannot think of any other way to do forward jumps without the ability to remember either the position of the jump ( so I can go back and fill it in ) or its order ( so I can leave the targets in order at the end ).

However, if the interpreter itself has access to one bit of internal state, it is possible to implement conditionals without a forward jump.

The bit acts like a switch. When the switch is on, the instruction pointer moves forward normally, executing each instruction. When the switch is off, the pointer moves forward, ignoring all instructions except for one: the instruction to turn the switch back on.

A second bit can be used to control the direction of the pointer, allowing backward jumps.

Since there's not really any use in execution while moving backwards, we can use the first switch to mean something else, giving us four states:

00 - execute forward
01 - ignore forward until marker
10 - ignore backward until marker
11 - ( something else )

If the interpreter is on a track and moves on its own, this is how you want to arrange things.

But suppose the interpreter isn't on a track.

So far, we have talked only about the first three of the four fundamental elements of computation:

  - sequence
  - recursion ( implied in the use of stacks + goto )
  - conditions

The fourth fundamental element of computation is concurrency.

Once there are other processors, no longer need the long track of instructions. You can just put all the processors next to each other, and let them pass messages back and forth.

Maybe some of them have long tapes of instructions inside. Maybe not. It doesn't matter. ( This is called encapsulation. )

But suppose some of the machines DO follow instructions taken from the inside... How will they know to stop doing that and check for messages?

Well, one way is to just assume they were all programmed that way before they were stuck together, so that at some point on their internal tape there's a recurring instruction that says "okay, stop and check for input".

This is fine, unless you're in a hurry, because you never know when a computer is going to stop.

( You've probably encountered this situation in real life on your own computer. Often when it's happening, the operating system makes your mouse cursor look like a little hourglass or a spinning color ball. )

So the other alternative is to have an execution state that means, "stop whatever you're doing, and do what I tell you instead."

To recap:

We can make our assembler purely sequential and forward moving by giving our interpreter(s) the ability to slide forward and backward without executing instructions.

If we also give them the ability to execute instructions directly from input, then we can reprogram them dynamically, in real time.

This implies that, provided they have some means of communication, they can reprogram each other dynamically, in real time... But that's a story for another day.






No comments:

Post a Comment