IF
statements.
This subject is dear to my heart, because it represents a turning point for me. I had been playing with the parsing of expressions, just as we have done in this series, but I still felt that I was a long way from being able to handle a complete language. After all, real languages have branches and loops and subroutines and all that. Perhaps you've shared some of the same thoughts. Awhile back, though, I had to produce control constructs for a structured assembly preprocessor I was writing. Imagine my surprise to discover that it was far easier than the expression parsing I had already been through. I remember thinking, "Hey! This is easy!" After we've finished this session, I'll bet you'll be thinking so, too.
IF
, 'w' for WHILE
,
etc. But it helps us get the concepts down pat without fussing
over lexical scanning. Fear not ... eventually we'll see
something looking like "real" code.
I also don't want to have us get bogged down in dealing with statements other than branches, such as the assignment statements we've been working on. We've already demonstrated that we can handle them, so there's no point carrying them around as excess baggage during this exercise. So what I'll do instead is to use an anonymous statement, "other", to take the place of the non-control statements and serve as a place-holder for them. We have to generate some kind of object code for them (we're back into compiling, not interpretation), so for want of anything else I'll just echo the character input.
OK, then, starting with yet another copy of the cradle, let's define the word:
-- ------------------------------------------------------------ -- Recognize and translate an "other" : other ( -- ) getname LOCAL s 'OF s 1 emitln ; -- ------------------------------------------------------------
Now include a call to it in the main program, thus:
-- ------------------------------------------------------------ : main ( -- ) init other ; -- ------------------------------------------------------------
Run the program and see what you get (chap5a.frt). Not very exciting, is it? But hang in there, it's a start, and things will get better.
The first thing we need is the ability to deal with more than one statement, since a single-line branch is pretty limited. We did that in the last session on interpreting, but this time let's get a little more formal. Consider the following BNF:
<program> ::= <block> END <block> ::= [ <statement> ]*
This says that, for our purposes here, a program is defined as a
block, followed by an END
statement. A block, in turn, consists
of zero or more statements. We only have one kind of statement,
so far.
What signals the end of a block? It's simply any construct that
isn't an "other" statement. For now, that means only the END
statement.
Armed with these ideas, we can proceed to build up our parser. The code for a program is:
-- ------------------------------------------------------------ -- Parse and translate a program : doprogram ( -- ) block Look 'e' <> IF S" End" expected ENDIF S" END" emitln ; -- ------------------------------------------------------------
Notice that I've arranged to emit an "END" command to the assembler, which sort of punctuates the output code, and makes sense considering that we're parsing a complete program here.
The code for Block
is:
-- ------------------------------------------------------------ -- Recognize and translate a statement block : block ( -- ) BEGIN Look 'e' <> WHILE other REPEAT ; -- ------------------------------------------------------------
(From the form of the word, you just know we're going to be adding to it in a bit!)
OK, enter these routines into your program. Replace the call to
Block
in the main word, by a call to DoProgram
. Now try it
and see how it works (chap5b.frt).
Well, it's still not much, but we're getting closer.
Pascal
or C
. For example, the Pascal
syntax for an IF
is:
IF <condition> THEN <statement>
(where the statement, of course, may be compound).
The C
version is similar:
IF ( <condition> ) <statement>Instead, I'll be using something that looks more like
Ada
:
IF <condition> <block> ENDIF
In other words, the IF
construct has a specific termination
symbol. This avoids the dangling-else of Pascal
and C
and also
precludes the need for the brackets {} or begin-end. The syntax
I'm showing you here, in fact, is that of the language KISS
that
I'll be detailing in later installments. The other constructs
will also be slightly different. That shouldn't be a real
problem for you. Once you see how it's done, you'll realize that
it really doesn't matter so much which specific syntax is
involved. Once the syntax is defined, turning it into code is
straightforward.
Now, all of the constructs we'll be dealing with here involve
transfer of control, which at the assembly-language level means
conditional and/or unconditional branches. For example, the
simple IF
statement
IF <condition> A ENDIF B ....must get translated into
Branch if NOT condition to L A L: B ...
It's clear, then, that we're going to need some more words
to help us deal with these branches. I've defined two of them
below. The word NewLabel
generates unique labels. This is done
via the simple expedient of calling every label '@@nn', where nn
is a label number starting from one. Procedure PostLabel
just
outputs the labels at the proper place.
Here are the two routines:
-- ------------------------------------------------------------ -- Generate a unique label : newlabel ( -- c-addr u ) S" @@" Lcount U>D <# #S #> $+ 1 +TO Lcount ; -- ------------------------------------------------------------ -- Post a label to output : postlabel ( c-addr u -- ) CR TYPE ':' EMIT ; -- ------------------------------------------------------------
Notice that we've added a new global variable, Lcount
, so you
need to change the declarations at the top of the program to
look like this:
0 VALUE Lcount -- label counter
Also, add the following extra initialization to Init
:
: init ( -- ) init CLEAR Lcount ;
(Don't forget that, or your labels can look really strange!)
At this point I'd also like to show you a new kind of notation.
If you compare the form of the IF
statement above with the assembly
code that must be produced, you can see that there are
certain actions associated with each of the keywords in the
statement:
IF: First, get the condition and issue the code for it. Then, create a unique label and emit a branch if false. ENDIF: Emit the label.
These actions can be shown very concisely if we write the syntax this way:
IF <condition> { Condition L = NewLabel Emit(Branch False to L) } <block> ENDIF { PostLabel(L) }
This is an example of syntax-directed translation. We've been doing it all along ... we've just never written it down this way before. The stuff in curly brackets represents the actions to be taken. The nice part about this representation is that it not only shows what we have to recognize, but also the actions we have to perform, and in which order. Once we have this syntax, the code almost writes itself.
About the only thing left to do is to be a bit more specific about what we mean by "Branch if false."
I'm assuming that there will be code executed for <condition> that will perform Boolean algebra and compute some result. It should also set the condition flags corresponding to that result. Now, the usual convention for a Boolean variable is to let 0 represent "false," and anything else (some use -1, some 1) represent "true."
On the x86 the condition flags are set whenever any data is calculated. If the data is a 0 (corresponding to a false condition, remember), the zero flag will be set. The code for "Branch on zero" is JE. So for our purposes here,
JE <=> Branch if false JNE <=> Branch if true
It's the nature of the beast that most of the branches we see will be JE's ... we'll be branching around the code that's supposed to be executed when the condition is true.
IF
-statement parser. In fact, we've almost
already done it! As usual, I'll be using our single-character
approach, with the character 'i' for IF
, and 'e' for ENDIF
(as
well as END
... that dual nature causes no confusion). I'll
also, for now, skip completely the character for the branch
condition, which we still have to define.
The code for DoIf
is:
-- ------------------------------------------------------------
-- Recognize and translate an IF
construct
DEFER block
: doIF ( -- )
'i' match
newlabel $TEMP L
condition
L S" offset NEAR je," $+ emitln
block
'e' match
L postlabel
L $FREE ;
-- ------------------------------------------------------------
Add this routine to your program, and change Block
to reference
it as follows:
-- ------------------------------------------------------------ -- Recognize and translate a statement block :NONAME ( -- ) BEGIN Look 'e' <> WHILE CASE Look 'i' OF doIF ENDOF other ENDCASE REPEAT ; IS block -- ------------------------------------------------------------
Notice the reference to word Condition
. Eventually, we'll
write a routine that can parse and translate any Boolean condition
we care to give it. But that's a whole installment by
itself (the next one, in fact). For now, let's just make it a
dummy that emits some text. Write the following routine:
-- ------------------------------------------------------------ -- Parse and translate a boolean condition -- This version is a dummy : condition ( -- ) S" <condition>" emitln ; -- ------------------------------------------------------------
Insert this word in your program just before DoIf
. Now run
the program (chap5c.frt). Try a string like
aibece
As you can see, the parser seems to recognize the construct and
inserts the object code at the right places. Now try a set of
nested IF
's, like
aibicedefe
It's starting to look real, eh?
Now that we have the general idea (and the tools such as the
notation and the words NewLabel
and PostLabel
), it's a piece
of cake to extend the parser to include other constructs. The
first (and also one of the trickiest) is to add the ELSE
clause
to IF
. The BNF is
IF <condition> <block> [ ELSE <block>] ENDIF
The tricky part arises simply because there is an optional part, which doesn't occur in the other constructs.
The corresponding output code should be
<condition> L1 offset NEAR je, <block> L2 offset NEAR jmp, L1: <block> L2: ...
This leads us to the following syntax-directed translation:
IF <condition> { L1 = NewLabel L2 = NewLabel Emit(L1 offset NEAR je,) } <block> ELSE { Emit(L2 offset NEAR jmp,) PostLabel(L1) } <block> ENDIF { PostLabel(L2) }
Comparing this with the case for an ELSE
-less IF
gives us a clue
as to how to handle both situations. The code below does it.
(Note that I use an 'l' for the ELSE
, since 'e' is otherwise
occupied):
-- ------------------------------------------------------------ -- Recognize and translate an IF construct : doIF ( -- ) 0 0 LOCALS| L2 L1 | 'i' match condition newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP L1 COUNT DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L1 COUNT S" offset NEAR je," $+ emitln block Look 'l' = IF 'l' match L2 FREE ?ALLOCATE newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L2 COUNT S" offset NEAR jmp," $+ emitln L1 COUNT postlabel block ENDIF 'e' match L2 COUNT postlabel L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- ------------------------------------------------------------
(In addition to this, modify block to not only exit-test for 'e', but also for 'l').
There you have it. A complete IF
parser/translator, in 19 lines
of code.
Give it a try now (chap5d.frt). Try something like
aiblcede
Did it work? Now, just to be sure we haven't broken the ELSE
-less case, try
aibece
Now try some nested IF
's. Try anything you like, including some
badly formed statements. Just remember that 'e' is not a legal
"other" statement.
WHILE
statement is
WHILE <condition> <block> ENDWHILE
I know, I know, we don't really need separate kinds of terminators
for each construct ... you can see that by the fact that
in our one-character version, 'e' is used for all of them. But I
also remember many debugging sessions in Pascal
, trying to track
down a wayward END
that the compiler obviously thought I meant to
put somewhere else. It's been my experience that specific and
unique keywords, although they add to the vocabulary of the
language, give a bit of error-checking that is worth the extra
work for the compiler writer.
Now, consider what the WHILE
should be translated into. It
should be:
L1: <condition> L2 offset NEAR je, <block> L1 offset NEAR jmp, L2:
As before, comparing the two representations gives us the actions needed at each point.
WHILE { L1 = NewLabel PostLabel(L1) } <condition> { Emit(L2 offset NEAR je,) } <block> ENDWHILE { Emit(L1 offset NEAR jmp,) PostLabel(L2) }
The code follows immediately from the syntax:
-- ------------------------------------------------------------ -- Parse and translate a while statement : doWHILE ( -- ) 0 0 LOCALS| L2 L1 | 'w' match newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L1 COUNT postlabel condition L2 COUNT S" offset NEAR je," $+ emitln block 'e' match L1 COUNT S" offset NEAR jmp," $+ emitln L2 COUNT postlabel L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- ------------------------------------------------------------
Since we've got a new statement, we have to add a call to it
within the word Block
:
-- ------------------------------------------------------------ -- Recognize and translate a statement block : block ( -- ) BEGIN Look 'e' <> Look 'l' <> AND WHILE CASE Look 'i' OF doIF ENDOF 'w' OF doWHILE ENDOF other ENDCASE REPEAT ; -- ------------------------------------------------------------
No other changes are necessary.
OK, try the new program (chap5e.frt).
Note that this time, the <condition>
code is inside the upper label, which is just where we wanted it.
Try some nested loops. Try some loops within IF
's, and some IF
's
within loops. If you get a bit confused as to what you should
type, don't be discouraged: you write bugs in other languages,
too, don't you? It'll look a lot more meaningful when we get
full keywords.
I hope by now that you're beginning to get the idea that this really is easy. All we have to do to accomodate a new construct is to work out the syntax-directed translation of it. The code almost falls out from there, and it doesn't affect any of the other routines. Once you've gotten the feel of the thing, you'll see that you can add new constructs about as fast as you can dream them up.
IF
and the WHILE
, is sufficient to write structured
code. But we're on a roll now, so let's richen up the
repertoire a bit.
This construct is even easier, since it has no condition test at
all ... it's an infinite loop. What's the point of such a loop?
Not much, by itself, but later on we're going to add a BREAK
command, that will give us a way out. This makes the language
considerably richer than Pascal
, which has no break, and also
avoids the funny while(1)
or WHILE TRUE
of C
and Pascal
.
The syntax is simply
LOOP <block> ENDLOOP
and the syntax-directed translation is:
LOOP { L = NewLabel PostLabel(L) } <block> ENDLOOP { Emit(L offset NEAR jmp, }
The corresponding code is shown below. Since I've already used
'l' for the ELSE
, I've used the last letter, 'p', as the
"keyword" this time.
-- ------------------------------------------------------------ -- Parse and translate a loop statement : doLOOP ( -- ) 0 LOCAL L 'p' match newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L PACK DROP L COUNT postlabel block 'e' match L COUNT S" offset NEAR jmp," $+ emitln L FREE ?ALLOCATE ; -- ------------------------------------------------------------
When you insert this routine, don't forget to add a line in Block
to call it. (chap5f.frt)
Here's one construct that I lifted right from Pascal
. The syntax is
REPEAT <block> UNTIL <condition> ,
and the syntax-directed translation is:
REPEAT { L = NewLabel PostLabel(L) } <block> UNTIL <condition> { Emit(L offset NEAR je,) }
As usual, the code falls out pretty easily:
-- ------------------------------------------------------------ -- Parse and translate a REPEAT statement : doREPEAT ( -- ) 0 LOCAL L 'r' match newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L PACK DROP L COUNT postlabel block 'u' match condition L COUNT S" offset NEAR je," $+ emitln L FREE ?ALLOCATE ; -- ------------------------------------------------------------
As before, we have to add the call to DoRepeat
within Block
.
This time, there's a difference, though. I decided to use 'r'
for REPEAT
(naturally), but I also decided to use 'u' for UNTIL
.
This means that the 'u' must be added to the set of characters in
the while-test. These are the characters that signal an exit
from the current block ... the "follow" characters, in compiler
jargon.
-- ------------------------------------------------------------ -- Recognize and translate a statement block : block ( -- ) BEGIN Look 'e' <> Look 'l' <> AND Look 'u' <> AND WHILE CASE Look 'i' OF L doIF ENDOF 'w' OF doWHILE ENDOF 'p' OF doLOOP ENDOF 'r' OF doREPEAT ENDOF other ENDCASE REPEAT ; -- ------------------------------------------------------------
FOR
loop is a very handy one to have around, but it's a bear
to translate. That's not so much because the construct itself is
hard ... it's only a loop after all ... but simply because it's
hard to implement in assembly language. Once the code is
figured out, the translation is straightforward enough.
C
fans love the FOR
-loop of that language (and, in fact, it's
easier to code), but I've chosen instead a syntax very much like
the one from good ol' BASIC
:
FOR <ident> = <expr1> TO <expr2> <block> ENDFOR
The translation of a FOR
loop can be just about as difficult as
you choose to make it, depending upon the way you decide to
define the rules as to how to handle the limits. Does expr2 get
evaluated every time through the loop, for example, or is it
treated as a constant limit? Do you always go through the loop
at least once, as in FORTRAN
, or not? It gets simpler if you
adopt the point of view that the construct is equivalent to:
<ident> = <expr1> TEMP = <expr2> WHILE <ident> <= TEMP <block> ENDWHILE
Notice that with this definition of the loop, <block> will not be executed at all if <expr1> is initially larger than <expr2>.
The x86 code needed to do this is trickier than anything we've done so far. I had a couple of tries at it, putting both the counter and the upper limit on the stack, both in registers, etc. I finally arrived at a hybrid arrangement, in which the loop counter is in memory (so that it can be accessed within the loop), and the upper limit is on the stack. The translated code came out like this:
<ident> get name of loop counter <expr1> get initial value 1 b# -> eax sub, predecrement it eax -> ident dword-ptr mov, save it <expr2> get upper limit eax push, save it on stack L1: ident dword-ptr -> eax mov, address loop counter to EAX 1 b# -> eax add, bump the counter eax -> ident dword-ptr mov, save new value [esp] -> eax cmp, check for range L2 offset NEAR jg, skip out if D0 > (SP) <block> L1 offset NEAR jmp, loop for next pass L2: [esp 4 +] -> esp lea, clean up the stack
Wow! That seems like a lot of code ... the line containing <block> seems to almost get lost. But that's the best I could do with it. I guess it helps to keep in mind that it's really only sixteen words, after all. If anyone else can optimize this better, please let me know.
Still, the parser routine is pretty easy now that we have the code:
-- ------------------------------------------------------------ -- Parse and translate a for statement : doFOR ( -- ) 0 0 0 LOCALS| name L2 L1 | 'f' match getname TO name newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP '=' match expression S" 1 b# -> eax sub," emitln S" eax -> " name CHAR-APPEND S" dword-ptr mov," $+ emitln expression S" eax push," emitln L1 COUNT postlabel S" dword-ptr -> eax mov," name CHAR-PREPEND emitln S" 1 b# -> eax add, eax -> " name CHAR-APPEND S" dword-ptr mov," $+ emitln S" [esp] -> eax cmp," emitln L2 COUNT S" offset NEAR jg," $+ emitln block 'e' match L1 COUNT S" offset NEAR jmp," $+ emitln L2 COUNT postlabel S" [esp 4 +] -> esp lea," emitln L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- ------------------------------------------------------------
Since we don't have expressions in this parser, I used the same
trick as for Condition
, and wrote the routine
-- ------------------------------------------------------------ -- Parse and translate an expression -- This version is a dummy : expression ( -- ) S" <expr>" emitln ; -- ------------------------------------------------------------
Give it a try. Once again, don't forget to add the call in
Block
(chap5g.frt). Since we don't have any input for the dummy version of
Expression
, a typical input line would look something like
afi=bece
Well, it does generate a lot of code, doesn't it? But at least it's the right code.
FOR
loop. The
reason for all the code above is the need to have the loop
counter accessible as a variable within the loop. If all we need
is a counting loop to make us go through something a specified
number of times, but don't need access to the counter itself,
there is a much easier solution. The x86 has "decrement and
branch nonzero" instructions built in which are ideal for counting.
For good measure, let's add this construct, too. This will be
the last of our loop structures.
The syntax and its translation is:
DO <expr> { L = NewLabel PostLabel(L) Emit( eax push, ) } <block> ENDDO { Emit( eax pop, ) Emit( 1 b# -> eax sub, L offset NEAR jnz, ) }
That's quite a bit simpler! The loop will execute <expr> times. Here's the code:
-- ------------------------------------------------------------ -- Parse and Translate a DO Statement : doDO ( -- ) 0 LOCAL L 'd' match newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L PACK DROP expression L COUNT postlabel S" eax push," emitln block 'e' match S" eax pop," emitln S" 1 b# -> eax sub," emitln L COUNT S" offset NEAR jnz," $+ emitln L FREE ?ALLOCATE ; -- ------------------------------------------------------------
I think you'll have to agree, that's a whole lot simpler than the
classical FOR
. Still, each construct has its place.
BREAK
statement to accompany LOOP
. This
is one I'm sort of proud of. On the face of it a BREAK
seems
really tricky. My first approach was to just use it as an extra
terminator to Block
, and split all the loops into two parts, just
as I did with the ELSE
half of an IF
. That turns out not to
work, though, because the BREAK
statement is almost certainly not
going to show up at the same level as the loop itself. The most
likely place for a BREAK
is right after an IF
, which would cause
it to exit to the IF
construct, not the enclosing loop. Wrong.
The BREAK
has to exit the inner LOOP
, even if it's nested down
into several levels of IF
s.
My next thought was that I would just store away, in some global variable, the ending label of the innermost loop. That doesn't work either, because there may be a break from an inner loop followed by a break from an outer one. Storing the label for the inner loop would clobber the label for the outer one. So the global variable turned into a stack. Things were starting to get messy.
Then I decided to take my own advice. Remember in the last session when I pointed out how well the implicit stack of a recursive descent parser was serving our needs? I said that if you begin to see the need for an external stack you might be doing something wrong. Well, I was. It is indeed possible to let the recursion built into our parser take care of everything, and the solution is so simple that it's surprising.
The secret is to note that every BREAK
statement has to occur
within a block ... there's no place else for it to be. So all we
have to do is to pass into Block
the exit address of the
innermost loop. Then it can pass the address to the routine that
translates the break instruction. Since an IF
statement doesn't
change the loop level, word DoIf
doesn't need to do anything
except pass the label into its blocks (both of them).
Since loops do change the level, each loop construct simply
ignores whatever label is above it and passes its own exit label along.
All this is easier to show you than it is to describe. I'll
demonstrate with the easiest loop, which is LOOP
:
-- ------------------------------------------------------------ -- Parse and Translate a LOOP Statement : doLOOP ( -- ) 0 0 LOCALS| L2 L1 | 'p' match newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L1 COUNT postlabel L2 block 'e' match L1 COUNT S" offset NEAR jmp," $+ emitln L2 COUNT postlabel L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- ------------------------------------------------------------
Notice that DoLoop
now has two labels, not just one.
The second is to give the BREAK
instruction a target to jump to. If there
is no BREAK
within the loop, we've wasted a label and cluttered
up things a bit, but there's no harm done.
Note also that Block
now has a parameter, which for loops will
always be the exit address. The new version of Block
is:
-- ------------------------------------------------------------ -- Recognize and Translate a Statement Block : block ( label -- ) LOCAL L BEGIN Look 'e' <> Look 'l' <> AND Look 'u' <> AND WHILE CASE Look 'i' OF L doIF ENDOF 'w' OF doWHILE ENDOF 'p' OF doLOOP ENDOF 'r' OF doREPEAT ENDOF 'f' OF doFOR ENDOF 'd' OF doDO ENDOF 'b' OF L doBREAK ENDOF other ENDCASE REPEAT ; -- ------------------------------------------------------------
Again, notice that all Block
does with the label is to pass it
into DoIf
and DoBreak
. The loop constructs don't need it,
because they are going to pass their own label anyway.
The new version of DoIf
is:
-- ------------------------------------------------------------ -- Recognize and translate an IF construct DEFER block ( L -- ) : doIF ( label -- ) 0 0 LOCALS| L2 L1 L | 'i' match condition newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP L1 COUNT DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L1 COUNT S" offset NEAR je," $+ emitln L block Look 'l' = IF 'l' match L2 FREE ?ALLOCATE newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L2 COUNT S" offset NEAR jmp," $+ emitln L1 COUNT postlabel L block ENDIF 'e' match L2 COUNT postlabel L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- ------------------------------------------------------------
Here, the only thing that changes is the addition of the
parameter to word Block
. An IF
statement doesn't change the
loop nesting level, so DoIf
just passes the label along. No
matter how many levels of IF
nesting we have, the same label will
be used.
Now, remember that DoProgram
also calls Block
, so it now needs to
pass it a label. An attempt to exit the outermost block is an
error, so DoProgram
passes a null label which is caught by
DoBreak
:
-- ------------------------------------------------------------ -- Recognize and translate a BREAK : doBREAK ( label -- ) 'b' match DUP C@ 0= IF S" No loop to break from" aborts ENDIF COUNT S" offset NEAR jmp," $+ emitln ; -- ------------------------------------------------------------ -- Parse and translate a program : doprogram ( -- ) C" " block Look 'e' <> IF S" End" expected EXIT ENDIF S" END" emitln ; -- ------------------------------------------------------------
That almost takes care of everything. Give it a try, see if you
can "break" it <pun>. Careful, though. By this time we've used
so many letters, it's hard to think of characters that aren't now
representing reserved words. Remember: before you try the
program, you're going to have to edit every occurence of Block
in
the other loop constructs to include the new parameter. Do it
just like I did for LOOP
.
I said almost above. There is one slight problem: if you take a
hard look at the code generated for DO
, you'll see that if you
break out of this loop, the value of the loop counter is still
left on the stack. We're going to have to fix that! A shame ...
that was one of our smaller routines, but it can't be helped.
Here's a version that doesn't have the problem:
-- ------------------------------------------------------------ -- Parse and translate a DO statement : doDO ( -- ) 0 0 LOCALS| L2 L1 | 'd' match newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP expression L1 COUNT postlabel S" eax push," emitln L2 block 'e' match S" eax pop," emitln S" 1 b# -> eax sub," emitln L1 COUNT S" offset NEAR jnz," $+ emitln S" [esp -4 +] -> esp lea," emitln L2 COUNT postlabel S" [esp 4 +] -> esp lea," emitln L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- ------------------------------------------------------------
The two extra instructions, the LEA -4 and LEA +4, take care of leaving the stack in the right shape.
FOR
loop, it was pretty
easy to do. Even that one was tricky only because it's tricky in
assembly language.
I'll conclude this session here. To wrap the thing up with a red
ribbon, we really should have a go at having real keywords
instead of these mickey-mouse single-character things. You've
already seen that the extension to multi-character words is not
difficult, but in this case it will make a big difference in the
appearance of our input code. I'll save that little bit for the
next installment. In that installment we'll also address Boolean
expressions, so we can get rid of the dummy version of Condition
that we've used here. See you then.
For reference purposes, here is the completed parser for this session:
-- Variable declarations -------------------------------------------------------------------------------------------------- 0 VALUE Look -- lookahead character 0 VALUE Lcount -- label counter -- Tools ------------------------------------------------------------------------------------------------------------------ : getchar ( -- ) EKEY TO Look ; -- read new character from input stream : error ( c-addr u -- ) CR ^G EMIT ." Error: " TYPE ." ." ; -- report an error : aborts ( c-addr u -- ) error ABORT ; -- report error and halt : expected ( c-addr u -- ) S" Expected" $+ aborts ; -- report what was expected : alpha? ( char -- tf ) >UPC 'A' 'Z' 1+ WITHIN ; -- recognize an alpha character : digit? ( char -- tf ) '0' '9' 1+ WITHIN ; -- recognize a decimal digit : emits ( c-addr u -- ) Tab EMIT TYPE ; -- output a string with tab : emitln ( c-addr u -- ) CR emits ; -- output a string with tab and crlf : init ( -- ) CLEAR Lcount CR getchar ; -- initialize -- Specifics -------------------------------------------------------------------------------------------------------------- -- match a specific input character : match ( char -- ) DUP Look = IF DROP getchar ELSE S" `" ROT CHAR-APPEND &' CHAR-APPEND expected ENDIF ; -- get an identifier : getname ( -- c-addr u ) Look alpha? 0= IF S" Name" expected ENDIF Look >UPC S" " ROT CHAR-APPEND getchar ; -- get a number : getnum ( -- char ) Look digit? 0= IF S" Integer" expected ENDIF Look getchar ; : other ( -- ) getname emitln ; -- generate a unique label : newlabel ( -- c-addr u ) S" @@" Lcount U>D <# #S #> $+ 1 +TO Lcount ; -- post a label to output : postlabel ( c-addr u -- ) CR TYPE ':' EMIT ; DEFER block -- parse and translate a boolean condition : condition ( -- ) S"" emitln ; -- parse and translate a math expression : expression ( -- ) S" " emitln ; -- recognize and translate an IF construct : doIF ( label -- ) 0 0 LOCALS| L2 L1 L | 'i' match condition newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP L1 COUNT DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L1 COUNT S" offset NEAR je," $+ emitln L block Look 'l' = IF 'l' match L2 FREE ?ALLOCATE newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L2 COUNT S" offset NEAR jmp," $+ emitln L1 COUNT postlabel L block ENDIF 'e' match L2 COUNT postlabel L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- recognize and translate a WHILE construct : doWHILE ( -- ) 0 0 LOCALS| L2 L1 | 'w' match newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L1 COUNT postlabel condition L2 COUNT S" offset NEAR je," $+ emitln L2 block 'e' match L1 COUNT S" offset NEAR jmp," $+ emitln L2 COUNT postlabel L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- recognize and translate a LOOP construct : doLOOP ( -- ) 0 0 LOCALS| L2 L1 | 'p' match newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L1 COUNT postlabel L2 block 'e' match L1 COUNT S" offset NEAR jmp," $+ emitln L2 COUNT postlabel L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- recognize and translate a FOR construct : doFOR ( -- ) 0 0 0 LOCALS| name L2 L1 | 'f' match getname TO name newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP '=' match expression S" 1 b# -> eax sub," emitln S" eax -> " name CHAR-APPEND S" dword-ptr mov," $+ emitln expression S" eax push," emitln L1 COUNT postlabel S" dword-ptr -> eax mov," name CHAR-PREPEND emitln S" 1 b# -> eax add, eax -> " name CHAR-APPEND S" dword-ptr mov," $+ emitln S" [esp] -> eax cmp," emitln L2 COUNT S" offset NEAR jg," $+ emitln L2 block 'e' match L1 COUNT S" offset NEAR jmp," $+ emitln L2 COUNT postlabel S" [esp 4 +] -> esp lea," emitln L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE name FREE ?ALLOCATE ; -- recognize and translate a REPEAT construct : doREPEAT ( -- ) 0 0 LOCALS| L2 L1 | 'r' match newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L1 COUNT postlabel L2 block 'u' match condition L1 COUNT S" offset NEAR je," $+ emitln L2 COUNT postlabel L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- recognize and translate a DO construct : doDO ( -- ) 0 0 LOCALS| L2 L1 | 'd' match newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP expression L1 COUNT postlabel S" eax push," emitln L2 block 'e' match S" eax pop," emitln S" 1 b# -> eax sub," emitln L1 COUNT S" offset NEAR jnz," $+ emitln S" [esp -4 +] -> esp lea," emitln L2 COUNT postlabel S" [esp 4 +] -> esp lea," emitln L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- recognize and translate a BREAK : doBREAK ( label -- ) 'b' match DUP C@ 0= IF S" No loop to break from" aborts ENDIF COUNT S" offset NEAR jmp," $+ emitln ; -- recognize and translate a statement block :NONAME ( label -- ) LOCAL L BEGIN Look 'e' <> Look 'l' <> AND Look 'u' <> AND WHILE CASE Look 'i' OF L doIF ENDOF 'w' OF doWHILE ENDOF 'p' OF doLOOP ENDOF 'r' OF doREPEAT ENDOF 'f' OF doFOR ENDOF 'd' OF doDO ENDOF 'b' OF L doBREAK ENDOF other ENDCASE REPEAT ; IS block -- parse and translate a program : doprogram ( -- ) C" " block Look 'e' <> IF S" End" expected EXIT ENDIF S" END" emitln ; -- Main Program ----------------------------------------------------------------------------------------------------------- : main ( -- ) CR init doprogram ;
***************************************************************** * * * COPYRIGHT NOTICE * * * * Copyright (C) 1988 Jack W. Crenshaw. All rights reserved. * * * *****************************************************************