As we left the parser, though, there was one big hole in our
capabilities: we did not address the issue of the branch
condition. To fill the void, I introduced to you a dummy parse
routine called Condition
, which only served as a place-keeper for
the real thing.
One of the things we'll do in this session is to plug that hole
by expanding Condition
into a true parser/translator.
Forth
compiler, building
up the parsers from very rudimentary beginnings to their final
forms, without spending much time in planning beforehand. That's
called coding without specs, and it's usually frowned upon. We
could get away with it before because the rules of arithmetic are
pretty well established ... we know what a '+' sign is supposed
to mean without having to discuss it at length. The same is true
for branches and loops. But the ways in which programming
languages implement logic vary quite a bit from language to
language. So before we begin serious coding, we'd better first
make up our minds what it is we want. And the way to do that is
at the level of the BNF syntax rules (the grammar).
<expression> ::= <unary op> <term> [<addop> <term>]* <term> ::= <factor> [<mulop> factor]* <factor> ::= <integer> | <variable> | ( <expression> )
(Remember, the nice thing about this grammar is that it enforces the operator precedence hierarchy that we normally expect for algebra.)
Actually, while we're on the subject, I'd like to amend this grammar a bit right now. The way we've handled the unary minus is a bit awkward. I've found that it's better to write the grammar this way:
<expression> ::= <term> [<addop> <term>]* <term> ::= <signed factor> [<mulop> factor]* <signed factor> ::= [<addop>] <factor> <factor> ::= <integer> | <variable> | (<expression>)
This puts the job of handling the unary minus onto Factor
, which
is where it really belongs.
This doesn't mean that you have to go back and recode the programs you've already written, although you're free to do so if you like. But I will be using the new syntax from now on.
Now, it probably won't come as a shock to you to learn that we can define an analogous grammar for Boolean algebra. A typical set or rules is:
<b-expression>::= <b-term> [<orop> <b-term>]* <b-term> ::= <not-factor> [AND <not-factor>]* <not-factor> ::= [NOT] <b-factor> <b-factor> ::= <b-literal> | <b-variable> | (<b-expression>)
Notice that in this grammar, the operator AND
is analogous to
'*', and OR
(and exclusive OR
) to '+'. The NOT
operator is
analogous to a unary minus. This hierarchy is not absolutely
standard ... some languages, notably Ada
, treat all logical
operators as having the same precedence level ... but it seems
natural.
Notice also the slight difference between the way the NOT
and the
unary minus are handled. In algebra, the unary minus is
considered to go with the whole term, and so never appears but
once in a given term. So an expression like
a * -b
or worse yet,
a - -b
is not allowed. In Boolean algebra, though, the expression
a AND NOT b
makes perfect sense, and the syntax shown allows for that.
IF
. The predicate
is required to have a Boolean value; that is, it must evaluate to
either TRUE or FALSE. The branch is then taken or not taken,
depending on that value. What we expect to see going on in
the word Condition
, then, is the evaluation of a Boolean
expression.
But there's more to it than that. A pure Boolean expression can indeed be the predicate of a control statement ... things like
IF a AND NOT b THEN ....
But more often, we see Boolean algebra show up in such things as
IF (x >= 0) and (x <= 100) THEN ...
Here, the two terms in parens are Boolean expressions, but the individual terms being compared: x, 0, and 100, are numeric in nature. The relational operators >= and <= are the catalysts by which the Boolean and the arithmetic ingredients get merged together.
Now, in the example above, the terms being compared are just that: terms. However, in general each side can be a math expression. So we can define a relation to be:
<relation> ::= <expression> <relop> <expression> ,
where the expressions we're talking about here are the old numeric type, and the relops are any of the usual symbols
=, <> (or !=), <, >, <=, and >=
If you think about it a bit, you'll agree that, since this kind of predicate has a single Boolean value, TRUE or FALSE, as its result, it is really just another kind of factor. So we can expand the definition of a Boolean factor above to read:
<b-factor> ::= <b-literal> | <b-variable> | (<b-expression>) | <relation>
That's the connection! The relops and the relation they define serve to wed the two kinds of algebra. It is worth noting that this implies a hierarchy where the arithmetic expression has a higher precedence that a Boolean factor, and therefore than all the Boolean operators. If you write out the precedence levels for all the operators, you arrive at the following list:
Level Syntax Element Operator 0 factor literal, variable 1 signed factor unary minus 2 term *, / 3 expression +, - 4 b-factor literal, variable, relop 5 not-factor NOT 6 b-term AND 7 b-expression OR, XOR
If we're willing to accept that many precedence levels, this grammar seems reasonable. Unfortunately, it won't work! The grammar may be great in theory, but it's no good at all in the practice of a top-down parser. To see the problem, consider the code fragment:
IF ((((((A + B + C) < 0 ) AND ....
When the parser is parsing this code, it knows after it sees the
IF
token that a Boolean expression is supposed to be next. So it
can set up to begin evaluating such an expression. But the first
expression in the example is an arithmetic expression, A + B + C.
What's worse, at the point that the parser has read this much of
the input line:
IF
((((((A ,
it still has no way of knowing which kind of expression it's dealing with. That won't do, because we must have different recognizers for the two cases. The situation can be handled without changing any of our definitions, but only if we're willing to accept an arbitrary amount of backtracking to work our way out of bad guesses. No compiler writer in his right mind would agree to that.
What's going on here is that the beauty and elegance of BNF grammar has met face to face with the realities of compiler technology.
To deal with this situation, compiler writers have had to make compromises so that a single parser can handle the grammar without backtracking.
The solution is simple, although it ends up causing profound changes to our grammar. We can only allow parentheses in one kind of factor. The way to do that varies considerably from language to language. This is one place where there is no agreement or convention to help us.
When Niklaus Wirth designed Pascal
, the desire was to limit the
number of levels of precedence (fewer parse routines, after all).
So the OR
and exclusive OR
operators are treated just like an
Addop and processed at the level of a math expression.
Similarly, the AND
is treated like a Mulop and processed with
Term
. The precedence levels are
Level Syntax Element Operator 0 factor literal, variable 1 signed factor unary minus, NOT 2 term *, /, AND 3 expression +, -, OR
Notice that there is only one set of syntax rules, applying to both kinds of operators. According to this grammar, then, expressions like
x + (y AND NOT z) DIV 3
are perfectly legal. And, in fact, they are ... as far as the
parser is concerned. Pascal
doesn't allow the mixing of
arithmetic and Boolean variables, and things like this are caught
at the semantic level, when it comes time to generate code for
them, rather than at the syntax level.
The authors of C
took a diametrically opposite approach: they
treat the operators as different, and have something much more
akin to our seven levels of precedence. In fact, in C
there are
no fewer than 17 levels! That's because C
also has the operators
'=', '+=' and its kin, '<<', '>>', '++', '--', etc. Ironically,
although in C
the arithmetic and Boolean operators are treated
separately, the variables are not ... there are no Boolean or
logical variables in C
, so a Boolean test can be made on any
integer value.
We'll do something that's sort of in-between. I'm tempted to
stick mostly with the Pascal
approach, since that seems the
simplest from an implementation point of view, but it results in
some funnies that I never liked very much, such as the fact that,
in the expression
IF (c >= 'A') and (c <= 'Z') then ...
the parens above are required. I never understood why before, and neither my compiler nor any human ever explained it very well, either. But now, we can all see that the 'and' operator, having the precedence of a multiply, has a higher one than the relational operators, so without the parens the expression is equivalent to
IF c >= ('A' and c) <= 'Z' then
which doesn't make sense.
In any case, I've elected to separate the operators into
different levels, although not as many as in C
.
<b-expression> ::= <b-term> [<orop> <b-term>]* <b-term> ::= <not-factor> [AND <not-factor>]* <not-factor> ::= [NOT] <b-factor> <b-factor> ::= <b-literal> | <b-variable> | <relation> <relation> ::= | <expression> [<relop> <expression] <expression> ::= <term> [<addop> <term>]* <term> ::= <signed factor> [<mulop> factor]* <signed factor>::= [<addop>] <factor> <factor> ::= <integer> | <variable> | (<b-expression>)
This grammar results in the same set of seven levels that I showed earlier. Really, it's almost the same grammar ... I just removed the option of parenthesized b-expressions as a possible b-factor, and added the relation as a legal form of b-factor.
There is one subtle but crucial difference, which is what makes the whole thing work. Notice the square brackets in the definition of a relation. This means that the relop and the second expression are optional.
A strange consequence of this grammar (and one shared by C
) is
that every expression is potentially a Boolean expression. The
parser will always be looking for a Boolean expression, but will
"settle" for an arithmetic one. To be honest, that's going to
slow down the parser, because it has to wade through more layers
of words. That's one reason why Forth
compilers tend
to compile faster than C
compilers. If it's raw speed you want,
stick with the Forth
syntax.
We begin, as we did in the arithmetic case, by dealing only with Boolean literals rather than variables. This gives us a new kind of input token, so we're also going to need a new recognizer, and a new word to read instances of that token type. Let's start by defining the two new procedures:
-- ------------------------------------------------------------- -- recognize a boolean literal : boolean? ( char -- tf ) >UPC DUP 'T' = SWAP 'F' = OR ; -- ------------------------------------------------------------- -- get a boolean literal : getboolean ( -- flag ) Look boolean? 0= IF S" Boolean literal" expected ENDIF Look >UPC 'T' = getchar ; -- -------------------------------------------------------------
Type these routines into your program. You can test them by adding into the main word the print statement
GetBoolean .
OK, compile the program and test it (chap6a.frt). As usual, it's not very impressive so far, but it soon will be.
Now, when we were dealing with numeric data we had to arrange to
generate code to load the values into EAX. We need to do the same
for Boolean data. The usual way to encode Boolean variables is
to let 0 stand for FALSE, and some other value for TRUE. Many
languages, such as C
, use an integer 1 to represent it. But I
prefer FFFFFFFF hex (or -1), because a bitwise NOT
also becomes a
Boolean NOT
. So now we need to emit the right assembler code to
load those values. The first cut at the Boolean expression
parser (BoolExpression
, of course) is:
-- ------------------------------------------------------------- -- Parse and translate a boolean expression : boolexpression ( -- ) Look boolean? 0= IF S" Boolean literal" expected ENDIF getboolean IF S" -1 b# -> eax mov," ELSE S" eax -> eax xor," ENDIF emitln ; -- -------------------------------------------------------------
Add this word to your parser, and call it from the main program (replacing the print statement you had just put there). As you can see, we still don't have much of a parser, but the output code is starting to look more realistic (chap6b.frt).
Next, of course, we have to expand the definition of a Boolean expression. We already have the BNF rule:
<b-expression> ::= <b-term> [<orop> <b-term>]*
I prefer the Forth
versions of the "orops", OR
and XOR
. But
since we are keeping to single-character tokens here, I'll encode
those with '|' and '~'. The next version of BoolExpression
is
almost a direct copy of the arithmetic word Expression
:
-- ------------------------------------------------------------ -- Recognize and translate a boolean or : boolOR ( -- ) '|' match boolterm S" [esp] -> eax or, [esp 4 +] -> esp lea," emitln ; -- ------------------------------------------------------------ -- Recognize and translate an exclusive or : boolXOR ( -- ) '~' match boolterm S" [esp] -> eax xor, [esp 4 +] -> esp lea," emitln ; -- ------------------------------------------------------------ -- Parse and translate a boolean expression : boolexpression ( -- ) boolterm BEGIN Look orop? WHILE S" eax push," emitln CASE Look '|' OF boolOR ENDOF '~' OF boolXOR ENDOF ENDCASE REPEAT ; -- -------------------------------------------------------------
Note the new recognizer OrOp?
, which is also a copy, this time
of AddOp?
:
-- ------------------------------------------------------------ -- Recognize a boolean orop : orop? ( char -- tf ) DUP '|' = SWAP '~' = OR ; -- ------------------------------------------------------------
OK, rename the old version of BoolExpression
to BoolTerm
, then
enter the code above. Compile and test this version. At this
point, the output code is starting to look pretty good. Of
course, it doesn't make much sense to do a lot of Boolean algebra
on constant values, but we'll soon be expanding the types of
Booleans we deal with. (chap6c.frt).
You've probably already guessed what the next step is: The
Boolean version of Term
.
Rename the current procedure BoolTerm
to NotFactor
, and enter the
following new version of BoolTerm
. Note that is is much simpler
than the numeric version, since there is no equivalent of
division.
-- ------------------------------------------------------------ -- Parse and translate a boolean term : boolterm ( -- ) notfactor BEGIN Look '&' = WHILE S" eax push," emitln '&' match notfactor S" [esp] -> eax and, [esp 4 +] -> esp lea," emitln REPEAT ; -- ------------------------------------------------------------
Now, we're almost home. We are translating complex Boolean
expressions, although only for constant values. The next step is
to allow for the NOT
. Write the following procedure:
-- ------------------------------------------------------------ -- parse and translate a boolean factor with NOT : notfactor ( -- ) Look '!' <> IF boolfactor EXIT ENDIF '!' match boolfactor S" -1 b# -> eax xor," emitln ; -- ------------------------------------------------------------
And rename the earlier word to BoolFactor
. Now try that.
At this point the parser should be able to handle any Boolean
expression you care to throw at it. Does it? Does it trap badly
formed expressions? (chap6d.frt)
If you've been following what we did in the parser for math
expressions, you know that what we did next was to expand the
definition of a factor to include variables and parens. We don't
have to do that for the Boolean factor, because those little
items get taken care of by the next step. It takes just a one
line addition to BoolFactor
to take care of relations:
-- ------------------------------------------------------------ -- Parse and translate a boolean factor : boolfactor ( -- ) Look boolean? IF getboolean IF S" -1 b# -> eax mov," ELSE S" eax -> eax xor," ENDIF emitln ELSE relation ENDIF ; -- ------------------------------------------------------------
You might be wondering when I'm going to provide for Boolean
variables and parenthesized Boolean expressions. The answer is,
I'm not! Remember, we took those out of the grammar earlier.
Right now all I'm doing is encoding the grammar we've already
agreed upon. The compiler itself can't tell the difference
between a Boolean variable or expression and an arithmetic one
... all of those will be handled by Relation
, either way.
Of course, it would help to have some code for Relation
. I don't
feel comfortable, though, adding any more code without first
checking out what we already have. So for now let's just write a
dummy version of Relation
that does nothing except eat the
current character, and write a little message:
-- ------------------------------------------------------------- -- Parse and translate a relation : relation ( -- ) S" <Relation>" emitln getchar ; -- -------------------------------------------------------------
OK, key in this code and give it a try. All the old things
should still work ... you should be able to generate the code for
AND
s, OR
s, and NOT
s. In addition, if you type any alphabetic
character you should get a little <Relation> place-holder, where
a Boolean factor should be. Did you get that? Fine, then let's
move on to the full-blown version of Relation
(chap6e.frt).
To get that, though, there is a bit of groundwork that we must lay first. Recall that a relation has the form
<relation> ::= | <expression> [<relop> <expression]
Since we have a new kind of operator, we're also going to need a new Boolean function to recognize it. That function is shown below. Because of the single-character limitation, I'm sticking to the four operators that can be encoded with such a character (the "not equals" is encoded by '#').
-- ------------------------------------------------------------ -- Recognize a relop : relop? ( char -- tf ) DUP '=' = OVER '#' = OR OVER '<' = OR SWAP '>' = OR ; -- ------------------------------------------------------------
Now, recall that we're using a zero or a -1 in register EAX to represent a Boolean value, and also that the loop constructs expect the flags to be set to correspond. In implementing all this on the x86, things get a a little bit tricky.
Since the loop constructs operate only on the flags, it would be nice (and also quite efficient) just to set up those flags, and not load anything into EAX at all. This would be fine for the loops and branches, but remember that the relation can be used anywhere a Boolean factor could be used. We may be storing its result to a Boolean variable. Since we can't know at this point how the result is going to be used, we must allow for both cases.
Comparing numeric data is easy enough ... the x86 has an operation for that ... but it sets the flags, not a value. What's more, the flags will always be set the same (zero if equal, etc.), while we need the zero flag set differently for the each of the different relops.
The solution is found in the x86 instruction SETcc, which sets a byte value to 00 or FF (funny how that works!) depending upon the result of the specified condition. If we make the destination byte to be EAX, we get the Boolean value needed.
I might mention here that this area is, in my opinion, the one
that represents the biggest difference between the efficiency of
hand-coded assembly language and compiler-generated code. We
have seen already that we lose efficiency in arithmetic
operations, although later I plan to show you how to improve that
a bit. We've also seen that the control constructs themselves
can be done quite efficiently ... it's usually very difficult to
improve on the code generated for an IF
or a WHILE
. But
virtually every compiler I've ever seen generates terrible code,
compared to assembler, for the computation of a Boolean function,
and particularly for relations. The reason is just what I've
hinted at above. When I'm writing code in assembler, I go ahead
and perform the test the most convenient way I can, and then set
up the branch so that it goes the way it should. In effect, I
"tailor" every branch to the situation. The compiler can't do
that (practically), and it also can't know that we don't want to
store the result of the test as a Boolean variable. So it must
generate the code in a very strict order, and it often ends up
loading the result as a Boolean that never gets used for
anything.
In any case, we're now ready to look at the code for Relation
.
It's shown below with its companion procedures:
-- ------------------------------------------------------------------------------------ -- recognize and translate a relational "equals" : equals ( -- ) '=' match expression S" [esp] -> eax cmp, [esp 4 +] -> esp lea, al sete, al -> eax movsx," emitln ; -- ------------------------------------------------------------------------------------ -- recognize and translate a relational "not equals" : notequals ( -- ) '#' match expression S" [esp] -> eax cmp, [esp 4 +] -> esp lea, al setne, al -> eax movsx," emitln ; -- ------------------------------------------------------------------------------------ -- recognize and translate a relational "less than" : less ( -- ) '<' match expression S" [esp] -> eax cmp, [esp 4 +] -> esp lea, al setg, al -> eax movsx," emitln ; -- ----------------------------------------------------------------------------------------- -- recognize and translate a relational "greater than" : greater ( -- ) '>' match expression S" [esp] -> eax cmp, [esp 4 +] -> esp lea, al setl, al -> eax movsx," emitln ; -- ----------------------------------------------------------------------------------------- -- parse and translate a relation : relation ( -- ) expression Look relop? 0= ?EXIT S" eax push," emitln CASE Look '=' OF equals ENDOF '#' OF notequals ENDOF '<' OF less ENDOF '>' OF greater ENDOF ENDCASE ; -- -------------------------------------------------------------
Now, that call to Expression
looks familiar! Here is where the
editor of your system comes in handy. We have already generated
code for Expression
and its buddies in previous sessions. You
can copy them into your file now. Remember to use the single-character
versions. Just to be certain, I've duplicated the
arithmetic procedures below. If you're observant, you'll also
see that I've changed them a little to make them correspond to
the latest version of the syntax. This change is not necessary,
so you may prefer to hold off on that until you're sure
everything is working.
-- ------------------------------------------------------------- -- getname still returns single characters VARIABLE gn$ : getname ( -- c-addr u ) getname gn$ C! gn$ 1 ; -- ------------------------------------------------------------- -- Parse and translate an identifier : ident ( -- ) getname DUP 1+ ALLOCATE ?ALLOCATE LOCAL name name PACK DROP Look '(' = IF '(' match ')' match name COUNT S" offset NEAR call," ELSE name COUNT S" dword-ptr -> eax mov," ENDIF $+ emitln name FREE ?ALLOCATE ; -- ------------------------------------------------------------- DEFER expression -- ------------------------------------------------------------- -- parse and translate a math factor : factor ( -- ) Look '(' = IF '(' match expression ')' match EXIT ENDIF Look alpha? IF ident EXIT ENDIF getnum S" d# -> eax mov," $+ emitln ; -- ------------------------------------------------------------- -- parse and translate the first math factor : signedfactor ( -- ) Look '+' = IF getchar ENDIF Look '-' <> IF factor EXIT ENDIF getchar Look digit? IF S" #-" getnum $+ S" d# -> eax mov," $+ ELSE factor S" eax neg," ENDIF emitln ; -- ------------------------------------------------------------- -- recognize and translate multiply and divide : multiply ( -- ) '*' match factor S" [esp] dword mul, [esp 4 +] -> esp lea," emitln ; -- ------------------------------------------------------------- : divide ( -- ) '/' match factor S" ebx pop, ebx -> eax xchg, eax -> edx mov," emitln S" #31 b# -> edx sar, ebx idiv," emitln ; -- ------------------------------------------------------------- -- parse and translate a math term : term ( -- ) signedfactor BEGIN Look mulop? WHILE S" eax push," emitln CASE Look '*' OF multiply ENDOF '/' OF divide ENDOF S" Mulop" expected ENDCASE REPEAT ; -- ------------------------------------------------------------- -- recognize and translate add and subtract : add ( -- ) '+' match term S" [esp] -> eax add, [esp 4 +] -> esp lea," emitln ; -- ------------------------------------------------------------- : subtract ( -- ) '-' match term S" [esp] -> eax sub, [esp 4 +] -> esp lea," emitln S" eax neg," emitln ; -- ------------------------------------------------------------- -- parse and translate an expression :NONAME ( -- ) term BEGIN Look addop? WHILE S" eax push," emitln CASE Look '+' OF add ENDOF '-' OF subtract ENDOF S" Addop" expected ENDCASE REPEAT ; IS expression -- -------------------------------------------------------------
There you have it ... a parser that can handle both arithmetic and Boolean algebra, and things that combine the two through the use of relops. I suggest you file away a copy of this parser in a safe place for future reference, because in our next step we're going to be chopping it up.
Condition
and Expression
? Now you know what
goes in their places!
I warn you, you're going to have to do some creative editing
here, so take your time and get it right. What you need to do is
to copy all of the procedures from the logic parser, from Ident
through BoolExpression
, into the parser for control constructs.
Insert them at the current location of Condition
. Then delete
that word, as well as the dummy Expression
. Next, change
every call to Condition
to refer to BoolExpression
instead.
Finally, copy the procedures Mulop?
, OrOp?
, Relop?
, Boolean?
,
and GetBoolean
into place. That should do it.
Compile the resulting program and give it a try. Since we
haven't used this program in awhile, don't forget that we used
single-character tokens for IF
, WHILE
, etc. Also don't forget
that any character not a keyword just gets echoed as a block.
Try
ia=bxlye
which stands for "IF a=b X ELSE Y ENDIF".
What do you think? Did it work? Try some others.
We're soon going to find that the one-line "programs" that we're having to write here will really cramp our style. At the moment we have no cure for that, because our parser doesn't recognize the end-of-line characters, the carriage return (CR) and the line feed (LF). So before going any further let's plug that hole.
There are a couple of ways to deal with the CR/LFs. One (the
C
/Unix
approach) is just to treat them as additional white space
characters and ignore them. That's actually not such a bad
approach, but it does sort of produce funny results for our
parser as it stands now. If it were reading its input from a
source file as any self-respecting real compiler does, there
would be no problem. But we're reading input from the keyboard,
and we're sort of conditioned to expect something to happen when
we hit the return key. It won't, if we just skip over the CR and
LF (try it). So I'm going to use a different method here, which
is not necessarily the best approach in the long run. Consider
it a temporary kludge until we're further along.
Instead of skipping the CR/LF, We'll let the parser go ahead and
catch them, then introduce a special word, analogous to
SkipWhite
, that skips them only in specified "legal" spots.
Here's the word:
-- ------------------------------------------------------------ -- skip cr+lf : fin ( -- ) Look ^M = IF getchar ENDIF Look ^J = IF getchar ENDIF ; -- ------------------------------------------------------------
Now, add two calls to Fin
in word Block
, like this:
-- ------------------------------------------------------------ -- Recognize and translate a statement block :NONAME ( label -- ) LOCAL L BEGIN Look 'e' <> Look 'l' <> AND Look 'u' <> AND WHILE fin CASE Look 'i' OF L doIF ENDOF 'w' OF doWHILE ENDOF 'p' OF doLOOP ENDOF 'r' OF doREPEAT ENDOF 'f' OF doFOR ENDOF 'd' OF doDO ENDOF 'b' OF L doBREAK ENDOF other ENDCASE fin REPEAT ; IS block -- ------------------------------------------------------------
Now, you'll find that you can use multiple-line "programs." The
only restriction is that you can't separate an IF
or WHILE
token
from its predicate.
Now we're ready to include the assignment statements. Simply
change that call to Other
in word Block
to a call to
Assignment
, and add the following word, copied from one of
our earlier programs. Note that Assignment
now calls
BoolExpression
, so that we can assign Boolean variables.
-- ------------------------------------------------------------ -- Parse and translate an assignment statement : assignment ( -- ) getname DUP 1+ ALLOCATE ?ALLOCATE DUP LOCAL name PACK DROP '=' match boolexpression S" eax -> " name COUNT $+ S" dword-ptr mov," $+ emitln name FREE ?ALLOCATE ; -- ------------------------------------------------------------
With that change, you should now be able to write reasonably realistic-looking programs, subject only to our limitation on single-character tokens. My original intention was to get rid of that limitation for you, too. However, that's going to require a fairly major change to what we've done so far. We need a true lexical scanner, and that requires some structural changes. They are not big changes that require us to throw away all of what we've done so far ... with care, it can be done with very minimal changes, in fact. But it does require that care.
This installment has already gotten pretty long, and it contains some pretty heavy stuff, so I've decided to leave that step until next time, when you've had a little more time to digest what we've done and are ready to start fresh.
In the next installment, then, we'll build a lexical scanner and eliminate the single-character barrier once and for all. We'll also write our first complete compiler, based on what we've done in this session. See you then.
***************************************************************** * * * COPYRIGHT NOTICE * * * * Copyright (C) 1988 Jack W. Crenshaw. All rights reserved. * * * *****************************************************************