Recently, though, I've begun to have problems, and they're the kind that send a message that you might be doing something wrong.
The whole thing came to a head when I tried to address the issue of semicolons. Several people have asked me about them, and whether or not KISS will have them separating the statements. My intention has been not to use semicolons, simply because I don't like them and, as you can see, they have not proved necessary.
But I know that many of you, like me, have gotten used to them, and so I set out to write a short installment to show you how they could easily be added, if you were so inclined.
Well, it turned out that they weren't easy to add at all. In fact it was darned difficult.
I guess I should have realized that something was wrong, because
of the issue of newlines. In the last couple of installments
we've addressed that issue, and I've shown you how to deal with
newlines with a procedure called, appropriately enough, NewLine
.
In TINY Version 1.0, I sprinkled calls to this procedure in
strategic spots in the code.
It seems that every time I've addressed the issue of newlines, though, I've found it to be tricky, and the resulting parser turned out to be quite fragile ... one addition or deletion here or there and things tended to go to pot. Looking back on it, I realize that there was a message in this that I just wasn't paying attention to.
When I tried to add semicolons on top of the newlines, that was the last straw. I ended up with much too complex a solution. I began to realize that something fundamental had to change.
So, in a way this installment will cause us to backtrack a bit and revisit the issue of scanning all over again. Sorry about that. That's the price you pay for watching me do this in real time. But the new version is definitely an improvement, and will serve us well for what is to come.
As I said, the scanner we used in Part X was about as simple as one can get. But anything can be improved. The new scanner is more like the classical scanner, and not as simple as before. But the overall compiler structure is even simpler than before. It's also more robust, and easier to add to and/or modify. I think that's worth the time spent in this digression. So in this installment, I'll be showing you the new structure. No doubt you'll be happy to know that, while the changes affect many procedures, they aren't very profound and so we lose very little of what's been done so far.
Ironically, the new scanner is much more conventional than the old one, and is very much like the more generic scanner I showed you earlier in Part VII. Then I started trying to get clever, and I almost clevered myself clean out of business. You'd think one day I'd learn: K-I-S-S!
Block
, which I've
reproduced below:
-- ------------------------------------------------------------ -- Parse and translate a block of statements :NONAME ( -- ) scan BEGIN token 'e' <> token 'l' <> AND WHILE CASE token 'i' OF doIF ENDOF 'w' OF doWHILE ENDOF 'R' OF doread ENDOF 'W' OF dowrite ENDOF assignment ENDCASE scan REPEAT ; IS block -- ------------------------------------------------------------
As you can see, Block
is oriented to individual program
statements. At each pass through the loop, we know that we are
at the beginning of a statement. We exit the block when we have
scanned an END
or an ELSE
.
But suppose that we see a semicolon instead. The procedure as
it's shown above can't handle that, because the word Scan
only
expects and can only accept tokens that begin with a letter.
I tinkered around for quite awhile to come up with a fix. I found many possible approaches, but none were very satisfying. I finally figured out the reason.
Recall that when we started with our single-character parsers, we
adopted a convention that the lookahead character would always be
prefetched. That is, we would have the character that
corresponds to our current position in the input stream fetched
into the global character Look
, so that we could examine it as
many times as needed. The rule we adopted was that every
recognizer, if it found its target token, would advance look to
the next character in the input stream.
That simple and fixed convention served us very well when we had single-character tokens, and it still does. It would make a lot of sense to apply the same rule to multi-character tokens.
But when we got into lexical scanning, I began to violate that simple rule. The scanner of Part X did indeed advance to the next token if it found an identifier or keyword, but it didn't do that if it found a carriage return, a whitespace character, or an operator.
Now, that sort of mixed-mode operation gets us into deep trouble
in the word Block
, because whether or not the input stream has
been advanced depends upon the kind of token we encounter. If
it's a keyword or the target of an assignment statement, the
"cursor," as defined by the contents of Look
, has been advanced
to the next token or to the beginning of whitespace. If, on the
other hand, the token is a semicolon, or if we have hit a
carriage return, the cursor has not advanced.
Needless to say, we can add enough logic to keep us on track. But it's tricky, and makes the whole parser very fragile.
There's a much better way, and that's just to adopt that same rule that's worked so well before, to apply to tokens as well as single characters. In other words, we'll prefetch tokens just as we've always done for characters. It seems so obvious once you think about it that way.
Interestingly enough, if we do things this way the problem that we've had with newline characters goes away. We can just lump them in as whitespace characters, which means that the handling of newlines becomes very trivial, and much less prone to error than we've had to deal with in the past.
-- ------------------------------------------------------------ -- Get an identifier : getname ( -- ) skipwhite Look alpha? 0= IF S" Identifier" expected ENDIF 'x' TO token token$ C0! BEGIN Look >UPC token$ char+! getchar Look alnum? 0= UNTIL ; -- ------------------------------------------------------------ -- Get a number : getnum ( -- ) skipwhite Look digit? 0= IF S" Number" expected ENDIF '#' TO token token$ C0! BEGIN Look token$ char+! getchar Look digit? 0= UNTIL ; -- ------------------------------------------------------------
These two words are functionally almost identical to the
ones I showed you in Part VII. They each fetch the current
token, either an identifier or a number, into the global string
token$
. They also set the encoded version, Token
, to the
appropriate code. The input stream is left with Look
containing
the first character not part of the token.
We can do the same thing for operators, even multi-character operators, with a procedure such as:
-- ------------------------------------------------------------ -- Get an operator : getop ( -- ) Look TO token token$ C0! BEGIN Look token$ char+! getchar Look digit? Look alpha? OR Look white? OR UNTIL ; -- ------------------------------------------------------------
Note that GetOp
returns, as its encoded token, the first
character of the operator. This is important, because it means
that we can now use that single character to drive the parser,
instead of the lookahead character.
We need to tie these procedures together into a single procedure that can handle all three cases. The following procedure will read any one of the token types and always leave the input stream advanced beyond it:
-- ------------------------------------------------------------ -- Get the next input token : next ( -- ) skipwhite Look alpha? IF getname EXIT ENDIF Look digit? IF getnum EXIT ENDIF getop ; -- ------------------------------------------------------------
Note that here I have put SkipWhite
before the calls rather
than after. This means that, in general, the variable Look
will
not have a meaningful value in it, and therefore we should not
use it as a test value for parsing, as we have been doing so far.
That's the big departure from our normal approach.
Now, remember that before I was careful not to treat the carriage
return (CR) and line feed (LF) characters as white space. This
was because, with SkipWhite
called as the last thing in the
scanner, the encounter with LF would trigger a read statement.
If we were on the last line of the program, we couldn't get out
until we input another line with a non-white character. That's
why I needed the second procedure, NewLine
, to handle the CRLF's.
But now, with the call to SkipWhite
coming first, that's exactly
the behavior we want. The compiler must know there's another
token coming or it wouldn't be calling Next
. In other words, it
hasn't found the terminating END
yet. So we're going to insist
on more data until we find something.
All this means that we can greatly simplify both the program and
the concepts, by treating CR and LF as whitespace characters, and
eliminating NewLine
. You can do that simply by modifying the
function White?
:
-- ------------------------------------------------------------ -- Recognize white space : white? ( char -- tf ) DUP BL = OVER Tab = OR OVER ^M = OR SWAP ^J = OR ; -- ------------------------------------------------------------
We've already tried similar routines in Part VII, but you might
as well try these new ones out. Add them to a copy of the Cradle
and call Next
with the following main program:
-- ------------------------------------------------------------ -- main program init BEGIN next CR Token EMIT SPACE token$ .$ Token '.' = UNTIL ; -- ------------------------------------------------------------
Compile it and verify that you can separate a program into a series of tokens, and that you get the right encoding for each token.
This almost works, but not quite. There are two potential
problems: First, in KISS/TINY almost all of our operators are
single-character operators. The only exceptions are the relops
>=
, <=
, and <>
.
It seems a shame to treat all operators as
strings and do a string compare, when only a single character
compare will almost always suffice. Second, and much more
important, the thing doesn't work when two operators appear
together, as in (a+b)*(c+d). Here the string following 'b' would
be interpreted as a single operator ")*(."
It's possible to fix that problem. For example, we could just
give GetOp
a list of legal characters, and we could treat the
parentheses as different operator types than the others. But
this begins to get messy.
Fortunately, there's a better way that solves all the problems.
Since almost all the operators are single characters, let's just
treat them that way, and let GetOp
get only one character at a
time. This not only simplifies GetOp
, but also speeds things up
quite a bit. We still have the problem of the relops, but we
were treating them as special cases anyway.
So here's the final version of GetOp
:
-- ------------------------------------------------------------ -- Get an operator : getop ( -- ) skipwhite Look TO Token Look token$ char-place getchar ; -- ------------------------------------------------------------
Note that I still give the string token$
a value. If you're truly
concerned about efficiency, you could leave this out. When we're
expecting an operator, we will only be testing Token
anyhow, so
the value of the string won't matter. But to me it seems to be
good practice to give the thing a value just in case.
Try this new version with some realistic-looking code. You should be able to separate any program into its individual tokens, with the caveat that the two-character relops will scan into two separate tokens. That's OK ... we'll parse them that way.
Now, in Part VII the function of Next
was combined with procedure
Scan
, which also checked every identifier against a list of
keywords and encoded each one that was found. As I mentioned at
the time, the last thing we would want to do is to use such a
procedure in places where keywords should not appear, such as in
expressions. If we did that, the keyword list would be scanned
for every identifier appearing in the code. Not good.
The right way to deal with that is to simply separate the
functions of fetching tokens and looking for keywords. The
version of Scan
shown below does nothing but check for keywords.
Notice that it operates on the current token and does not advance
the input stream.
-- ------------------------------------------------------------ -- Scan the current identifier for keywords : scan ( -- ) Token 'x' = IF token$ COUNT 0 KWlist lookup KW->token ENDIF ; -- ------------------------------------------------------------
There is one last detail. In the compiler there are a few places
that we must actually check the string value of the token.
Mainly, this is done to distinguish between the different END
's,
but there are a couple of other places. (I should note in
passing that we could always eliminate the need for matching END
characters by encoding each one to a different character. Right
now we are definitely taking the lazy man's route.)
The following version of MatchString
takes the place of the
character-oriented Match
. Note that, like Match
, it does advance
the input stream.
-- ------------------------------------------------------------ -- Match a specific input string : matchstring ( c-addr u -- ) token$ COUNT 2OVER COMPARE IF &` CHAR-PREPEND &' CHAR-APPEND expected ELSE 2DROP ENDIF next ; -- ------------------------------------------------------------
First of all, the code for procedure Block
doesn't change, though
its function does:
-- ------------------------------------------------------------ -- Parse and translate a block of statements :NONAME ( -- ) scan BEGIN token 'e' <> token 'l' <> AND WHILE CASE token 'i' OF doIF ENDOF 'w' OF doWHILE ENDOF 'R' OF doread ENDOF 'W' OF dowrite ENDOF assignment ENDCASE scan REPEAT ; IS block -- ------------------------------------------------------------
Remember that the new version of Scan
doesn't advance the input
stream, it only scans for keywords. The input stream must be
advanced by each procedure that Block
calls.
In general, we have to replace every test on Look
with a similar
test on Token
. For example:
-- ------------------------------------------------------------- -- Parse and translate a boolean expression :NONAME ( -- ) boolterm BEGIN Token orop? WHILE _push CASE Token '|' OF boolOR ENDOF '~' OF boolXOR ENDOF ENDCASE REPEAT ; IS boolexpression -- ------------------------------------------------------------
In procedures like Add
, we don't have to use Match
anymore. We
need only call Next
to advance the input stream:
-- ------------------------------------------------------------ -- Recognize and translate an add : add ( -- ) next term _popadd ; -- ------------------------------------------------------------
Control structures are actually simpler. We just call Next
to
advance over the control keywords:
-- ------------------------------------------------------------ -- Recognize and translate an if construct DEFER block : doIF ( -- ) 0 0 LOCALS| L2 L1 | next boolexpression newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP L1 COUNT DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L1 COUNT _branch0 block token 'l' = IF L2 FREE ?ALLOCATE next newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L2 COUNT _branch L1 COUNT postlabel block ENDIF L2 COUNT postlabel S" ENDIF" matchstring L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- ------------------------------------------------------------
That's about the extent of the required changes. In the listing of TINY Version 1.1 below, I've also made a number of other "improvements" that aren't really required. Let me explain them briefly:
Prog
and Main
, and combined
their functions into the main program. They didn't seem to
add to program clarity ... in fact they seemed to just
muddy things up a little.
PROGRAM
and BEGIN
from the
keyword list. Each one only occurs in one place, so it's
not necessary to search for it.
CheckTable
and CheckDup
, and replaced in-line code by calls to them.
This cleans up a number of routines.
StoreVariable
, and put it in the parser where it
belongs. See Assignment
, for example.
CompareExpression
and
NextExpression
.
PROGRAM
, it parses the same language as
before. It's just a bit cleaner, and more importantly it's
considerably more robust. I feel good about it.
The next installment will be another digression: the discussion of semicolons and such that got me into this mess in the first place. THEN we'll press on into procedures and types. Hang in there with me. The addition of those features will go a long way towards removing KISS from the "toy language" category. We're getting very close to being able to write a serious compiler.
-- ------------------------------------------------------------ -- Variable declarations -1 =: unknown 0 VALUE Lcount -- label counter 0 VALUE Look -- lookahead character 0 VALUE token -- encoded token #100 =: maxentry CREATE token$ 0 C, #256 CHARS ALLOT -- unencoded token CREATE prog$ 0 C, #256 CHARS ALLOT -- program name : char+! ( c addr -- ) DUP >S COUNT + C! 1 S> C+! ; : char-place ( c addr -- ) 1 OVER C! CHAR+ C! ; : token> ( -- c-addr u ) token$ COUNT ; : =token ( char -- tf ) token = ; : <>token ( char -- tf ) token <> ; -- Symbol tables and labels ----------------------------------- -- Helper to build symbol tables : $, ( c-addr u size -- ) >S S MIN >R R@ C, R@ 0 ?DO C@+ C, LOOP DROP S> R> - CHARS ALLOT ; -- Type declarations : SYMTAB ( size -- ) CREATE DUP 0< IF 0 , ABS DUP , 1+ maxentry * ALLOT ELSE HERE >R 0 , ( #items ) DUP >S , ( itemsize ) BEGIN BL <WORD> DUP WHILE 2DUP S" \" COMPARE 0= IF 2DROP REFILL DROP ELSE S $, 1 R@ +! ENDIF REPEAT 2DROP -R -S ENDIF DOES> CELL+ @+ 1+ ROT * + ( ix -- addr ) ; -- Definition of keywords and token types 8 CHARS =: /symbol /symbol SYMTAB KWlist IF ELSE ENDIF \ WHILE ENDWHILE \ DO ENDDO \ LOOP ENDLOOP \ REPEAT UNTIL \ FOR TO ENDFOR \ BREAK \ READ WRITE \ VAR END : KW->token ( index -- ) 2+ C" xilewedeLerufteBRWve" + C@ TO token ; /symbol NEGATE SYMTAB ST =CELL 1- NEGATE SYMTAB SType 0 ST 2 CELLS - =: [cnt]ST 0 SType 2 CELLS - =: [cnt]SType : lookup ( c-addr u 'table -- n2|unknown ) 0 0 LOCALS| /symbol n table sz addr | table 2 CELLS - @+ TO n @ TO /symbol n 0<= IF unknown EXIT ENDIF 0 n DO /symbol 1+ I * table + COUNT addr sz COMPARE 0= IF I UNLOOP EXIT ENDIF -1 +LOOP unknown ; -- Locate symbol in table : locate ( c-addr u -- ix ) 0 ST lookup ; -- Returns -1 | the index of the entry. : intable ( c-addr u -- tf ) 0 ST lookup 0>= ; -- Look for symbol in table -- Dump symbol table : .symbols ( -- ) [cnt]ST @ 0= IF CR ." No symbols defined." EXIT ENDIF CR ." -- type --.--- name ---" [cnt]ST @ 0 ?DO CR 5 HTAB I 1+ SType C@ EMIT #16 HTAB I 1+ ST .$ LOOP ; -- Tools ------------------------------------------------------ : getchar ( -- ) EKEY TO Look ; -- read new character from input stream : error ( c-addr u -- ) CR ^G EMIT ." Error: " TYPE ." ." ; -- report an error : aborts ( c-addr u -- ) error ABORT ; -- report error and halt : expected ( c-addr u -- ) S" expected" $+ aborts ; -- report what was expected : alpha? ( char -- tf ) >UPC 'A' 'Z' 1+ WITHIN ; -- recognize an alpha character : digit? ( char -- tf ) '0' '9' 1+ WITHIN ; -- recognize a decimal digit : alnum? ( char -- tf ) DUP alpha? SWAP digit? OR ; -- recognize alphanumeric : orop? ( char -- tf ) DUP '|' = SWAP '~' = OR ; -- recognize an OR operand : addop? ( char -- tf ) DUP '+' = SWAP '-' = OR ; -- test for AddOp : mulop? ( char -- tf ) DUP '*' = SWAP '/' = OR ; -- test for MulOp : emits ( c-addr u -- ) Tab EMIT TYPE ; -- output a string with tab : emitdln ( c-addr u -- ) CR TYPE ; -- output a string without tab, +crlf : emitln ( c-addr u -- ) CR emits ; -- output a string with tab and crlf -- Recognize white space : white? ( char -- tf ) DUP Tab = OVER BL = OR OVER ^M = OR SWAP ^J = OR ; -- Skip white space : skipwhite ( -- ) BEGIN Look white? WHILE getchar REPEAT ; -- Abort, reporting an offending item : name.aborts ( c-addr1 u1 c-addr2 u2 ) &' CHAR-APPEND S" `" 2SWAP $+ $+ aborts ; : .undefined ( c-addr u -- ) S" Undefined identifier" 2SWAP name.aborts ; : .duplicate ( c-addr u -- ) S" Duplicate identifier" 2SWAP name.aborts ; : checktable ( c-addr u -- ) 2DUP intable IF 2DROP EXIT ENDIF .undefined ; -- Is identifier in the symbol table? : checkdup ( c-addr u -- ) 2DUP intable 0= IF 2DROP EXIT ENDIF .duplicate ; -- Is identifier already in symbol table? : checkident ( -- ) 'x' <>token IF S" Identifier" expected ENDIF ; -- Is current token an identifier? -- Generate a unique label / post it to output : newlabel ( -- c-addr u ) S" @" Lcount U>D <# #S #> $+ 1 +TO Lcount ; : postlabel ( c-addr u -- ) CR TYPE ':' EMIT ; -- Recognize a relop : relop? ( char -- tf ) DUP '=' = OVER '#' = OR OVER '<' = OR SWAP '>' = OR ; -- Match a specific input character : match ( char -- ) DUP Look = IF DROP getchar skipwhite ELSE S" `" ROT CHAR-APPEND &' CHAR-APPEND expected ENDIF ; -- Get an identifier : getname ( -- ) skipwhite Look alpha? 0= IF S" Identifier" expected ENDIF 'x' TO token token$ C0! BEGIN Look >UPC token$ char+! getchar Look alnum? 0= UNTIL ; -- Get a number : getnum ( -- ) skipwhite Look digit? 0= IF S" Number" expected ENDIF '#' TO token token$ C0! BEGIN Look token$ char+! getchar Look digit? 0= UNTIL ; -- Get an operator : getop ( -- ) skipwhite Look TO token Look token$ char-place getchar ; -- Get next token of any type : next ( -- ) skipwhite Look alpha? IF getname EXIT ENDIF Look digit? IF getnum EXIT ENDIF getop ; -- Get an identifier and scan it for keywords : scan ( -- ) 'x' =token IF token> 0 KWlist lookup KW->token ENDIF ; -- Match a specific input string : matchstring ( c-addr u -- ) token> 2OVER COMPARE IF &` CHAR-PREPEND &' CHAR-APPEND expected ELSE 2DROP next ENDIF ; -- Add a new entry to symbol table : addentry ( c-addr u type -- ) [cnt]ST LOCALS| #entries T sz addr | addr sz checkdup #entries @ maxentry = IF S" Symbol table full" aborts ENDIF 1 #entries +! addr sz #entries @ ST PACK DROP T #entries @ SType C! ; -- Code generation -------------------------------------------- -- Load primary register with a constant : loadconstant ( c-addr u -- ) S" d# -> eax mov," $+ emitln ; -- Load primary register from variable : loadvariable ( c-addr u -- ) 2DUP intable 0= IF .undefined ENDIF S" dword-ptr -> eax mov," $+ emitln ; -- Store primary register in variable : storevariable ( c-addr u -- ) 2DUP intable 0= IF .undefined ENDIF S" eax -> " 2SWAP $+ S" dword-ptr mov," $+ emitln ; -- Allocate storage for a static variable : allocatestorage ( $value $name -- ) S" CREATE " 2SWAP $+ BL CHAR-APPEND 2SWAP $+ S" , " $+ emitdln ; : _callw ( c-addr u -- ) S" offset NEAR call," $+ emitln ; -- call a word : _clear ( -- ) S" eax -> eax xor," emitln ; -- clear primary : _negate ( -- ) S" eax neg," emitln ; -- negate primary : _not ( -- ) S" eax not," emitln ; -- not primary : _push ( -- ) S" eax push," emitln ; -- push primary to stack : _pop ( -- ) S" eax pop," emitln ; -- pop primary from stack : _decr ( -- ) S" 1 b# -> eax sub," emitln ; -- decrement primary : _incr ( -- ) S" 1 b# -> eax add," emitln ; -- increment primary : _decSP ( -- ) S" [esp -4 +] -> esp lea," emitln ; -- drop 1 stack item : _incSP ( -- ) S" [esp 4 +] -> esp lea," emitln ; -- restore stack item : _popadd ( -- ) S" [esp] -> eax add, [esp 4 +] -> esp lea," emitln ; -- add TOS to primary : _popsub ( -- ) S" [esp] -> eax sub, [esp 4 +] -> esp lea," emitln _negate ; -- subtract TOS from primary : _popor ( -- ) S" [esp] -> eax or, [esp 4 +] -> esp lea," emitln ; -- or TOS to primary : _popxor ( -- ) S" [esp] -> eax xor, [esp 4 +] -> esp lea," emitln ; -- xor TOS to primary : _popand ( -- ) S" [esp] -> eax and, [esp 4 +] -> esp lea," emitln ; -- xor TOS to primary : _popcmp ( -- ) S" [esp] -> eax cmp, [esp 4 +] -> esp lea," emitln ; -- compare TOS to primary : _sete ( -- ) S" al sete, al -> eax movsx," emitln ; -- set primary if equal : _setne ( -- ) S" al setne, al -> eax movsx," emitln ; -- set primary if NOT equal : _setg ( -- ) S" al setl, al -> eax movsx," emitln ; -- set primary if greater : _setl ( -- ) S" al setg, al -> eax movsx," emitln ; -- set primary if less : _setge ( -- ) S" al setle, al -> eax movsx," emitln ; -- set primary if greater or equal : _setle ( -- ) S" al setge, al -> eax movsx," emitln ; -- set primary if less or equal -- A collection of jumps : _branch ( c-addr u -- ) S" offset NEAR jmp," $+ emitln ; : _pcmp+b0> ( c-addr u -- ) _popcmp S" offset NEAR jg," $+ emitln ; : _branch0 ( c-addr u -- ) S" eax -> eax or, " 2SWAP $+ S" offset NEAR je," $+ emitln ; : _branch<>0 ( c-addr u -- ) S" eax -> eax or, " 2SWAP $+ S" offset NEAR jnz," $+ emitln ; -- Multiply TOS and primary; divide primary by TOS : _popmul ( -- ) S" [esp] dword mul, [esp 4 +] -> esp lea," emitln ; : _popdiv ( -- ) S" ecx pop, ecx -> eax xchg, eax -> edx mov, #31 b# -> edx sar, ebx idiv," emitln ; -- Read to primary register and store in variable : readit ( c-addr u -- ) S" READ" _callw storevariable ; -- Write from primary register : writeit ( -- ) S" WRITE" _callw ; -- Write assembler header info : header ( -- ) S" -- DATA section --------" emitdln S" " emitdln ; -- Write assembler prologue : prolog ( -- ) S" " emitln S" -- CODE section --------" emitdln S" " emitdln S" CODE " prog$ COUNT $+ emitdln S" rpush," emitln newlabel postlabel ; -- Write assembler epilogue : epilog ( -- ) S" rpop, ebx jmp," emitln S" END-CODE" emitdln ; -- Expressions ------------------------------------------------ DEFER boolexpression -- Parse and translate a math factor : factor ( -- ) '(' =token IF next boolexpression S" )" matchstring EXIT ENDIF 'x' =token IF token> loadvariable next EXIT ENDIF '#' =token IF token> loadconstant next EXIT ENDIF S" Math factor" expected ; -- Recognize and translate multiply / divide : multiply ( -- ) next factor _popmul ; : divide ( -- ) next factor _popdiv ; -- Parse and translate a math term : term ( -- ) factor BEGIN token mulop? WHILE _push CASE token '*' OF multiply ENDOF '/' OF divide ENDOF ENDCASE REPEAT ; -- Recognize and translate add / subtract : add ( -- ) next term _popadd ; : subtract ( -- ) next term _popsub ; -- Parse and translate an expression : expression ( -- ) token addop? IF _clear ELSE term ENDIF BEGIN token addop? WHILE _push CASE token '+' OF add ENDOF '-' OF subtract ENDOF ENDCASE REPEAT ; -- Get another / next expression and compare : comparex ( -- ) expression _popcmp ; : nextx ( -- ) next comparex ; -- Recognize and translate a relational "equals" / "less or equal" / "not equals" : equals ( -- ) nextx _sete ; : lessorequal ( -- ) nextx _setle ; : notequals ( -- ) nextx _setne ; -- Recognize and translate a relational "less than" : less ( -- ) next CASE token '=' OF lessorequal ENDOF '>' OF notequals ENDOF comparex _setl ENDCASE ; -- Recognize and translate a relational "greater than" : greater ( -- ) next '=' =token IF nextx _setge EXIT ENDIF comparex _setg ; -- Parse and translate a relation : relation ( -- ) expression token relop? 0= ?EXIT _push CASE token '=' OF equals ENDOF '<' OF less ENDOF '>' OF greater ENDOF ENDCASE ; -- Parse and translate a boolean factor with NOT : notfactor ( -- ) '!' <>token IF relation EXIT ENDIF next relation _not ; -- Parse and translate a boolean term : boolterm ( -- ) notfactor BEGIN '&' =token WHILE _push next notfactor _popand REPEAT ; -- Recognize and translate a boolean OR / XOR : boolOR ( -- ) next boolterm _popor ; : boolXOR ( -- ) next boolterm _popxor ; -- Parse and translate a boolean expression :NONAME ( -- ) boolterm BEGIN token orop? WHILE _push CASE token '|' OF boolOR ENDOF '~' OF boolXOR ENDOF ENDCASE REPEAT ; IS boolexpression -- Parse and translate an assignment statement : assignment ( -- ) token> DUP 1+ ALLOCATE ?ALLOCATE DUP LOCAL name PACK DROP next S" =" matchstring boolexpression name COUNT storevariable name FREE ?ALLOCATE ; -- Block statements ------------------------------------------- DEFER block -- Recognize and translate an IF construct : doIF ( label -- ) 0 0 LOCALS| L2 L1 L | next boolexpression newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP L1 COUNT DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L1 COUNT _branch0 L block 'l' =token IF L2 FREE ?ALLOCATE next newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP L2 COUNT _branch L1 COUNT postlabel L block ENDIF L2 COUNT postlabel S" ENDIF" matchstring L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- Recognize and translate a WHILE construct : doWHILE ( -- ) 0 0 LOCALS| L2 L1 | newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP next L1 COUNT postlabel boolexpression L2 COUNT _branch0 L2 block S" ENDWHILE" matchstring L1 COUNT _branch L2 COUNT postlabel L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- Parse and translate a LOOP statement : doLOOP ( -- ) 0 0 LOCALS| L2 L1 | newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP next L1 COUNT postlabel L2 block L1 COUNT _branch L2 COUNT postlabel S" ENDLOOP" matchstring L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- Parse and translate a REPEAT statement : doREPEAT ( -- ) 0 0 LOCALS| L2 L1 | newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP next L1 COUNT postlabel L2 block S" UNTIL" matchstring boolexpression L1 COUNT _branch0 L2 COUNT postlabel L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- Parse and translate a FOR statement : doFOR ( -- ) 0 0 0 LOCALS| name L2 L1 | next checkident token> checktable token> DUP 1+ ALLOCATE ?ALLOCATE DUP TO name PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP next S" =" matchstring expression _decr name COUNT storevariable S" TO" matchstring expression _push L1 COUNT postlabel name COUNT loadvariable _incr name COUNT storevariable L2 COUNT _pcmp+b0> L2 block L1 COUNT _branch L2 COUNT postlabel S" ENDFOR" matchstring _incSP name FREE ?ALLOCATE L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- Parse and Translate a DO Statement : doDO ( -- ) 0 0 LOCALS| L2 L1 | newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP next expression L1 COUNT postlabel _push L2 block _pop _decr L1 COUNT _branch<>0 _decSP L2 COUNT postlabel S" ENDDO" matchstring _incSP L1 FREE ?ALLOCATE L2 FREE ?ALLOCATE ; -- Recognize and translate a BREAK : doBREAK ( label -- ) DUP C@ 0= IF S" No loop to break from" aborts ENDIF COUNT _branch next ; -- Read a single variable : readvar ( -- ) checkident token> checktable token> readit next ; -- Process a read statement : doread ( -- ) next S" (" matchstring readvar BEGIN ',' =token WHILE next readvar REPEAT S" )" matchstring ; -- Process a write statement : dowrite ( -- ) next S" (" matchstring expression writeit BEGIN ',' =token WHILE next expression writeit REPEAT S" )" matchstring ; -- Recognize and translate a statement block :NONAME ( label -- ) LOCAL L scan BEGIN 'e' <>token 'l' <>token AND 'u' <>token AND WHILE CASE token 'i' OF L doIF ENDOF 'w' OF doWHILE ENDOF 'd' OF doDO ENDOF 'L' OF doLOOP ENDOF 'r' OF doREPEAT ENDOF 'f' OF doFOR ENDOF 'B' OF L doBREAK ENDOF 'R' OF doread ENDOF 'W' OF dowrite ENDOF assignment ENDCASE scan REPEAT ; IS block -- Declarations ----------------------------------------------- -- Allocate storage for a variable : alloc ( -- ) 0 LOCAL aname next token> DUP 1+ ALLOCATE ?ALLOCATE DUP TO aname PACK DROP 'x' <>token IF S" Variable name" expected ENDIF aname COUNT checkdup aname COUNT 'v' addentry next '=' =token IF S" =" matchstring '-' =token IF S" -" matchstring S" -" ELSE S" " ENDIF token> $+ next ELSE S" 0" ENDIF aname COUNT allocatestorage aname FREE ?ALLOCATE ; -- Parse and translate the global declarations : topdecls ( -- ) scan BEGIN 'v' =token WHILE alloc BEGIN ',' =token WHILE alloc REPEAT REPEAT ; -- Parse and translate a program ------------------------------ -- Initialize everything : init ( -- ) CLEAR Lcount [cnt]ST 0! [cnt]SType 0! CR getchar next ; : TINY11 ( -- ) init S" PROGRAM" matchstring token> prog$ PACK DROP next header topdecls S" BEGIN" matchstring prolog C" " block S" END" matchstring epilog ; -- ------------------------------------------------------------
***************************************************************** * * * COPYRIGHT NOTICE * * * * Copyright (C) 1989 Jack W. Crenshaw. All rights reserved. * * * *****************************************************************