Recently, though, I've begun to have problems, and they're the kind that send a message that you might be doing something wrong.
The whole thing came to a head when I tried to address the issue of semicolons. Several people have asked me about them, and whether or not KISS will have them separating the statements. My intention has been not to use semicolons, simply because I don't like them and, as you can see, they have not proved necessary.
But I know that many of you, like me, have gotten used to them, and so I set out to write a short installment to show you how they could easily be added, if you were so inclined.
Well, it turned out that they weren't easy to add at all. In fact it was darned difficult.
I guess I should have realized that something was wrong, because
of the issue of newlines. In the last couple of installments
we've addressed that issue, and I've shown you how to deal with
newlines with a procedure called, appropriately enough, NewLine.
In TINY Version 1.0, I sprinkled calls to this procedure in
strategic spots in the code.
It seems that every time I've addressed the issue of newlines, though, I've found it to be tricky, and the resulting parser turned out to be quite fragile ... one addition or deletion here or there and things tended to go to pot. Looking back on it, I realize that there was a message in this that I just wasn't paying attention to.
When I tried to add semicolons on top of the newlines, that was the last straw. I ended up with much too complex a solution. I began to realize that something fundamental had to change.
So, in a way this installment will cause us to backtrack a bit and revisit the issue of scanning all over again. Sorry about that. That's the price you pay for watching me do this in real time. But the new version is definitely an improvement, and will serve us well for what is to come.
As I said, the scanner we used in Part X was about as simple as one can get. But anything can be improved. The new scanner is more like the classical scanner, and not as simple as before. But the overall compiler structure is even simpler than before. It's also more robust, and easier to add to and/or modify. I think that's worth the time spent in this digression. So in this installment, I'll be showing you the new structure. No doubt you'll be happy to know that, while the changes affect many procedures, they aren't very profound and so we lose very little of what's been done so far.
Ironically, the new scanner is much more conventional than the old one, and is very much like the more generic scanner I showed you earlier in Part VII. Then I started trying to get clever, and I almost clevered myself clean out of business. You'd think one day I'd learn: K-I-S-S!
Block, which I've
reproduced below:
-- ------------------------------------------------------------
-- Parse and translate a block of statements
:NONAME ( -- )
scan
BEGIN token 'e' <>
token 'l' <> AND
WHILE CASE token
'i' OF doIF ENDOF
'w' OF doWHILE ENDOF
'R' OF doread ENDOF
'W' OF dowrite ENDOF
assignment
ENDCASE
scan
REPEAT ; IS block
-- ------------------------------------------------------------
As you can see, Block is oriented to individual program
statements. At each pass through the loop, we know that we are
at the beginning of a statement. We exit the block when we have
scanned an END or an ELSE.
But suppose that we see a semicolon instead. The procedure as
it's shown above can't handle that, because the word Scan only
expects and can only accept tokens that begin with a letter.
I tinkered around for quite awhile to come up with a fix. I found many possible approaches, but none were very satisfying. I finally figured out the reason.
Recall that when we started with our single-character parsers, we
adopted a convention that the lookahead character would always be
prefetched. That is, we would have the character that
corresponds to our current position in the input stream fetched
into the global character Look, so that we could examine it as
many times as needed. The rule we adopted was that every
recognizer, if it found its target token, would advance look to
the next character in the input stream.
That simple and fixed convention served us very well when we had single-character tokens, and it still does. It would make a lot of sense to apply the same rule to multi-character tokens.
But when we got into lexical scanning, I began to violate that simple rule. The scanner of Part X did indeed advance to the next token if it found an identifier or keyword, but it didn't do that if it found a carriage return, a whitespace character, or an operator.
Now, that sort of mixed-mode operation gets us into deep trouble
in the word Block, because whether or not the input stream has
been advanced depends upon the kind of token we encounter. If
it's a keyword or the target of an assignment statement, the
"cursor," as defined by the contents of Look, has been advanced
to the next token or to the beginning of whitespace. If, on the
other hand, the token is a semicolon, or if we have hit a
carriage return, the cursor has not advanced.
Needless to say, we can add enough logic to keep us on track. But it's tricky, and makes the whole parser very fragile.
There's a much better way, and that's just to adopt that same rule that's worked so well before, to apply to tokens as well as single characters. In other words, we'll prefetch tokens just as we've always done for characters. It seems so obvious once you think about it that way.
Interestingly enough, if we do things this way the problem that we've had with newline characters goes away. We can just lump them in as whitespace characters, which means that the handling of newlines becomes very trivial, and much less prone to error than we've had to deal with in the past.
-- ------------------------------------------------------------
-- Get an identifier
: getname ( -- )
skipwhite
Look alpha? 0= IF S" Identifier" expected ENDIF
'x' TO token
token$ C0!
BEGIN
Look >UPC token$ char+!
getchar
Look alnum? 0=
UNTIL ;
-- ------------------------------------------------------------
-- Get a number
: getnum ( -- )
skipwhite
Look digit? 0= IF S" Number" expected ENDIF
'#' TO token
token$ C0!
BEGIN
Look token$ char+!
getchar
Look digit? 0=
UNTIL ;
-- ------------------------------------------------------------
These two words are functionally almost identical to the
ones I showed you in Part VII. They each fetch the current
token, either an identifier or a number, into the global string
token$. They also set the encoded version, Token, to the
appropriate code. The input stream is left with Look containing
the first character not part of the token.
We can do the same thing for operators, even multi-character operators, with a procedure such as:
-- ------------------------------------------------------------
-- Get an operator
: getop ( -- )
Look TO token
token$ C0!
BEGIN
Look token$ char+!
getchar
Look digit? Look alpha? OR Look white? OR
UNTIL ;
-- ------------------------------------------------------------
Note that GetOp returns, as its encoded token, the first
character of the operator. This is important, because it means
that we can now use that single character to drive the parser,
instead of the lookahead character.
We need to tie these procedures together into a single procedure that can handle all three cases. The following procedure will read any one of the token types and always leave the input stream advanced beyond it:
-- ------------------------------------------------------------
-- Get the next input token
: next ( -- )
skipwhite
Look alpha? IF getname EXIT ENDIF
Look digit? IF getnum EXIT ENDIF
getop ;
-- ------------------------------------------------------------
Note that here I have put SkipWhite before the calls rather
than after. This means that, in general, the variable Look will
not have a meaningful value in it, and therefore we should not
use it as a test value for parsing, as we have been doing so far.
That's the big departure from our normal approach.
Now, remember that before I was careful not to treat the carriage
return (CR) and line feed (LF) characters as white space. This
was because, with SkipWhite called as the last thing in the
scanner, the encounter with LF would trigger a read statement.
If we were on the last line of the program, we couldn't get out
until we input another line with a non-white character. That's
why I needed the second procedure, NewLine, to handle the CRLF's.
But now, with the call to SkipWhite coming first, that's exactly
the behavior we want. The compiler must know there's another
token coming or it wouldn't be calling Next. In other words, it
hasn't found the terminating END yet. So we're going to insist
on more data until we find something.
All this means that we can greatly simplify both the program and
the concepts, by treating CR and LF as whitespace characters, and
eliminating NewLine. You can do that simply by modifying the
function White?:
-- ------------------------------------------------------------
-- Recognize white space
: white? ( char -- tf )
DUP BL =
OVER Tab = OR
OVER ^M = OR
SWAP ^J = OR ;
-- ------------------------------------------------------------
We've already tried similar routines in Part VII, but you might
as well try these new ones out. Add them to a copy of the Cradle
and call Next with the following main program:
-- ------------------------------------------------------------
-- main program
init
BEGIN
next
CR Token EMIT SPACE token$ .$
Token '.' =
UNTIL ;
-- ------------------------------------------------------------
Compile it and verify that you can separate a program into a series of tokens, and that you get the right encoding for each token.
This almost works, but not quite. There are two potential
problems: First, in KISS/TINY almost all of our operators are
single-character operators. The only exceptions are the relops
>=, <=, and <>.
It seems a shame to treat all operators as
strings and do a string compare, when only a single character
compare will almost always suffice. Second, and much more
important, the thing doesn't work when two operators appear
together, as in (a+b)*(c+d). Here the string following 'b' would
be interpreted as a single operator ")*(."
It's possible to fix that problem. For example, we could just
give GetOp a list of legal characters, and we could treat the
parentheses as different operator types than the others. But
this begins to get messy.
Fortunately, there's a better way that solves all the problems.
Since almost all the operators are single characters, let's just
treat them that way, and let GetOp get only one character at a
time. This not only simplifies GetOp, but also speeds things up
quite a bit. We still have the problem of the relops, but we
were treating them as special cases anyway.
So here's the final version of GetOp:
-- ------------------------------------------------------------
-- Get an operator
: getop ( -- )
skipwhite
Look TO Token
Look token$ char-place
getchar ;
-- ------------------------------------------------------------
Note that I still give the string token$ a value. If you're truly
concerned about efficiency, you could leave this out. When we're
expecting an operator, we will only be testing Token anyhow, so
the value of the string won't matter. But to me it seems to be
good practice to give the thing a value just in case.
Try this new version with some realistic-looking code. You should be able to separate any program into its individual tokens, with the caveat that the two-character relops will scan into two separate tokens. That's OK ... we'll parse them that way.
Now, in Part VII the function of Next was combined with procedure
Scan, which also checked every identifier against a list of
keywords and encoded each one that was found. As I mentioned at
the time, the last thing we would want to do is to use such a
procedure in places where keywords should not appear, such as in
expressions. If we did that, the keyword list would be scanned
for every identifier appearing in the code. Not good.
The right way to deal with that is to simply separate the
functions of fetching tokens and looking for keywords. The
version of Scan shown below does nothing but check for keywords.
Notice that it operates on the current token and does not advance
the input stream.
-- ------------------------------------------------------------
-- Scan the current identifier for keywords
: scan ( -- )
Token 'x' = IF token$ COUNT 0 KWlist lookup
KW->token
ENDIF ;
-- ------------------------------------------------------------
There is one last detail. In the compiler there are a few places
that we must actually check the string value of the token.
Mainly, this is done to distinguish between the different END's,
but there are a couple of other places. (I should note in
passing that we could always eliminate the need for matching END
characters by encoding each one to a different character. Right
now we are definitely taking the lazy man's route.)
The following version of MatchString takes the place of the
character-oriented Match. Note that, like Match, it does advance
the input stream.
-- ------------------------------------------------------------
-- Match a specific input string
: matchstring ( c-addr u -- )
token$ COUNT 2OVER
COMPARE IF &` CHAR-PREPEND &' CHAR-APPEND expected
ELSE 2DROP
ENDIF
next ;
-- ------------------------------------------------------------
First of all, the code for procedure Block doesn't change, though
its function does:
-- ------------------------------------------------------------
-- Parse and translate a block of statements
:NONAME ( -- )
scan
BEGIN token 'e' <>
token 'l' <> AND
WHILE CASE token
'i' OF doIF ENDOF
'w' OF doWHILE ENDOF
'R' OF doread ENDOF
'W' OF dowrite ENDOF
assignment
ENDCASE
scan
REPEAT ; IS block
-- ------------------------------------------------------------
Remember that the new version of Scan doesn't advance the input
stream, it only scans for keywords. The input stream must be
advanced by each procedure that Block calls.
In general, we have to replace every test on Look with a similar
test on Token. For example:
-- -------------------------------------------------------------
-- Parse and translate a boolean expression
:NONAME ( -- )
boolterm
BEGIN Token orop?
WHILE _push
CASE Token
'|' OF boolOR ENDOF
'~' OF boolXOR ENDOF
ENDCASE
REPEAT ; IS boolexpression
-- ------------------------------------------------------------
In procedures like Add, we don't have to use Match anymore. We
need only call Next to advance the input stream:
-- ------------------------------------------------------------ -- Recognize and translate an add : add ( -- ) next term _popadd ; -- ------------------------------------------------------------
Control structures are actually simpler. We just call Next to
advance over the control keywords:
-- ------------------------------------------------------------
-- Recognize and translate an if construct
DEFER block
: doIF ( -- )
0 0 LOCALS| L2 L1 |
next
boolexpression
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP
L1 COUNT DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP
L1 COUNT _branch0
block
token 'l' = IF L2 FREE ?ALLOCATE
next
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP
L2 COUNT _branch
L1 COUNT postlabel
block
ENDIF
L2 COUNT postlabel
S" ENDIF" matchstring
L1 FREE ?ALLOCATE
L2 FREE ?ALLOCATE ;
-- ------------------------------------------------------------
That's about the extent of the required changes. In the listing of TINY Version 1.1 below, I've also made a number of other "improvements" that aren't really required. Let me explain them briefly:
Prog and Main, and combined
their functions into the main program. They didn't seem to
add to program clarity ... in fact they seemed to just
muddy things up a little.
PROGRAM and BEGIN from the
keyword list. Each one only occurs in one place, so it's
not necessary to search for it.
CheckTable
and CheckDup, and replaced in-line code by calls to them.
This cleans up a number of routines.
StoreVariable, and put it in the parser where it
belongs. See Assignment, for example.
CompareExpression and
NextExpression.
PROGRAM, it parses the same language as
before. It's just a bit cleaner, and more importantly it's
considerably more robust. I feel good about it.
The next installment will be another digression: the discussion of semicolons and such that got me into this mess in the first place. THEN we'll press on into procedures and types. Hang in there with me. The addition of those features will go a long way towards removing KISS from the "toy language" category. We're getting very close to being able to write a serious compiler.
-- ------------------------------------------------------------
-- Variable declarations
-1 =: unknown
0 VALUE Lcount -- label counter
0 VALUE Look -- lookahead character
0 VALUE token -- encoded token
#100 =: maxentry
CREATE token$ 0 C, #256 CHARS ALLOT -- unencoded token
CREATE prog$ 0 C, #256 CHARS ALLOT -- program name
: char+! ( c addr -- ) DUP >S COUNT + C! 1 S> C+! ;
: char-place ( c addr -- ) 1 OVER C! CHAR+ C! ;
: token> ( -- c-addr u ) token$ COUNT ;
: =token ( char -- tf ) token = ;
: <>token ( char -- tf ) token <> ;
-- Symbol tables and labels -----------------------------------
-- Helper to build symbol tables
: $, ( c-addr u size -- )
>S S MIN >R
R@ C, R@ 0 ?DO C@+ C, LOOP DROP
S> R> - CHARS ALLOT ;
-- Type declarations
: SYMTAB ( size -- )
CREATE DUP 0< IF 0 , ABS DUP ,
1+ maxentry * ALLOT
ELSE HERE >R
0 , ( #items )
DUP >S , ( itemsize )
BEGIN BL <WORD> DUP
WHILE 2DUP S" \" COMPARE
0= IF 2DROP REFILL DROP
ELSE S $, 1 R@ +!
ENDIF
REPEAT 2DROP -R -S
ENDIF
DOES> CELL+ @+ 1+ ROT * + ( ix -- addr ) ;
-- Definition of keywords and token types
8 CHARS =: /symbol
/symbol SYMTAB KWlist IF ELSE ENDIF \
WHILE ENDWHILE \
DO ENDDO \
LOOP ENDLOOP \
REPEAT UNTIL \
FOR TO ENDFOR \
BREAK \
READ WRITE \
VAR END
: KW->token ( index -- ) 2+ C" xilewedeLerufteBRWve" + C@ TO token ;
/symbol NEGATE SYMTAB ST
=CELL 1- NEGATE SYMTAB SType
0 ST 2 CELLS - =: [cnt]ST
0 SType 2 CELLS - =: [cnt]SType
: lookup ( c-addr u 'table -- n2|unknown )
0 0 LOCALS| /symbol n table sz addr |
table 2 CELLS - @+ TO n @ TO /symbol
n 0<= IF unknown EXIT ENDIF
0 n DO
/symbol 1+ I * table + COUNT
addr sz COMPARE 0= IF I UNLOOP EXIT ENDIF
-1 +LOOP
unknown ;
-- Locate symbol in table
: locate ( c-addr u -- ix ) 0 ST lookup ; -- Returns -1 | the index of the entry.
: intable ( c-addr u -- tf ) 0 ST lookup 0>= ; -- Look for symbol in table
-- Dump symbol table
: .symbols ( -- )
[cnt]ST @ 0= IF CR ." No symbols defined." EXIT ENDIF
CR ." -- type --.--- name ---"
[cnt]ST @ 0 ?DO CR 5 HTAB I 1+ SType C@ EMIT
#16 HTAB I 1+ ST .$
LOOP ;
-- Tools ------------------------------------------------------
: getchar ( -- ) EKEY TO Look ; -- read new character from input stream
: error ( c-addr u -- ) CR ^G EMIT ." Error: " TYPE ." ." ; -- report an error
: aborts ( c-addr u -- ) error ABORT ; -- report error and halt
: expected ( c-addr u -- ) S" expected" $+ aborts ; -- report what was expected
: alpha? ( char -- tf ) >UPC 'A' 'Z' 1+ WITHIN ; -- recognize an alpha character
: digit? ( char -- tf ) '0' '9' 1+ WITHIN ; -- recognize a decimal digit
: alnum? ( char -- tf ) DUP alpha? SWAP digit? OR ; -- recognize alphanumeric
: orop? ( char -- tf ) DUP '|' = SWAP '~' = OR ; -- recognize an OR operand
: addop? ( char -- tf ) DUP '+' = SWAP '-' = OR ; -- test for AddOp
: mulop? ( char -- tf ) DUP '*' = SWAP '/' = OR ; -- test for MulOp
: emits ( c-addr u -- ) Tab EMIT TYPE ; -- output a string with tab
: emitdln ( c-addr u -- ) CR TYPE ; -- output a string without tab, +crlf
: emitln ( c-addr u -- ) CR emits ; -- output a string with tab and crlf
-- Recognize white space
: white? ( char -- tf )
DUP Tab =
OVER BL = OR
OVER ^M = OR
SWAP ^J = OR ;
-- Skip white space
: skipwhite ( -- ) BEGIN Look white? WHILE getchar REPEAT ;
-- Abort, reporting an offending item
: name.aborts ( c-addr1 u1 c-addr2 u2 ) &' CHAR-APPEND S" `" 2SWAP $+ $+ aborts ;
: .undefined ( c-addr u -- ) S" Undefined identifier" 2SWAP name.aborts ;
: .duplicate ( c-addr u -- ) S" Duplicate identifier" 2SWAP name.aborts ;
: checktable ( c-addr u -- ) 2DUP intable IF 2DROP EXIT ENDIF .undefined ; -- Is identifier in the symbol table?
: checkdup ( c-addr u -- ) 2DUP intable 0= IF 2DROP EXIT ENDIF .duplicate ; -- Is identifier already in symbol table?
: checkident ( -- ) 'x' <>token IF S" Identifier" expected ENDIF ; -- Is current token an identifier?
-- Generate a unique label / post it to output
: newlabel ( -- c-addr u ) S" @" Lcount U>D <# #S #> $+ 1 +TO Lcount ;
: postlabel ( c-addr u -- ) CR TYPE ':' EMIT ;
-- Recognize a relop
: relop? ( char -- tf )
DUP '=' =
OVER '#' = OR
OVER '<' = OR
SWAP '>' = OR ;
-- Match a specific input character
: match ( char -- )
DUP Look = IF DROP getchar skipwhite
ELSE S" `" ROT CHAR-APPEND &' CHAR-APPEND expected
ENDIF ;
-- Get an identifier
: getname ( -- )
skipwhite
Look alpha? 0= IF S" Identifier" expected ENDIF
'x' TO token
token$ C0!
BEGIN
Look >UPC token$ char+! getchar
Look alnum? 0=
UNTIL ;
-- Get a number
: getnum ( -- )
skipwhite
Look digit? 0= IF S" Number" expected ENDIF
'#' TO token
token$ C0!
BEGIN
Look token$ char+!
getchar
Look digit? 0=
UNTIL ;
-- Get an operator
: getop ( -- )
skipwhite
Look TO token
Look token$ char-place
getchar ;
-- Get next token of any type
: next ( -- )
skipwhite
Look alpha? IF getname EXIT ENDIF
Look digit? IF getnum EXIT ENDIF
getop ;
-- Get an identifier and scan it for keywords
: scan ( -- ) 'x' =token IF token> 0 KWlist lookup KW->token ENDIF ;
-- Match a specific input string
: matchstring ( c-addr u -- )
token> 2OVER
COMPARE IF &` CHAR-PREPEND &' CHAR-APPEND expected
ELSE 2DROP next
ENDIF ;
-- Add a new entry to symbol table
: addentry ( c-addr u type -- )
[cnt]ST LOCALS| #entries T sz addr |
addr sz checkdup
#entries @ maxentry = IF S" Symbol table full" aborts ENDIF
1 #entries +!
addr sz #entries @ ST PACK DROP
T #entries @ SType C! ;
-- Code generation --------------------------------------------
-- Load primary register with a constant
: loadconstant ( c-addr u -- ) S" d# -> eax mov," $+ emitln ;
-- Load primary register from variable
: loadvariable ( c-addr u -- )
2DUP intable 0= IF .undefined ENDIF
S" dword-ptr -> eax mov," $+ emitln ;
-- Store primary register in variable
: storevariable ( c-addr u -- )
2DUP intable 0= IF .undefined ENDIF
S" eax -> " 2SWAP $+ S" dword-ptr mov," $+ emitln ;
-- Allocate storage for a static variable
: allocatestorage ( $value $name -- )
S" CREATE " 2SWAP $+ BL CHAR-APPEND 2SWAP $+ S" , " $+ emitdln ;
: _callw ( c-addr u -- ) S" offset NEAR call," $+ emitln ; -- call a word
: _clear ( -- ) S" eax -> eax xor," emitln ; -- clear primary
: _negate ( -- ) S" eax neg," emitln ; -- negate primary
: _not ( -- ) S" eax not," emitln ; -- not primary
: _push ( -- ) S" eax push," emitln ; -- push primary to stack
: _pop ( -- ) S" eax pop," emitln ; -- pop primary from stack
: _decr ( -- ) S" 1 b# -> eax sub," emitln ; -- decrement primary
: _incr ( -- ) S" 1 b# -> eax add," emitln ; -- increment primary
: _decSP ( -- ) S" [esp -4 +] -> esp lea," emitln ; -- drop 1 stack item
: _incSP ( -- ) S" [esp 4 +] -> esp lea," emitln ; -- restore stack item
: _popadd ( -- ) S" [esp] -> eax add, [esp 4 +] -> esp lea," emitln ; -- add TOS to primary
: _popsub ( -- ) S" [esp] -> eax sub, [esp 4 +] -> esp lea," emitln _negate ; -- subtract TOS from primary
: _popor ( -- ) S" [esp] -> eax or, [esp 4 +] -> esp lea," emitln ; -- or TOS to primary
: _popxor ( -- ) S" [esp] -> eax xor, [esp 4 +] -> esp lea," emitln ; -- xor TOS to primary
: _popand ( -- ) S" [esp] -> eax and, [esp 4 +] -> esp lea," emitln ; -- xor TOS to primary
: _popcmp ( -- ) S" [esp] -> eax cmp, [esp 4 +] -> esp lea," emitln ; -- compare TOS to primary
: _sete ( -- ) S" al sete, al -> eax movsx," emitln ; -- set primary if equal
: _setne ( -- ) S" al setne, al -> eax movsx," emitln ; -- set primary if NOT equal
: _setg ( -- ) S" al setl, al -> eax movsx," emitln ; -- set primary if greater
: _setl ( -- ) S" al setg, al -> eax movsx," emitln ; -- set primary if less
: _setge ( -- ) S" al setle, al -> eax movsx," emitln ; -- set primary if greater or equal
: _setle ( -- ) S" al setge, al -> eax movsx," emitln ; -- set primary if less or equal
-- A collection of jumps
: _branch ( c-addr u -- ) S" offset NEAR jmp," $+ emitln ;
: _pcmp+b0> ( c-addr u -- ) _popcmp S" offset NEAR jg," $+ emitln ;
: _branch0 ( c-addr u -- ) S" eax -> eax or, " 2SWAP $+ S" offset NEAR je," $+ emitln ;
: _branch<>0 ( c-addr u -- ) S" eax -> eax or, " 2SWAP $+ S" offset NEAR jnz," $+ emitln ;
-- Multiply TOS and primary; divide primary by TOS
: _popmul ( -- ) S" [esp] dword mul, [esp 4 +] -> esp lea," emitln ;
: _popdiv ( -- ) S" ecx pop, ecx -> eax xchg, eax -> edx mov, #31 b# -> edx sar, ebx idiv," emitln ;
-- Read to primary register and store in variable
: readit ( c-addr u -- ) S" READ" _callw storevariable ;
-- Write from primary register
: writeit ( -- ) S" WRITE" _callw ;
-- Write assembler header info
: header ( -- )
S" -- DATA section --------" emitdln
S" " emitdln ;
-- Write assembler prologue
: prolog ( -- )
S" " emitln
S" -- CODE section --------" emitdln
S" " emitdln
S" CODE " prog$ COUNT $+ emitdln
S" rpush," emitln
newlabel postlabel ;
-- Write assembler epilogue
: epilog ( -- )
S" rpop, ebx jmp," emitln
S" END-CODE" emitdln ;
-- Expressions ------------------------------------------------
DEFER boolexpression
-- Parse and translate a math factor
: factor ( -- )
'(' =token IF next boolexpression S" )" matchstring EXIT ENDIF
'x' =token IF token> loadvariable next EXIT ENDIF
'#' =token IF token> loadconstant next EXIT ENDIF
S" Math factor" expected ;
-- Recognize and translate multiply / divide
: multiply ( -- ) next factor _popmul ;
: divide ( -- ) next factor _popdiv ;
-- Parse and translate a math term
: term ( -- )
factor
BEGIN token mulop?
WHILE _push
CASE token
'*' OF multiply ENDOF
'/' OF divide ENDOF
ENDCASE
REPEAT ;
-- Recognize and translate add / subtract
: add ( -- ) next term _popadd ;
: subtract ( -- ) next term _popsub ;
-- Parse and translate an expression
: expression ( -- )
token addop? IF _clear
ELSE term
ENDIF
BEGIN token addop?
WHILE _push
CASE token
'+' OF add ENDOF
'-' OF subtract ENDOF
ENDCASE
REPEAT ;
-- Get another / next expression and compare
: comparex ( -- ) expression _popcmp ;
: nextx ( -- ) next comparex ;
-- Recognize and translate a relational "equals" / "less or equal" / "not equals"
: equals ( -- ) nextx _sete ;
: lessorequal ( -- ) nextx _setle ;
: notequals ( -- ) nextx _setne ;
-- Recognize and translate a relational "less than"
: less ( -- )
next
CASE token
'=' OF lessorequal ENDOF
'>' OF notequals ENDOF
comparex _setl
ENDCASE ;
-- Recognize and translate a relational "greater than"
: greater ( -- )
next
'=' =token IF nextx _setge EXIT ENDIF
comparex _setg ;
-- Parse and translate a relation
: relation ( -- )
expression
token relop? 0= ?EXIT
_push
CASE token
'=' OF equals ENDOF
'<' OF less ENDOF
'>' OF greater ENDOF
ENDCASE ;
-- Parse and translate a boolean factor with NOT
: notfactor ( -- )
'!' <>token IF relation EXIT ENDIF
next relation _not ;
-- Parse and translate a boolean term
: boolterm ( -- )
notfactor
BEGIN '&' =token
WHILE _push next notfactor _popand
REPEAT ;
-- Recognize and translate a boolean OR / XOR
: boolOR ( -- ) next boolterm _popor ;
: boolXOR ( -- ) next boolterm _popxor ;
-- Parse and translate a boolean expression
:NONAME ( -- )
boolterm
BEGIN token orop?
WHILE _push
CASE token
'|' OF boolOR ENDOF
'~' OF boolXOR ENDOF
ENDCASE
REPEAT ; IS boolexpression
-- Parse and translate an assignment statement
: assignment ( -- )
token> DUP 1+ ALLOCATE ?ALLOCATE DUP LOCAL name PACK DROP
next
S" =" matchstring
boolexpression
name COUNT storevariable
name FREE ?ALLOCATE ;
-- Block statements -------------------------------------------
DEFER block
-- Recognize and translate an IF construct
: doIF ( label -- )
0 0 LOCALS| L2 L1 L |
next
boolexpression
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP
L1 COUNT DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP
L1 COUNT _branch0
L block
'l' =token IF L2 FREE ?ALLOCATE
next
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP
L2 COUNT _branch
L1 COUNT postlabel
L block
ENDIF
L2 COUNT postlabel
S" ENDIF" matchstring
L1 FREE ?ALLOCATE
L2 FREE ?ALLOCATE ;
-- Recognize and translate a WHILE construct
: doWHILE ( -- )
0 0 LOCALS| L2 L1 |
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP
next
L1 COUNT postlabel
boolexpression
L2 COUNT _branch0
L2 block
S" ENDWHILE" matchstring
L1 COUNT _branch
L2 COUNT postlabel
L1 FREE ?ALLOCATE
L2 FREE ?ALLOCATE ;
-- Parse and translate a LOOP statement
: doLOOP ( -- )
0 0 LOCALS| L2 L1 |
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP
next
L1 COUNT postlabel
L2 block
L1 COUNT _branch
L2 COUNT postlabel
S" ENDLOOP" matchstring
L1 FREE ?ALLOCATE
L2 FREE ?ALLOCATE ;
-- Parse and translate a REPEAT statement
: doREPEAT ( -- )
0 0 LOCALS| L2 L1 |
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP
next
L1 COUNT postlabel
L2 block
S" UNTIL" matchstring
boolexpression
L1 COUNT _branch0
L2 COUNT postlabel
L1 FREE ?ALLOCATE
L2 FREE ?ALLOCATE ;
-- Parse and translate a FOR statement
: doFOR ( -- )
0 0 0 LOCALS| name L2 L1 |
next
checkident token> checktable
token> DUP 1+ ALLOCATE ?ALLOCATE DUP TO name PACK DROP
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP
next S" =" matchstring expression
_decr
name COUNT storevariable
S" TO" matchstring
expression
_push
L1 COUNT postlabel
name COUNT loadvariable
_incr
name COUNT storevariable
L2 COUNT _pcmp+b0>
L2 block
L1 COUNT _branch
L2 COUNT postlabel
S" ENDFOR" matchstring
_incSP
name FREE ?ALLOCATE
L1 FREE ?ALLOCATE
L2 FREE ?ALLOCATE ;
-- Parse and Translate a DO Statement
: doDO ( -- )
0 0 LOCALS| L2 L1 |
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L1 PACK DROP
newlabel DUP 1+ ALLOCATE ?ALLOCATE DUP TO L2 PACK DROP
next
expression
L1 COUNT postlabel
_push
L2 block
_pop
_decr
L1 COUNT _branch<>0
_decSP
L2 COUNT postlabel
S" ENDDO" matchstring
_incSP
L1 FREE ?ALLOCATE
L2 FREE ?ALLOCATE ;
-- Recognize and translate a BREAK
: doBREAK ( label -- )
DUP C@ 0= IF S" No loop to break from" aborts ENDIF
COUNT _branch
next ;
-- Read a single variable
: readvar ( -- ) checkident token> checktable token> readit next ;
-- Process a read statement
: doread ( -- )
next
S" (" matchstring
readvar
BEGIN ',' =token
WHILE next readvar
REPEAT
S" )" matchstring ;
-- Process a write statement
: dowrite ( -- )
next
S" (" matchstring
expression writeit
BEGIN ',' =token
WHILE next expression writeit
REPEAT
S" )" matchstring ;
-- Recognize and translate a statement block
:NONAME ( label -- )
LOCAL L
scan
BEGIN 'e' <>token
'l' <>token AND
'u' <>token AND
WHILE CASE token
'i' OF L doIF ENDOF
'w' OF doWHILE ENDOF
'd' OF doDO ENDOF
'L' OF doLOOP ENDOF
'r' OF doREPEAT ENDOF
'f' OF doFOR ENDOF
'B' OF L doBREAK ENDOF
'R' OF doread ENDOF
'W' OF dowrite ENDOF
assignment
ENDCASE
scan
REPEAT ; IS block
-- Declarations -----------------------------------------------
-- Allocate storage for a variable
: alloc ( -- )
0 LOCAL aname
next
token> DUP 1+ ALLOCATE ?ALLOCATE DUP TO aname PACK DROP
'x' <>token IF S" Variable name" expected ENDIF
aname COUNT checkdup
aname COUNT 'v' addentry
next
'=' =token IF S" =" matchstring
'-' =token IF S" -" matchstring
S" -"
ELSE S" "
ENDIF
token> $+
next
ELSE S" 0"
ENDIF
aname COUNT allocatestorage
aname FREE ?ALLOCATE ;
-- Parse and translate the global declarations
: topdecls ( -- )
scan
BEGIN 'v' =token
WHILE alloc
BEGIN ',' =token WHILE alloc REPEAT
REPEAT ;
-- Parse and translate a program ------------------------------
-- Initialize everything
: init ( -- )
CLEAR Lcount
[cnt]ST 0!
[cnt]SType 0!
CR getchar next ;
: TINY11 ( -- )
init
S" PROGRAM" matchstring token> prog$ PACK DROP next
header
topdecls
S" BEGIN" matchstring
prolog
C" " block
S" END" matchstring
epilog ;
-- ------------------------------------------------------------
***************************************************************** * * * COPYRIGHT NOTICE * * * * Copyright (C) 1989 Jack W. Crenshaw. All rights reserved. * * * *****************************************************************