LET'S BUILD A COMPILER!

By

Jack W. Crenshaw, Ph.D.

27 August 1989

Part XIII: PROCEDURES

INTRODUCTION

At last we get to the good part!

At this point we've studied almost all the basic features of compilers and parsing. We have learned how to translate arithmetic expressions, Boolean expressions, control constructs, data declarations, and I/O statements. We have defined a language, TINY 1.3, that embodies all these features, and we have written a rudimentary compiler that can translate them. By adding some file I/O we could indeed have a working compiler that could produce executable object files from programs written in TINY. With such a compiler, we could write simple programs that could read integer data, perform calculations with it, and output the results.

That's nice, but what we have is still only a toy language. We can't read or write even a single character of text, and we still don't have procedures.

It's the features to be discussed in the next couple of installments that separate the men from the toys, so to speak. "Real" languages have more than one data type, and they support procedure calls. More than any others, it's these two features that give a language much of its character and personality. Once we have provided for them, our languages, TINY and its successors, will cease to become toys and will take on the character of real languages, suitable for serious programming jobs.

For several installments now, I've been promising you sessions on these two important subjects. Each time, other issues came up that required me to digress and deal with them. Finally, we've been able to put all those issues to rest and can get on with the mainstream of things. In this installment, I'll cover procedures. Next time, we'll talk about the basic data types.

ONE LAST DIGRESSION

This has been an extraordinarily difficult installment for me to write. The reason has nothing to do with the subject itself ... I've known what I wanted to say for some time, and in fact I presented most of this at Software Development '89, back in February. It has more to do with the approach. Let me explain.

When I first began this series, I told you that we would use several "tricks" to make things easy, and to let us learn the concepts without getting too bogged down in the details. Among these tricks was the idea of looking at individual pieces of a compiler at a time, i.e. performing experiments using the Cradle as a base. When we studied expressions, for example, we dealt with only that part of compiler theory. When we studied control structures, we wrote a different program, still based on the Cradle, to do that part. We only incorporated these concepts into a complete language fairly recently. These techniques have served us very well indeed, and led us to the development of a compiler for TINY version 1.3.

When I first began this session, I tried to build upon what we had already done, and just add the new features to the existing compiler. That turned out to be a little awkward and tricky ... much too much to suit me.

I finally figured out why. In this series of experiments, I had abandoned the very useful techniques that had allowed us to get here, and without meaning to I had switched over into a new method of working, that involved incremental changes to the full TINY compiler.

You need to understand that what we are doing here is a little unique. There have been a number of articles, such as the Small C articles by Cain and Hendrix, that presented finished compilers for one language or another. This is different. In this series of tutorials, you are watching me design and implement both a language and a compiler, in real time.

In the experiments that I've been doing in preparation for this article, I was trying to inject the changes into the TINY compiler in such a way that, at every step, we still had a real, working compiler. In other words, I was attempting an incremental enhancement of the language and its compiler, while at the same time explaining to you what I was doing.

That's a tough act to pull off! I finally realized that it was dumb to try. Having gotten this far using the idea of small experiments based on single-character tokens and simple, special-purpose programs, I had abandoned them in favor of working with the full compiler. It wasn't working.

So we're going to go back to our roots, so to speak. In this installment and the next, I'll be using single-character tokens again as we study the concepts of procedures, unfettered by the other baggage that we have accumulated in the previous sessions. As a matter of fact, I won't even attempt, at the end of this session, to merge the constructs into the TINY compiler. We'll save that for later.

After all this time, you don't need more buildup than that, so let's waste no more time and dive right in.

THE BASICS

All modern CPU's provide direct support for procedure calls, and the 80x86 is no exception. For the 80x86, the call is a CALL, and the return is RET. All we have to do is to arrange for the compiler to issue these commands at the proper place.

Actually, there are really three things we have to address. One of them is the call/return mechanism. The second is the mechanism for defining the procedure in the first place. And, finally, there is the issue of passing parameters to the called procedure. None of these things are really very difficult, and we can of course borrow heavily on what people have done in other languages ... there's no need to reinvent the wheel here. Of the three issues, that of parameter passing will occupy most of our attention, simply because there are so many options available.

A BASIS FOR EXPERIMENTS

As always, we will need some software to serve as a basis for what we are doing. We don't need the full TINY compiler, but we do need enough of a program so that some of the other constructs are present. Specifically, we need at least to be able to handle statements of some sort, and data declarations.

The program shown below is that basis. It's a vestigial form of TINY, with single-character tokens. It has data declarations, but only in their simplest form ... no lists or initializers. It has assignment statements, but only of the kind

     <ident> = <ident>

In other words, the only legal expression is a single variable name. There are no control constructs ... the only legal statement is the assignment.

Most of the program is just the standard Cradle routines. I've shown the whole thing here, just to make sure we're all starting from the same point (calls1.frt):

-- ------------------------------------------------------------
-- Variable declarations --------------------------------------

0 VALUE Look                            -- lookahead character 

CREATE ST #26 CHARS ALLOT               -- symbol table

-- Get type of symbol
: typeof ( char -- ) 'A' - ST + C@ ;

-- Look for symbol in table
: intable ( char -- ) typeof 0<> ;

-- Tools ------------------------------------------------------

: getchar  ( -- )          EKEY TO Look ;                       -- read new character from input stream 
: error    ( c-addr u -- ) CR ^G EMIT ." Error: " TYPE ." ." ;  -- report an error 
: aborts   ( c-addr u -- ) error ABORT ;                        -- report error and halt 
: expected ( c-addr u -- ) S"  expected" $+ aborts ;            -- report what was expected 
: alpha?   ( char -- tf )  >UPC 'A' 'Z' 1+ WITHIN ;             -- recognize an alpha character 
: digit?   ( char -- tf )  '0' '9' 1+ WITHIN ;                  -- recognize a decimal digit
: alnum?   ( char -- tf )  DUP alpha? SWAP digit? OR ;          -- recognize alphanumeric
: white?   ( char -- tf )  DUP  BL =  SWAP Tab =  OR ;          -- Recognize white space
: emits    ( c-addr u -- ) Tab EMIT TYPE ;                      -- output a string with tab 
: emitdln  ( c-addr u -- ) CR TYPE ;                            -- output a string with crlf 
: emitln   ( c-addr u -- ) CR emits ;                           -- output a string with tab and crlf 

-- Skip over EOL
: fin ( -- ) Look ^M = IF  getchar  Look ^J = IF  getchar  ENDIF ENDIF ;

-- Skip a comment field 
: skipcomment RECURSIVE ( -- )
        BEGIN   Look '}' <> 
        WHILE   getchar
                Look '{' = IF  skipcomment  ENDIF
        REPEAT  getchar ;

-- Skip white space
: skipwhite ( -- )
        BEGIN  Look white?
        WHILE  Look '{' = IF  skipcomment  ELSE  getchar  ENDIF
        REPEAT ;

-- Abort, reporting an offending item
: name.aborts ( c-addr1 u1 char ) 
        S" " ROT CHAR-APPEND &' CHAR-APPEND S"  `" 2SWAP $+ $+ aborts ;

: .undefined ( char -- ) >S S" Undefined identifier" S> name.aborts ;
: .duplicate ( char -- ) >S S" Duplicate identifier" S> name.aborts ;

-- Is identifier in the symbol table?
: checktable ( char -- ) 
        DUP intable    IF DROP EXIT ENDIF .undefined ; 

-- Is identifier already in symbol table?
: checkdup ( char -- ) 
        DUP intable 0= IF DROP EXIT ENDIF .duplicate ; 

-- Add a new entry to symbol table 
: addentry ( char type -- )
        OVER checkdup
        SWAP 'A' - ST + C! ;

: checkvar ( char -- ) 
        DUP checktable
        typeof 'v' <> IF  >S S" ' is not a variable" S" `" 
                          S> CHAR-APPEND 2SWAP $+ aborts  
                   ENDIF ;

-- Match a specific input character 
: match ( char -- )
        DUP Look = IF  DROP getchar skipwhite 
                 ELSE  S" `" ROT CHAR-APPEND &' CHAR-APPEND expected 
                ENDIF ;

-- Get an identifier 
: getname ( -- char )
        Look alpha? 0= IF  S" Identifier" expected  ENDIF
        Look >UPC  getchar skipwhite ;

-- Get a number 
: getnum ( -- num )
        Look digit? 0= IF  S" Number" expected  ENDIF
        Look  getchar skipwhite ;

-- Code generation --------------------------------------------

-- Load primary register from variable
: loadvariable  ( char -- ) 
        DUP checkvar
        S" " ROT CHAR-APPEND S"  dword-ptr -> eax  mov," $+ emitln ;

-- Store primary register in variable
: storevariable ( char -- ) 
        DUP checkvar
        S" eax -> " ROT CHAR-APPEND S"  dword-ptr  mov," $+ emitln ;

-- Expressions ------------------------------------------------

-- Parse and translate an expression
: expression ( -- )  getname loadvariable ;

-- Parse and translate an assignment statement
: assignment ( -- )  getname >R  '=' match  expression  R> storevariable ;

-- Block statements -------------------------------------------

: doblock    ( -- )  BEGIN  Look 'e' <>  WHILE  assignment fin  REPEAT ;
: beginblock ( -- )  'b' match fin doblock 'e' match fin ;

-- Declarations -----------------------------------------------

-- Allocate storage for a variable
: alloc ( char -- )
        >R R@ 'v' addentry 
        S" CREATE " R> CHAR-APPEND S"  0 ," $+ emitdln ;

: decl ( -- )  'v' match  getname alloc ;

-- Parse and translate the global declarations 
: topdecls ( -- )
        BEGIN   Look 'b' <>
        WHILE   CASE Look
                  'v' OF  decl  ENDOF
                          S" Unrecognized keyword" Look name.aborts
                ENDCASE
                fin
        REPEAT ;

-- Parse and translate a program ------------------------------

-- Initialize everything
: init  ( -- ) ST #26 ERASE  CR getchar skipwhite ; 
: calls ( -- ) init topdecls beginblock ;
-- ------------------------------------------------------------

Note that we do have a symbol table, and there is logic to check a variable name to make sure it's a legal one. It's also worth noting that I have included the code you've seen before to provide for white space and newlines. Finally, note that the main program is delimited, as usual, by BEGIN-END brackets.

Once you've copied the program to iForth, the first step is to compile it and make sure it works (calls1.frt). Give it a few declarations, and then a begin-block. Try something like:

     va             (for VAR A)
     vb             (for VAR B)
     vc             (for VAR C)
     b              (for BEGIN)
     a=b
     b=c
     e.             (for END.)

As usual, you should also make some deliberate errors, and verify that the program catches them correctly.

DECLARING A PROCEDURE

If you're satisfied that our little program works, then it's time to deal with the procedures. Since we haven't talked about parameters yet, we'll begin by considering only procedures that have no parameter lists.

As a start, let's consider a simple program with a procedure, and think about the code we'd like to see generated for it:

     PROGRAM FOO;
     .
     .
     PROCEDURE BAR;                     BAR:
     BEGIN                                   .
     .                                       .
     .                                       .
     END;                                    ret,

     BEGIN { MAIN PROGRAM }             MAIN:
     .                                       .
     .                                       .
     BAR;                                    BAR offset NEAR call,
     .                                       .
     .                                       .
     END.                                    END MAIN

Here I've shown the high-order language constructs on the left, and the desired assembler code on the right. The first thing to notice is that we certainly don't have much code to generate here! For the great bulk of both the procedure and the main program, our existing constructs take care of the code to be generated.

The key to dealing with the body of the procedure is to recognize that although a procedure may be quite long, declaring it is really no different than declaring a variable. It's just one more kind of declaration. We can write the BNF:

     <declaration> ::= <data decl> | <procedure>

This means that it should be easy to modify TopDecl to deal with procedures. What about the syntax of a procedure? Well, here's a suggested syntax, which is essentially that of Pascal:

     <procedure> ::= PROCEDURE <ident> <begin-block>

There is practically no code generation required, other than that generated within the begin-block. We need only emit a label at the beginning of the procedure, and a RET at the end.

Here's the required code:

-- ------------------------------------------------------------
-- Parse and translate a procedure declaration 
: doProc ( -- )
        'p' match
        getname >R fin 
        R@ 'p' addentry
        R> procheader
        beginblock
        _ret ;
-- ------------------------------------------------------------

Note that I've added a new code generation routine, _ret, which merely emits a RET instruction. The creation of that routine is "left as an exercise for the student."

To finish this version, add the following line within the Case statement in TopDecls:

            'p' OF doProc ENDOF

I should mention that this structure for declarations, and the BNF that drives it, differs from standard Pascal. In the Jensen & Wirth definition of Pascal, variable declarations, in fact all kinds of declarations, must appear in a specific sequence, i.e. labels, constants, types, variables, procedures, and main program. To follow such a scheme, we should separate the two declarations, and have code in the main program something like

     doVars 
     doProcs
     doMain

However, most implementations of Pascal don't require that order and let you freely mix up the various declarations, as long as you still don't try to refer to something before it's declared. Although it may be more aesthetically pleasing to declare all the global variables at the top of the program, it certainly doesn't do any harm to allow them to be sprinkled around. In fact, it may do some good, in the sense that it gives you the opportunity to do a little rudimentary information hiding. Variables that should be accessed only by the main program, for example, can be declared just before it and will thus be inaccessible by the procedures.

OK, try this new version out (calls2.frt). Note that we can declare as many procedures as we choose (as long as we don't run out of single-character names!), and the labels and RET's all come out in the right places.

It's worth noting here that I do not allow for nested procedures. In TINY, all procedures must be declared at the global level, the same as in C. There has been quite a discussion about this point in the Computer Language Forum of CompuServe. It turns out that there is a significant penalty in complexity that must be paid for the luxury of nested procedures. What's more, this penalty gets paid at run time, because extra code must be added and executed every time a procedure is called. I also happen to believe that nesting is not a good idea, simply on the grounds that I have seen too many abuses of the feature. Before going on to the next step, it's also worth noting that the "main program" as it stands is incomplete, since it doesn't have the label and END statement. Let's fix that little oversight:

-- ------------------------------------------------------------
-- Parse and translate a main program
: doMain ( -- )
        'b' match fin
        prolog  doBlock  epilog ;
-- ------------------------------------------------------------
.
.
.
-- ------------------------------------------------------------
-- Main Program 
: TINY ( -- ) init topdecls doMain ;
-- ------------------------------------------------------------

Note that DoProc and DoMain are not quite symmetrical. DoProc uses a call to BeginBlock, whereas DoMain cannot. That's because a procedure is signaled by the keyword PROCEDURE (abbreviated by a 'p' here), while the main program gets no keyword other than the BEGIN itself.

And that brings up an interesting question: why?

If we look at the structure of C programs, we find that all functions are treated just alike, except that the main program happens to be identified by its name, "main." Since C functions can appear in any order, the main program can also be anywhere in the compilation unit.

In Pascal, on the other hand, all variables and procedures must be declared before they're used, which means that there is no point putting anything after the main program ... it could never be accessed. The "main program" is not identified at all, other than being that part of the code that comes after the global BEGIN. In other words, if it ain't anything else, it must be the main program.

This causes no small amount of confusion for beginning programmers, and for big Pascal programs sometimes it's difficult to find the beginning of the main program at all. This leads to conventions such as identifying it in comments:

     BEGIN { of MAIN }

This has always seemed to me to be a bit of a kludge. The question comes up: Why should the main program be treated so much differently than a procedure? In fact, now that we've recognized that procedure declarations are just that ... part of the global declarations ... isn't the main program just one more declaration, also?

The answer is yes, and by treating it that way, we can simplify the code and make it considerably more orthogonal. I propose that we use an explicit keyword, PROGRAM, to identify the main program (Note that this means that we can't start the file with it, as in Pascal). In this case, our BNF becomes:

     <declaration> ::= <data decl> | <procedure> | <main program>


     <procedure> ::= PROCEDURE <ident> <begin-block>


     <main program> ::= PROGRAM <ident> <begin-block>

The code also looks much better, at least in the sense that DoMain and DoProc look more alike:

-- ------------------------------------------------------------
-- Parse and translate a main program 
: doMain ( -- )  
        'P' match  
        getname prog$ char-place fin
        prolog 
        beginblock ;
-- ------------------------------------------------------------
.
.
.
-- ------------------------------------------------------------
-- Parse and translate global declarations
: topdecls ( -- ) 
        BEGIN  Look '.' <>
        WHILE  CASE Look 
                'v' OF Decl   ENDOF
                'p' OF DoProc ENDOF
                'P' OF DoMain ENDOF
                    S" Unrecognized keyword' Look name.aborts
                ENDCASE
                fin
        REPEAT ;
-- ------------------------------------------------------------
-- Main Program
: TINY ( -- ) init topdecls epilog ;
-- ------------------------------------------------------------

Since the declaration of the main program is now within the loop of TopDecl, that does present some difficulties. How do we ensure that it's the last thing in the file? And how do we ever exit from the loop? My answer for the second question, as you can see, was to bring back our old friend the period. Once the parser sees that, we're done. (calls3.frt)

To answer the first question: it depends on how far we're willing to go to protect the programmer from dumb mistakes. In the code that I've shown, there's nothing to keep the programmer from adding code after the main program ... even another main program. The code will just not be accessible. However, we could access it via a FORWARD statement, which we'll be providing later. As a matter of fact, many assembler language programmers like to use the area just after the program to declare large, uninitialized data blocks, so there may indeed be some value in not requiring the main program to be last. We'll leave it as it is.

If we decide that we should give the programmer a little more help than that, it's pretty easy to add some logic to kick us out of the loop once the main program has been processed. Or we could at least flag an error if someone tries to include two mains.

CALLING THE PROCEDURE

If you're satisfied that things are working, let's address the second half of the equation ... the call.

Consider the BNF for a procedure call:

     <proc_call> ::= <identifier>

for an assignment statement, on the other hand, the BNF is:

     <assignment> ::= <identifier> '=' <expression>

At this point we seem to have a problem. The two BNF statements both begin on the right-hand side with the token <identifier>. How are we supposed to know, when we see the identifier, whether we have a procedure call or an assignment statement? This looks like a case where our parser ceases being predictive, and indeed that's exactly the case. However, it turns out to be an easy problem to fix, since all we have to do is to look at the type of the identifier, as recorded in the symbol table. As we've discovered before, a minor local violation of the predictive parsing rule can be easily handled as a special case.

Here's how to do it:

-- ------------------------------------------------------------
-- Parse and translate an assignment statement 
: assignment ( char -- )
        >R
          '=' match expression
        R> storevariable ;
-- ------------------------------------------------------------
-- Decide if a statement is an assignment or procedure call 
: assignorproc ( -- )
        getname >R
        CASE R@ typeof
           0  OF R> .undefined  ENDOF
          'v' OF R> assignment  ENDOF
          'p' OF R> callproc    ENDOF
                 S" Identifier " R> CHAR-APPEND 
                 S" cannot be used here" $+ aborts 
        ENDCASE ;
-- ------------------------------------------------------------
-- Parse and translate a block of statements 
: doBlock ( -- )
        BEGIN  Look 'e' <> 
        WHILE  assignorproc
               fin
        REPEAT ;
-- ------------------------------------------------------------

As you can see, procedure Block now calls AssignOrProc instead of Assignment. The function of this new procedure is to simply read the identifier, determine its type, and then call whichever procedure is appropriate for that type. Since the name has already been read, we must pass it to the two procedures, and modify Assignment to match. Procedure CallProc is a simple code generation routine:

-- --------------------------------------------------------------------
-- Call a procedure 
: callproc ( char -- ) S"  offset NEAR call, " ROT CHAR-PREPEND emitln ;
-- --------------------------------------------------------------------

Well, at this point we have a compiler that can deal with procedures. (calls4.frt) It's worth noting that procedures can call procedures to any depth. So even though we don't allow nested declarations, there is certainly nothing to keep us from nesting calls, just as we would expect to do in any language. We're getting there, and it wasn't too hard, was it?

Of course, so far we can only deal with procedures that have no parameters. The procedures can only operate on the global variables by their global names. So at this point we have the equivalent of BASIC's GOSUB construct. Not too bad ... after all lots of serious programs were written using GOSUBs, but we can do better, and we will. That's the next step.

PASSING PARAMETERS

Again, we all know the basic idea of passed parameters, but let's review them just to be safe.

In general the procedure is given a parameter list, for example

     PROCEDURE FOO(X, Y, Z)

In the declaration of a procedure, the parameters are called formal parameters, and may be referred to in the body of the procedure by those names. The names used for the formal parameters are really arbitrary. Only the position really counts. In the example above, the name 'X' simply means "the first parameter" wherever it is used.

When a procedure is called, the "actual parameters" passed to it are associated with the formal parameters, on a one-for-one basis.

The BNF for the syntax looks something like this:

     <procedure> ::= PROCEDURE <ident>
                    '(' <param-list> ')' <begin-block>


     <param_list> ::= <parameter> ( ',' <parameter> )* | null

Similarly, the procedure call looks like:

     <proc call> ::= <ident> '(' <param-list> ')'

Note that there is already an implicit decision built into this syntax. Some languages, such as Pascal and Ada, permit parameter lists to be optional. If there are no parameters, you simply leave off the parens completely. Other languages, like C and Modula 2, require the parens even if the list is empty. Clearly, the example we just finished corresponds to the former point of view. But to tell the truth I prefer the latter. For procedures alone, the decision would seem to favor the "listless" approach. The statement

     initialize; ,

standing alone, can only mean a procedure call. In the parsers we've been writing, we've made heavy use of parameterless procedures, and it would seem a shame to have to write an empty pair of parens for each case.

But later on we're going to be using functions, too. And since functions can appear in the same places as simple scalar identifiers, you can't tell the difference between the two. You have to go back to the declarations to find out. Some folks consider this to be an advantage. Their argument is that an identifier gets replaced by a value, and what do you care whether it's done by substitution or by a function? But we sometimes do care, because the function may be quite time-consuming. If, by writing a simple identifier into a given expression, we can incur a heavy run-time penalty, it seems to me we ought to be made aware of it.

Anyway, Niklaus Wirth designed both Pascal and Modula 2. I'll give him the benefit of the doubt and assume that he had a good reason for changing the rules the second time around!

Needless to say, it's an easy thing to accomodate either point of view as we design a language, so this one is strictly a matter of personal preference. Do it whichever way you like best.

Before we go any further, let's alter the translator to handle a (possibly empty) parameter list. For now we won't generate any extra code ... just parse the syntax. The code for processing the declaration has very much the same form we've seen before when dealing with VAR-lists:

-- ------------------------------------------------------------
-- Process the formal parameter list of a procedure 
: formallist ( -- )
        '(' match
        Look ')' <> 
           IF   formalparam
                BEGIN  Look ',' =  
                WHILE  ',' match formalparam 
                REPEAT
        ENDIF   
        ')' match ;
-- ------------------------------------------------------------

The word DoProc needs to have a line added to call FormalList:

-- ------------------------------------------------------------
-- Parse and translate a procedure declaration 
: doProc ( -- )
        'p' match
        getname >R
        formallist
        fin
        R@ 'p' addentry
        R> procheader
        beginblock
        _ret ;
-- ------------------------------------------------------------

For now, the code for FormalParam is just a dummy one that simply skips the parameter name:

-- ------------------------------------------------------------
-- Process a formal parameter 
: formalparam ( -- ) getname DROP ;
-- ------------------------------------------------------------

For the actual procedure call, there must be similar code to process the actual parameter list:

-- ------------------------------------------------------------
-- Process an actual parameter 
: param ( -- ) getname DROP ;
-- ------------------------------------------------------------
-- Process the actual parameter list for a procedure call 
: paramlist ( -- )
        '(' match
        Look ')' <> 
           IF   param
                BEGIN Look ',' =
                WHILE ',' match param
                REPEAT 
        ENDIF
        ')' match ;
-- ------------------------------------------------------------
-- Process a procedure call 
: callproc ( char -- )
        paramlist _call ;
-- ------------------------------------------------------------

Note here that CallProc is no longer just a simple code generation routine. It has some structure to it. To handle this, I've renamed the code generation routine to just _call, and called it from within CallProc.

OK, if you'll add all this code to your translator and try it out (calls5.frt), you'll find that you can indeed parse the syntax properly. I'll note in passing that there is no checking to make sure that the number (and, later, types) of formal and actual parameters match up. In a production compiler, we must of course do this. We'll ignore the issue now if for no other reason than that the structure of our symbol table doesn't currently give us a place to store the necessary information. Later on, we'll have a place for that data and we can deal with the issue then.

THE SEMANTICS OF PARAMETERS

So far we've dealt with the syntax of parameter passing, and we've got the parsing mechanisms in place to handle it. Next, we have to look at the semantics, i.e., the actions to be taken when we encounter parameters. This brings us square up against the issue of the different ways parameters can be passed.

There is more than one way to pass a parameter, and the way we do it can have a profound effect on the character of the language. So this is another of those areas where I can't just give you my solution. Rather, it's important that we spend some time looking at the alternatives so that you can go another route if you choose to.

There are two main ways parameters are passed:

By value
By reference (address)

The differences are best seen in the light of a little history.

The old FORTRAN compilers passed all parameters by reference. In other words, what was actually passed was the address of the parameter. This meant that the called subroutine was free to either read or write that parameter, as often as it chose to, just as though it were a global variable. This was actually quite an efficient way to do things, and it was pretty simple since the same mechanism was used in all cases, with one exception that I'll get to shortly.

There were problems, though. Many people felt that this method created entirely too much coupling between the called subroutine and its caller. In effect, it gave the subroutine complete access to all variables that appeared in the parameter list.

Many times, we didn't want to actually change a parameter, but only use it as an input. For example, we might pass an element count to a subroutine, and wish we could then use that count within a DO-loop. To avoid changing the value in the calling program, we had to make a local copy of the input parameter, and operate only on the copy. Some FORTRAN programmers, in fact, made it a practice to copy all parameters except those that were to be used as return values. Needless to say, all this copying defeated a good bit of the efficiency associated with the approach.

There was, however, an even more insidious problem, which was not really just the fault of the "pass by reference" convention, but a bad convergence of several implementation decisions.

Suppose we have a subroutine:

     SUBROUTINE FOO(X, Y, N)

where N is some kind of input count or flag. Many times, we'd like to be able to pass a literal or even an expression in place of a variable, such as:

     CALL FOO(A, B, J + 1)

Here the third parameter is not a variable, and so it has no address. The earliest FORTRAN compilers did not allow such things, so we had to resort to subterfuges like:

     K = J + 1
     CALL FOO(A, B, K)

Here again, there was copying required, and the burden was on the programmer to do it. Not good.

Later FORTRAN implementations got rid of this by allowing expressions as parameters. What they did was to assign a compiler-generated variable, store the value of the expression in the variable, and then pass the address of the expression.

So far, so good. Even if the subroutine mistakenly altered the anonymous variable, who was to know or care? On the next call, it would be recalculated anyway.

The problem arose when someone decided to make things more efficient. They reasoned, rightly enough, that the most common kind of "expression" was a single integer value, as in:

     CALL FOO(A, B, 4)

It seemed inefficient to go to the trouble of "computing" such an integer and storing it in a temporary variable, just to pass it through the calling list. Since we had to pass the address of the thing anyway, it seemed to make lots of sense to just pass the address of the literal integer, 4 in the example above.

To make matters more interesting, most compilers, then and now, identify all literals and store them separately in a "literal pool," so that we only have to store one value for each unique literal. That combination of design decisions: passing expressions, optimization for literals as a special case, and use of a literal pool, is what led to disaster.

To see how it works, imagine that we call subroutine FOO as in the example above, passing it a literal 4. Actually, what gets passed is the address of the literal 4, which is stored in the literal pool. This address corresponds to the formal parameter, K, in the subroutine itself.

Now suppose that, unbeknownst to the programmer, subroutine FOO actually modifies K to be, say, -7. Suddenly, that literal 4 in the literal pool gets changed, to a -7. From then on, every expression that uses a 4 and every subroutine that passes a 4 will be using the value of -7 instead! Needless to say, this can lead to some bizarre and difficult-to-find behavior. The whole thing gave the concept of pass-by-reference a bad name, although as we have seen, it was really a combination of design decisions that led to the problem.

In spite of the problem, the FORTRAN approach had its good points. Chief among them is the fact that we don't have to support multiple mechanisms. The same scheme, passing the address of the argument, works for every case, including arrays. So the size of the compiler can be reduced.

Partly because of the FORTRAN gotcha, and partly just because of the reduced coupling involved, modern languages like C, Pascal, Ada, and Modula 2 generally pass scalars by value.

This means that the value of the scalar is copied into a separate value used only for the call. Since the value passed is a copy, the called procedure can use it as a local variable and modify it any way it likes. The value in the caller will not be changed.

It may seem at first that this is a bit inefficient, because of the need to copy the parameter. But remember that we're going to have to fetch some value to pass anyway, whether it be the parameter itself or an address for it. Inside the subroutine, using pass-by-value is definitely more efficient, since we eliminate one level of indirection. Finally, we saw earlier that with FORTRAN, it was often necessary to make copies within the subroutine anyway, so pass-by-value reduces the number of local variables. All in all, pass-by-value is better.

Except for one small little detail: if all parameters are passed by value, there is no way for a called to procedure to return a result to its caller! The parameter passed is not altered in the caller, only in the called procedure. Clearly, that won't get the job done.

There have been two answers to this problem, which are equivalent. In Pascal, Wirth provides for VAR parameters, which are passed-by-reference. What a VAR parameter is, in fact, is none other than our old friend the FORTRAN parameter, with a new name and paint job for disguise. Wirth neatly gets around the "changing a literal" problem as well as the "address of an expression" problem, by the simple expedient of allowing only a variable to be the actual parameter. In other words, it's the same restriction that the earliest FORTRANs imposed.

C does the same thing, but explicitly. In C, all parameters are passed by value. One kind of variable that C supports, however, is the pointer. So by passing a pointer by value, you in effect pass what it points to by reference. In some ways this works even better yet, because even though you can change the variable pointed to all you like, you still can't change the pointer itself. In a function such as strcpy, for example, where the pointers are incremented as the string is copied, we are really only incrementing copies of the pointers, so the values of those pointers in the calling procedure still remain as they were. To modify a pointer, you must pass a pointer to the pointer.

Since we are simply performing experiments here, we'll look at both pass-by-value and pass-by-reference. That way, we'll be able to use either one as we need to. It's worth mentioning that it's going to be tough to use the C approach to pointers here, since a pointer is a different type and we haven't studied types yet!

PASS-BY-VALUE

Let's just try some simple-minded things and see where they lead us. Let's begin with the pass-by-value case. Consider the procedure call:

     FOO(X, Y)

Almost the only reasonable way to pass the data is through the CPU stack. So the code we'd like to see generated might look something like this:

     X dword-ptr  push,       ; Push X
     Y dword-ptr  push,       ; Push Y
     FOO offset NEAR  call,   ; Call FOO

That certainly doesn't seem too complex!

When the CALL is executed, the CPU pushes the return address onto the stack and jumps to FOO. At this point the stack will look like this (assuming a 32-bit machine):

          .
          .
          Value of X     (4 bytes)
          Value of Y     (4 bytes)
          return address (4 bytes)
ESP -->   [ ..... ]

So the values of the parameters have addresses that are fixed offsets from the stack pointer. In this example, the addresses are:

     X:  8(SP)
     Y:  4(SP)

Now consider what the called procedure might look like:

     PROCEDURE FOO(A, B)
     BEGIN
          A = B
     END

(Remember, the names of the formal parameters are arbitrary ... only the positions count.)

The desired output code might look like:

     FOO: [esp 4 +] -> eax  mov,
          eax -> [esp 8 +]  mov,
          ret,

Note that, in order to address the formal parameters, we're going to have to know which position they have in the parameter list. This means some changes to the symbol table stuff. In fact, for our single-character case it's best to just create a new symbol table for the formal parameters.

Let's begin by declaring a new table:

     CREATE params 26 CHARS ALLOT

We also will need to keep track of how many parameters a given procedure has:

     0 VALUE #params

And we need to initialize the new table. Now, remember that the formal parameter list will be different for each procedure that we process, so we'll need to initialize that table anew for each procedure. Here's the initializer:

-- ------------------------------------------------------------
-- Initialize parameter table to null
: clearparams ( -- )
        params 26 CHARS ERASE 
        CLEAR #params ;
-- ------------------------------------------------------------

We'll put a call to this procedure in Init, and also at the end of DoProc:

-- ------------------------------------------------------------
-- Initialize 
: init ( -- )
        CLEAR Lcount  
        clearST
        clearparams 
        CR getchar skipwhite ;
-- ------------------------------------------------------------
.
.
.
-- ------------------------------------------------------------
-- Parse and translate a procedure declaration 
: doProc ( -- )
        'p' match
        getname >R 
        formallist
        fin 
        R@ 'p' addentry
        R> procheader 
        beginblock
        _ret
        clearparams ;
-- ------------------------------------------------------------

Note that the call within DoProc ensures that the table will be clear when we're in the main program.

OK, now we need a few procedures to work with the table. The next few functions are essentially copies of InTable, TypeOf, etc.:

-- ------------------------------------------------------------
-- Find the parameter number 
: paramnumber ( char -- u ) 'A' - params + C@ ;
-- ------------------------------------------------------------
-- See if an identifier is a parameter 
: param? ( char -- tf ) 'A' - params + C@ 0<> ;
-- ------------------------------------------------------------
-- Add a new parameter to table 
: addparam ( char -- ) 
        DUP param? IF  .duplicate  ENDIF
        1 +TO #params
        #params SWAP 'A' - params + C! ;
-- ------------------------------------------------------------

Finally, we need some code generation routines:

-- ------------------------------------------------------------
-- Load a parameter to the primary register 
: loadparameter ( index -- )
        #params SWAP - 1+ CELLS ( offset ) >R
        S" [esp "  R> (.) $+  S"  +] -> eax  mov," $+ emitln ;
-- ------------------------------------------------------------
-- Store a parameter from the primary register 
: storeparameter ( index -- )
        #params SWAP - 1+ CELLS ( offset ) >R
        S" eax -> [esp "  R> (.) $+  S"  +]  mov," $+ emitln ;
-- ------------------------------------------------------------
-- Push the primary register to the stack 
: _push ( -- )  S" eax  push," emitln ;
-- ------------------------------------------------------------

( The last routine is one we've seen before, but it wasn't in this vestigial version of the program.)

With those preliminaries in place, we're ready to deal with the semantics of procedures with calling lists (remember, the code to deal with the syntax is already in place).

Let's begin by processing a formal parameter. All we have to do is to add each parameter to the parameter symbol table:

-- ------------------------------------------------------------
-- Process a formal parameter 
: formalparam ( -- ) getname addparam ;
-- ------------------------------------------------------------

Now, what about dealing with a formal parameter when it appears in the body of the procedure? That takes a little more work. We must first determine that it is a formal parameter. To do this, I've written a modified version of TypeOf:

-- ------------------------------------------------------------------
-- Get type of symbol
: typeof ( char -- ) DUP param? IF  DROP 'f'  ELSE  ST[n] C@  ENDIF ; 
-- ------------------------------------------------------------------

(Note that, since TypeOf now calls Param?, it may need to be relocated in your source.)

We also must modify AssignOrProc to deal with this new type:

-- ------------------------------------------------------------
-- Decide if a statement is an assignment or procedure call 
: assignorproc ( -- )
        getname >R
        CASE R@ typeof
           0  OF R> .undefined  ENDOF
          'v' OF R> assignment  ENDOF
          'f' OF R> assignment  ENDOF
          'p' OF R> callproc    ENDOF
              S" Identifier `" R> CHAR-APPEND 
              S" ' cannot be used here" $+ aborts 
        ENDCASE ;
-- ------------------------------------------------------------

Finally, the code to process an assignment statement and an expression must be extended:

-- ------------------------------------------------------------
-- Parse and translate an expression 
-- Vestigial version 
        getname 
        DUP param? IF  paramnumber loadparameter
                 ELSE  loadvariable 
                ENDIF ;
-- ------------------------------------------------------------
-- Parse and translate an assignment statement
: assignment ( char -- )
        >R  '=' match expression  R>
        DUP param? IF  paramnumber storeparameter  
                 ELSE  storevariable 
                ENDIF ;
-- ------------------------------------------------------------

As you can see, these procedures will treat every variable name encountered as either a formal parameter or a global variable, depending on whether or not it appears in the parameter symbol table. Remember that we are using only a vestigial form of Expression. In the final program, the change shown here will have to be added to Factor, not Expression.

The rest is easy. We need only add the semantics to the actual procedure call, which we can do with one new line of code:

-- ------------------------------------------------------------
-- Process an actual parameter 
: param ( -- ) expression _push ;
-- ------------------------------------------------------------

That's it. Add these changes to your program and give it a try. (calls6.frt) Try declaring one or two procedures, each with a formal parameter list. Then do some assignments, using combinations of global and formal parameters. You can call one procedure from within another, but you cannot declare a nested procedure. You can even pass formal parameters from one procedure to another. If we had the full syntax of the language here, you'd also be able to do things like read or write formal parameters or use them in complicated expressions.

WHAT'S WRONG?

At this point, you might be thinking: Surely there's more to this than a few pushes and pops. There must be more to passing parameters than this.

You'd be right. As a matter of fact, the code that we're generating here leaves a lot to be desired in several respects.

The most glaring oversight is that it's wrong! If you'll look back at the code for a procedure call, you'll see that the caller pushes each actual parameter onto the stack before it calls the procedure. The procedure uses that information, but it doesn't change the stack pointer. That means that the stuff is still there when we return. Somebody needs to clean up the stack, or we'll soon be in very hot water!

Fortunately, that's easily fixed. All we have to do is to increment the stack pointer when we're finished.

Should we do that in the calling program, or the called procedure? Some folks let the called procedure clean up the stack, since that requires less code to be generated per call, and since the procedure, after all, knows how many parameters it's got. But that means that it must do something with the return address so as not to lose it.

I prefer letting the caller clean up, so that the callee need only execute a return. Also, it seems a bit more balanced, since the caller is the one who "messed up" the stack in the first place. But that means that the caller must remember how many items it pushed. To make things easy, I've modified the procedure ParamList to be a function instead of a procedure, returning the number of bytes pushed:

-- ------------------------------------------------------------
-- Process the parameter list for a procedure  call
: paramlist ( -- #bytes )
        0 LOCAL N
        '(' match
        Look ')' <> 
           IF   param 1 +TO N
                BEGIN Look ',' =
                WHILE ',' match param 1 +TO N
                REPEAT 
        ENDIF
        ')' match 
        N CELLS ;
-- ------------------------------------------------------------

Procedure CallProc then uses this to clean up the stack:

-- ------------------------------------------------------------
-- Process a procedure call 
: callproc ( char -- ) paramlist >R _call R> _cleanstack ;
-- ------------------------------------------------------------

Here I've created yet another code generation procedure:

-- ------------------------------------------------------------
-- Adjust the stack pointer upwards by n bytes 
: _cleanstack ( #bytes -- ) 
        ?DUP IF   (.) S"  d# -> esp  add," $+ emitln  ENDIF ;
-- ------------------------------------------------------------

OK, if you'll add this code to your compiler, I think you'll find that the stack is now under control. (calls7.frt)

The next problem has to do with our way of addressing relative to the stack pointer. That works fine in our simple examples, since with our rudimentary form of expressions nobody else is messing with the stack. But consider a different example as simple as:

     PROCEDURE FOO(A, B)
     BEGIN
          A = A + B
     END

The code generated by a simple-minded parser might be:

     FOO: [esp 8 +] -> eax mov, ; fetch a
          eax push,             ; push it
          [esp 4 +] -> eax mov, ; fetch b
          [esp]     -> eax add, ; add a
          eax -> [esp 8 +] mov, ; store a
          ret,

This would be wrong. When we push the first argument onto the stack, the offsets for the two formal parameters are no longer 4 and 8, but are 8 and 12. So the second fetch would fetch A again, not B.

This is not the end of the world. I think you can see that all we really have to do is to alter the offset every time we do a push, and that in fact is what's done if the CPU has no support for other methods.

Fortunately, though, the x86 does have such support. Recognizing that this CPU would be used a lot with high-order language compilers, Intel decided to add direct support for this kind of thing.

The problem, as you can see, is that as the procedure executes, the stack pointer bounces up and down, and so it becomes an awkward thing to use as a reference to access the formal parameters. The solution is to define some other register, and use it instead. This register is typically set equal to the original stack pointer, and is called the frame pointer.

The x86 instruction set ENTER lets you declare such a frame pointer, and sets it equal to the stack pointer, all in one instruction. As a matter of fact, it does even more than that. Since this register (EBP) may have been in use for something else in the calling procedure, ENTER also pushes the current value of EBP onto the stack. It can also add a value to the stack pointer, to make room for local variables.

The complement of ENTER is LEAVE, which simply restores the stack pointer and pops the old value back into the register.

Using these two instructions, the code for the previous example becomes:

     FOO: 0 0 enter,
          [ebp 12 +] -> eax  mov, ; fetch a
          eax push,               ; push it
          [ebp  8 +] -> eax  mov, ; fetch b
          [esp] -> eax  add,      ; add a
          eax -> [ebp 12 +]  mov, ; store a
          leave,
          ret,

Fixing the compiler to generate this code is a lot easier than it is to explain it. All we need to do is to modify the code generation created by DoProc. Since that makes the code a little more than one line, I've created new procedures to deal with it, paralleling the Prolog and Epilog procedures called by DoMain:

-- ------------------------------------------------------------
-- Write the prolog for a procedure 
: procprolog ( char -- )
        S" SUBROUTINE " ROT CHAR-APPEND emitdln 
        S" 0 0  enter," emitln ;
-- ------------------------------------------------------------
-- Write the epilog for a procedure 
: procepilog ( -- )
        S" leave," emitln
        S" ret,"   emitln ;
-- ------------------------------------------------------------

The word DoProc now just calls these:

-- ------------------------------------------------------------
-- Parse and translate a procedure declaration
: doProc ( -- )
        'p' match
        getname >R 
        formallist
        fin 
        R@ 'p' addentry
        R> procprolog
        beginblock
        procepilog
        clearparams ;
-- ------------------------------------------------------------

Finally, we need to change the references to ESP in procedures LoadParameter and StoreParameter:

-- ------------------------------------------------------------
-- Load a parameter to the primary register 
: loadparameter ( index -- )
        #params SWAP - 2+ CELLS ( offset ) >R
        S" [ebp "  R> (.) $+  S"  +] -> eax  mov," $+ emitln ;
-- ------------------------------------------------------------
-- Store a parameter from the primary register 
: storeparameter ( index -- )
        #params SWAP - 2+ CELLS ( offset ) >R
        S" eax -> [ebp "  R> (.) $+  S"  +]  mov," $+ emitln ;
-- ------------------------------------------------------------

(Note that the offset computation changes to allow for the extra push of EBP.)

That's all it takes. Try this out and see how you like it. (calls8.frt)

At this point we are generating some relatively nice code for procedures and procedure calls. Within the limitation that there are no local variables (yet) and that no procedure nesting is allowed, this code is just what we need.

There is still just one little small problem remaining:

WE HAVE NO WAY TO RETURN RESULTS TO THE CALLER

But that, of course, is not a limitation of the code we're generating, but one inherent in the call-by-value protocol. Notice that we can use formal parameters in any way inside the procedure. We can calculate new values for them, use them as loop counters (if we had loops, that is!), etc. So the code is doing what it's supposed to. To get over this last problem, we need to look at the alternative protocol.

CALL-BY-REFERENCE

This one is easy, now that we have the mechanisms already in place. We only have to make a few changes to the code generation. Instead of pushing a value onto the stack, we must push an address. As it turns out, the x86 has an instruction, LEA, that does just that.

We'll be making a new version of the test program for this. Before we do anything else,

MAKE A COPY

of the program as it now stands, because we'll be needing it again later.

Let's begin by looking at the code we'd like to see generated for the new case. Using the same example as before, we need the call

     FOO(X, Y)

to be translated to:

     X d# push,             ; Push the address of X
     Y d# push,             ; Push the address of Y
     FOO offset NEAR  call, ; Call FOO

That's a simple matter of a slight change to Param:

-- ------------------------------------------------------------
-- Process an actual parameter 
: param ( -- ) S"  d# push," getname CHAR-PREPEND emitln ;
-- ------------------------------------------------------------

(Note that with pass-by-reference, we can't have expressions in the calling list, so Param can just read the name directly.)

At the other end, the references to the formal parameters must be given one level of indirection:

     FOO: 0 0 enter,
          [ebp 12 +] -> eax  mov, ; fetch address of A
          [eax] -> eax mov,       ; fetch A
          eax push,               ; push it
          [ebp  8 +] -> eax  mov, ; fetch address of B
          [eax] -> eax mov,       ; fetch B
          [esp] -> eax add,
          [ebp 12 +] -> ebx  mov, ; get address of A
          eax -> [ebx]       mov, ; store A
          leave,
          ret,

All of this can be handled by changes to LoadParameter and StoreParameter:

-- ------------------------------------------------------------
-- Load a parameter to the primary register
: loadparameter ( index -- )
        #params SWAP - 2+ CELLS ( offset ) >R
        S" [ebp "  R> (.) $+  S"  +] -> ebx  mov," $+ emitln 
        S" [ebx] -> eax  mov," emitln ;
-- ------------------------------------------------------------
-- Store a parameter from the primary register
: storeparameter ( index -- )
        #params SWAP - 2+ CELLS ( offset ) >R
        S" [ebp "  R> (.) $+  S"  +] -> ebx  mov," $+ emitln 
        S" eax -> [ebx]  mov," emitln ;
-- ------------------------------------------------------------

That should do it. Give it a try and see if it's generating reasonable-looking code (calls9.frt). As you will see, the code is hardly optimal, since we reload the address register every time a parameter is needed. But that's consistent with our KISS approach here, of just being sure to generate code that works. We'll just make a little note here, that here's yet another candidate for optimization, and press on.

Now we've learned to process parameters using pass-by-value and pass-by-reference. In the real world, of course, we'd like to be able to deal with both methods. We can't do that yet, though, because we have not yet had a session on types, and that has to come first.

If we can only have one method, then of course it has to be the good ol' FORTRAN method of pass-by-reference, since that's the only way procedures can ever return values to their caller.

This, in fact, will be one of the differences between TINY and KISS. In the next version of TINY, we'll use pass-by-reference for all parameters. KISS will support both methods.

LOCAL VARIABLES

So far, we've said nothing about local variables, and our definition of procedures doesn't allow for them. Needless to say, that's a big gap in our language, and one that needs to be corrected.

Here again we are faced with a choice: Static or dynamic storage?

In those old FORTRAN programs, local variables were given static storage just like global ones. That is, each local variable got a name and allocated address, like any other variable, and was referenced by that name.

That's easy for us to do, using the allocation mechanisms already in place. Remember, though, that local variables can have the same names as global ones. We need to somehow deal with that by assigning unique names for these variables.

The characteristic of static storage, of course, is that the data survives a procedure call and return. When the procedure is called again, the data will still be there. That can be an advantage in some applications. In the FORTRAN days we used to do tricks like initialize a flag, so that you could tell when you were entering a procedure for the first time and could do any one-time initialization that needed to be done.

Of course, the same "feature" is also what makes recursion impossible with static storage. Any new call to a procedure will overwrite the data already in the local variables.

The alternative is dynamic storage, in which storage is allocated on the stack just as for passed parameters. We also have the mechanisms already for doing this. In fact, the same routines that deal with passed (by value) parameters on the stack can easily deal with local variables as well ... the code to be generated is the same. The purpose of the offset in the x86 ENTER instruction is there just for that reason: we can use it to adjust the stack pointer to make room for locals. Dynamic storage, of course, inherently supports recursion.

When I first began planning TINY, I must admit to being prejudiced in favor of static storage. That's simply because those old FORTRAN programs were pretty darned efficient ... the early FORTRAN compilers produced a quality of code that's still rarely matched by modern compilers. Even today, a given program written in FORTRAN is likely to outperform the same program written in C or Pascal, sometimes by wide margins. (Whew! Am I going to hear about that statement!)

I've always supposed that the reason had to do with the two main differences between FORTRAN implementations and the others: static storage and pass-by-reference. I know that dynamic storage supports recursion, but it's always seemed to me a bit peculiar to be willing to accept slower code in the 95% of cases that don't need recursion, just to get that feature when you need it. The idea is that, with static storage, you can use absolute addressing rather than indirect addressing, which should result in faster code.

More recently, though, several folks have pointed out to me that there really is no performance penalty associated with dynamic storage. With the x86, for example, you shouldn't use absolute addressing anyway ... most operating systems require position independent code. And the x86 instruction

     [ebp 8 +] -> eax mov,

has exactly the same timing as

     X dword-ptr -> eax mov, .

So I'm convinced, now, that there is no good reason not to use dynamic storage.

Since this use of local variables fits so well into the scheme of pass-by-value parameters, we'll use that version of the translator to illustrate it. (I sure hope you kept a copy!)

The general idea is to keep track of how many local parameters there are. Then we use the integer in the ENTER instruction to adjust the stack pointer downward to make room for them. Formal parameters are addressed as positive offsets from the frame pointer, and locals as negative offsets. With a little bit of work, the same procedures we've already created can take care of the whole thing.

Let's start by creating a new variable, LBase:

     0 VALUE lbase

We'll use this variable, instead of #Params, to compute stack offsets. That means changing the two references to #Params in LoadParameter and StoreParameter:

-- ------------------------------------------------------------
-- Load a parameter to the primary register 
: loadparameter ( index -- )
        lbase SWAP - 2+ CELLS ( offset ) >R
        S" [ebp "  R> (.) $+  S"  +] -> eax  mov," $+ emitln ;
-- ------------------------------------------------------------
-- Store a parameter from the primary register  (call by value)
: storeparameter ( index -- )
        lbase SWAP - 2+ CELLS ( offset ) >R
        S" eax -> [ebp "  R> (.) $+  S"  +]  mov," $+ emitln ;
-- ------------------------------------------------------------

The idea is that the value of LBase will be frozen after we have processed the formal parameters, and won't increase further as the new, local variables, are inserted in the symbol table. This is taken care of at the end of FormalList:

-- ------------------------------------------------------------
-- Process the formal parameter list of a procedure 
: formallist ( -- )
        '(' match
        Look ')' <> 
           IF   formalparam
                BEGIN  Look ',' =  
                WHILE  ',' match formalparam 
                REPEAT
        ENDIF   
        ')' match fin 
        #params TO lbase 
        2 +TO #params ;
-- ------------------------------------------------------------

(We add two words to make allowances for the return address and old frame pointer, which end up between the formal parameters and the locals.)

About all we need to do next is to install the semantics for declaring local variables into the parser. The routines are very similar to Decl and TopDecls:

-- ------------------------------------------------------------
-- Parse and translate a local data declaration
: locdecl ( -- ) 'v' match  getname addparam  fin ;
-- ------------------------------------------------------------
-- Parse and translate local declarations
: locdecls ( -- n )
        0
        BEGIN  Look 'v' =
        WHILE  locdecl  1+ 
        REPEAT ;
-- ------------------------------------------------------------

Note that LocDecls returns the number of locals to DoProc.

Next, we modify DoProc to use this information:

-- ------------------------------------------------------------
-- Parse and translate a procedure declaration 
: doProc ( -- )
        'p' match
        getname >R 
        R@ 'p' addentry
        formallist
        locdecls ( #formals )
        R> SWAP ( name #formals ) procprolog
        beginblock
        procepilog
        clearparams ;
-- ------------------------------------------------------------

(I've made a couple of changes here that weren't really necessary. Aside from rearranging things a bit, I moved the call to Fin to within FormalList, and placed one inside LocDecls as well. Don't forget to put one at the end of FormalList, so that we're together here.)

Note the change in the call to ProcProlog. The new argument is the number of words (not bytes) to allocate space for. Here's the new version of ProcProlog:

-- ------------------------------------------------------------
-- Write the prolog for a procedure 
: procprolog ( char #formals -- )
        >R S" SUBROUTINE " ROT CHAR-APPEND emitdln 
        R> CELLS (.) S" 0  enter," $+ emitln ;
-- ------------------------------------------------------------

That should do it. Add these changes and see how they work (calls10.frt).

CONCLUSION

At this point you know how to compile procedure declarations and procedure calls, with parameters passed by reference and by value. You can also handle local variables. As you can see, the hard part is not in providing the mechanisms, but in deciding just which mechanisms to use. Once we make these decisions, the code to translate the constructs is really not that difficult. I didn't show you how to deal with the combination of local parameters and pass-by-reference parameters, but that's a straightforward extension to what you've already seen. It just gets a little more messy, that's all, since we need to support both mechanisms instead of just one at a time. I'd prefer to save that one until after we've dealt with ways to handle different variable types.

That will be the next installment, which will be coming soon to a Forum near you. See you then.

*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************

LET'S BUILD A COMPILER! By Jack W. Crenshaw, Ph.D. 27 August 1989