Besides being a parallel Forth, another important feature of tForth is that it tracks the documents prepared by X3J14, an American National Standards Technical Committee (ANS TC) with the mission of standardizing the Forth language. ANS Forth materialized in 1994. Several important consequences for tForth's design are discussed.
The (occam-like) parallel constructs of tForth, its optimizing compiler and server/client concept are described in detail.
The tForth kernel, sans the parallel extensions, proves easy to port to other 32-bit CPU's. This process uses the same base set of source files plus a small hardware dependent kernel file. As a first experiment, a 32-bit protected mode Forth for the Intel 386+387 and 486 chips is produced: iForth. Helped by the ANS Forth word ENVIRONMENT?, both these Forths run our standard set of examples, benchmarks and utilities -- currently a collection of over two megabytes of text files. Some details of iForth's implementation are disclosed.
The best-known feature of a transputer is probably its link interface. The links allow networking of transputers by direct point-to-point connections without any external logic. The four T800 links support bit rates of 5, 10 and 20 Mbit/s, bidirectionally, and with full automatic DMA control. For the programmer there is no handshaking or protocol to take care of. Sending messages over a link can be done with a single instruction (plus a few to set up buffer pointers and counts).
Peeking inside the chip reveals that the transputer is a stack machine. It has two three-level stacks for integer and floating-point expression evaluation. Apart from these stacks, the transputer has a workspace pointer. This is a register that points to the base of a workspace where local variables are kept. When calling a subroutine, room is allocated at the bottom (lowest address) of the workspace and the integer stack plus the old workspace pointer are copied there.
Transputer multitasking is implemented with on-chip timer and microcoded scheduling hardware. The basic item to schedule is called a process. A process needs a (potentially very small) workspace to save its state and queueing information, and for its local variables.
At any time, a process is active: being executed or on a list waiting to be executed, or it is inactive: ready to input, ready to output, or waiting until a specified time. Inactive processes do not consume any time. Input and output is not necessarily done on a hardware link, there is something called a memory channel that behaves in exactly the same way as the hardware and is used for on-chip interprocess communication.
Active processes are held in two linked lists of process workspaces, one of high priority processes and one of low priority processes. A high priority process runs until completion, or until it becomes inactive (although one may assume that to be a programming error!). A low priority process is only permitted to run for two timeslices (about 2 ms), after that it is forcibly descheduled at the next available descheduling point.
Actually, our present peephole optimizer is not simple anymore. The first approach used a state-machine that combined the current stack depth with the stack requirements of the macro to be compiled, resulting in an optimized sequence of external d(f)push, d(f)pop instructions or equivalent internal stack shuffling. We found however that almost every primitive benefited from some kind of special optimization that was very hard to generalize (e.g addition is commutative, subtraction is not, ROT is faster using a local etc.). We ended up putting a full-scale stack optimizer in every primitive. The speedup is about 50% over the state machine approach. However, the transputer is extremely hostile to the stack concept (a dpush or dpop costs about 8 bytes!) and we suspect that for other CPU's these extreme measures will not be necessary. For iForth we do not keep the top of the stack in registers and (therefore) perform much less optimization, nevertheless iForth (on a 33 MHz '386) is about 3 times faster than tForth (on a 25 MHz T800).
A subroutine-threaded Forth without macro expansion is really not much faster than for instance a direct threaded system. But if words like DUP + 2+
are implemented as macro's, what do you do when the user accesses them in interpret mode: 13 DUP . .
? And what happens when a macro is ticked: ' DUP routine ! routine @ EXECUTE
? Although the first problem could be tackled by compiling input into a buffer and execute it there (watch out for nested interpreters!), the second problem cannot. Thus, we have decided to implement a shadow definition for every macro. The shadow is directly executable and can be ticked and thus indirectly compiled, although that will result in suboptimal code. Macro's are identified by a special bit in their header, which is used by ' and FIND to pick the right definition, depending on STATE. Words like COMPILE [COMPILE] POSTPONE and COMPILE, need to be really smart when a user specifies a macro word as their argument. The easy way out is to always compile the slow non-macro definition, but tForth does extra processing, again relying on STATE, to optimize these cases.
Considering the fact that a transputer context switch is in effect a change of the workspace, we decided to put all of the Forth administration in the workspace. For tForth this means all of its five stacks (data, return, system, floating-point and locals) plus the user table. The stack pointers are implemented as user variables and are thus also located in the workspace. This means that a switch between Forth processes has no additional overhead as it involves a simple swap of workspace pointers, which is accomplished by the scheduling hardware automatically.
A transputer call instruction changes the workspace pointer. When all Forth stacks and the user area are in the workspace, it is clearly unacceptable (and unnecessary) that the workspace moves around when Forth words call each other. We solved this by starting every single piece of code with an instruction that ``undoes'' the workspace adjustment caused by a call. This creates a small problem for the occasional Forth word that in a subroutine-threaded model effectively ``jumps'' into a piece of code (for instance EXECUTE). All affected words must simulate a workspace adjustment before they jump.
When discussing transputer hardware scheduling we mentioned that a low priority process is only permitted to run for two timeslices (about 2 ms) before it is forcibly descheduled at the next available descheduling point. This descheduling point is an almost Forth-like concept resembling PAUSE. A process can prevent descheduling if it does not use two special instructions, lend (loop end) and j (an unconditional branch). Of course it must also avoid becoming inactive. We kept this optionally non-cooperating feature in tForth by not using the special loop instruction (it is incompatible with +LOOP anyway) and by clearly documenting which tForth control flow words (AGAIN REPEAT, but not ELSE) use the special unconditional branch instruction. Also, we provide alternative words that perform the same action but cannot be descheduled. This gives the programmer full dynamic control over the priorities of Forth processes, using PAUSE-like words of his own design.
A transputer Forth conforming to the Forth 83 standard is a bad idea, as that explicitly requires the use of a 16-bit model [9]. The T4 and T8-type transputers can not access this data type directly; they only handle bytes and words efficiently (it is no problem for the 16-bit T2). The ANS Forth does not specify a virtual machine, and only requires that the size of a cell is an integral multiple of the size of a character, which should have a minimum of 8 bits (data-stack elements, return-stack elements, addresses and single-precision numbers are all one cell wide). A natural choice for the transputer is thus a byte-sized character and a 32-bit cell.
Of course, there is no such thing as a free lunch. The extra flexibility and speed is paid for with alignment problems. These have now become the responsibility of the programmer (if he cares about transportability, that is). To a transputer a cell is not equivalent to four bytes in a row, as a cell address should be a multiple of four. ANS Forth specifies the words CELL+ CELLS CHAR+ CHARS ALIGN and ALIGNED for use with , C, ALLOT etcetera. Furthermore CREATE must guarantee that aligned addresses are passed into DOES> code. Stack comments specifying addresses now mention if these addresses should be aligned.
To a user of tForth alignment problems are hardly visible. The only problem we frequently encountered is when using CREATE to build records of mixed datatypes and forgetting to use strategically placed ALIGN words (This happened often when transporting quick hacks from iForth to tForth. iForth uses 32-bit cells too, but the hardware is able to fetch a word from an unaligned address so a programming error goes unnoticed). Therefore special attention must be paid to the mixed use of C, and , and to the compilation of strings. Floating-point numbers must also be aligned (single and double-precision IEEE numbers plus the processor-specific internal format must be supported).
The problems become really interesting when the hardware does not support bytes and thus CHAR+ is not equivalent to 1+ anymore. We expect to find a lot of invalid assumptions in our metacompiler code once we get to build a TMS320C30 target (This DSP chip treats everything as a 32-bit item and so CHAR+ and CELL+ must be equivalent).
: / S" FM/MOD NIP " EVALUATE ; IMMEDIATE : MOD S" FM/MOD DROP " EVALUATE ; IMMEDIATEThe code shown will not obstruct the workings of the optimizer.
an extensible structure that contains definitions and associated data space. The form of this structure is not specified by the standard, but it may be described as consisting of three logical parts: ordered word list(s) that may be searched for word names, a code space where the actions of the definitions are stored, and a data space. Of these, only data space may be directly addressed by a standard program.
By this definition, words like PFA, NFA, CFA etcetera are no longer required, and an implementation can use any algorithm it sees fit to organize the word headers and the information stored therein. Furthermore, it means special threading techniques and in-line code generation are now legal. For a user, it means something like : FOO [ ' BAR , ] ;
is no longer considered portable code (if it ever was). Likewise, 3 CONSTANT FUBAR 5 FUBAR 2+ !
must be frowned upon. Regrettably, forbidding access to word headers and vocabulary internals makes illegal some elegant OOP techniques [10].
tForth uses the relaxed rules of ANS Forth to implement a subroutine-threaded Forth where some primitives are expanded in-line and optimized at macro edges for optimal stack access. The headers are separated from the code (this makes implementing locals much easier) and use a binned hashing technique with 256 threads for every vocabulary (or wordlist). The header table contains extra information to aid the optimizer (more flags) and contains an additional pointer to optional forget code that is executed when FORGET walks the complete set of wordlists and removes user definitions one by one.
Header: flags(word) forget(ptr) hash(ptr) code(ptr) link(ptr) name(token)
VOCABULARY one ALSO one DEFINITIONS : hello ." hello!" ; 1000 ALLOT PREVIOUS VOCABULARY two ALSO two DEFINITIONS : hello2 ." hello2!" ; 1000 ALLOT PREVIOUS ALSO one DEFINITIONS : hello3 ." hello3!" ; FORGET hello ( or even FORGET one . Try it on your system... )
COMPILE DUP
do not work, or have to be written [COMPILE] DUP
. POSTPONE has been invented to circumvent the IMMEDIATE problem, although there are still some pathological cases where it cannot do what you want. The famous example is COMPILE [COMPILE] IF
.
COMPILE, provides a portable alternative for the common coding practice: [ ' DUP ] LITERAL ,
. It should now be written ['] DUP COMPILE,
but again, it will only work if the programmer knows the ticked word is not IMMEDIATE.
Sequential files do not have this problem because most OS's give out unique fid's when a file is multiple accessed (it therefore is environmentally dependent what happens).
A channel in above description can be a hardware bi-directional link, of which there are a limited number (between 2 and 4), or a memory channel, of which there can be an almost unlimited number (they only need one word of memory). Of course, processes running on different transputers can only communicate using hardware links. Therefore the ideal, writing code to run on n processes without needing knowledge of the physical location of these n processes, is unattainable in practice. The new generation of transputers features an unlimited number of virtual channels, that is, the link messages are multiplexed and routed automatically [14]. This eliminates the differences between a memory channel and a hardware link, and thus the difference between local and remote processes.
The channels LINK0 LINK1 LINK2 LINK3 and EVENT are pre-defined. The first four refer to the hardware channels that connect the transputers to each other, opposed to the memory channels defined using CHANNEL. These hardware channels behave identically to those defined using CHANNEL down to the machine code level, as far as a single input or output action is concerned. The EVENT channel is special in that it is read only. Nothing but external circuitry is able to restart the process thus descheduled. The data read is garbage and should be thrown away. Note that, although a link is bi-directional and thus is equivalent to two separate memory channels, it is possible to refer to both with the same name. tForth's high-level channel i/o words sort out the hardware details for themselves. As an example, the following syntax is correct:
LINK1 CHANNEL-C@ 3 + LINK1 CHANNEL-C!This reads a byte of link1, adds three to it and sends it back over the same link.
The word CHANNEL-@ fetches a word from a channel, CHANNEL-! stores a word to a channel. They are similar to @ !. As already used in bove, CHANNEL-C@ CHANNEL-C! fetch and store bytes, similar to C@ C!. Finally we have CHANNEL-SEND CHANNEL-RECEIVE. Both accept a string address plus count and a channel. CHANNEL-SEND sends count bytes located at string address over the channel. CHANNEL-RECEIVE receives count bytes from the channel and stores them at string address. Note that the number of bytes have to agree. Normally they will be communicated by a separate CHANNEL-@ CHANNEL-! exchange.
In order to make an hardware link equivalent to a memory channel, it is sometimes useful to hide the fact that the name of a link corresponds to two memory addresses, one for input, the other for output. Note that use of the CHANNEL-xx words already guarantees this automatically, but in some situations it is better to have explicit conversion words. >INPUT-CHANNEL and >OUTPUT-CHANNEL convert a channel addres to the address needed for the transputer hardware in, and out, instructions.
SELECT [channel] GUARD [code] ENDGUARD ... [channel] [boolean] ?GUARD [code] ENDGUARD ... [end] [start] REPLICATE [channel] GUARD [code] ENDGUARD ... [end] [start] REPLICATE [channel] [boolean] ?GUARD [code] ENDGUARD ... ENDSELECTHere ``[channel]'' is used to describe an arbitrary series of tForth words that should result in a channel address put on the data stack, for use by GUARD.
The word ``[code]'' describes any tForth code. This [code] is executed on the condition that something can be read from the channel GUARDed.
Again, the ``[boolean]'' can be generated by any sequence of tForth words. ?GUARD needs a boolean true and a ready channel to activate its code.
The REPLICATE statement should be seen as DO LOOP in disguise. Between REPLICATE and GUARD, the loop index I can be used to compute channel addresses for GUARD. If one of these computed channels becomes ready, the corresponding index is pushed on the data stack and the [code] after GUARDis executed.
Time is cyclic. Whenever a clock register reaches the maximum positive number fitting in a CELL , it ``increments'' to MININT. Time values must be considered unsigned values and manipulated with unsigned operators like U< and U>. As tForth pays no attention whatsoever to the error flag, it is perfectly okay to use + to add time values together.
A disadvantage of the internal clocks is that their intervals depend on the process priority. Therefore tForth provides the words >MS and >TICKS that work independent of this priority. The words accept or output data in millisecond format, like the ANSI standard word MS. With >MS a count in ``timer ticks'' is translated to real elapsed time in milliseconds. With >TICKS, a real time in milliseconds is translated to timer ticks. Both words determine the priority of their callers to decide on the conversion factor.
SELECT [timeout] TIMEOUT [code] ENDGUARD ... [timeout] [boolean] ?TIMEOUT [code] ENDGUARD ... [end] [start] REPLICATE [timeout] TIMEOUT [code] ENDGUARD ... [end] [start] REPLICATE [timeout] [boolean] ?TIMEOUT [code] ENDGUARD ... ENDSELECTHere ``[timeout]'' is used to describe an arbitrary series of tForth words that should result in an unsigned number being put on the data stack, for use by TIMEOUT or ?TIMEOUT. This number describes a time interval in timer ticks.
The word ``[code]'' describes any tForth code. This [code] is executed on the condition that the specified time interval has elapsed.
Again, the ``[boolean]'' can be generated by any sequence of tForth words. ?TIMEOUT needs a boolean true and a counted down time interval to activate its code.
The REPLICATE statement was described with GUARD, above.
An example of the above follows
\ Needed now tForth REQUESTS for keys to be send. : KEY-REQUEST {{ ITERM! 0 BOOTLINK @ CHANNEL-C! }} ; : X6 CR ." Type any key to continue (you have a few seconds...)" KEY-REQUEST SELECT BOOTLINK @ GUARD BOOTLINK @ CHANNEL-C@ CR ." You typed '" EMIT ." '" ENDGUARD 100000 TIMEOUT CR ." Time out, press a key " BOOTLINK @ CHANNEL-C@ DROP ENDGUARD ENDSELECT ;An example of REPLICATE
: X5-CALCULATE-TIMEOUT 7 = IF 10000 ELSE 100000 THEN ; : X5 SELECT 10 0 REPLICATE I X5-CALCULATE-TIMEOUT TIMEOUT 7 <> ABORT" Failure in X5" ENDGUARD ENDSELECT ;
To simulate parallel tasks on a single-processor system, executing the instructions of one task is interleaved with running all other tasks. The transputer hardware handles this automatically. This ``slicing'' is completely transparent to the user: there are no special requirements for the use of data structures or program control statements; a task or process may be structured just like any other definition. However, special syntax exists to create, start and stop processes.
The transputer supports running a task at two levels of priority [3]. The hardware maintains four queues of active processes, two for high, and two for low priority processes. A process can be in one of four possible states: executing; waiting to execute, which implies that it is in one of the two active process queues; waiting for a timer event, which implies it is in one of the two timer queues, or waiting for a communication event, in which case it is in no queue. A high priority process will execute without interruption until it terminates, or waits for a timer or communication event to take place. In this case, if there are any further high priority processes waiting to proceed, then the process at the head of the high priority active queue will be scheduled. If there are no high priority processes waiting to execute, then the next waiting low priority process will be scheduled. Low priority processes may be pre-empted at any time by a high-priority process that becomes ready to run. Low priority processes are time-sliced; if a low priority process executes a j, or lend, instruction, and has been executing for more than its time-slice period, it is descheduled and placed at the back of the low priority active queue, with the process at the head of the queue commencing execution.
We already mentioned that tForth uses a combination of time-slicing and a method involving the word PAUSE. A process can immediately suspend its execution by calling PAUSE.
The necessary steps to be able to run a task concurrently are (1) create a workspace and (2) pass the wanted priority plus the address of the newly allocated workspace plus the machine address of the code to be run to the transputer hardware. As already mentioned, the code must contain an infinite loop or execute the STOP statement when it finishes. Because the machine executable address of a colon definition is not equivalent to its Forth execution token, the word RUN is available as a conversion operator.
We'll describe one of the two kinds of asynchronous concurrent processes supported by tForth. Methods are discussed to create, destroy, run and stop processes. Some simple examples will be given.
The word GETPRIORITY assesses the priority level of the calling task. Its dual, SETPRIORITY , changes the priority level of the caller, at the same time leaving the current priority level on the data stack. A guaranteed feature of this word is that it will not cause descheduling.
We needed SETPRIORITY for the tForth SEMAPHORE concept. It was very difficult to find a way to do priority switching. The transputer instruction set does not support it directly, probably because there is no use for it in an occam context.
There is no compile-time parameter to FORTH-PROCESS as the size of stacks and user area is fixed. A Forth task needs about 8 KByte of memory to run.
A feature of the word created with FORTH-PROCESS is that it not merely returns the address of the allocated workspace, but that it will actually start a concurrent Forth process when passed a priority and a machine executable address. When the process executes STOP the transputer hardware automatically writes data at negative offsets of the workspace pointer in order to allow a restart of the process. This information is used by RERUN to let the process continue just after the point where it left off.
An example:
FORTH-PROCESS process VARIABLE count : counter BEGIN \ begin ... again allows RERUN 1000 0 DO I count +! LOOP STOP \ do one iteration only AGAIN ; LO-PRIO ' counter RUN process count ? \ result of the first iteration ' process RERUN count ? \ result of the second iteration
FORTH-PROCESS process ' process ( cfa) 'PID ( PID) H. ( might print $80031B00)Do not confuse 'PID and PID. The tForth word PID returns the process identifier of the interpreter or main process. The server will try to put the workspace of the main process in transputer on-chip RAM, so PID will in most cases be very nearly equal to MININT.
Stopping a Forth process is accomplished by having it execute STOP or stopp,. A machine process can only stop by executing stopp,.
Killing a process is the operation of forcing a process to execute STOP or stopp,. It probably should be avoided, but it has its uses, especially when FORGETting code containing process definitions (tForth already uses KILL for this purpose internally).
KILL needs an execution token on the stack. It will try to stop the process that is identified by this token, removing the process from all the process queues. It works by redirecting the stored instruction pointer of the process to code that executes stopp,, and making sure the process runs at least one more time to execute that instruction. There is a check if the word corresponding to the execution token is created by a FORTH-PROCESS, but it can not be 100% fool-proof. If it fails, a random word is stored at a random memory address.
There are difficulties with killing processes:
Example usage: ' name KILL
In tForth's PAR construct all started concurrent processes are forced to wait until each and every process has finished and the main line of sequential processing can continue. The ``waiting to be finished'' bit, the synchronization, is the only feature discerning them from a set of asynchronous concurrent processes as described in the previous paragraphs. Of course there are superficial differences too, like that it is not necessary for the user to allocate named workspaces for each and every process needed in a PAR. Allocation and deallocation code is generated by the compiler automatically.
As a quick preview, inside a colon definition a standard PAR will look as follows:
VARIABLE gorilla 1 gorilla ! VARIABLE bananas 111 bananas ! : ZOO PAR STARTP -24 bananas +! ENDP STARTP 1 gorilla +! ENDP ENDPAR CR bananas ? gorilla ? ;Between the PAR and ENDPAR two processes are started in parallel. Their routines are specified between STARTP and ENDP . This example carefully avoids the problems that arise when in the course of their execution the processes need to modify the same variable.
Occam splits of n-1 tasks if asked to run n processes. The root process takes over one of the remaining parallel tasks and runs it in its own workspace. This would be wrong if one of the parallel tasks wants to start its own set of parallel jobs: then two or more processes (at different ``task depths'') will end up with the same workspace and overwrite each other's data!
tForth therefore starts up exactly n NEW processes for each n PAR . The root process executes endp, after it has allocated and started the n tasks.
To define a high-level parallel construct four words are available: PAR, STARTP, ENDP and ENDPAR.
PAR marks the start of a list of sub-processes. The routine definition for each sub-process is enclosed by STARTP and ENDP. These two words generate code that takes care of workspace allocation and deallocation. ENDPAR marks the end of a list of processes. It compiles code that waits until all of the processes that are in the list of processes have terminated and makes sure sequential processing proceeds with the same priority level in effect as before the PAR was started.
Example usage, a parallel multiplier:
\ First create a parameter area. Each parallel process can access this area. \ item#: 0 1 2 3 4 5 CREATE data 0 , 11 , 12 , 123 , 3311 , 0 , : TH data []CELL ; ( index -- address ) : ZOTZ PAR STARTP \ multiply items 1 and 2 and store at 0 1 TH @ \ get item 1 2 TH @ \ get item 2 * \ multiply 0 TH ! \ store at item 0 ENDP STARTP \ multiply items 3 and 4 and store at 5 3 TH @ \ get item 3 4 TH @ \ get item 4 * \ multiply 5 TH ! \ store at item 5 ENDP ENDPAR ; PREVIOUS ZOTZ CR 0 TH ? 5 TH ?
Example: : WOOF ( -- ) PAR 12 21 2 :I STARTP + {{ . SPACE }} ENDP 3e PI 2 :F STARTP F+ E. ENDP ENDPAR .S ; \ After WOOF , tForth prints: \ FORTH> WOOF 33 6.141593E0 \ Data: --- \ System: --- \ Float: --- ok \ FORTH>
[limit] [start] TIME makes sure the next code fragment that is inside a PAR ... ENDPAR construct is executed [limit] - [start] +1 times. TIME has the same functionality as DO. The next STARTP command will act as the corresponding LOOP statement. The loop index is available using I as with any other DO ... LOOP
command. You can pass items to the I-th sub-process using :I.
Example:
: FOO PAR 3 0 TIMES I 2* I 1+ I DUP * 3 :I STARTP + + . ENDP ENDPAR ; \ Execution of FOO generates the following output: \ FORTH> FOO 1 5 11 ok
In a parallel effort, we are working on kernel ports to the Intel '386+387 and '486 chips (Ed: now finished), the Motorola 68000 series and the Texas Instruments TMS320C30 (Ed: finished) and '40 chips. Porting problems with byte-order and character size will result in incremental touch-ups of our present metacompiler, trying to make it truly universal.
[1] IMS T800 Architecture; Technical Note 6. INMOS Ltd, Bristol, 1987. [2] Transputer Instruction Set: a compiler writer's guide. INMOS Limited, Prentice Hall International (UK) Ltd 1988. [3] Inside The Transputer. D.A.P. Mitchell, J.A. Thompson, G.A. Manson, G.R. Brooks, Blackwell Scientific Publications, 1990 [4] User Guide Transputer Education Kit; Theory of Operation, Installation, Schematics. Computer System Architects. [5] A tutorial introduction to OCCAM programming. Dick Pountain, INMOS Ltd. 1987 [6] draft proposed American National Standard for Information Systems - Programming Languages - Forth (X3J14 dpANS-2 - Aug 3, 1991) Secretariat Computer and Business Equipment Manufacturers Association, 311 First Street, N.W., Suite 500, Washington, DC 20001-2178 [7] Comments received during the first public review of BSR X3.215.199x, dpANS Forth. Secretariat CBEMA. [8] dpANS-3, Changed sections only, by John Rible. Secretariat CBEMA. [9] Forth-83 Standard. A publication of the Forth Standards Team, P.O. box 4545 Mountain View, CA 94040, USA, August 1983 [10] Object-Oriented Forth; Implementation of Data Structures. Dick Pountain, Academic Press. [11] Faster Forth; Reducing overhead in threaded interpretive languages. Ronald L. Greene, BYTE, June 1984. [12] A Fast Forth for the 68000, Lori Chavez. Dr. Dobb's Journal, October 1987. [13] Implementing Forth on the 80386. John E. Lecky, JFAR Vol 5, No 1, 1987 [14] The T9000 Transputer; Product Overview, INMOS Ltd, Bristol, 1991