Last update: Sat Jan 16, 1999, debugged networking code, realistic example.
Last update: Tue Dec 29, 1998, removed confusing co-operating processes stuff.
Last update: Mon Dec 28, 1998.
author: Marcel Hendrix, e-mail

Abstract

This paper discusses a parallel Forth implementation under Windows NT or Linux; iForth version 1.11 (not yet released.)

iForth is a very complete (almost) ANS Forth. ANS Forth materialized in 1994. Several important consequences to iForth's design are discussed.

The (occam-like) parallel constructs of iForth are described in detail.

You can look at the current implementation as packaged in a Forth text file. Users of iForth and other not too brain-damaged Forths can actually compile and run this file and its examples when they implement the extra SYSCALLs described here. Very old iForths and all other Forths must implement the code shown on the pipes and socks page first. SECURE must be OFF .

Here is the pipi example (parallel computation of extremely large numbers).

Introduction

iForth is heavily influenced by tForth, a parallel 32-bits Forth for the INMOS transputer. tForth was developed by five people over a period of two years. The tForth team called themselves the Dutch Transputer Forth Workshop, but later renamed to the The Dutch Forth Workshop. The project was started and partly sponsored by the Dutch FIG branch, at that time headed by Hans Nieuwenhuijsen.

tForth was used to metacompile the first iForth, a private project of one the tForth developers, around 1991. iForth was continually developed after that, with the purpose of creating a very fast, cross-platform, down-to-the-metal ANS Forth. Parallel processing was not on the wishlist originally.

When SGS-Thomson announced the official end of the transputer era to be December 1998 (last possibility to order, December 1999 is the real final date), it was decided to try to incorporate tForth's high-level constructs in iForth. This is partly a respectful salute to a great idea, partly a move to salvage an enormous amount of work, and partly a very good way to make available the power of the current dual and quad Pentium-II CPU boards in a high-level language.

When a program uses so-called "threads", Windows NT workstation will automatically distribute these over up to four CPUs. An important difference with tForth's transputer processes is that all threads run in the same shared memory space. As tForth was designed for networked CPUs it did not assume the shared memory feature and relied completely on C.A.R. Hoare's CSP concept. This means that iForth's parallel constructs only need a replacement for the transputer hardware "channels" to function under NT or Linux. Shared memory is a feature that can be used when a program does not need to run on networked CPUs.

The state of development is that

most tForth concepts run under Windows NT
:I , :F , TIMES and REPLICATE are not supported
transputer channels and links are emulated with sockets (no named pipes anymore)
transputer processes are emulated with NT threads

Parallel Forth

Links and channels

The parallel processes enabled by the transputer hardware communicated by the use of channels [2,3,5]. As the following description shows, these channels can be emulated rather easily with standard sockets or named pipes. The communication regulates data transfer as well as decides which process is to run. A communication proceeds along a channel that connects two processes, the output and the input process. The output process stops until it is able to hand its data to the input process. The input process stops until the process on the other side of the channel is ready to deliver its data. In principle, there is no limit to the number of channels connected to a process, but there is always just one input and one output process that relates to a channel.

For the transputer, a channel in above description could be a hardware bi-directional link, of which there are a limited number (between 2 and 4), or a memory channel, of which there can be an almost unlimited number (they only need one word of memory for the transputer, seven words for NT/Linux). Of course, processes running on different processors can only communicate using hardware links (on the transputer) or using TCP/IP over an ethernet connection (iForth). Therefore the ideal, writing code to run on n processes without needing knowledge of the physical location of these n processes, was unattainable with tForth. As iForth knows no links, and each memory channel is (potentially) a hardware connection to a remote location, the difference between local and remote processes is less. However, distributing parts of a big program over a network to separate processors and correctly wiring up all the input and output channels stays a non-trivial job. It is felt that in such a situation having a live interpreter on every computing node will be a big help and a great advantage of iForth over other solutions.

Channel Input and Output operations

A channel can be seen as an address representing an hardware link.

The word CHANNEL-@ fetches a word from a channel, CHANNEL-! stores a word to a channel. They are similar to @ and !. CHANNEL-C@ CHANNEL-C! fetch and store bytes, similar to C@ C!. Finally we have CHANNEL-SEND CHANNEL-RECEIVE. Both accept a string address plus count and a channel. CHANNEL-SEND sends count bytes located at string address over the channel. CHANNEL-RECEIVE receives count bytes from the channel and stores them at string address. Note that the number of bytes have to agree. Normally they will be communicated by a separate CHANNEL-@ CHANNEL-! exchange.

Language constructs using channels

The SELECT statement in iForth allows a process to make a choice over its future behavior dependent on the readiness of other concurrent processes to communicate with it over a channel. Several input sources are possible for this construct. We will concentrate here on those components directly using channels. An informal description follows:

 SELECT
	[channel] GUARD [code] ENDGUARD
	...
	[channel] [boolean] ?GUARD [code] ENDGUARD
 ENDSELECT

Here "[channel]" is used to describe an arbitrary series of iForth words that should result in a channel address put on the data stack, for use by GUARD.

The word "[code]" describes any iForth code. This [code] is executed on the condition that something can be read from the channel GUARDed.

Again, the "[boolean]" can be generated by any sequence of iForth words. ?GUARD needs a boolean true and a ready channel to activate its code.

Time

Time is not as fine-grained as on the transputer, both NT and Linux have a resolution of around 10 ms. Time starts when the OS boots and rolls over after about 500 days up-time.

Time is cyclic. Whenever the clock register reaches the maximum positive number fitting in a CELL , it "increments" to MININT. Time values must be considered unsigned values and manipulated with unsigned operators like U< and U>. As Forth pays no attention to overflow, it is perfectly okay to use + to add time values together.

All time intervals in iForth are expressed in MS (the transputer needed "TICKS").

Reading the time

The present value of the relevant timer register can be read with ?MS. The time since switch-on is returned in milliseconds (iForth does not attempt to provide a real time-of-day clock.)

Waiting

It is possible to halt a process for a specified number of milliseconds. The word to use is MS . Maybe we should stress that this word only delays the process calling it, not the CPU itself; other processes and channel i/o will keep on running. The maximum delay that can be specified is around 50 days.

Language constructs using TIME

The SELECT statement in iForth allows a process to make a choice over its future behavior dependent on signals from the system timer. We will concentrate here on those components directly using the timer. An informal description follows:

 SELECT
	[timeout] TIMEOUT [code] ENDGUARD
	...
	[timeout] [boolean] ?TIMEOUT [code] ENDGUARD
 ENDSELECT

Here "[timeout]" is used to describe an arbitrary series of iForth words that should result in an unsigned number being put on the data stack, for use by TIMEOUT or ?TIMEOUT. This number describes a time interval in timer ticks.

The word "[code]" describes any iForth code. This [code] is executed on the condition that the specified time interval has elapsed.

Again, the "[boolean]" can be generated by any sequence of iForth words. ?TIMEOUT needs a boolean true and a counted down time interval to activate its code.

An example of the above follows

0 VALUE stop?
CHANNEL comm

: X6	CLEAR stop?
	CR ." Type any key to continue (you have a few seconds...)"
	PAR
	    STARTP
		BEGIN KEY? stop? OR UNTIL
		stop? 0= IF KEY comm CHANNEL-C! ENDIF
	    ENDP
	    STARTP
		SELECT
		    comm GUARD
			comm CHANNEL-C@
			CR ." You typed '" EMIT ." '"
		    ENDGUARD
		    10000 TIMEOUT
			CR ." Time out" TRUE TO stop?
		    ENDGUARD
		ENDSELECT
	    ENDP
	ENDPAR ;

Concurrent Language Features

iForth supports true and apparent concurrency. In true concurrent systems each parallel activity is executed by a different physical processor. If parallel execution is simulated by a single-processor system it is said to use apparent concurrency. In iForth the distinction between both forms of concurrency is explicitly programmed by the user. Both forms have advantages. The same language constructs can be used for both. However, important differences exist that reflect the physical implementation.

To simulate parallel tasks on a single-processor system, executing the instructions of one task is interleaved with running all other tasks. The OS handles this automatically. This "slicing" is completely transparent to the user: there are no special requirements for the use of data structures or program control statements; a task or process may be structured just like any other definition. However, special syntax exists to create, start and stop processes.

Asynchronous processes

Asynchronous processes are not very interesting in iForth because they use the old and proven co-operating processes idea (PAUSE), not threading. Note that an asynchronous process has its own USER area, so it can do more than a thread. See the iForth user manual 2nd ed.

Synchronized concurrent processing

We will now describe one of the two iForth constructs that allow synchronized concurrent processing. Occam would call this a "PAR construct". The syntax is discussed and methods are shown to pass parameters to the separate processes. Simple examples are given throughout.

In iForth's PAR construct all started concurrent processes are forced to wait until each and every process has finished and the main line of sequential processing can continue. The "waiting to be finished" bit, the synchronization, is the only feature discerning them from a set of asynchronous concurrent processes as described in the previous paragraphs. Of course there are superficial differences too, like that it is not necessary for the user to allocate named workspaces for each and every process needed in a PAR. Allocation and deallocation code is generated by the compiler automatically.

As a quick preview, inside a colon definition a standard PAR will look as follows:

 VARIABLE gorilla	  1 gorilla !
 VARIABLE bananas	111 bananas !

  : ZOO
  	PAR
  	   STARTP  -24 bananas +!  ENDP
	   STARTP    1 gorilla +!  ENDP
	ENDPAR
	CR bananas ?  gorilla ? ;

Between the PAR and ENDPAR two processes are started in parallel. Their routines are specified between STARTP and ENDP . This example carefully avoids the problems that arise when in the course of their execution the processes need to modify the same variable.

How to define a high-level PAR

iForth's high-level PAR construct is designed to be used with dynamically allocated workspaces and routines written in high-level iForth. The workspace of a sub-process contains all 5 iForth stacks but not a copy of the user area of the main sequential process.

Occam splits of n-1 tasks if asked to run n processes. The root process takes over one of the remaining parallel tasks and runs it in its own workspace. This would be wrong if one of the parallel tasks wants to start its own set of parallel jobs: then two or more processes (at different "task depths") will end up with the same workspace and overwrite each other's data.

iForth therefore starts up exactly n NEW processes for each n PAR . The root process executes endp, after it has allocated and started the n tasks.

To define a high-level parallel construct four words are available: PAR, STARTP, ENDP and ENDPAR.

PAR marks the start of a list of sub-processes. The routine definition for each sub-process is enclosed by STARTP and ENDP. These two words generate code that takes care of workspace allocation and deallocation. ENDPAR marks the end of a list of processes. It compiles code that waits until all of the processes that are in the list of processes have terminated and makes sure sequential processing proceeds with the same priority level in effect as before the PAR was started.

Example usage, a parallel multiplier:

\ First create a parameter area. Each parallel process can access this area.
\              item#: 0   1    2    3     4      5
	CREATE data   0 , 11 , 12 , 123 , 3311 , 0 ,

: TH	data []CELL ;	( index -- address )

: ZOTZ
	PAR
	   STARTP	\ multiply items 1 and 2 and store at 0
		1 TH @	\ get item 1
		2 TH @	\ get item 2
		  *	\ multiply
		0 TH !	\ store at item 0
	   ENDP
	   STARTP 	\ multiply items 3 and 4 and store at 5
	 	3 TH @	\ get item 3
		4 TH @	\ get item 4
		  * 	\ multiply
		5 TH !	\ store at item 5
	   ENDP
	ENDPAR ;

	ZOTZ  CR 0 TH ?  5 TH ?

Parameter passing

No words are available to pass parameters to the high-level processes inside a PAR ENDPAR construct. tForth had the word :I to signal the following STARTP that it should generate code to transfer n words from the data stack of the main sequential process onto the data stack of this particular process. The word :F did the same thing, but for the floating-point items. After the execution of the code generated by STARTP , the relevant stack of the root process dropped the specified number of items. It remains to be seen if these two words will be sorely missed in iForth, or if they can be emulated with extra channel I/O.

Example:

 : WOOF ( -- )
 	PAR
  	   12 21  2 :I STARTP  + {{ . SPACE }} ENDP
	   3e PI  2 :F STARTP  F+ E.           ENDP
	ENDPAR
	.S ;

\ After WOOF , tForth prints:
\ FORTH> WOOF 33  6.141593E0
\   Data: ---
\ System: ---
\  Float: --- ok
\ FORTH>

Process replication

In tForth is is possible to start concurrent processes using a DO LOOP related construction. The word used is TIMES . iForth does not have this word.

[limit] [start] TIMES makes sure the next code fragment that is inside a PAR ... ENDPAR construct is executed [limit]-[start]+1 times. TIMES has the same functionality as DO. The next STARTP command will act as the corresponding LOOP statement. The loop index is available using I as with any other DO ... LOOP command. You can pass items to the I-th sub-process using :I.

Example:

: FOO
	PAR  3 0 TIMES
		   I 2*  I 1+  I DUP *   3 :I
		STARTP
		    + +  .
		ENDP
	ENDPAR ;

\ Execution of FOO generates the following output:
\ FORTH> FOO 1  5  11   ok

Where do we go from here?

This is version 1.0 of iForth with parallel extensions. Experience with a dual-CPU board will show if this is the right path to follow. But it definitely is exciting stuff!

literature

 [1] IMS T800 Architecture; Technical Note 6. INMOS Ltd, Bristol, 1987.
 [2] Transputer Instruction Set: a compiler writer's guide.
     INMOS Limited, Prentice Hall International (UK) Ltd 1988.
 [3] Inside The Transputer.
     D.A.P. Mitchell, J.A. Thompson, G.A. Manson, G.R. Brooks,
     Blackwell Scientific Publications, 1990
 [4] User Guide Transputer Education Kit; Theory of Operation, Installation,
     Schematics. Computer System Architects.
 [5] A tutorial introduction to OCCAM programming. Dick Pountain, INMOS Ltd.
     1987
 [6] draft proposed American National Standard for Information Systems -
     Programming Languages - Forth  (X3J14 dpANS-2 - Aug 3, 1991)
     Secretariat Computer and Business Equipment Manufacturers Association,
     311 First Street, N.W., Suite 500, Washington, DC 20001-2178
 [7] Comments received during the first public review of BSR X3.215.199x,
     dpANS Forth. Secretariat CBEMA.
 [8] dpANS-3, Changed sections only, by John Rible. Secretariat CBEMA.
 [9] Forth-83 Standard. A publication of the Forth Standards Team,
     P.O. box 4545 Mountain View, CA 94040, USA, August 1983
[10] Object-Oriented Forth; Implementation of Data Structures.
     Dick Pountain, Academic Press.
[11] Faster Forth; Reducing overhead in threaded interpretive languages.
     Ronald L. Greene, BYTE, June 1984.
[12] A Fast Forth for the 68000, Lori Chavez. Dr. Dobb's Journal, October 1987.
[13] Implementing Forth on the 80386. John E. Lecky, JFAR Vol 5, No 1, 1987
[14] The T9000 Transputer; Product Overview, INMOS Ltd, Bristol, 1991

free counter