mxForth

mxForth is generated with the META program, which is a metacompiler written in Forth. The source code of mxForth is loosely based on Bill Muench's (BiMu@aol.com) bForth. mxForth comes for free with iForth.

mxForth is an extremely efficient Forth compiler (presently there is no freely available Forth compiler generating faster code). It will generate applications in which the interpreter is still active. This is an option that costs real money (licensing!) in commercial compilers.
The downside of mxForth is that it is real slow at compiling big programs, and intrinsic functions (FIND WORD CMOVE etc.) are not optimized. This can be a problem sometimes. (If it is, you probably should be using iForth). Of course, not everybody is a speed freak, some people really care about functionality, user interface building, library support... In this case you should look for a professional Forth package (eg. ProForth for Windows from MPE or SwiftForth from FORTH Inc.).

There are presently versions of mxForth for Linux 2.0 and for Windows NT 4.0.

LINUX

The package mentioned above will get the source code for the metacompiler. This compiler is an iForth application and will not run without modification on just any Forth.
Once meta runs you can feed it mxforth.frt, the source code for a new Forth called mxForth. mxForth is able to run under Linux 2.0 or under Windows NT 4.0, depending on a switch in meta.
All I/O in mxForth is handled by a separate server program written in C. The source code for mxforth.c is included in the package. Users of iForth will notice that it is also possible to fool their iforth server into using the mxforth.img file, simply by renaming it to iforth.img.
Linux users who want to experiment with mxForth can use the precompiled mxforth.img in the package after compiling the server (mxforth.c). Note that you cannot rebuild and modify mxforth without a working meta. A smart programmer will be able to run meta on mxForth. You can call yourself a black-belt Forth guru if you succeed :-)

Windows NT 4.0

There is not much difference here from the Linux version.
The package mentioned above will get you the sources for the metacompiler. This compiler is a platform independent iForth application but will not run without modification on just any Windows NT Forth.
Once meta runs you can feed it mxforth.frt, the source code for a new Forth called mxForth. mxForth is able to run under Linux 2.0 or under Windows NT 4.0, depending on a switch in meta. By default meta generates a Linux mxForth when run under Linux and an NT mxForth when run under NT.
All I/O in mxForth is handled by a separate server program written in C. The source code for mxforth.c is NOT included in the package, so what do you do?

Users of iForth can fool their iforth servers into using the mxforth.img file, simply by renaming it to iforth.img.
NT Forth users, e.g. Win32Forth enthusiasts, can try to get meta to work for their Forth. After mxforth.img is successfully assembled they will still need the free iForth C server to execute it.
NT users without access to any Forth use the precompiled mxforth.img from the first package together with the free iForth server for NT (the second package mentioned above). The only thing you can do is play with mxForth a bit (download the third package for some examples).

General info

These are culled from articles posted on comp.lang.forth.

Accompanying version 2.1

Recently some people have expressed a desire to build their own Forth systems to me. iForth does not allow to rebuild the basic system because of various reasons. But rebuilding iForth was not what these people were after. A simple, small and efficient Forth was all they needed. No complicated tools should be necessary, the source code and metacompiler had to be simple to understand and compact, and porting to an embedded system shouldn't be too much of a pain.

In the past I have written many meta and target compilers. Most of this vast array of source code is not usable anymore, mainly because it is written for 16-bit PC-Forths with very peculiar extensions. In order to save some of this tremendous amount of work from oblivion I decided to go against my principles and write a metacompiler in iForth (which is a full ANS Forth). For this I used the best ideas from the past but, because over-generality tends to lead to obesity, I made some clear design decisions from the start. META and mxForth will become part of the iForth distribution.

The first release of META is now ready. It fits in a 24 Kbytes text file. I've written an example application called mxForth that fits in 98 K of text. META can generate any kind of Forth for any kind of machine and any kind of operating system, but mxForth (which looks like an unoptimized evil brother of iForth) is a subroutine-threaded Forth for the Pentium meant to run under Linux. The OS-part of mxForth is split off to a C-server program that performs all I/O duties and loads mxForth into memory. This C-server program could just as well be written in any other language (assembly language for instance). mxForth has most ANS wordsets. Only the floating-point wordset is really missing. All the file words are there. Extra's are some Linux specific words to do timing, shell to the OS, change the working directory, etc. As a bonus one can write C-subroutines, add them to the server code and call them from Forth. It is not yet possible to let C call Forth. mxForth can be reduced to about 40 kB (presently is 700 K), the C-server is 18 kB.

META itself is very nice, I think. Most standard Forth operators like @ ! MOVE DUMP SEE , DIS are available in a META wordlist where they operate transparently on the target memory space. To prevent me going crazy, the new words are prefixed by "T", so we have T@ T! TMOVE TDUMP TSEE TDIS TWORDS etc. The only thing you can't do is execute the new code. The meta compiler is multi-pass, so you can create very complicated forward references if it needs be (mxForth currently needs 4 passes to build). A symbol table is generated at the end of the compile to aid in debugging (I had to learn gdb to debug mxForth, which was a terrible experience).

As Anton Ertl has shown many times in the past, a subroutine-threaded Intel Forth is very inefficient, and direct-threaded seems to be best. Now, a direct-threaded Forth called eForth is already available. It must be assembled using MASM (which is enough to drive red-blooded Linuxers up the wall). So I decided to do it differently this time. As efficiency was no primary aim, I valued the fact that a sr-Forth is as about as simple as one can imagine. But of course, I was curious about the performance anyway. To test that, I ran the four benchmarks in the [xxx] distribution. The results are shown below. You will notice that mxForth is about 200% faster than [xxx] 0.3.0 straight-out-of-the-box, on an Intel P5-166. OPTIMIZE is set to 4 (the default), which means that CONSTANT etc is immediate, and that LITERAL and BRANCH etc. generate inline machine code (which is the natural thing to do for a sr-Forth). Note that OPTIMIZE is an mxForth word, not a metacompile option. Maybe I will add some MACRO words to mxForth to see how high one can tune the performance. Wil Baden's Pinhole optimizer can not easily be added to a subroutine-threaded Forth. This is something I found out too late.

On getting meta to work

[..] A short hint: on a Linux system you may need to recompile the C-server with

   gcc -O4 mxforth.c -omxforth

Do not touch the mxforth.img file in any way. The META compiler will generate a new mxForth when iForth is invoked as follows:

   i4 in mxf.cmd

(Here I assume you have aliased the commands to start iForth with i4).

[LF is an editor written in ANS Forth by Leo Wong -mhx]

LF.FRT runs (using function keys, reverse video, bold, and dynamic memory). I also added the Laxen & Perry F83 block editor and the "report" writer from Starting Forth. mxForth does not have floating-point or locals. Dictionary manipulations are possible, but some words are missing [this has been corrected -mhx]. It has FORGET, but no MARKER (FORGET crashes if you nest VOCABULARY's) [MARKER is now in, but the vocabulary problem remains -mhx]. mxForth is case-sensitive [not anymore, it is now an option -mhx].

Performance revisited

On checking the benchmark files I used to compare mxForth to [xxx] (currently the fastest Linux Forth no money can buy) I found I had accidentally left in "speculative" optimizer code. Sequences like I C@ and I 2! were merged to IC@ and I2!. This made the code about 10 to 20% faster, but it is of course not fair to [xxx].

I had to put in a little extra work to assure the figures I gave stay true, even with the original benchmarks. The new meta.frt and mxforth.zip files are now available. mxForth has become about 30% faster than my latest figures, I seem to have overdone it. My excuses to those people that have already downloaded the package.

The updated results are as follows:

 siev   bubble  matrix  fib     machine and configuration
--------------------------------------------------------------------------
 8.72    9.17    8.79   10.34   Intel Pentium 166MHz; 256K cache;
                                gcc-2.7.2 --enable-force-reg; [xxx]-0.3.0;
 4.03    5.12    3.71    4.39   Intel Pentium 166MHz; 256K cache;
                                mxForth 2.1 with OPTIMIZE=4

mxForth speedup with regards to [xxx]:

 siev   bubble  matrix   fib
--------------------------------------------
 2.16    1.8     2.37    2.36   times faster

Yet another release...(2.2)

I did some more work on mxForth and bumped the version number to 2.2.

The main change is that I altered the model to have the top of stack in a register. mxForth is still subroutine-threaded. Furthermore I removed the numerous no-ops that 2.1 inserted to ensure alignment. The main drawback from this is that mxForth 2.2 has become very difficult to SEE.

To satisfy my curiosity I also added a switch to remove all optimizations from mxForth: OPTIMIZE OFF. In this case mxForth runs at about 0.75 times [xxx]'s speed (confirming bare subroutine-threading is very slow on Intel).

The results from the standard benchmarks in the [xxx] distribution as run on mxForth 2.2 for Linux. mxForth is subroutine-threaded with TOS in a register. The results for [xxx] are shown as a base-line figure, [xxx] is currently the fastest freely available Linux Forth (it also is quite a lot faster than Win32Forth for NT on the same machine). For all the following timings I used an Intel Pentium 166MHz with 256K cache.

 siev   bubble  matrix  fib     configuration (times in seconds)
--------------------------------------------------------------------------
 8.72    9.17    8.79   10.34   gcc-2.7.2 --enable-force-reg; [xxx]-0.3.0;
15.19   17.05   11.98   16.79   mxForth 2.2 with OPTIMIZE=0
12.15   12.47   10.60   13.60   mxForth 2.2 with OPTIMIZE=4
 5.86    7.66    8.12    5.97   mxForth 2.2 with OPTIMIZE=7
10.30   14.10   16.25   18.04   mxForth 2.2 with OPTIMIZE=8
10.30   12.60    6.37    7.84   mxForth 2.2 with OPTIMIZE=10
 9.34    9.85    4.36    7.84   mxForth 2.2 with OPTIMIZE=11
 7.18    8.69    4.89   10.67   mxForth 2.2 with OPTIMIZE=12
 3.86    8.68    4.91    2.99   mxForth 2.2 with OPTIMIZE=14
 3.26    4.24    2.72    2.99   mxForth 2.2 with OPTIMIZE=15

What are the optimizations?

OPTIMIZE (bit# set)     optimization performed
----------------------------------------------------------
0                       smart VARIABLE and CONSTANT
1                       LITERAL compiles inline instead of subroutine
2                       LOOP, branch and ?branch compile jumps
3                       MACRO's are expanded inline

Individual bits in OPTIMIZE can be set to enable the different optimizers. The results show that simply inlining code (8) is not very effective as a first step, it may make some algorithms slower. The best approach seems to be a smart CONSTANT, VARIABLE and LITERAL plus inlined jumps. Non-smart CONSTANT and VARIABLE are possible (14) but a factor 1.5 lower speed results when a lot of memory referencing is going on.

What are the limits? iForth is still about 2 times faster than mxForth. The RAFTS project promises Forth code that will be at least 1.3 times faster than iForth/bigFORTH (approaching C).

Accompanying version 2.4

Although most people are by now probably getting sick of it, I want to give a (hopefully final) status update on the latest mxForth developments.

Just as Leo Wong's fascination with LF/MF/HF, I just can't let go of mxForth.

We're skipping release 2.3 and go to 2.4 immediately. In 2.3 I added more macro words and made sure John Hayes's TESTER.FRT ran without errors. In 2.4 I succeeded in having mxForth run Anton Ertl's POSTPONE.FS. This is quite an achievement for an optimizing Forth (I think). I have now a somewhat generalized approach to defining optimizing macro words that doesn't need state smartness. The macro's even succeed in generating optimized code when POSTPONEd or [COMPILE]d. There are *no* state-smart words in mxForth anymore, none. I will let this technique ripen some more before I backmerge it into iForth. I don't see yet how I can re-use the macro's in the metacompiler to make sure the code for mxForth itself is optimized, but it must be in there somewhere.

Maybe interesting for some people: I've ported mxForth to Windows NT 4.0. It might even work for Windows '95 as long as SYSTEM is not used (I did not test this!). To be able to testdrive mxForth under Windows I have made a simple C server available (mxserver.exe) for download at http://www.IAEhv.nl/users/mhx. iForth users can use their iforth.exe C server by simply renaming the image file from mxforth.img to iforth.img. The documentation on my homepage has been expanded to reflect the latest changes.

Of course mxForth still has the facility to extend it through the C server. Just write a C function, put its address in the jump table, and mxForth can call it with SYSCALL. (This might be fun for fooling around with Windows DLL's interactively).

Although I did my best, I seem to have reached the limit of what can be done with the mxForth model (subroutine, TOS in register, datastack indexed by ebp, Forth return == hardware machine stack). In the end iForth seems to be at least 1.25 times faster than mxForth (iForth uses the hardware stack for the Forth data stack). I looked into CF32 by Tom Zimmer - Thom Almy. The programs I could get to run reliably on CF32 are 1.25 to 2 times faster than mxForth/iForth. The extra speed comes from the IN/OUT compiler hint CF32 allows the programmer to give to the compiler.

And now, the results for the standard benchmarks in the [xxx] distribution as run on mxForth 2.4 for NT (The Linux timings are exactly the same).

Note: The results for [xxx] are shown as a base-line figure. This is *not* to imply [xxx] or mxForth are the Alpha and Omega of Forthing and all other Forths are not worthy of your attention. [xxx]'s authors have expressed an interest in building an efficient Forth kernel and that's about all mxForth is good for at the moment. For building eye-popping, jaw-flapping applications under Win32s/Win95/WinNT use Win32Forth, for Linux use ... weell, who wants that stuff anyway :-)

For all the following timings I used an Intel Pentium 166MHz with 256K cache.

 siev   bubble  matrix  fib     configuration (times in seconds)
-------------------------------------------------------------------
 8.72    9.17    8.79   10.34   gcc-2.7.2 +force-reg; [xxx]-0.3.0;
 3.26    4.24    2.72    2.99   mxForth 2.2 with OPTIMIZE=15
 2.61    2.13    2.29    1.75   mxForth 2.4 with OPTIMIZE=6 (max)

There is quite a lot of improvement of 2.4 over 2.2, OTOH 2.4 was very difficult to get right.

-- Program output (NT 4.0) ----------------------------------------
[1] mxForth server 0.69 (console), Jul 30 1997, 16:49:11.
[2] Stuffed mxForth at 0041618A [entry: 0x420000]
[3] Current process priority is 32.
mxForth vsn 2.4

FORTH> cd mxf2/work c:\dfwforth\examples\mxf2\work ok
FORTH> include benches.frt
Running the Ertl Suite...
Sieving...2607 ms elapsed
 redefining list
Bubbling...2129 ms elapsed
Bubbling with flag...2454 ms elapsed
Matrix multiply in progress..2286 ms elapsed
Fibonacci (optimized)...9227465  1690 ms elapsed
Fibonacci (original)...9227465  1746 ms elapsed ok

free counter