Win32Forth: rename timemod.f32 and fsl_util.f32 to timemod.f and fsl_util.f before use. Do not confuse with the two files for SwiftForth. Likewise: rename timemod.fs_4 to timemod.fs etc.
Sometimes the code is written in a way to expose gotchas of the Intel x86 line of CPUs. Also, the code might favor the AMD Athlon in some places (i.e. run slower than necessary on lesser CPUs). However, I feel the benchmarks are still interesting enough across the board.
I do not excuse myself for not writing portable code -- I do what is needed to get the job done. Hopefully this vast amount of sometimes tricky Forth code is just what the advanced student needs.
Of course I didn't write all these benchmarks myself. Many programs are taken from comp.lang.forth or the Forth archives at taygeta. Many come from magazine articles (Like Forth Dimensions). Many are ported from well-known 'C' or FORTRAN benchmarks. Some of the benchmarks are interesting by themselves (e.g. there is a program to compute all 64-bit Persistent Numbers).
These Forth benchmarks shows the quality of your compiler's integer, floating-point and I/O code as compared to what iForth would give you. Results are available in the monster.tbl file.
BENCHMARK t2 "tak.frt" The famous three-way recurse useless code.
********** BENCHMARK: "tak.frt" run at 00:14:41, April 16, 2001 1000 times TAK, 2.471 seconds elapsed. Result = 7 (7) 1000 times TAKK, 2.122 seconds elapsed. Result = 7 (7) 1000 times TACK, 1.716 seconds elapsed. Result = 7 (7) 1000 times TAKKK, 0.000 seconds elapsed. Result = 7 (7)
BENCHMARK t38 "sievef.frt" You know the Sieve of Eratosthenes? But did you ever try to run it on the FPU?
********** BENCHMARK: "sievef.frt" run at 00:16:35, April 16, 2001 1000 iterations. 1899.000000 primes found. 2.158 seconds elapsed.
BENCHMARK t13 "dsieve.frt" This is a parallel implementation of the Sieve of Eratosthenes. It runs faster even on a single CPU (portable only to Forths that can access OS threads).
********** BENCHMARK: "dsieve.frt" run at 00:14:56, April 16, 2001 1 iteration, non-threaded. 539776 primes found. 2.423 seconds elapsed. 1 iteration, threaded. 539776 primes found. 1.765 seconds elapsed. 1 iteration, threaded. 539776 primes found. 1.768 seconds elapsed.
BENCHMARK t12 "dhryston.frt" The Dhrystone benchmark.
********** BENCHMARK: "dhryston.frt" run at 00:14:55, April 16, 2001 Starting Dhrystone, 100000 loops ... 0.083 seconds elapsed. That is 1204819 Dhrystones/sec.
BENCHMARK t46 "whetvsn2.frt" The Whetstone (FP) benchmark.
********** BENCHMARK: "whetvsn2.frt" run at 00:16:47, April 16, 2001 Start .. 0.347 seconds elapsed... that makes 576,368 KWhet/sec.
BENCHMARK t18 "fastread.frt" Tests the speed of your Forth's file I/O.
********** BENCHMARK: "fastread.frt" run at 00:15:24, April 16, 2001 zygote 0.126 seconds elapsed. 128 (maxtest) zygote 0.022 seconds elapsed. 128 (test-mapversion) zygote 0.097 seconds elapsed. 128 (test-boyer-moore-version) 2.07 milliseconds, 7 (test-TAK) 260001 lines in 0.087 seconds elapsed. (lcount) 1.122 seconds elapsed. (test-invert) 0.090 seconds elapsed. (1024 read-files) 0.238 seconds elapsed. ( 10 read-files) 0.374 seconds elapsed. (1024 read-files) 0.375 seconds elapsed. ( 10 read-files) 0.479 seconds elapsed. ( 9 read-files) 0.094 seconds elapsed. (1100 read-files) 0.231 seconds elapsed. ( 11 read-files) 0.377 seconds elapsed. (1100 read-files) 0.375 seconds elapsed. ( 11 read-files) 0.375 seconds elapsed. ( 10 read-files)
BENCHMARK t19 "fd92.frt" This benchmark comes from Forth Dimensions 1992.
********** BENCHMARK: "fd92.frt" run at 00:15:29, April 16, 2001 *** Benchmark from FD March/April 1992, Guy Kelly *** Empty : 0.001 seconds elapsed. Thread : 0.025 seconds elapsed. Nest1 : 0.030 seconds elapsed. Nest2 : 0.004 seconds elapsed. Prims : 0.002 seconds elapsed. Sieve : 0.004 seconds elapsed. Loads : 0.005 seconds elapsed. Comp : 0.001 seconds elapsed. C prim : 0.062 seconds elapsed. C sec : 0.070 seconds elapsed. rd+wrd+fnd : 0.064 seconds elapsed. read + <word> : 0.052 seconds elapsed. <word> : 0.048 seconds elapsed. word : 0.030 seconds elapsed. refill : 0.192 seconds elapsed.
BENCHMARK t30 "many_sorts.frt" Tests different sorting algorithms. 'New' Forth.
********** BENCHMARK: "many_sorts.frt" run at 00:16:20, April 16, 2001 Sedge sort - 100000 elements, 68 ms (needs 14874 passes) Quicker sort - 100000 elements, 70 ms (needs 60746 passes) Quick sort - 100000 elements, 65 ms (needs 86326 passes) Comb sort - 100000 elements, 219 ms (needs 43 passes) Heap sort - 100000 elements, 191 ms (needs 1624893 passes) Shell sort - 100000 elements, 299 ms (needs 2793058 passes) SIS sort - 10000 elements, 787 ms (needs 25161320 passes) Bubble sort - 10000 elements, 3063 ms (needs 9888 passes)
BENCHMARK t41 "sortsc.frt" Tests different sorting algorithms. 'Old' Forth.
********** BENCHMARK: "sortsc.frt" run at 00:16:38, April 16, 2001 Evaluating NR-QUICK Test Dict RAM Fetches Stores Compares Time Score Maximum Average ----------------------------------------------------------------------------------- RAMP 498000 0 622602 40959 557070 0.05 3.66 3.66 3.66 SLOPE 498000 0 663579 81922 557083 0.05 3.89 3.89 3.77 WILD 498000 0 1047912 341600 678763 0.09 6.24 6.24 4.58 SHUFFLE 498000 0 1056320 343432 685038 0.09 6.28 6.28 5.01 BYTE 498000 0 994683 370544 603589 0.09 5.92 6.34 5.19 FLAT 498000 0 1138687 548866 565248 0.10 6.74 7.03 5.44 CHECKER 498000 0 1101497 518212 561480 0.09 6.52 7.03 5.59 HUMP 498000 0 1007353 391671 595322 0.09 5.99 7.03 5.64 Evaluating R-QUICK Test Dict RAM Fetches Stores Compares Time Score Maximum Average ----------------------------------------------------------------------------------- RAMP 498000 0 671761 49152 598033 0.05 3.93 3.93 3.93 SLOPE 498000 0 712737 90112 598048 0.06 4.17 4.17 4.04 WILD 498000 0 1183112 317790 829149 0.10 6.98 6.98 5.02 SHUFFLE 498000 0 1187685 312742 838541 0.10 7.02 7.03 5.51 BYTE 498000 0 1170198 430714 709739 0.10 6.90 7.31 5.79 FLAT 498000 0 1261567 614400 614400 0.10 7.39 7.94 6.05 CHECKER 498000 0 1233009 581124 622058 0.10 7.23 7.94 6.21 HUMP 498000 0 1156347 455738 670708 0.10 6.80 7.94 6.29 Evaluating HIBBARD Test Dict RAM Fetches Stores Compares Time Score Maximum Average ----------------------------------------------------------------------------------- RAMP 498000 0 1064996 0 532498 0.06 4.72 4.72 4.72 SLOPE 498000 0 2135532 576172 779680 0.15 10.34 10.34 7.53 WILD 498000 0 4394402 1688406 1352998 0.32 22.06 22.06 12.38 SHUFFLE 498000 0 4465416 1722632 1371392 0.32 22.41 22.41 14.88 BYTE 498000 0 2743320 862002 940659 0.19 13.49 22.41 14.60 FLAT 498000 0 1064996 0 532498 0.06 4.71 22.41 12.94 CHECKER 498000 0 1085478 20480 532499 0.06 4.83 22.41 11.78 HUMP 498000 0 2412754 696712 858021 0.17 11.77 22.41 11.78
BENCHMARK t37 "savage.frt" Bill Savage wrote this FPU test ages ago.
********** BENCHMARK: "savage.frt" run at 00:16:33, April 16, 2001 Test with MaxLoop = 2500 ... 0.002 seconds elapsed. Error = -8.69931682245806E-010 Test with MaxLoop = 25000 ... 0.017 seconds elapsed. Error = -1.04020728031173E-007 Test with MaxLoop = 250000 ... 0.162 seconds elapsed. Error = -1.24108046293259E-005 Test with MaxLoop = 2500000 ... 1.624 seconds elapsed. Error = -1.42431538552046E-003
BENCHMARK t47 "clinpack/4linpack.frt" The LINPACK suite in Forth.
********** BENCHMARK: "clinpack/4linpack.frt" run at 00:16:47, April 16, 2001 Unrolled DP precision LINPACK norm. resid resid machep x[0]-1 x[n-1]-1 1 8.21565038E-14 2.22044605E-16 -1.49880108E-14 -1.89848137E-14 times are reported for matrices of order 100 dgefa dgesl total kflops unit ratio times for array with leading dimension of 201 0.002 0.000 0.002 343333 0.006 0.036 0.002 0.000 0.002 343333 0.006 0.036 0.003 0.000 0.003 228888 0.009 0.054 0.002 8.000E-5 0.002 305184 0.007 0.040 times for array with leading dimension of 200 0.002 0.000 0.002 343333 0.006 0.036 0.002 0.000 0.002 343333 0.006 0.036 0.002 0.000 0.002 343333 0.006 0.036 0.002 8.000E-5 0.002 305184 0.006 0.038 Unrolled, DP Precision, 305185 Kflops; 100 Reps
BENCHMARK t53 "tfftdp/tfftdp.frt" A special FFT implementation. Very ugly Forth.
********** BENCHMARK: "fft/fft_c.frt" run at 00:16:48, April 16, 2001 FFT-Mayer: 04 Oct 1994 FFT size: 131072, run-time = 411 ms [1.000], sum of squared errors = 1.219198E-9 FFT size: 65536, run-time = 182 ms [0.940], sum of squared errors = 1.554338E-10 FFT size: 32768, run-time = 47 ms [0.518], sum of squared errors = 6.390517E-13 FFT size: 16384, run-time = 13 ms [0.307], sum of squared errors = 1.955693E-14 FFT size: 8192, run-time = 5 ms [0.254], sum of squared errors = 7.584755E-16 FFT size: 4096, run-time = 2 ms [0.220], sum of squared errors = 3.386117E-17 FFT size: 2048, run-time = 1 ms [0.240], sum of squared errors = 1.507175E-17 FFT size: 1024, run-time = 0 ms [0.000], sum of squared errors = 1.101705E-19 FFT size: 512, run-time = 0 ms [0.000], sum of squared errors = 9.511414E-22 FFT size: 256, run-time = 0 ms [0.000], sum of squared errors = 1.933452E-22 FFT size: 128, run-time = 0 ms [0.000], sum of squared errors = 3.258967E-23 FFT size: 64, run-time = 0 ms [0.000], sum of squared errors = 3.255156E-26 FFT size: 32, run-time = 0 ms [0.000], sum of squared errors = 9.150787E-29 FFT size: 16, run-time = 0 ms [0.000], sum of squared errors = 1.459393E-28 FFT size: 8, run-time = 0 ms [0.000], sum of squared errors = 0.000000 FFT size: 4, run-time = 0 ms [0.000], sum of squared errors = 0.000000
BENCHMARK t55 "MPEBench/benchm.frt" Not Really The MPE Benchmarks. Slight modifications to frustrate compilers that inline everything in sight. (Not so successful, VFX is still some 20% faster than iForth sometimes).
********** BENCHMARK: "MPEBench/benchm.frt" run at 00:18:02, April 16, 2001 This system's primitives using extensions Test time including overhead ms times ns (each) DO LOOP 7 1000000 7 + 8 1000000 8 M+ 15 1000000 15 * 8 1000000 8 / 55 1000000 55 M* 10 1000000 10 M/ 55 1000000 55 /MOD 57 1000000 57 */ 65 1000000 65 ARRAY fill 10 1000000 10 Total: 295 1 This system's application performance using extensions Test time including overhead ms times ns (each) Eratosthenes sieve 1899 Primes 451 8190000 55 Fibonacci recursion ( 34 -> 5702887 ) 471 5702853 82 Hoare's quick sort (reverse order) 267 1000000 267 Generate random numbers (1024 kb array) 274 262144 1045 LZ77 Comp. (100 kb Random Data Mem>Mem) 99 1 Dhrystone (integer) 855 1000000 855 1169590 Dhrystones/sec Total: 2422 1
BENCHMARK t8 "compile-rtime1.frt" Here the interpreter runs while a compiled definition is executing. This can be very efficient if the compiler is either very dumb or every smart.
********** BENCHMARK: "compile-rtime1.frt" run at 00:14:50, April 16, 2001 0.562 seconds elapsed.
BENCHMARK t56 "mmul/mm.frt" Matrix-multiplication benchmark. Your compiler will be 10 to 100 times slower. If not, we have to talk :-)
********** BENCHMARK: "mmul/mm.frt" run at 00:18:05, April 16, 2001 CLK 902 MHz 500x500 mm - normal algorithm 31.67 MFlops, 28.48 ticks/flop, 7.893 s 500x500 mm - blocking, factor of 20 191.43 MFlops, 4.71 ticks/flop, 1.305 s 500x500 mm - transposed B matrix 121.36 MFlops, 7.43 ticks/flop, 2.059 s 500x500 mm - transposed B matrix #2 123.49 MFlops, 7.30 ticks/flop, 2.024 s 500x500 mm - Robert's algorithm 121.93 MFlops, 7.39 ticks/flop, 2.050 s 500x500 mm - T. Maeno's algorithm, subarray 20x20 189.16 MFlops, 4.76 ticks/flop, 1.321 s 500x500 mm - D. Warner's algorithm, subarray 20x20 134.11 MFlops, 6.72 ticks/flop, 1.864 s 500x500 mm - generic mat* 456.26 MFlops, 1.97 ticks/flop, 0.547 s 500x500 mm - iForth DGEMM1 456.50 MFlops, 1.97 ticks/flop, 0.547 s CLK 897 MHz 120x120 mm - normal algorithm 271.84 MFlops, 3.29 ticks/flop, 12.713 ms 120x120 mm - blocking, factor of 20 215.91 MFlops, 4.15 ticks/flop, 16.006 ms 120x120 mm - transposed B matrix 336.24 MFlops, 2.66 ticks/flop, 10.278 ms 120x120 mm - transposed B matrix #2 388.64 MFlops, 2.30 ticks/flop, 8.892 ms 120x120 mm - Robert's algorithm 366.07 MFlops, 2.45 ticks/flop, 9.440 ms 120x120 mm - T. Maeno's algorithm, subarray 20x20 226.45 MFlops, 3.96 ticks/flop, 15.261 ms 120x120 mm - D. Warner's algorithm, subarray 20x20 158.53 MFlops, 5.65 ticks/flop, 21.799 ms 120x120 mm - generic mat* 685.61 MFlops, 1.30 ticks/flop, 5.040 ms 120x120 mm - iForth DGEMM1 685.52 MFlops, 1.30 ticks/flop, 5.041 ms CLK 895 MHz 60x60 mm - normal algorithm 345.33 MFlops, 2.59 ticks/flop, 1.250 ms 60x60 mm - blocking, factor of 20 220.67 MFlops, 4.05 ticks/flop, 1.957 ms 60x60 mm - transposed B matrix 321.31 MFlops, 2.78 ticks/flop, 1.344 ms 60x60 mm - transposed B matrix #2 385.03 MFlops, 2.32 ticks/flop, 1.121 ms 60x60 mm - Robert's algorithm 373.78 MFlops, 2.39 ticks/flop, 1.155 ms 60x60 mm - T. Maeno's algorithm, subarray 20x20 229.38 MFlops, 3.90 ticks/flop, 1.883 ms 60x60 mm - D. Warner's algorithm, subarray 20x20 158.54 MFlops, 5.64 ticks/flop, 2.724 ms 60x60 mm - generic mat* 691.80 MFlops, 1.29 ticks/flop, 0.624 ms 60x60 mm - iForth DGEMM1 691.30 MFlops, 1.29 ticks/flop, 0.624 ms 4x4 mm - generic mat* 216.64 MFlops, 4.13 ticks/flop, 0.000 us 5x5 mm - generic mat* 346.84 MFlops, 2.58 ticks/flop, 0.000 us 6x6 mm - generic mat* 484.69 MFlops, 1.84 ticks/flop, 0.000 us 7x7 mm - generic mat* 521.59 MFlops, 1.71 ticks/flop, 1.000 us 8x8 mm - generic mat* 628.28 MFlops, 1.42 ticks/flop, 1.000 us 9x9 mm - generic mat* 625.26 MFlops, 1.43 ticks/flop, 2.000 us 10x10 mm - generic mat* 694.48 MFlops, 1.28 ticks/flop, 2.000 us 11x11 mm - generic mat* 748.74 MFlops, 1.19 ticks/flop, 3.000 us 12x12 mm - generic mat* 782.20 MFlops, 1.14 ticks/flop, 4.000 us 13x13 mm - generic mat* 790.87 MFlops, 1.13 ticks/flop, 5.000 us 14x14 mm - generic mat* 831.73 MFlops, 1.07 ticks/flop, 6.000 us 15x15 mm - generic mat* 836.65 MFlops, 1.06 ticks/flop, 8.000 us 16x16 mm - generic mat* 867.92 MFlops, 1.03 ticks/flop, 9.000 us 17x17 mm - generic mat* 886.63 MFlops, 1.00 ticks/flop, 11.000 us 18x18 mm - generic mat* 886.42 MFlops, 1.00 ticks/flop, 13.000 us 19x19 mm - generic mat* 912.80 MFlops, 0.98 ticks/flop, 15.000 us 20x20 mm - generic mat* 924.73 MFlops, 0.96 ticks/flop, 17.000 us 21x21 mm - generic mat* 915.25 MFlops, 0.97 ticks/flop, 20.000 us 22x22 mm - generic mat* 914.60 MFlops, 0.97 ticks/flop, 23.000 us 23x23 mm - generic mat* 910.64 MFlops, 0.98 ticks/flop, 26.000 us 24x24 mm - generic mat* 919.00 MFlops, 0.97 ticks/flop, 30.000 us 25x25 mm - generic mat* 921.05 MFlops, 0.97 ticks/flop, 33.000 us 26x26 mm - generic mat* 866.73 MFlops, 1.03 ticks/flop, 40.000 us 27x27 mm - generic mat* 922.13 MFlops, 0.97 ticks/flop, 42.000 us 28x28 mm - generic mat* 925.65 MFlops, 0.96 ticks/flop, 47.000 us 29x29 mm - generic mat* 920.12 MFlops, 0.97 ticks/flop, 53.000 us 30x30 mm - generic mat* 898.29 MFlops, 0.99 ticks/flop, 60.000 us 31x31 mm - generic mat* 888.08 MFlops, 1.00 ticks/flop, 67.000 us 32x32 mm - generic mat* 915.42 MFlops, 0.97 ticks/flop, 71.000 us
BENCHMARK t57 "../misc/floyd.frt" This is Floyd's algorithm (combinatorial logic).
********** BENCHMARK: "../misc/floyd.frt" run at 00:18:51, April 16, 2001 20 microseconds per shuffle 23 microseconds per shuffle 4 microseconds per shuffle3 14 microseconds per shuffle4
BENCHMARK t60 "../misc/golygonc.frt" A geometric problem.
********** BENCHMARK: "../misc/golygonc.frt" run at 00:19:02, April 16, 2001 48-sided Golygon. 246803720 golygons 4.180 seconds.
BENCHMARK t62 "../misc/md5-nwr.frt" The MD5 protocol.
********** BENCHMARK: "../misc/md5-nwr.frt" run at 00:19:07, April 16, 2001 1000 times 10,000 spaces ... 77 us/iteration, result: f38898bb69bb02bccb9594dfe471c5c0
BENCHMARK t65 "../misc/my-sha.frt" The SHA protocol.
********** BENCHMARK: "../misc/my-sha.frt" run at 00:19:12, April 16, 2001 SHA-1 test for EX1 for 100000 loops, 1 microseconds/iteration SHA-1 test for EX2 for 100000 loops, 2 microseconds/iteration SHA-1 test for EX3 for 10 loops, 14100 microseconds/iteration, 70.921 MBytes/sec.
BENCHMARK t66 "../misc/pento2.frt" Another geometric problem, or better, puzzle.
********** BENCHMARK: "../misc/pento2.frt" run at 00:19:14, April 16, 2001 Solutions to the Pentomino Puzzle by Exhaustive Search Total solutions = 2339 Total pieces tried = 2455939, 0.964 seconds elapsed.
BENCHMARK t67 "mullen3.frt" This searches for 64-bit Persistent Numbers.
********** BENCHMARK: "mullen3.frt" run at 00:19:15, April 16, 2001 N = 2, MAX_L = 4, DISTINCT = 55 (0.000 seconds elapsed) N = 3, MAX_L = 5, DISTINCT = 220 (0.000 seconds elapsed) N = 4, MAX_L = 6, DISTINCT = 715 (0.001 seconds elapsed) N = 5, MAX_L = 7, DISTINCT = 2,002 (0.004 seconds elapsed) N = 6, MAX_L = 7, DISTINCT = 5,005 (0.012 seconds elapsed) N = 7, MAX_L = 8, DISTINCT = 11,440 (0.029 seconds elapsed) N = 8, MAX_L = 9, DISTINCT = 24,310 (0.069 seconds elapsed) N = 9, MAX_L = 9, DISTINCT = 48,620 (0.153 seconds elapsed) N = 10, MAX_L = 10, DISTINCT = 92,378 (0.313 seconds elapsed) N = 11, MAX_L = 10, DISTINCT = 167,960 (0.619 seconds elapsed) N = 12, MAX_L = 10, DISTINCT = 293,930 (1.155 seconds elapsed) N = 13, MAX_L = 10, DISTINCT = 497,420 (2.113 seconds elapsed) N = 14, MAX_L = 10, DISTINCT = 817,190 (3.635 seconds elapsed) N = 15, MAX_L = 11, DISTINCT = 1,307,504 (6.270 seconds elapsed) N = 16, MAX_L = 11, DISTINCT = 2,042,975 (10.157 seconds elapsed) MAX-LENGTH = 11, champion = 277777788888899free counter