The question I started out with: Is there a non-trivial difference between
The following tests are a mix of integer and floating point code, I/O is eliminated as much as possible. The benchmark code is available from my homepage.
BENCHMARK 486 Pentium rel. -------------------------------------------------------------------- Fibonacci...9227465 10.347 sec 2.052 sec 5.04 Sieving...1899 10.934 sec 2.192 sec 4.99 Matrix multiply... 8.537 sec 2.099 sec 4.07 bubbling... 11.024 sec 2.778 sec 3.97The average speedup of the Pentium over the i486 is a factor 4.51, 1.8 times better than expected on the basis of the clockspeed difference.
BENCHMARK 486 Pentium rel. -------------------------------------------------------------------- 300,000-times push+pop 27.899 sec 3.646 sec 7.65 (push+pop expected 9.100 sec 3.600 sec 2.53) 300,000-times nop+nop 9.137 sec 1.837 sec 4.97 (nop+nop expected 9.100 sec 3.600 sec 2.53)From the "expected" rows we see that the Pentium machine has a much better caching strategy / implementation. A mystery remains however, as the Pentium seems to use substantially less than 1 cycle to execute a NOP instruction! This is probably why everybody thinks that a Pentium is about twice as fast as a similarly clocked '486 (see also CONCLUSION). Overall, the mean speedup is a factor of 6.31, or 2.52 times better than expected.
BENCHMARK 486 Pentium rel. 1/rel. --------------------------------------------------------------------- no-overlap CMOVE : 8.291 MB/s 48.859 MB/s 0.17 5.88 CMOVE> : 9.310 MB/s 42.253 MB/s 0.22 4.55 MOVE 1->2 : 9.068 MB/s 50.335 MB/s 0.18 5.56 MOVE 2<-1 : 9.085 MB/s 50.000 MB/s 0.18 5.56 overlap CMOVE : 5.787 MB/s 44.776 MB/s 0.13 7.69 CMOVE> : 5.784 MB/s 38.363 MB/s 0.15 6.67 MOVE 1<-2 : 10.570 MB/s 60.483 MB/s 0.17 5.88 MOVE 2->1 : 10.691 MB/s 60.728 MB/s 0.18 5.68Speedup is real good: 5.93 times, or 2.37 times better than expected.
Test 486 Pentium rel. --------------------------------------------------------------------------- Testing DO LOOP = 0.1380 us 0.0300 us 4.60 Testing * = 0.7290 us 0.1440 us 5.06 Testing / = 0.9320 us 0.3160 us 2.95 Testing + = 0.2050 us 0.0540 us 3.80 Testing M* = 0.7950 us 0.1730 us 4.60 Testing M/ = 0.9020 us 0.3230 us 2.79 Testing M+ = 0.4910 us 0.1200 us 4.09 Testing /MOD = 0.9370 us 0.3830 us 2.45 Testing */ = 1.4290 us 0.4300 us 3.32 Testing Eratosthenes = 2.7106 us 0.2564 us 10.57 Testing Hoare's qsort = 39.8000 us 11.1000 us 3.59 all tests = 7.2160 sec 2.1160 sec 3.41The average speedup is 4.26, 1.7 times better than expected.
BENCH Pentium (sec) 486 (sec) rel. 1/rel -------------------------------------------------------------------- Empty : 0.007 0.038 0.18 5.56 Thread : 0.058 0.122 0.48 2.08 Nest1 : 0.044 0.268 0.16 6.25 Nest2 : 0.028 0.071 0.39 2.56 Prims : 0.045 0.136 0.33 3.00 Sieve : 0.050 0.152 0.33 3.00 Loads : 0.017 0.062 0.27 3.70 Comp : 0.018 0.069 0.26 3.85 C prim : 0.161 0.689 0.23 4.35 C sec : 0.170 0.688 0.25 4.00 rd+wrd+fnd : 0.100 0.469 0.21 4.76 read + <word> : 0.050 0.147 0.34 2.94 <word> : 0.253 0.817 0.31 3.22 word : 0.239 0.982 0.24 4.17 refill : 0.139 0.490 0.28 3.57The average speedup is 3.80, or 1.52 times better than expected.
The "unnest/nest pairs" benchmark tests how long it takes to unwind from 32 million nested colon definitions (no loops!).
The SAVAGE benchmark is well-known, Bill Savage wrote it.
FLOAT is a mix of fp operations.
FPMATH was translated by R. L. Smith from an article in Dr. Dobb's Journal, September 1988.
A long time ago I myself posted the Forth DHRYSTONE and WHETSTONE programs here.
BENCHMARK 486 Pentium rel. ------------------------------------------------------------------- 1mloop 0.121 sec 0.035 sec 3.42 SuperSieve 1e9 -- 1.001e9 4.446 sec 0.693 sec 6.42 (50181 primes found) 32 million nest/unnest pairs 9.434 sec 2.947 sec 3.20 Savage (floating point) 0.075 sec 0.018 sec 4.17 Float (mix of operations) 3.039 sec 0.737 sec 4.12 Fpmath 2.074 sec 0.579 sec 3.58 Forth Dhrystone 41631 D/s 166389 D/s 0.25 Forth Whetstone 14495 KW/s 57208 KW/s 0.25Average speedup (note the Dhrystone and Whetstone figures) is 4.11, or 1.65 times better than expected.
I wonder if the future will bring us a Forth compiler that is truly optimized for the Pentium, and that will show higher speedups than say a factor of 2 over a similarly clocked '486. free counter