[MLton] s->alignment considered harmful
Matthew Fluet
fluet at tti-c.org
Sat Nov 3 08:13:24 PST 2007
Here are the results of benchmarks investigating the impact of dynamic vs
fixed alignment and div vs. bitop implementations of align.
MLton0 -- svn HEAD, -align 4
MLton1 -- svn HEAD, -align 8
MLton2 -- svn HEAD + align-h-opt.patch, -align 4
MLton3 -- svn HEAD + align-h-opt.patch, -align 8
MLton4 -- svn HEAD + sed s/s->alignment/4/, -align 4
MLton5 -- svn HEAD + sed s/s->alignment/4/ + align-h-opt.patch, -align 4
MLton6 -- svn HEAD + sed s/s->alignment/8/, -align 8
MLton7 -- svn HEAD + sed s/s->alignment/8/ + align-h-opt.patch, -align 8
On SHADOW (with 8GB physical memory), the benchmarks are compiled with
-runtime 'ram-slop 0.125', so that there is some GC pressure. Although, I
believe that for most benchmarks, the default heuristics will never try to
grow the heap to the point where there is a difference between (the
default) ram-slop 0.5 and ram-slop 0.125.
I also added a thread-switch benchmark to the collection, since that was
the program where Florian saw an appreciable difference with the fixed
alignment.
The brief conclusion is that for the majority of benchmarks, neither
fixing an alignment nor using a bitop implementation of align makes a
difference. For the few benchmarks where there is a slight runtime
improvement, we see most of the improvement just with the bitop
implementation of align, and little to no additional improvement due to
fixing an alignment.
To focus on the behavior of the bitop implementation of align with a
dynamic s->alignment, I ran the benchmarks a second time for any test for
which |MLton0 - MLton2| > 0.3 or for which |MLton1 - MLton3| > 0.3 on the
first benchmark run. There was quite a bit of variability between the two
runs, but I think there is a positive effect of the bitop implementation
for lexgen and vliw (and possibly a negative effect on flat-array) on
SHADOW and a positive effect for vliw (and maybe md5) on FENRIR.
And, ituitively, it seems that the bitop implementation of align would be
more efficient than the division implementation. So, I'll commit that
patch shortly.
As for thread-switch, there is a significant speedup with the bitop
implementation of align (and only a slight additional speedup due to a
fixed alignment). The explaination for this is not that thread-switch
does a lot of allocation or garbage collection (it does neither), but
rather due to the implementation of
<src>/runtime/gc/switch-thread.c:GC_threadSwitch, which makes
multiple calls to <src>/runtime/gc/current.c:getThreadCurrent:
objptr getThreadCurrentObjptr (GC_state s) {
return s->currentThread;
}
GC_thread getThreadCurrent (GC_state s) {
pointer p = objptrToPointer(getThreadCurrentObjptr(s), s->heap.start);
return (GC_thread)(p + offsetofThread (s));
}
which calls <src>runtime/gc/thread.c:offsetofThread:
size_t sizeofThread (GC_state s) {
size_t res;
res = GC_NORMAL_HEADER_SIZE + sizeof (struct GC_thread);
res = align (res, s->alignment);
if (DEBUG) { ... }
assert (isAligned (res, s->alignment));
return res;
}
size_t offsetofThread (GC_state s) {
return (sizeofThread (s)) - (GC_NORMAL_HEADER_SIZE + sizeof (struct GC_thread));
}
The offsetofThread function coordinates between the ML object pointer
(which points to the word immediately after the ML object header) and a
strut GC_thread pointer, which has a variable amount of padding for
alignment purposes; see <src>/runtime/gc/thread.h:
/*
* Thread objects are normal objects with the following layout:
*
* header ::
* padding ::
* bytesNeeded (size_t) ::
* exnStack (size_t) ::
* stack (object-pointer)
*
* There may be zero or more bytes of padding for alignment purposes.
...
*/
typedef struct GC_thread {
size_t bytesNeeded;
size_t exnStack;
objptr stack;
} __attribute__ ((packed)) *GC_thread;
On a 32-bit platform, struct GC_thread is 12 bytes, and needs no bytes of
padding for -align 4 and 4 bytes of padding for -align 8. On a 64-bit
platform, struct GC_thread is 24 bytes, and needs no bytes of padding for
-align 4 or -align 8. Hence, we need to dynamically determine the padding
at runtime, when the alignment of the program is known.
Note, this dynamic padding for struct GC_thread was introduced with the
64-bit port, so thread-switch did take a big performance hit with the
port.
For thread-switch, the runtime is dominated by the GC_switchToThread
calls, which do a number of align calls. As Florian observed, this ends
up stressing integer division. The thread-switch benchmark shows that
when the bitop implementation of align is much faster than the division
implementation, though the other benchmarks show that align rarely
dominates a benchmark's runtime.
Interestingly, the align call in sizeofThread is:
res = GC_NORMAL_HEADER_SIZE + sizeof (struct GC_thread);
res = align (res, s->alignment);
where the value being aligned is a compile time constant. Nonetheless,
fixing an alignment (which would make sizeofThread and offsetofThread
evaluate to compile time constants) does not significantly improve the
performance of the thread-switch benchmark (over using the bitop
implementation of align with a dynamic alignment). So, I don't think that
there is much to be gained from compiling the runtime multiple time, for
each fixed alignment.
The benchmark runtime ratios are below; the full benchmark results are
attached.
SHADOW (Dual-processor single-core AMD Opteron 2.00GHz, 8GB Memory, Fedora Core 7)
Linux shadow 2.6.23.1-10.fc7 #1 SMP Fri Oct 19 14:35:28 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)
MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
run time ratio
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 1.00 0.86 0.99 0.83 0.99 0.99 0.82 0.83
boyer 1.00 1.09 1.05 1.17 1.01 1.01 1.08 1.08
checksum 1.00 1.00 1.00 1.00 1.01 1.00 1.00 1.00
count-graphs 1.00 0.70 0.99 0.69 0.99 0.98 0.69 0.69
DLXSimulator 1.00 1.24 1.07 1.24 1.08 1.00 1.23 1.24
fft 1.00 0.90 1.00 0.90 1.00 1.00 0.90 0.90
fib 1.00 0.92 1.00 0.90 0.91 1.00 0.92 0.92
flat-array 1.00 1.08 1.23 1.17 0.99 1.24 1.08 1.06
hamlet 1.00 0.91 1.02 0.93 0.99 1.00 0.94 0.96
imp-for 1.00 1.30 1.00 1.30 1.00 1.00 1.30 1.30
knuth-bendix 1.00 0.77 1.00 0.78 1.01 1.00 0.79 0.78
lexgen 1.00 0.79 0.90 0.73 0.91 0.92 0.72 0.74
life 1.00 0.92 0.99 0.91 0.99 0.99 0.90 0.90
logic 1.00 0.79 0.94 0.69 0.92 0.93 0.69 0.69
mandelbrot 1.00 0.98 0.98 1.00 0.98 0.98 0.98 0.98
matrix-multiply 1.00 0.98 1.00 0.99 1.00 1.01 1.02 0.98
md5 1.00 0.96 0.98 0.95 0.98 0.98 0.95 0.95
merge 1.00 0.99 1.01 0.99 1.00 1.00 1.06 0.99
mlyacc 1.00 0.91 0.98 0.89 0.98 0.97 0.89 0.90
model-elimination 1.00 0.88 0.97 0.86 0.97 0.97 0.87 0.86
mpuz 1.00 0.77 1.00 0.77 1.00 1.00 0.77 0.77
nucleic 1.00 1.03 1.00 1.02 1.01 1.00 1.03 1.03
output1 1.00 0.56 1.00 0.56 1.00 1.00 0.56 0.56
peek 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
psdes-random 1.00 0.88 1.00 0.89 1.00 1.00 0.93 0.93
ratio-regions 1.00 0.81 0.90 0.81 1.00 0.99 0.81 0.72
ray 1.00 0.87 1.00 0.86 1.00 1.00 0.86 0.85
raytrace 1.00 0.72 1.00 0.72 1.00 1.00 0.71 0.72
simple 1.00 0.92 1.00 0.91 1.00 1.00 0.92 0.92
smith-normal-form 1.00 1.01 1.00 1.00 0.99 0.99 1.00 1.01
tailfib 1.00 1.27 0.99 1.27 1.00 1.00 1.27 1.27
tak 1.00 0.88 1.00 0.88 1.01 1.00 0.88 0.88
tensor 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
thread-switch 1.00 0.88 0.54 0.42 0.50 0.51 0.37 0.37
tsp 1.00 1.01 1.00 1.04 1.03 1.03 1.04 1.01
tyan 1.00 0.87 1.00 0.87 0.99 0.99 0.85 0.86
vector-concat 1.00 0.98 1.00 1.00 1.00 1.01 0.99 0.99
vector-rev 1.00 0.80 1.00 0.80 1.03 1.06 0.80 0.80
vliw 1.00 0.77 0.80 0.67 0.80 0.78 0.67 0.66
wc-input1 1.00 0.80 1.00 0.80 1.00 1.00 0.80 0.80
wc-scanStream 1.00 0.97 0.98 0.96 0.99 0.99 0.97 0.97
zebra 1.00 0.65 0.99 0.64 0.99 0.99 0.64 0.64
zern 1.00 0.79 1.00 0.79 0.98 1.01 0.79 0.79
run time ratio
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
boyer 1.00 1.10 0.92 0.98 0.93 0.93 0.99 0.99
DLXSimulator 1.00 1.09 0.94 1.09 0.94 0.88 1.08 1.09
flat-array 1.00 1.00 1.00 1.00 1.00 1.17 1.12 1.00
lexgen 1.00 0.78 0.89 0.72 0.88 0.90 0.72 0.72
logic 1.00 0.77 1.01 0.75 1.00 1.00 0.77 0.75
ratio-regions 1.00 0.87 0.96 0.82 1.07 0.96 0.77 0.86
thread-switch 1.00 0.86 0.53 0.41 0.50 0.50 0.36 0.37
vliw 1.00 0.77 0.80 0.67 0.80 0.78 0.67 0.67
FENRIR (Dual-processor dual-core Intel Xeon 2.66GHz, 2GB Memory, Mac OS X 10.4)
Darwin fenrir.uchicago.edu 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5367)
MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8
run time ratio
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 1.00 0.89 1.00 0.89 1.00 1.00 0.89 0.89
boyer 1.00 1.24 1.01 1.24 1.01 1.01 1.25 1.25
checksum 1.00 1.00 1.00 1.00 1.00 0.98 1.00 1.00
count-graphs 1.00 1.08 1.00 1.04 0.99 1.00 1.04 1.04
DLXSimulator 1.00 1.26 1.01 1.27 1.00 1.00 1.26 1.27
fft 1.00 0.94 1.00 0.94 1.00 1.00 0.94 0.94
fib 1.00 0.98 1.00 0.97 1.00 1.02 0.98 0.97
flat-array 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
hamlet 1.00 1.19 1.01 1.21 1.03 1.01 1.20 1.21
imp-for 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
knuth-bendix 1.00 0.99 1.00 0.99 1.00 1.00 0.98 0.99
lexgen 1.00 1.06 0.97 1.04 0.94 0.93 1.00 1.00
life 1.00 1.13 1.00 1.13 1.00 1.00 1.13 1.13
logic 1.00 1.07 1.00 1.08 1.00 1.00 1.09 1.09
mandelbrot 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
matrix-multiply 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
md5 1.00 1.00 0.89 0.95 0.88 0.90 0.94 0.94
merge 1.00 1.36 1.00 1.37 1.00 1.00 1.37 1.37
mlyacc 1.00 1.21 0.99 1.21 0.99 0.99 1.20 1.20
model-elimination 1.00 1.04 1.00 1.05 1.00 1.00 1.03 1.04
mpuz 1.00 0.97 0.99 1.03 0.99 0.98 0.97 0.97
nucleic 1.00 0.86 1.00 0.86 1.00 1.00 0.86 0.86
output1 1.00 1.02 1.00 1.02 1.00 1.00 1.02 1.02
peek 1.00 1.00 0.99 1.00 0.98 0.97 1.00 1.00
psdes-random 1.00 1.01 1.00 1.00 1.00 1.00 1.00 1.00
ratio-regions 1.00 1.00 1.05 1.00 1.00 1.00 1.00 1.00
ray 1.00 0.96 1.02 0.98 1.02 0.99 0.98 0.96
raytrace 1.00 0.83 1.00 0.83 1.00 1.00 0.83 0.83
simple 1.00 1.03 1.00 1.02 1.00 1.00 1.02 1.04
smith-normal-form 1.00 0.99 1.00 0.99 1.00 1.00 0.99 1.00
tailfib 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
tak 1.00 0.99 0.98 0.96 0.98 0.99 0.95 1.02
tensor 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
thread-switch 1.00 1.06 0.81 0.87 0.80 0.80 0.84 0.85
tsp 1.00 0.97 1.00 0.97 1.00 1.00 1.00 0.97
tyan 1.00 1.10 1.01 1.12 1.01 1.01 1.10 1.11
vector-concat 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
vector-rev 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
vliw 1.00 1.12 0.91 1.07 0.91 0.90 1.01 1.02
wc-input1 1.00 1.02 1.01 1.01 1.00 1.00 1.02 1.01
wc-scanStream 1.00 1.01 1.00 1.01 1.00 1.00 1.01 1.00
zebra 1.00 0.98 0.99 0.97 0.99 0.99 0.97 0.97
zern 1.00 0.88 1.03 0.91 1.03 1.03 0.91 0.91
run time ratio
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
count-graphs 1.00 1.08 1.00 1.04 0.99 1.00 1.04 1.04
md5 1.00 1.02 0.96 1.02 0.95 0.97 1.02 1.01
mpuz 1.00 0.97 0.99 0.97 0.99 0.99 0.97 0.97
ratio-regions 1.00 1.00 1.03 1.00 1.00 1.00 1.00 1.00
thread-switch 1.00 1.06 0.81 0.87 0.80 0.80 0.84 0.85
vliw 1.00 1.12 0.91 1.06 0.91 0.90 1.01 1.01
-------------- next part --------------
SHADOW (Dual-processor single-core AMD Opteron 2.00GHz, 8GB Memory, Fedora Core 7)
Linux shadow 2.6.23.1-10.fc7 #1 SMP Fri Oct 19 14:35:28 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)
MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
run time ratio
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 1.00 0.86 0.99 0.83 0.99 0.99 0.82 0.83
boyer 1.00 1.09 1.05 1.17 1.01 1.01 1.08 1.08
checksum 1.00 1.00 1.00 1.00 1.01 1.00 1.00 1.00
count-graphs 1.00 0.70 0.99 0.69 0.99 0.98 0.69 0.69
DLXSimulator 1.00 1.24 1.07 1.24 1.08 1.00 1.23 1.24
fft 1.00 0.90 1.00 0.90 1.00 1.00 0.90 0.90
fib 1.00 0.92 1.00 0.90 0.91 1.00 0.92 0.92
flat-array 1.00 1.08 1.23 1.17 0.99 1.24 1.08 1.06
hamlet 1.00 0.91 1.02 0.93 0.99 1.00 0.94 0.96
imp-for 1.00 1.30 1.00 1.30 1.00 1.00 1.30 1.30
knuth-bendix 1.00 0.77 1.00 0.78 1.01 1.00 0.79 0.78
lexgen 1.00 0.79 0.90 0.73 0.91 0.92 0.72 0.74
life 1.00 0.92 0.99 0.91 0.99 0.99 0.90 0.90
logic 1.00 0.79 0.94 0.69 0.92 0.93 0.69 0.69
mandelbrot 1.00 0.98 0.98 1.00 0.98 0.98 0.98 0.98
matrix-multiply 1.00 0.98 1.00 0.99 1.00 1.01 1.02 0.98
md5 1.00 0.96 0.98 0.95 0.98 0.98 0.95 0.95
merge 1.00 0.99 1.01 0.99 1.00 1.00 1.06 0.99
mlyacc 1.00 0.91 0.98 0.89 0.98 0.97 0.89 0.90
model-elimination 1.00 0.88 0.97 0.86 0.97 0.97 0.87 0.86
mpuz 1.00 0.77 1.00 0.77 1.00 1.00 0.77 0.77
nucleic 1.00 1.03 1.00 1.02 1.01 1.00 1.03 1.03
output1 1.00 0.56 1.00 0.56 1.00 1.00 0.56 0.56
peek 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
psdes-random 1.00 0.88 1.00 0.89 1.00 1.00 0.93 0.93
ratio-regions 1.00 0.81 0.90 0.81 1.00 0.99 0.81 0.72
ray 1.00 0.87 1.00 0.86 1.00 1.00 0.86 0.85
raytrace 1.00 0.72 1.00 0.72 1.00 1.00 0.71 0.72
simple 1.00 0.92 1.00 0.91 1.00 1.00 0.92 0.92
smith-normal-form 1.00 1.01 1.00 1.00 0.99 0.99 1.00 1.01
tailfib 1.00 1.27 0.99 1.27 1.00 1.00 1.27 1.27
tak 1.00 0.88 1.00 0.88 1.01 1.00 0.88 0.88
tensor 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
thread-switch 1.00 0.88 0.54 0.42 0.50 0.51 0.37 0.37
tsp 1.00 1.01 1.00 1.04 1.03 1.03 1.04 1.01
tyan 1.00 0.87 1.00 0.87 0.99 0.99 0.85 0.86
vector-concat 1.00 0.98 1.00 1.00 1.00 1.01 0.99 0.99
vector-rev 1.00 0.80 1.00 0.80 1.03 1.06 0.80 0.80
vliw 1.00 0.77 0.80 0.67 0.80 0.78 0.67 0.66
wc-input1 1.00 0.80 1.00 0.80 1.00 1.00 0.80 0.80
wc-scanStream 1.00 0.97 0.98 0.96 0.99 0.99 0.97 0.97
zebra 1.00 0.65 0.99 0.64 0.99 0.99 0.64 0.64
zern 1.00 0.79 1.00 0.79 0.98 1.01 0.79 0.79
size
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 167,542 168,070 167,062 167,590 165,502 165,446 166,030 165,974
boyer 213,513 213,577 213,033 213,097 211,473 211,417 211,537 211,481
checksum 93,529 93,561 93,049 93,081 91,489 91,433 91,521 91,465
count-graphs 119,561 119,785 119,081 119,305 117,521 117,465 117,745 117,689
DLXSimulator 195,972 196,580 195,492 196,100 193,932 193,876 194,540 194,484
fft 117,319 117,335 116,839 116,855 115,279 115,223 115,295 115,239
fib 93,417 93,401 92,937 92,921 91,377 91,321 91,361 91,305
flat-array 92,953 92,985 92,473 92,505 90,913 90,857 90,945 90,889
hamlet 1,503,617 1,516,881 1,503,137 1,516,401 1,501,577 1,501,521 1,514,841 1,514,785
imp-for 93,241 93,257 92,761 92,777 91,201 91,145 91,217 91,161
knuth-bendix 171,716 172,532 171,236 172,052 169,676 169,620 170,492 170,436
lexgen 285,107 286,611 284,627 286,131 283,067 283,011 284,571 284,515
life 117,881 117,865 117,401 117,385 115,841 115,785 115,825 115,769
logic 177,673 177,561 177,193 177,081 175,633 175,577 175,521 175,465
mandelbrot 93,129 93,145 92,649 92,665 91,089 91,033 91,105 91,049
matrix-multiply 95,113 95,129 94,633 94,649 93,073 93,017 93,089 93,033
md5 126,804 127,364 126,324 126,884 124,764 124,708 125,324 125,268
merge 94,745 94,793 94,265 94,313 92,705 92,649 92,753 92,697
mlyacc 657,203 661,411 656,723 660,931 655,163 655,107 659,371 659,315
model-elimination 849,466 851,850 848,986 851,370 847,426 847,370 849,810 849,754
mpuz 99,353 99,417 98,873 98,937 97,313 97,257 97,377 97,321
nucleic 269,048 269,176 268,568 268,696 267,008 266,952 267,136 267,080
output1 136,184 136,824 135,704 136,344 134,144 134,088 134,784 134,728
peek 132,340 132,884 131,860 132,404 130,300 130,244 130,844 130,788
psdes-random 96,313 96,377 95,833 95,897 94,273 94,217 94,337 94,281
ratio-regions 120,857 120,873 120,377 120,393 118,817 118,761 118,833 118,777
ray 244,704 245,648 244,224 245,168 242,664 242,608 243,608 243,552
raytrace 372,714 373,738 372,234 373,258 370,674 370,618 371,698 371,642
simple 343,377 344,161 342,897 343,681 341,337 341,281 342,121 342,065
smith-normal-form 271,668 284,692 271,188 284,212 269,628 269,572 282,652 282,596
tailfib 92,985 93,001 92,505 92,521 90,945 90,889 90,961 90,905
tak 93,465 93,417 92,985 92,937 91,425 91,369 91,377 91,321
tensor 162,251 162,891 161,771 162,411 160,211 160,155 160,851 160,795
thread-switch 141,476 142,100 140,996 141,620 139,436 139,380 140,060 140,004
tsp 139,347 139,955 138,867 139,475 137,307 137,251 137,915 137,859
tyan 212,212 213,236 211,732 212,756 210,172 210,116 211,196 211,140
vector-concat 94,761 94,793 94,281 94,313 92,721 92,665 92,753 92,697
vector-rev 94,521 94,553 94,041 94,073 92,481 92,425 92,513 92,457
vliw 518,946 520,418 518,466 519,938 516,906 516,850 518,378 518,322
wc-input1 158,850 159,570 158,370 159,090 156,810 156,754 157,530 157,474
wc-scanStream 169,666 170,370 169,186 169,890 167,626 167,570 168,330 168,274
zebra 212,436 213,396 211,956 212,916 210,396 210,340 211,356 211,300
zern 132,174 132,190 131,694 131,710 130,134 130,078 130,150 130,094
compile time
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 9.77 10.07 10.45 11.05 11.14 9.79 11.26 9.82
boyer 11.43 9.89 9.83 9.85 9.96 9.72 11.70 11.74
checksum 8.58 7.50 8.36 8.47 7.45 8.51 7.23 8.62
count-graphs 9.46 9.59 9.39 9.56 8.22 9.59 8.21 9.57
DLXSimulator 12.27 12.17 10.53 11.90 10.47 12.08 12.09 12.23
fft 9.15 7.91 7.87 7.75 7.89 9.37 9.14 7.82
fib 8.40 7.30 8.35 7.24 8.40 7.37 8.64 7.39
flat-array 8.42 7.32 8.37 8.48 8.54 7.35 7.25 8.70
hamlet 52.31 52.22 54.46 47.50 54.29 45.89 54.60 46.20
imp-for 8.72 8.72 8.50 8.65 7.38 8.90 7.36 8.88
knuth-bendix 10.78 10.80 9.12 9.26 9.18 10.63 9.02 8.93
lexgen 13.77 13.83 13.70 11.94 12.16 13.76 14.12 13.26
life 9.25 7.93 8.89 9.10 8.13 9.20 9.38 9.47
logic 10.50 11.04 10.65 10.65 10.56 9.27 10.69 9.25
mandelbrot 7.39 7.34 8.28 7.41 7.40 8.50 8.39 8.62
matrix-multiply 8.23 8.32 7.42 7.52 8.36 8.47 8.46 8.56
md5 9.25 9.28 9.16 9.28 8.28 8.33 8.17 9.59
merge 7.38 8.22 8.23 7.39 7.31 8.32 8.39 7.96
mlyacc 29.83 29.57 26.78 26.54 30.07 29.82 30.17 26.75
model-elimination 28.19 25.40 27.54 24.99 28.51 24.39 24.95 27.22
mpuz 8.45 7.47 8.41 8.41 7.45 7.45 7.34 7.43
nucleic 12.52 12.53 11.26 11.20 11.16 11.22 11.02 12.82
output1 9.47 9.49 8.39 9.39 9.34 8.25 9.52 8.24
peek 9.20 9.22 8.39 9.42 8.18 9.46 8.19 8.17
psdes-random 7.57 8.45 7.55 7.47 8.47 8.51 8.60 7.38
ratio-regions 9.83 9.82 9.76 8.69 9.93 8.78 8.73 10.09
ray 11.24 12.43 12.36 11.99 12.32 11.51 12.69 11.13
raytrace 14.87 14.95 14.48 15.45 15.41 16.39 16.75 14.36
simple 13.00 13.11 14.50 12.98 14.62 12.71 13.14 14.60
smith-normal-form 11.18 11.31 11.22 11.53 12.62 12.77 13.02 11.63
tailfib 8.34 7.57 7.34 8.38 7.32 7.32 8.37 8.59
tak 7.30 7.37 8.22 8.03 8.28 8.45 7.25 8.44
tensor 9.91 11.33 11.39 11.25 10.31 11.22 10.20 9.70
thread-switch 9.46 8.43 9.37 9.39 9.53 8.74 9.80 8.37
tsp 8.58 9.63 9.54 8.49 8.65 8.53 8.43 8.56
tyan 10.62 10.61 10.61 10.76 11.92 10.64 10.34 12.21
vector-concat 8.40 8.30 8.31 8.33 7.46 7.36 7.46 7.46
vector-rev 8.25 7.91 7.28 7.30 7.48 7.35 7.24 7.38
vliw 21.40 19.38 21.08 18.74 19.27 21.76 21.76 21.44
wc-input1 8.77 10.06 10.08 10.06 10.03 8.96 8.66 8.84
wc-scanStream 9.28 10.30 10.31 10.34 9.40 10.69 9.26 10.76
zebra 10.66 10.67 11.78 10.61 10.65 11.89 12.19 11.85
zern 9.40 9.36 8.40 8.28 9.35 9.63 8.14 8.30
run time
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 15.80 13.62 15.66 13.16 15.69 15.64 13.03 13.18
boyer 38.36 41.92 40.14 44.90 38.73 38.80 41.33 41.55
checksum 18.49 18.54 18.55 18.54 18.66 18.54 18.55 18.54
count-graphs 32.82 23.09 32.39 22.55 32.36 32.32 22.53 22.53
DLXSimulator 27.28 33.74 29.31 33.74 29.38 27.27 33.67 33.82
fft 15.50 13.97 15.50 13.97 15.50 15.48 13.97 13.97
fib 41.14 37.81 41.14 37.22 37.41 41.15 37.80 37.81
flat-array 28.32 30.45 34.87 33.01 28.05 35.12 30.45 30.08
hamlet 42.23 38.26 43.23 39.46 41.97 42.21 39.54 40.65
imp-for 26.67 34.64 26.67 34.65 26.67 26.67 34.64 34.64
knuth-bendix 23.79 18.42 23.88 18.58 23.97 23.85 18.89 18.51
lexgen 24.42 19.21 22.01 17.78 22.26 22.46 17.69 18.12
life 19.29 17.72 19.08 17.55 19.10 19.07 17.40 17.41
logic 29.55 23.28 27.72 20.48 27.33 27.59 20.44 20.54
mandelbrot 21.18 20.78 20.74 21.18 20.74 20.74 20.74 20.74
matrix-multiply 27.35 26.92 27.29 26.94 27.28 27.70 27.96 26.83
md5 33.74 32.48 33.20 32.02 33.20 33.20 32.05 31.99
merge 52.00 51.39 52.38 51.25 52.10 52.00 55.28 51.28
mlyacc 25.85 23.52 25.24 23.08 25.28 25.13 23.08 23.18
model-elimination 36.54 32.18 35.31 31.59 35.55 35.49 31.65 31.49
mpuz 27.20 20.85 27.20 20.86 27.21 27.21 20.85 20.85
nucleic 15.44 15.93 15.43 15.82 15.60 15.39 15.87 15.90
output1 41.55 23.35 41.54 23.33 41.53 41.53 23.31 23.30
peek 34.89 34.89 34.90 34.89 34.91 34.89 34.89 34.89
psdes-random 18.01 15.91 18.01 15.99 18.01 18.01 16.72 16.72
ratio-regions 142.15 114.67 127.38 115.31 141.84 141.09 115.23 102.65
ray 16.99 14.73 16.94 14.66 17.02 16.96 14.55 14.49
raytrace 20.50 14.71 20.52 14.71 20.53 20.48 14.65 14.69
simple 23.57 21.57 23.49 21.55 23.59 23.53 21.63 21.59
smith-normal-form 8.38 8.43 8.35 8.39 8.33 8.33 8.39 8.43
tailfib 23.67 30.14 23.41 30.14 23.68 23.71 30.14 30.14
tak 31.81 28.14 31.82 28.15 32.03 31.82 28.14 28.15
tensor 22.70 22.70 22.70 22.70 22.69 22.70 22.70 22.70
thread-switch 78.82 69.11 42.84 33.20 39.22 40.23 29.29 29.40
tsp 21.78 21.91 21.77 22.76 22.35 22.35 22.73 21.90
tyan 26.91 23.41 26.81 23.34 26.68 26.75 22.92 23.27
vector-concat 28.64 28.18 28.69 28.61 28.78 28.80 28.29 28.31
vector-rev 45.20 36.31 45.26 36.31 46.77 47.73 36.17 36.29
vliw 32.21 24.65 25.68 21.67 25.83 25.21 21.66 21.34
wc-input1 34.72 27.85 34.60 27.68 34.72 34.76 27.67 27.78
wc-scanStream 28.80 27.89 28.27 27.73 28.66 28.47 27.91 27.87
zebra 40.92 26.56 40.60 26.21 40.61 40.61 26.21 26.22
zern 24.90 19.74 24.91 19.69 24.50 25.03 19.64 19.62
MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
run time ratio
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
boyer 1.00 1.10 0.92 0.98 0.93 0.93 0.99 0.99
DLXSimulator 1.00 1.09 0.94 1.09 0.94 0.88 1.08 1.09
flat-array 1.00 1.00 1.00 1.00 1.00 1.17 1.12 1.00
lexgen 1.00 0.78 0.89 0.72 0.88 0.90 0.72 0.72
logic 1.00 0.77 1.01 0.75 1.00 1.00 0.77 0.75
ratio-regions 1.00 0.87 0.96 0.82 1.07 0.96 0.77 0.86
thread-switch 1.00 0.86 0.53 0.41 0.50 0.50 0.36 0.37
vliw 1.00 0.77 0.80 0.67 0.80 0.78 0.67 0.67
size
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
boyer 213,513 213,577 213,033 213,097 211,473 211,417 211,537 211,481
DLXSimulator 195,972 196,580 195,492 196,100 193,932 193,876 194,540 194,484
flat-array 92,953 92,985 92,473 92,505 90,913 90,857 90,945 90,889
lexgen 285,107 286,611 284,627 286,131 283,067 283,011 284,571 284,515
logic 177,673 177,561 177,193 177,081 175,633 175,577 175,521 175,465
ratio-regions 120,857 120,873 120,377 120,393 118,817 118,761 118,833 118,777
thread-switch 141,476 142,100 140,996 141,620 139,436 139,380 140,060 140,004
vliw 518,946 520,418 518,466 519,938 516,906 516,850 518,378 518,322
compile time
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
boyer 10.42 10.66 9.85 11.31 9.85 9.85 11.47 11.48
DLXSimulator 12.20 10.57 10.39 11.85 10.39 12.15 11.98 10.36
flat-array 8.47 8.46 8.30 7.35 7.44 7.36 8.63 7.33
lexgen 11.79 11.85 13.71 13.74 13.04 11.62 12.08 11.59
logic 10.06 9.40 10.71 10.69 10.90 9.12 9.41 9.13
ratio-regions 10.21 8.79 8.68 10.08 10.14 8.92 10.43 10.43
thread-switch 8.42 8.43 8.34 8.33 9.72 9.81 10.00 9.98
vliw 22.18 19.82 18.56 21.47 18.98 19.07 22.06 21.77
run time
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
boyer 40.66 44.66 37.55 39.89 37.65 37.72 40.22 40.14
DLXSimulator 30.34 32.94 28.52 32.98 28.60 26.81 32.91 33.03
flat-array 25.07 25.16 25.08 25.17 25.07 29.24 28.12 25.17
lexgen 23.99 18.81 21.26 17.35 21.17 21.50 17.27 17.33
logic 27.69 21.26 27.87 20.71 27.62 27.62 21.25 20.72
ratio-regions 132.97 115.51 127.43 109.44 142.49 127.32 102.73 114.90
thread-switch 80.24 69.05 42.87 33.04 40.22 40.18 29.18 29.42
vliw 32.44 24.85 25.81 21.64 26.11 25.45 21.63 21.61
FENRIR (Dual-processor dual-core Intel Xeon 2.66GHz, 2GB Memory, Mac OS X 10.4)
Darwin fenrir.uchicago.edu 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5367)
MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8
run time ratio
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 1.00 0.89 1.00 0.89 1.00 1.00 0.89 0.89
boyer 1.00 1.24 1.01 1.24 1.01 1.01 1.25 1.25
checksum 1.00 1.00 1.00 1.00 1.00 0.98 1.00 1.00
count-graphs 1.00 1.08 1.00 1.04 0.99 1.00 1.04 1.04
DLXSimulator 1.00 1.26 1.01 1.27 1.00 1.00 1.26 1.27
fft 1.00 0.94 1.00 0.94 1.00 1.00 0.94 0.94
fib 1.00 0.98 1.00 0.97 1.00 1.02 0.98 0.97
flat-array 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
hamlet 1.00 1.19 1.01 1.21 1.03 1.01 1.20 1.21
imp-for 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
knuth-bendix 1.00 0.99 1.00 0.99 1.00 1.00 0.98 0.99
lexgen 1.00 1.06 0.97 1.04 0.94 0.93 1.00 1.00
life 1.00 1.13 1.00 1.13 1.00 1.00 1.13 1.13
logic 1.00 1.07 1.00 1.08 1.00 1.00 1.09 1.09
mandelbrot 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
matrix-multiply 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
md5 1.00 1.00 0.89 0.95 0.88 0.90 0.94 0.94
merge 1.00 1.36 1.00 1.37 1.00 1.00 1.37 1.37
mlyacc 1.00 1.21 0.99 1.21 0.99 0.99 1.20 1.20
model-elimination 1.00 1.04 1.00 1.05 1.00 1.00 1.03 1.04
mpuz 1.00 0.97 0.99 1.03 0.99 0.98 0.97 0.97
nucleic 1.00 0.86 1.00 0.86 1.00 1.00 0.86 0.86
output1 1.00 1.02 1.00 1.02 1.00 1.00 1.02 1.02
peek 1.00 1.00 0.99 1.00 0.98 0.97 1.00 1.00
psdes-random 1.00 1.01 1.00 1.00 1.00 1.00 1.00 1.00
ratio-regions 1.00 1.00 1.05 1.00 1.00 1.00 1.00 1.00
ray 1.00 0.96 1.02 0.98 1.02 0.99 0.98 0.96
raytrace 1.00 0.83 1.00 0.83 1.00 1.00 0.83 0.83
simple 1.00 1.03 1.00 1.02 1.00 1.00 1.02 1.04
smith-normal-form 1.00 0.99 1.00 0.99 1.00 1.00 0.99 1.00
tailfib 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
tak 1.00 0.99 0.98 0.96 0.98 0.99 0.95 1.02
tensor 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
thread-switch 1.00 1.06 0.81 0.87 0.80 0.80 0.84 0.85
tsp 1.00 0.97 1.00 0.97 1.00 1.00 1.00 0.97
tyan 1.00 1.10 1.01 1.12 1.01 1.01 1.10 1.11
vector-concat 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
vector-rev 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
vliw 1.00 1.12 0.91 1.07 0.91 0.90 1.01 1.02
wc-input1 1.00 1.02 1.01 1.01 1.00 1.00 1.02 1.01
wc-scanStream 1.00 1.01 1.00 1.01 1.00 1.00 1.01 1.00
zebra 1.00 0.98 0.99 0.97 0.99 0.99 0.97 0.97
zern 1.00 0.88 1.03 0.91 1.03 1.03 0.91 0.91
size
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 167,936 167,936 163,840 167,936 163,840 163,840 163,840 163,840
boyer 204,800 212,992 204,800 208,896 200,704 200,704 208,896 208,896
checksum 106,496 106,496 106,496 106,496 102,400 102,400 102,400 102,400
count-graphs 126,976 126,976 126,976 126,976 122,880 122,880 122,880 122,880
DLXSimulator 196,608 196,608 192,512 196,608 192,512 192,512 196,608 196,608
fft 126,976 126,976 126,976 126,976 122,880 122,880 122,880 122,880
fib 106,496 106,496 102,400 102,400 102,400 102,400 102,400 102,400
flat-array 102,400 106,496 102,400 102,400 102,400 102,400 102,400 102,400
hamlet 1,335,296 1,363,968 1,335,296 1,359,872 1,335,296 1,335,296 1,359,872 1,359,872
imp-for 106,496 106,496 102,400 102,400 102,400 102,400 102,400 102,400
knuth-bendix 176,128 176,128 176,128 176,128 172,032 172,032 176,128 176,128
lexgen 274,432 278,528 274,432 274,432 270,336 270,336 274,432 274,432
life 126,976 131,072 126,976 126,976 126,976 126,976 126,976 126,976
logic 172,032 176,128 172,032 176,128 172,032 172,032 176,128 176,128
mandelbrot 106,496 106,496 102,400 102,400 102,400 102,400 102,400 102,400
matrix-multiply 106,496 106,496 102,400 102,400 102,400 102,400 102,400 102,400
md5 139,264 139,264 135,168 139,264 135,168 135,168 135,168 135,168
merge 106,496 106,496 102,400 102,400 102,400 102,400 102,400 102,400
mlyacc 606,208 614,400 606,208 614,400 602,112 602,112 610,304 610,304
model-elimination 729,088 741,376 729,088 741,376 729,088 724,992 741,376 741,376
mpuz 114,688 114,688 110,592 110,592 110,592 110,592 110,592 110,592
nucleic 278,528 282,624 278,528 282,624 278,528 278,528 282,624 282,624
output1 143,360 143,360 143,360 143,360 139,264 139,264 143,360 143,360
peek 143,360 143,360 139,264 143,360 139,264 139,264 139,264 139,264
psdes-random 106,496 106,496 106,496 106,496 102,400 102,400 102,400 102,400
ratio-regions 131,072 131,072 126,976 126,976 126,976 126,976 126,976 126,976
ray 237,568 237,568 233,472 237,568 233,472 233,472 237,568 237,568
raytrace 331,776 339,968 331,776 335,872 327,680 327,680 335,872 335,872
simple 307,200 311,296 303,104 311,296 303,104 303,104 307,200 307,200
smith-normal-form 262,144 278,528 262,144 274,432 262,144 262,144 274,432 274,432
tailfib 102,400 106,496 102,400 102,400 102,400 102,400 102,400 102,400
tak 106,496 106,496 102,400 102,400 102,400 102,400 102,400 102,400
tensor 167,936 172,032 167,936 167,936 163,840 163,840 167,936 167,936
thread-switch 151,552 155,648 151,552 151,552 151,552 151,552 151,552 151,552
tsp 143,360 147,456 143,360 143,360 143,360 143,360 143,360 143,360
tyan 208,896 212,992 208,896 208,896 204,800 204,800 208,896 208,896
vector-concat 106,496 106,496 102,400 102,400 102,400 102,400 102,400 102,400
vector-rev 106,496 106,496 102,400 102,400 102,400 102,400 102,400 102,400
vliw 466,944 475,136 462,848 471,040 462,848 462,848 471,040 471,040
wc-input1 167,936 167,936 163,840 167,936 163,840 163,840 163,840 163,840
wc-scanStream 176,128 176,128 172,032 176,128 172,032 172,032 172,032 172,032
zebra 212,992 212,992 208,896 212,992 208,896 208,896 208,896 208,896
zern 135,168 135,168 131,072 131,072 131,072 131,072 131,072 131,072
compile time
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 5.32 5.20 5.38 5.25 5.39 5.38 5.40 5.38
boyer 5.31 5.38 5.37 5.41 5.37 5.38 5.44 5.43
checksum 4.22 4.22 4.24 4.23 4.26 4.26 4.26 4.25
count-graphs 4.56 4.57 4.58 4.58 4.59 4.57 4.59 4.58
DLXSimulator 5.62 5.61 5.66 5.67 5.67 5.66 5.75 5.67
fft 4.50 4.50 4.50 4.49 4.52 4.52 4.53 4.54
fib 4.24 4.24 4.25 4.25 4.27 4.27 4.27 4.27
flat-array 4.26 4.24 4.28 4.27 4.29 4.26 4.28 4.26
hamlet 21.60 21.72 23.09 22.62 21.73 23.14 21.89 23.30
imp-for 4.26 4.24 4.27 4.27 4.28 4.29 4.28 4.28
knuth-bendix 5.08 5.10 5.12 5.12 5.12 5.11 5.15 5.13
lexgen 6.41 6.42 6.47 6.46 6.50 6.47 6.50 6.49
life 4.53 4.53 4.53 4.53 4.55 4.57 4.56 4.56
logic 5.23 5.25 5.26 5.28 5.29 5.27 5.31 5.30
mandelbrot 4.30 4.31 4.31 4.32 4.32 4.31 4.33 4.30
matrix-multiply 4.31 4.31 4.34 4.32 4.35 4.34 4.36 4.34
md5 4.67 4.67 4.67 4.68 4.69 4.69 4.70 4.70
merge 4.29 4.28 4.31 4.29 4.39 4.32 4.33 4.31
mlyacc 14.03 14.20 14.35 14.38 14.30 14.36 14.37 14.41
model-elimination 12.74 12.81 12.62 12.66 12.93 12.42 13.02 12.52
mpuz 4.36 4.35 4.36 4.37 4.38 4.39 4.38 4.38
nucleic 6.36 6.40 6.40 6.44 6.39 6.40 6.47 6.46
output1 4.66 4.66 4.68 4.68 4.68 4.68 4.70 4.69
peek 4.68 4.67 4.67 4.69 4.69 4.70 4.70 4.69
psdes-random 4.34 4.32 4.36 4.34 4.35 4.34 4.35 4.35
ratio-regions 4.95 4.94 4.96 4.95 4.95 4.94 4.96 4.93
ray 6.09 6.12 6.14 6.15 6.18 6.16 6.19 6.16
raytrace 7.86 7.95 8.18 8.01 8.04 8.04 8.11 8.12
simple 6.78 6.81 6.84 6.84 6.86 6.84 6.88 6.85
smith-normal-form 5.95 6.07 6.02 6.13 6.07 6.03 6.19 6.13
tailfib 4.29 4.28 4.31 4.29 4.32 4.32 4.33 4.31
tak 4.29 4.28 4.28 4.28 4.30 4.31 4.31 4.32
tensor 5.50 5.49 5.54 5.54 5.55 5.56 5.56 5.55
thread-switch 4.76 4.75 4.76 4.77 4.79 4.76 4.78 4.78
tsp 4.86 4.87 4.86 4.86 4.89 4.88 4.89 4.90
tyan 5.81 5.82 5.89 5.89 5.87 5.89 5.90 5.91
vector-concat 4.33 4.33 4.36 4.34 4.35 4.37 4.35 4.36
vector-rev 4.33 4.30 4.35 4.32 4.34 4.32 4.34 4.32
vliw 9.74 9.77 10.11 10.12 9.90 9.87 9.93 9.92
wc-input1 5.00 5.00 5.02 5.04 5.03 5.02 5.04 5.02
wc-scanStream 5.14 5.12 5.16 5.15 5.16 5.14 5.17 5.15
zebra 6.02 5.98 6.03 6.04 6.03 6.03 6.04 6.04
zern 4.70 4.68 4.69 4.68 4.71 4.70 4.71 4.72
run time
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 11.81 10.47 11.87 10.52 11.86 11.85 10.54 10.51
boyer 17.32 21.42 17.43 21.47 17.53 17.50 21.67 21.73
checksum 30.86 30.84 30.85 30.84 30.85 30.36 30.84 30.83
count-graphs 12.33 13.36 12.32 12.86 12.25 12.27 12.78 12.76
DLXSimulator 11.05 13.93 11.13 14.03 11.08 11.07 13.97 14.00
fft 12.40 11.69 12.40 11.70 12.41 12.41 11.70 11.70
fib 21.36 20.95 21.36 20.81 21.36 21.88 20.90 20.80
flat-array 13.88 13.86 13.87 13.86 13.85 13.86 13.87 13.85
hamlet 19.62 23.27 19.77 23.75 20.12 19.74 23.58 23.83
imp-for 13.37 13.37 13.37 13.37 13.37 13.37 13.37 13.37
knuth-bendix 11.89 11.72 11.92 11.77 11.93 11.86 11.71 11.79
lexgen 9.97 10.57 9.72 10.36 9.34 9.32 10.02 10.02
life 11.20 12.63 11.17 12.68 11.20 11.18 12.64 12.64
logic 11.53 12.31 11.59 12.40 11.55 11.53 12.53 12.52
mandelbrot 18.99 18.99 18.99 18.99 18.99 18.99 18.99 18.99
matrix-multiply 12.36 12.38 12.35 12.42 12.35 12.39 12.38 12.37
md5 20.48 20.51 18.24 19.41 18.10 18.35 19.31 19.22
merge 16.98 23.15 16.98 23.21 16.97 16.95 23.23 23.24
mlyacc 13.08 15.88 13.00 15.84 12.96 12.95 15.71 15.73
model-elimination 22.69 23.49 22.72 23.91 22.69 22.60 23.37 23.59
mpuz 13.34 12.95 13.17 13.69 13.17 13.13 12.95 12.95
nucleic 10.88 9.41 10.90 9.38 10.89 10.89 9.38 9.40
output1 14.15 14.44 14.15 14.43 14.15 14.16 14.43 14.43
peek 20.62 20.65 20.32 20.58 20.16 20.07 20.59 20.63
psdes-random 12.81 12.97 12.83 12.78 12.80 12.77 12.87 12.84
ratio-regions 47.96 47.90 50.25 47.93 48.03 48.09 47.91 47.94
ray 13.95 13.44 14.17 13.69 14.17 13.79 13.68 13.44
raytrace 11.47 9.53 11.49 9.56 11.48 11.49 9.50 9.53
simple 11.40 11.73 11.43 11.59 11.42 11.41 11.63 11.81
smith-normal-form 14.21 14.14 14.22 14.12 14.21 14.19 14.13 14.14
tailfib 13.62 13.62 13.62 13.62 13.62 13.62 13.62 13.62
tak 14.26 14.09 14.01 13.72 14.02 14.15 13.54 14.56
tensor 20.93 20.93 20.93 20.93 20.93 20.93 20.93 20.93
thread-switch 22.14 23.47 17.92 19.25 17.61 17.62 18.68 18.82
tsp 23.06 22.41 23.07 22.38 23.07 23.07 22.97 22.38
tyan 12.12 13.34 12.22 13.58 12.21 12.21 13.37 13.41
vector-concat 18.55 18.53 18.54 18.51 18.51 18.54 18.53 18.54
vector-rev 19.49 19.45 19.48 19.46 19.49 19.47 19.48 19.45
vliw 12.03 13.47 10.91 12.84 10.94 10.86 12.18 12.21
wc-input1 14.48 14.70 14.57 14.67 14.53 14.55 14.70 14.68
wc-scanStream 18.51 18.62 18.49 18.60 18.53 18.54 18.60 18.50
zebra 16.17 15.80 16.03 15.70 15.98 16.03 15.66 15.72
zern 14.56 12.81 15.02 13.26 15.01 14.99 13.29 13.30
MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8
run time ratio
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
count-graphs 1.00 1.08 1.00 1.04 0.99 1.00 1.04 1.04
md5 1.00 1.02 0.96 1.02 0.95 0.97 1.02 1.01
mpuz 1.00 0.97 0.99 0.97 0.99 0.99 0.97 0.97
ratio-regions 1.00 1.00 1.03 1.00 1.00 1.00 1.00 1.00
thread-switch 1.00 1.06 0.81 0.87 0.80 0.80 0.84 0.85
vliw 1.00 1.12 0.91 1.06 0.91 0.90 1.01 1.01
size
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
count-graphs 126,976 126,976 126,976 126,976 122,880 122,880 122,880 122,880
md5 139,264 139,264 135,168 139,264 135,168 135,168 135,168 135,168
mpuz 114,688 114,688 110,592 110,592 110,592 110,592 110,592 110,592
ratio-regions 131,072 131,072 126,976 126,976 126,976 126,976 126,976 126,976
thread-switch 151,552 155,648 151,552 151,552 151,552 151,552 151,552 151,552
vliw 466,944 475,136 462,848 471,040 462,848 462,848 471,040 471,040
compile time
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
count-graphs 4.78 4.64 4.79 4.68 4.80 4.81 4.83 4.82
md5 4.81 4.70 4.82 4.71 4.77 4.72 4.75 4.77
mpuz 4.41 4.38 4.41 4.38 4.41 4.41 4.43 4.41
ratio-regions 4.98 4.95 4.98 4.96 4.99 4.99 4.98 5.00
thread-switch 4.74 4.73 4.75 4.76 4.77 4.75 4.78 4.76
vliw 9.71 9.72 10.08 10.09 9.90 9.89 9.91 9.90
run time
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
count-graphs 12.33 13.36 12.32 12.86 12.25 12.27 12.78 12.76
md5 18.95 19.35 18.24 19.41 18.10 18.35 19.31 19.22
mpuz 13.33 12.95 13.17 12.95 13.17 13.13 12.96 12.95
ratio-regions 48.07 47.99 49.49 47.99 48.10 48.14 47.95 47.90
thread-switch 22.12 23.46 17.93 19.25 17.61 17.61 18.68 18.82
vliw 12.04 13.45 10.91 12.81 10.92 10.84 12.13 12.19
More information about the MLton
mailing list