[MLton] cvs commit: Improvements to SSA{,2} shrinker in the
presence of profiling
Matthew Fluet
fluet@cs.cornell.edu
Sun, 12 Jun 2005 08:57:57 -0400 (EDT)
> As stated above, this solves the performance problem with
> wc-scanStream. Unfortunately, it did not significantly affect any of
> the other benchmarks. The new outlier in the presence of profiling is
> checksum.
In an attempt to understand where other performance problems lie with
profiling, I ran the benchmarks with -profile drop, but with SSA, SSA2,
and RSSA passes to erase profiling annotations.
MLton0 -- mlton -profile no
MLton1 -- mlton -profile drop /* drop profiling at start of SSA opts
MLton2 -- mlton -profile drop /* drop profiling at end of SSA opts
MLton3 -- mlton -profile drop /* drop profiling at start of SSA2 opts
MLton4 -- mlton -profile drop /* drop profiling at end of SSA2 opts
MLton5 -- mlton -profile drop /* drop profiling at start of RSSA opts
MLton6 -- mlton -profile drop /* drop profiling at end of RSSA opts,
/* before implementProfiling
MLton7 -- mlton -profile drop /* don't drop profiling from ILs,
/* but don't actually implement anything in
/* implementProfiling
run time ratio
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut 1.00 1.04 1.04 1.04 1.03 1.04 1.04 1.04
boyer 1.00 1.02 1.03 1.02 1.02 1.02 1.02 1.04
checksum 1.00 1.00 1.04 1.00 1.02 1.03 1.00 1.65
count-graphs 1.00 1.08 1.03 1.04 1.00 1.01 1.02 1.08
DLXSimulator 1.00 1.00 1.08 0.97 0.97 0.97 0.98 0.97
fft 1.00 0.98 0.97 0.95 0.95 1.06 1.05 0.97
fib 1.00 1.10 1.10 1.10 1.10 1.10 1.10 1.39
flat-array 1.00 1.25 1.05 0.96 1.04 1.08 0.96 0.96
hamlet 1.00 1.12 1.04 1.04 1.04 1.04 1.04 1.08
imp-for 1.00 1.00 0.99 0.99 1.03 0.99 0.99 0.99
knuth-bendix 1.00 1.10 1.10 1.10 1.10 1.10 1.10 1.24
lexgen 1.00 0.98 1.07 0.98 1.03 1.02 0.98 1.02
life 1.00 1.03 1.08 1.06 1.03 1.04 1.09 1.01
logic 1.00 1.04 0.96 0.96 0.96 0.96 0.96 1.00
mandelbrot 1.00 0.99 1.01 1.03 0.99 0.99 0.99 0.99
matrix-multiply 1.00 1.00 0.99 1.00 1.06 1.08 1.01 0.99
md5 1.00 1.00 1.25 1.25 1.40 1.40 1.40 1.40
merge 1.00 1.00 1.00 1.00 1.00 1.00 1.02 1.16
mlyacc 1.00 1.11 1.14 1.05 1.07 1.01 1.01 1.02
model-elimination 1.00 0.92 0.91 0.91 0.92 0.99 0.91 1.03
mpuz 1.00 1.01 0.95 0.95 0.95 0.95 0.95 1.00
nucleic 1.00 0.92 0.92 0.92 0.92 0.92 0.92 0.92
output1 1.00 0.94 0.94 0.97 0.97 0.94 0.94 0.97
peek 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.25
psdes-random 1.00 1.10 1.07 1.13 1.07 1.07 1.11 1.11
ratio-regions 1.00 1.04 1.09 1.09 1.09 1.09 1.09 1.18
ray 1.00 1.00 1.01 1.07 1.01 1.01 1.01 1.03
raytrace 1.00 1.04 1.03 1.03 1.02 1.03 1.07 1.04
simple 1.00 1.06 1.07 1.00 0.99 0.99 0.99 1.15
smith-normal-form 1.00 0.97 1.07 1.02 0.97 0.97 0.97 0.97
tailfib 1.00 1.00 0.96 0.96 0.96 0.96 0.96 0.96
tak 1.00 1.34 1.35 1.31 1.31 1.31 1.31 1.36
tensor 1.00 1.01 0.81 0.81 0.81 0.81 0.84 0.96
tsp 1.00 1.00 1.00 1.03 1.04 1.00 1.00 1.01
tyan 1.00 1.06 1.19 1.13 1.06 1.10 1.18 1.08
vector-concat 1.00 1.02 1.01 1.00 1.00 1.00 1.00 0.99
vector-rev 1.00 1.09 0.99 1.11 0.99 0.99 0.99 0.99
vliw 1.00 0.98 0.99 1.07 1.01 0.99 0.98 1.03
wc-input1 1.00 1.02 1.02 1.02 1.02 1.05 1.06 0.99
wc-scanStream 1.00 1.02 1.01 1.01 1.01 1.01 1.01 0.95
zebra 1.00 0.98 0.99 1.02 0.98 0.99 0.98 0.97
zern 1.00 1.00 0.99 0.99 0.99 0.99 0.98 1.00
Of the benchmarks that have a ratio >= 1.2 between keeping profiling all
the way through and no profiling, there is no single culprit:
checksum 1.00 1.00 1.04 1.00 1.02 1.03 1.00 1.65
fib 1.00 1.10 1.10 1.10 1.10 1.10 1.10 1.39
knuth-bendix 1.00 1.10 1.10 1.10 1.10 1.10 1.10 1.24
md5 1.00 1.00 1.25 1.25 1.40 1.40 1.40 1.40
peek 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.25
tak 1.00 1.34 1.35 1.31 1.31 1.31 1.31 1.36
fib, knuth-bendix, and tak seem to suggest that there may be missed
simplifications before the SSA optimizations. md5 is missing something in
the SSA optimizations, and again in the SSA2 optimizations. checksum,
fib, knuth-bendix, and peek each seem to exibit some cost being incurred
by implementing profiling. (Though, with -profile drop, this should
essentially erase the profiling annotations.)
The way forward is clear -- investigate md5 to isolate the missing
optimizations, then try to investigate pre-SSA optimizations using tak,
and finally try to understand the cost of implementing profiling using
checksum. But, I'm probably going to take a break from this, since all
the benchmarks above are essentially a small loop, so I'm hopeful that
profiling has minimal impact on real programs.