[MLton] Constant Folding of FP Operations
Matthew Fluet
fluet at tti-c.org
Sun Jun 1 18:03:26 PDT 2008
On Sun, 1 Jun 2008, Matthew Fluet wrote:
> On Fri, 23 May 2008, Vesa Karvonen wrote:
>> Attached is an experimental patch that improves MLton's constant folding
>> of floating point operations. The problem with constant folding FP
>> operations in SML is that the FP ops are subject to rounding mode
>> settings:
>>
>> http://www.standardml.org/Basis/ieee-float.html#SIG:IEEE_REAL.setRoundingMode:VAL
>>
>> The workaround used in the patch is to evaluate the operations in all
>> rounding modes (actually in only TO_NEGINF and TO_POSINF) and check that
>> the results agree. This ensures that constant folding is correct in all
>> rounding modes.
>
> Seems like a good optimization, and a sound one. Did you run the benchmarks
> and observe any speedups?
On amd64-linux, I get the following:
MLton0 -- ~/devel/mlton/mlton.svn.trunk/build/bin/mlton -codegen amd64
MLton1 -- ~/devel/mlton/mlton.svn.trunk/build/bin/mlton -codegen c
MLton2 -- ~/devel/mlton/mlton.svn.trunk/build.real-cf/bin/mlton -codegen amd64
MLton3 -- ~/devel/mlton/mlton.svn.trunk/build.real-cf/bin/mlton -codegen c
run time ratio
benchmark MLton0 MLton1 MLton2 MLton3
barnes-hut 1.00 1.04 0.84 0.88
boyer 1.00 1.12 1.03 1.12
checksum 1.00 5.72 1.01 5.72
count-graphs 1.00 0.92 1.01 0.92
DLXSimulator 1.00 1.08 0.98 1.06
fft 1.00 1.05 1.00 1.06
fib 1.00 1.40 1.04 1.40
flat-array 1.00 1.60 0.99 1.61
hamlet 1.00 1.65 1.00 1.68
imp-for 1.00 1.46 0.99 1.47
knuth-bendix 1.00 1.36 1.00 1.36
lexgen 1.00 1.05 1.00 1.04
life 1.00 1.00 1.00 1.00
logic 1.00 1.08 1.00 1.09
mandelbrot 1.00 1.36 0.96 1.35
matrix-multiply 1.00 0.91 1.00 0.91
md5 1.00 6.94 1.00 6.94
merge 1.00 1.05 1.00 1.05
mlyacc 1.00 1.08 1.00 1.08
model-elimination 1.00 1.25 0.99 1.26
mpuz 1.00 1.93 1.00 1.93
nucleic 1.00 1.00 0.99 0.99
output1 1.00 1.23 1.00 1.24
peek 1.00 0.91 1.05 0.92
psdes-random 1.00 0.83 0.99 0.83
ratio-regions 1.00 1.21 1.00 1.33
ray 1.00 1.06 0.94 1.05
raytrace 1.00 1.13 0.96 1.05
simple 1.00 1.40 1.06 1.42
smith-normal-form 1.00 1.01 1.01 1.01
tailfib 1.00 1.87 1.06 1.87
tak 1.00 1.14 1.00 1.13
tensor 1.00 1.57 1.01 1.57
tsp 1.00 1.01 1.00 1.01
tyan 1.00 1.18 1.02 1.18
vector-concat 1.00 1.08 1.00 1.07
vector-rev 1.00 1.40 1.00 1.47
vliw 1.00 1.35 1.01 1.38
wc-input1 1.00 1.00 1.00 1.00
wc-scanStream 1.00 1.21 0.99 1.21
zebra 1.00 0.79 1.00 0.79
zern 1.00 1.62 1.00 1.64
This seems to only show a non-noise speedup on barnes-hut (and maybe ray).
I'd really like to know a good way of cutting down the noise in the
benchmarks; consider that fib and tailfib (which use no FP operations, and
so yield identical assembly code) show a 1.04 and 1.06 slowdown,
respectively.
The rest of the benchmark data follows:
size
benchmark MLton0 MLton1 MLton2 MLton3
barnes-hut 165,614 170,237 167,663 172,286
boyer 218,529 220,105 218,529 220,105
checksum 98,257 105,473 98,257 105,473
count-graphs 124,401 127,073 124,401 127,073
DLXSimulator 201,324 210,004 201,324 210,004
fft 120,687 127,772 120,655 127,708
fib 98,225 97,321 98,225 97,321
flat-array 97,681 96,913 97,681 96,913
hamlet 1,509,177 1,542,601 1,508,809 1,545,585
imp-for 97,969 97,105 97,969 97,105
knuth-bendix 177,004 186,044 177,004 186,044
lexgen 291,003 318,683 291,003 318,683
life 122,257 118,777 122,257 118,777
logic 182,497 182,665 182,497 182,665
mandelbrot 97,857 100,545 97,841 100,529
matrix-multiply 99,969 102,225 99,969 102,225
md5 132,252 142,588 132,252 142,588
merge 99,601 106,417 99,601 106,417
mlyacc 663,259 704,187 663,259 704,187
model-elimination 865,986 953,682 866,002 953,666
mpuz 104,241 112,273 104,241 112,273
nucleic 273,760 256,196 273,760 256,196
output1 141,056 148,688 141,056 148,688
peek 137,804 143,212 137,804 143,212
psdes-random 101,169 99,665 101,169 99,665
ratio-regions 125,649 135,905 125,649 135,905
ray 249,400 257,839 248,728 258,223
raytrace 378,114 397,078 374,530 392,054
simple 347,593 377,012 346,777 376,548
smith-normal-form 276,332 292,884 276,332 292,884
tailfib 97,713 96,897 97,713 96,897
tak 98,273 97,273 98,273 97,273
tensor 167,507 174,971 167,507 174,971
tsp 144,827 151,658 144,363 151,354
tyan 217,644 229,268 217,644 229,268
vector-concat 99,617 98,457 99,617 98,457
vector-rev 99,217 98,281 99,217 98,281
vliw 528,426 616,042 526,874 614,490
wc-input1 164,522 169,826 164,522 169,826
wc-scanStream 175,258 184,914 175,258 184,914
zebra 217,196 219,948 217,196 219,948
zern 135,302 140,707 135,318 140,403
compile time
benchmark MLton0 MLton1 MLton2 MLton3
barnes-hut 9.78 12.15 10.89 12.87
boyer 10.80 23.17 10.32 22.40
checksum 7.70 7.94 7.66 8.25
count-graphs 8.29 9.43 8.93 9.58
DLXSimulator 10.81 15.09 10.92 15.61
fft 8.19 9.04 8.53 9.21
fib 7.74 7.89 8.22 8.56
flat-array 7.56 7.58 8.18 8.13
hamlet 44.43 111.25 48.09 112.88
imp-for 7.75 7.79 7.71 7.96
knuth-bendix 9.51 13.01 9.86 13.41
lexgen 12.25 18.86 12.61 19.02
life 8.31 9.84 8.27 9.29
logic 10.09 13.89 9.90 14.41
mandelbrot 7.77 7.80 8.04 8.39
matrix-multiply 8.33 8.52 7.82 8.13
md5 8.64 10.02 8.74 10.50
merge 7.60 7.82 7.86 8.20
mlyacc 27.15 44.67 26.66 44.90
model-elimination 25.60 57.05 25.57 57.73
mpuz 7.92 8.23 8.05 8.34
nucleic 11.75 23.32 11.74 23.16
output1 8.57 10.34 8.72 10.81
peek 8.45 10.05 8.94 10.31
psdes-random 7.68 8.16 8.04 8.34
ratio-regions 8.97 10.33 9.42 10.98
ray 11.82 17.10 11.64 17.38
raytrace 15.24 26.74 14.96 26.36
simple 13.48 22.12 13.04 21.70
smith-normal-form 11.55 54.59 12.06 52.01
tailfib 7.63 7.94 7.85 8.10
tak 7.64 8.04 7.88 8.17
tensor 10.38 13.41 10.48 13.56
tsp 9.25 10.70 9.55 11.00
tyan 10.77 15.96 10.89 16.04
vector-concat 7.44 7.76 7.92 8.20
vector-rev 7.59 7.80 8.10 7.89
vliw 19.42 35.39 19.30 35.81
wc-input1 9.85 12.02 10.13 12.06
wc-scanStream 9.60 12.95 9.90 12.64
zebra 11.04 15.44 11.02 15.17
zern 8.28 9.75 8.85 9.81
run time
benchmark MLton0 MLton1 MLton2 MLton3
barnes-hut 18.06 18.74 15.15 15.94
boyer 53.59 59.96 55.25 60.11
checksum 18.73 107.23 18.83 107.20
count-graphs 26.00 24.04 26.24 24.05
DLXSimulator 28.01 30.34 27.39 29.75
fft 14.61 15.34 14.54 15.45
fib 37.68 52.67 39.24 52.68
flat-array 29.27 46.93 29.02 47.00
hamlet 50.41 82.93 50.26 84.74
imp-for 26.97 39.41 26.73 39.56
knuth-bendix 24.96 34.07 24.90 34.01
lexgen 22.16 23.22 22.11 23.01
life 26.40 26.39 26.45 26.39
logic 23.22 25.07 23.31 25.41
mandelbrot 21.64 29.39 20.71 29.21
matrix-multiply 35.93 32.61 35.81 32.74
md5 33.22 230.39 33.24 230.60
merge 48.85 51.19 49.00 51.41
mlyacc 26.21 28.28 26.19 28.23
model-elimination 38.80 48.56 38.28 48.76
mpuz 23.48 45.41 23.44 45.35
nucleic 18.37 18.41 18.18 18.25
output1 37.22 45.94 37.34 46.09
peek 21.91 19.99 23.06 20.05
psdes-random 16.05 13.26 15.88 13.28
ratio-regions 124.21 150.12 124.00 164.76
ray 14.96 15.84 14.09 15.66
raytrace 17.23 19.49 16.49 18.15
simple 27.70 38.85 29.36 39.24
smith-normal-form 8.48 8.59 8.59 8.59
tailfib 22.48 41.96 23.79 41.95
tak 32.33 36.78 32.45 36.44
tensor 22.72 35.64 22.98 35.59
tsp 25.33 25.69 25.28 25.59
tyan 27.55 32.38 28.06 32.51
vector-concat 27.96 30.27 27.97 29.98
vector-rev 37.32 52.38 37.26 54.74
vliw 23.96 32.38 24.31 33.13
wc-input1 34.72 34.87 34.85 34.61
wc-scanStream 28.70 34.74 28.31 34.84
zebra 30.52 24.01 30.54 24.12
zern 22.54 36.43 22.59 36.90
More information about the MLton
mailing list