<div dir="ltr">On Mon, Jul 28, 2008 at 1:30 PM, Wesley W. Terpstra <span dir="ltr"><<a href="mailto:terpstra@gmail.com">terpstra@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div dir="ltr"><div class="gmail_quote"><div>PIC code actually improves performance in all but one very rare case that I've made conditional based on if the output format is a library.</div></div></div></blockquote><div>
<br>Well, it seems to improve executable size by about 2%, but has not measurable affect on performance. I've succeeded in running the entire regression suite and a full MLton build in both executable and library form, so I think the modified assembly is ok. For reference, here is the difference in file sizes and execution times for the benchmark tests. The tensor result was not reproducible.<br>
<br>MLton0 -- /usr/bin/mlton<br>MLton1 -- /home/terpstra/mlton/build/bin/mlton<br>run time ratio<br>benchmark MLton0 MLton1<br>barnes-hut 1.00 1.00<br>boyer 1.00 1.01<br>checksum 1.00 1.00<br>
count-graphs 1.00 1.00<br>DLXSimulator 1.00 1.00<br>fft 1.00 1.00<br>fib 1.00 0.99<br>flat-array 1.00 1.00<br>hamlet 1.00 1.00<br>imp-for 1.00 1.00<br>
knuth-bendix 1.00 1.01<br>lexgen 1.00 1.00<br>life 1.00 1.00<br>logic 1.00 1.00<br>mandelbrot 1.00 0.98<br>matrix-multiply 1.00 1.00<br>md5 1.00 1.00<br>
merge 1.00 0.99<br>mlyacc 1.00 1.00<br>model-elimination 1.00 1.00<br>mpuz 1.00 1.00<br>nucleic 1.00 1.00<br>output1 1.00 1.00<br>peek 1.00 1.05<br>
psdes-random 1.00 0.97<br>ratio-regions 1.00 1.00<br>ray 1.00 1.00<br>raytrace 1.00 1.00<br>simple 1.00 1.00<br>smith-normal-form 1.00 1.00<br>tailfib 1.00 1.00<br>
tak 1.00 1.00<br>tensor 1.00 1.14<br>tsp 1.00 0.99<br>tyan 1.00 1.00<br>vector-concat 1.00 0.99<br>vector-rev 1.00 1.00<br>vliw 1.00 1.00<br>
wc-input1 1.00 0.96<br>wc-scanStream 1.00 1.00<br>zebra 1.00 1.02<br>zern 1.00 1.00<br>size<br>benchmark MLton0 MLton1<br>barnes-hut 166,333 164,541<br>
boyer 218,916 214,324<br>checksum 98,692 98,372<br>count-graphs 124,724 123,716<br>DLXSimulator 201,687 199,191<br>fft 121,394 120,706<br>fib 98,628 98,324<br>
flat-array 98,084 97,812<br>hamlet 1,509,072 1,480,528<br>imp-for 98,372 98,068<br>knuth-bendix 177,383 174,743<br>lexgen 291,254 286,374<br>life 122,660 121,860<br>
logic 182,916 181,444<br>mandelbrot 98,244 97,972<br>matrix-multiply 100,388 99,940<br>md5 132,679 131,543<br>merge 100,036 99,716<br>mlyacc 663,302 654,902<br>
model-elimination 866,113 851,409<br>mpuz 104,660 104,244<br>nucleic 274,691 269,571<br>output1 141,379 139,795<br>peek 138,199 136,663<br>psdes-random 101,524 101,012<br>
ratio-regions 125,860 124,756<br>ray 249,951 246,015<br>raytrace 378,729 373,129<br>simple 348,096 341,680<br>smith-normal-form 276,727 272,359<br>tailfib 98,116 97,812<br>
tak 98,676 98,372<br>tensor 167,762 165,794<br>tsp 145,462 144,070<br>tyan 217,959 214,487<br>vector-concat 100,020 99,700<br>vector-rev 99,620 99,252<br>
vliw 528,453 520,453<br>wc-input1 169,601 167,185<br>wc-scanStream 175,633 173,201<br>zebra 217,591 216,199<br>zern 136,013 134,893<br>compile time<br>
benchmark MLton0 MLton1<br>barnes-hut 8.77 8.07<br>boyer 8.50 8.56<br>checksum 6.27 6.35<br>count-graphs 6.82 6.92<br>DLXSimulator 8.67 8.86<br>fft 6.66 6.69<br>
fib 6.25 6.26<br>flat-array 6.28 6.28<br>hamlet 35.31 37.89<br>imp-for 6.32 6.40<br>knuth-bendix 7.73 7.84<br>lexgen 10.12 10.08<br>life 6.79 6.84<br>
logic 8.06 8.09<br>mandelbrot 6.28 6.36<br>matrix-multiply 6.30 6.41<br>md5 7.02 7.00<br>merge 6.28 6.38<br>mlyacc 21.76 22.13<br>model-elimination 20.14 20.65<br>
mpuz 6.46 6.49<br>nucleic 9.58 9.44<br>output1 7.06 7.07<br>peek 7.08 7.08<br>psdes-random 6.28 6.38<br>ratio-regions 7.42 7.63<br>ray 9.43 9.41<br>
raytrace 12.71 12.33<br>simple 10.83 10.84<br>smith-normal-form 9.38 9.48<br>tailfib 6.33 6.34<br>tak 6.27 6.24<br>tensor 8.48 8.49<br>tsp 7.31 7.28<br>
tyan 8.99 9.08<br>vector-concat 6.20 6.34<br>vector-rev 6.29 6.31<br>vliw 15.92 16.05<br>wc-input1 7.62 7.84<br>wc-scanStream 7.79 8.02<br>zebra 9.19 9.06<br>
zern 7.02 7.10<br>run time<br>benchmark MLton0 MLton1<br>barnes-hut 14.83 14.85<br>boyer 44.68 44.98<br>checksum 14.96 14.96<br>count-graphs 22.08 21.99<br>
DLXSimulator 22.41 22.49<br>fft 12.69 12.74<br>fib 32.96 32.78<br>flat-array 21.55 21.56<br>hamlet 41.78 41.75<br>imp-for 22.39 22.35<br>knuth-bendix 20.71 20.96<br>
lexgen 18.20 18.24<br>life 22.89 22.89<br>logic 20.15 20.13<br>mandelbrot 17.45 17.06<br>matrix-multiply 32.28 32.36<br>md5 28.05 27.96<br>merge 37.35 36.98<br>
mlyacc 21.18 21.21<br>model-elimination 32.01 32.08<br>mpuz 19.66 19.68<br>nucleic 15.21 15.17<br>output1 30.85 30.93<br>peek 18.12 19.07<br>psdes-random 13.22 12.81<br>
ratio-regions 100.76 101.11<br>ray 12.23 12.27<br>raytrace 14.33 14.34<br>simple 22.80 22.75<br>smith-normal-form 4.95 4.95<br>tailfib 18.21 18.26<br>tak 27.08 27.08<br>
tensor 19.11 21.70<br>tsp 19.38 19.23<br>tyan 23.37 23.37<br>vector-concat 22.96 22.84<br>vector-rev 29.38 29.40<br>vliw 19.68 19.64<br>wc-input1 29.64 28.52<br>
wc-scanStream 23.43 23.36<br>zebra 25.37 25.82<br>zern 18.41 18.37<br><br></div></div></div>