local refs
Matthew Fluet
Matthew Fluet <fluet@CS.Cornell.EDU>
Fri, 30 Nov 2001 18:46:14 -0500 (EST)
> > Compile times still disappointing; run times unchanged.
>
> I don't see anything noticeably bad compile-time wise, and the
> runtimes look on the whole OK. Code sizes are even a percent or two
> better.
Yeah, I knew that top level handler stuff was killing us. ;)
Here's how wc-input1 sped up:
[fluet@lennon wc-input1]$ mlprof -d 1 wc-input1.old mlmon.old.out
4.16 seconds of CPU time
main_0 79.57%
loop_21 51.66%
L_163 10.27%
L_164 7.25%
L_91 5.74%
L_92 5.14%
L_44 4.83%
L_162 4.53%
L_36 2.72%
L_47 2.42%
L_45 2.42%
L_41 2.11%
L_37 0.30%
loop_10 0.30%
L_42 0.30%
<unknown> 11.30%
Thread_atomicEnd (C) 6.97%
GC_doGC (C) 2.16%
L_165 ()
x_315 = Ref_ref (global_26)
x_316 = Ref_ref (global_26)
x_313 = Ref_ref (global_2)
x_312 = Ref_ref (global_2)
x_317 = Array_array (global_25)
x_309 = (x_317, x_316, x_315, x_314, x_313, x_312)
x_311 = Ref_deref (openIns_0)
x_310 = ::_5 (x_311, x_309)
Ref_assign (openIns_0, x_310)
x_233 = Ref_ref (x_309)
loop_21 (global_2)
loop_21 (x_152)
x_300 = Ref_deref (x_233)
x_162 = #6 x_300
x_160 = #5 x_300
x_157 = #1 x_300
x_303 = Ref_deref (x_160)
x_308 = Ref_deref (x_162)
x_307 = Int_lt (x_303, x_308)
case x_307 of
false => L_161 | true => L_164
[fluet@lennon wc-input1]$ mlprof -d 1 wc-input1.new mlmon.new.out
3.55 seconds of CPU time
main_0 75.77%
loop_21 31.23%
L_90 16.36%
L_159 16.36%
L_89 9.29%
L_160 7.43%
L_42 4.83%
L_34 4.09%
L_45 2.97%
L_43 2.97%
L_39 2.23%
loop_8 0.74%
L_161 0.37%
L_37 0.37%
loop_9 0.37%
L_44 0.37%
<unknown> 13.52%
Thread_atomicEnd (C) 9.30%
GC_doGC (C) 1.41%
L_161 ()
x_268 = Ref_ref (global_26)
x_197 = Ref_ref (global_26)
x_156 = Ref_ref (global_2)
x_158 = Ref_ref (global_2)
x_155 = Array_array (global_25)
x_296 = (x_155, x_197, x_268, x_220, x_156, x_158)
x_297 = Ref_deref (openIns_0)
x_295 = ::_5 (x_297, x_296)
Ref_assign (openIns_0, x_295)
loop_21 (global_2)
loop_21 (x_150)
x_290 = Ref_deref (x_156)
x_294 = Ref_deref (x_158)
x_293 = Int_lt (x_290, x_294)
case x_293 of
false => L_157 | true => L_160
So, it wasn't an accumulator ref; we just avoid an extra level of
indirection in accessing the known components of the tuple.
Here's wc-scanStream:
[fluet@lennon wc-scanStream]$ mlprof -d 1 wc-scanStream.old mlmon.old.out
6.38 seconds of CPU time
main_0 81.19%
loop_40 25.48%
input1_0 24.90%
L_182 21.04%
L_186 5.60%
L_185 4.83%
L_184 4.63%
L_183 3.47%
L_44 2.70%
L_45 1.54%
L_36 1.35%
L_47 1.16%
L_42 0.97%
L_41 0.77%
L_46 0.58%
L_178 0.19%
loop_9 0.19%
loop_8 0.19%
L_43 0.19%
L_165 0.19%
<unknown> 12.54%
Thread_atomicEnd (C) 5.33%
GC_doGC (C) 0.94%
loop_40 (x_245, x_349, x_348, x_347, x_346, x_345)
input1_0 (x_349, x_348, x_347, x_346, x_345)
input1_0 (x_342, x_336, x_305, x_306, x_304)
x_344 = Array_length (x_336)
x_343 = Int_geu (x_342, x_344)
case x_343 of
false => L_186 | true => L_187
L_182 ()
loop_40 (x_245, x_337, x_336, x_305, x_306, x_304)
[fluet@lennon wc-scanStream]$ nm wc-scanStream.old | grep loop_40
0804d44a t MLtonProfile916$$0.main_0$$1.loop_40$$2.loop_40$$Begin
0804d44a t loop_40
[fluet@lennon wc-scanStream]$ nm wc-scanStream.old | grep input1_0
0804d486 t MLtonProfile917$$0.main_0$$1.input1_0$$2.input1_0$$Begin
0804d486 t input1_0
[fluet@lennon wc-scanStream]$ mlprof -d 1 wc-scanStream.new mlmon.new.out
7.23 seconds of CPU time
main_0 81.47%
loop_37 32.94%
L_165 22.75%
input1_0 22.41%
L_167 3.90%
L_168 3.57%
L_166 3.06%
L_169 2.72%
L_42 2.55%
L_34 1.53%
L_45 1.53%
L_39 1.36%
L_43 1.19%
L_36 0.17%
loop_9 0.17%
L_44 0.17%
<unknown> 13.00%
Thread_atomicEnd (C) 4.70%
GC_doGC (C) 0.69%
GC_gc (C) 0.14%
loop_37 (x_248, x_317, x_316, x_315, x_314, x_313)
input1_0 (x_317, x_316, x_315, x_314, x_313)
input1_0 (x_310, x_304, x_273, x_274, x_272)
x_312 = Array_length (x_304)
x_311 = Int_geu (x_310, x_312)
case x_311 of
false => L_169 | true => L_170
L_165 ()
loop_37 (x_248, x_305, x_304, x_273, x_274, x_272)
[fluet@lennon wc-scanStream]$ nm wc-scanStream.new | grep loop_37
0804d104 t MLtonProfile893$$0.main_0$$1.loop_37$$2.loop_37$$Begin
0804d104 t loop_37
[fluet@lennon wc-scanStream]$ nm wc-scanStream.new | grep input_1
[fluet@lennon wc-scanStream]$ nm wc-scanStream.new | grep input1_0
0804d140 t MLtonProfile894$$0.main_0$$1.input1_0$$2.input1_0$$Begin
0804d140 t input1_0
No idea here; they look pretty much the same; and new even looks like it
has better loop alignment.
I'm going to go ahead and check in what I have.