sigaltstack and cygwin
Matthew Fluet
Matthew Fluet <fluet@CS.Cornell.EDU>
Wed, 6 Mar 2002 12:03:06 -0500 (EST)
> OK. No problem. I just checked in the changes to put the assumes in
> and all the tests passed, with one exception. For testing, I turned
> on "reserveEsp" (defined in x86-codegen.fun) for all compiles instead
> of just for Cygwin programs that use signals. With that, the "slower"
> regression test gives a segfault. I'm not sure whether I missed an
> assume or there is a register allocator bug. Matthew, if you could
> take a look, that would be great. Just change reserveEsp in
> x86-codegen.fun to true.
Fixed. It was really easy and I should have remembered it earlier. There
is a hack in the register allocator to _not_ adjust c_stackP after the
last C call in a basic block; the reasoning is after the final C call,
popping all the arguments by adding to %esp will simply make %esp equal to
c_stackP, which we'll just load in the next block that makes a C call.
So, I was saving one instruction per C call. In Cygwin (or with
reserveEsp = true) this is very bad, because we'll never refetch c_stackP
from memory, we'll just assume it's up to date in %esp. So, with
slower.sml, there is a loop that executes for 4304967296 times, each time
through the loop making two C calls -- we simply ran out of stack space.
Anyways, I eliminated that hack. Here are the benchmark results:
MLton0 -- mlton-stable
MLton1 -- mlton
MLton2 -- mlton -native-reserve-esp true
compile time
benchmark MLton0 MLton1 MLton2
barnes-hut 2.09 2.04 2.01
checksum 0.52 0.48 0.48
count-graphs 1.38 1.38 1.37
DLXSimulator 3.76 3.71 3.71
fft 1.04 1.03 1.05
fib 0.46 0.44 0.44
hamlet 45.11 44.86 44.75
imp-for 0.48 0.46 0.47
knuth-bendix 1.82 1.79 1.79
lexgen 4.85 4.83 4.83
life 1.05 1.02 1.02
logic 2.36 2.33 2.33
mandelbrot 0.49 0.47 0.46
matrix-multiply 0.54 0.52 0.53
md5 1.01 0.99 0.98
merge 0.49 0.49 0.48
mlyacc 18.20 18.15 18.15
mpuz 0.68 0.66 0.64
nucleic 2.26 2.21 2.21
peek 0.83 0.78 0.80
psdes-random 0.53 0.51 0.51
ratio-regions 1.98 1.95 1.98
ray 3.02 3.02 2.95
raytrace 9.07 9.03 9.02
simple 6.13 6.02 6.05
smith-normal-form 7.01 6.95 6.97
tailfib 0.44 0.45 0.45
tak 0.45 0.46 0.45
tensor 2.51 2.50 2.49
tsp 1.22 1.18 1.21
tyan 3.21 3.18 3.15
vector-concat 0.51 0.51 0.52
vector-rev 0.52 0.49 0.50
vliw 10.96 10.81 10.76
wc-input1 1.38 1.34 1.36
wc-scanStream 1.45 1.42 1.38
zebra 4.90 5.10 5.11
zern 0.90 0.85 0.87
run time
benchmark MLton0 MLton1 MLton2
barnes-hut 3.73 3.74 3.75
checksum 3.18 3.18 3.31
count-graphs 3.54 3.54 3.76
DLXSimulator 14.58 14.58 14.58
fft 8.76 8.77 8.82
fib 3.37 3.37 3.37
hamlet 7.20 7.15 7.20
imp-for 7.33 6.61 6.61
knuth-bendix 5.64 5.64 5.52
lexgen 9.33 9.35 9.37
life 5.04 5.11 4.84
logic 17.65 17.70 17.58
mandelbrot 6.06 6.06 6.06
matrix-multiply 2.42 2.42 2.40
md5 1.76 1.76 1.80
merge 48.12 48.12 48.31
mlyacc 8.65 8.65 8.72
mpuz 4.26 4.26 4.34
nucleic 8.00 8.00 8.00
peek 0.82 0.92 0.82
psdes-random 2.78 2.78 3.14
ratio-regions 8.13 8.12 8.19
ray 3.36 3.34 3.26
raytrace 4.86 4.86 4.90
simple 5.84 5.84 5.98
smith-normal-form 0.67 0.67 0.67
tailfib 10.96 10.96 10.95
tak 7.74 7.74 7.74
tsp 7.51 7.51 7.52
tyan 16.04 16.08 16.06
vector-concat 2.56 2.56 3.16
vector-rev 4.27 4.26 4.30
vliw 5.68 5.68 5.65
wc-input1 1.92 1.92 1.67
wc-scanStream 1.96 1.96 2.20
zebra 1.77 1.76 1.68
zern 32.07 32.10 32.01
run time ratio
benchmark MLton1 MLton2
barnes-hut 1.00 1.01
checksum 1.00 1.04
count-graphs 1.00 1.06
DLXSimulator 1.00 1.00
fft 1.00 1.01
fib 1.00 1.00
hamlet 0.99 1.00
imp-for 0.90 0.90
knuth-bendix 1.00 0.98
lexgen 1.00 1.00
life 1.01 0.96
logic 1.00 1.00
mandelbrot 1.00 1.00
matrix-multiply 1.00 0.99
md5 1.00 1.02
merge 1.00 1.00
mlyacc 1.00 1..01
mpuz 1.00 1.02
nucleic 1.00 1.00
peek 1.12 1.00
psdes-random 1.00 1.13
ratio-regions 1.00 1.01
ray 1.00 0.97
raytrace 1.00 1.01
simple 1.00 1.02
smith-normal-form 1.00 1.00
tailfib 1.00 1.00
tak 1.00 1.00
tsp 1.00 1.00
tyan 1.00 1.00
vector-concat 1.00 1.23
vector-rev 1.00 1.01
vliw 1.00 0.99
wc-input1 1.00 0.87
wc-scanStream 1.00 1.12
zebra 1.00 0.95
zern 1.00 1.00
size
benchmark MLton0 MLton1 MLton2
barnes-hut 57,275 57,499 55,195
checksum 23,537 23,569 23,505
count-graphs 45,009 45,073 44,081
DLXSimulator 88,569 88,697 87,193
fft 33,569 33,601 33,153
fib 23,569 23,569 23,505
hamlet 1,101,560 1,103,640 1,098,744
imp-for 23,569 23,569 23,505
knuth-bendix 64,994 65,122 63,586
lexgen 149,569 149,825 146,497
life 40,273 40,273 39,217
logic 80,657 80,657 80,049
mandelbrot 23,633 23,633 23,601
matrix-multiply 24,113 24,145 24,113
md5 33,218 33,346 32,930
merge 24,785 24,817 24,593
mlyacc 464,577 465,697 458,369
mpuz 28,145 28,145 27,953
nucleic 62,545 62,545 61,809
peek 32,194 32,290 31,778
psdes-random 25,009 25,041 24,977
ratio-regions 43,281 43,313 43,153
ray 84,312 84,632 81,688
raytrace 237,349 237,669 235,173
simple 180,537 180,825 178,233
smith-normal-form 138,667 138,731 136,363
tailfib 23,281 23,281 23,217
tak 23,697 23,697 23,633
tensor 56,970 57,002 56,074
tsp 38,594 38,690 38,466
tyan 85,666 85,954 83,266
vector-concat 24,497 24,497 24,369
vector-rev 24,465 24,497 24,337
vliw 295,665 296,945 286,769
wc-input1 48,666 48,730 47,386
wc-scanStream 49,370 49,434 48,090
zebra 110,178 110,242 106,242
zern 31,168 31,232 30,720
tensor is raising a runtime exception; is it a known problem?
mlton-stable is code from yesterday. mlton is the checked in code (i.e.,
with the %esp add hack removed). mlton -native-reserve-esp true is the
checked in code with reserveEsp forced to true in the codegen.
Results are mixed. Reserving esp can both hurt or help; it hurts just
from register pressure. It can help when there are a lot of C calls
(wc-input1) and the cost of fetching c_stackP is the bottleneck.
The esp hack seems not to have much effect, except on peek. Nothing
obvious going on there; the assembly between mlton-stable and mlton are
identical except for 49 addl instructions that are dropped in
mlton-stable.