new liveness pass
Stephen Weeks
sweeks@intertrust.com
Fri, 1 Dec 2000 17:06:46 -0800 (PST)
Most excellent. The new liveness pass that I wrote sped up allocate registers
by quite a bit. For a self compile, it cut down the time from 84 seconds to 11
seconds. The pass is based on the liveness algorithm described in section 4.13,
page 132, of Morgan's "Building and Optimizing Compiler". BTW, the Dragon book
and Muchnick's book provided no help at all on speeding up liveness. They
suggest using bit-vectors, which is infeasible for MLton due to the large size
of CPS functions.
Anyways, here is a description of the new algorithm.
Walk over the whole program and
1. Build the predecessor graph of basic blocks. Each basic block records the
set of its predecessors and the set of variables live at the beginning of
the block.
2. For each variable record the block in which is defined and the list of
blocks where it is used.
Now, for each variable, propagate the liveness information backwards from uses
along basic blocks until the definition block is reached.
That's it. The reason why it's so fast is that it processes one variable at a
time, and hence the operation to determine if that variable is in the live list
for a particular block is constant time -- the variable is either at the head of
the list or it's not there.
Matthew, maybe you can use some variant of this algorithm, if we don't figure
out how to avoid doing liveness twice altogether.
Also, here is the log for a self compile with the new pass. The self compile
time is down to 765 seconds. This is the first time we're under 800! The funny
thing is, the last self compile was at 948 seconds, so the difference can not be
explained by this pass alone. The (unchanged) x86 codegen sped up by 59 seconds!
My only conjecture is that the backend is leaving around some property lists on
labels, which is slowing down the codegen. The new liveness pass doesn't
leave quite as many around, so maybe that was helping. I'll look into making
sure the backend cleans up after itself.
% cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 3
cpu MHz : 731.480093
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr xmm
bogomips : 729.09
Compiling mlton (takes a while)
time mlton -v -no-polyvariance mlton.cm
MLton 20001130 (built Fri Dec 1 15:47:44 2000 on starlinux.epr.com)
created this file on Fri Dec 1 15:49:53 2000.
Do not edit this file.
Flag settings:
aux: false
chunk: coalesce 2000
contify strategy: Both
defines: [NODEBUG,MLton_safe=TRUE,MLton_detectOverflow=TRUE]
fixed heap: None
indentation: 3
includes: [mlton.h]
inline: NonRecursive {product = 320,small = 60}
input file: mlton.cm
instrument: false
instrument Sxml: false
keep Cps: false
match: left to right
messages: true
native: true
native-commented: 0
native-copy-prop: true
native-ieee-fp: false
native-move-hoist: true
native-optimize: 1
native-split: Some(100000)
polyvariance: None
print at fun entry: false
profile: false
show types: false
compile starting
parse and elaborate starting
parse and elaborate finished in 54.500
core-ml size is 89,907,652 bytes
numPeeks = 14
average position in property list = 0.0
numPeeks = 2450808
average position in bucket = 0.221
dead starting
dead finished in 0.090
basis size is 867,352 bytes
numPeeks = 72743
average position in property list = 0.0
numPeeks = 2450808
average position in bucket = 0.221
size = 187411
gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileuOZ0Ia /tmp/fileuiNTnF.c -L/home/sweeks/mlton/lib -lmlton -lm -lgmp
/tmp/fileuOZ0Ia /tmp/fileeaXAYV
infer starting
unification starting
unification finished in 1.570
finish infer starting
finish infer finished in 20.950
infer finished in 23.290
xml.unsimplified size is 37,952,500 bytes
numPeeks = 1077855
average position in property list = 0.000
numPeeks = 2589647
average position in bucket = 0.271
infer simplify starting
infer simplify finished in 3.510
xml size is 22,331,948 bytes
numPeeks = 4570806
average position in property list = 0.546
numPeeks = 2589647
average position in bucket = 0.271
size = 122292
num types in program = 20501
num types in table = 35249
hash table size is 0 bytes
mono starting
mono finished in 5.540
mono.unsimplified size is 42,350,988 bytes
numPeeks = 9074433
average position in property list = 0.275
numPeeks = 3306823
average position in bucket = 0.601
mono simplify starting
mono simplify finished in 5.740
mono size is 34,543,836 bytes
numPeeks = 13789800
average position in property list = 0.428
numPeeks = 3306823
average position in bucket = 0.601
size = 191119
num types in program = 12946
num types in table = 64373
hash table size is 0 bytes
implement exceptions starting
implement exceptions finished in 0.320
sxml.unsimplified size is 35,101,760 bytes
numPeeks = 14086561
average position in property list = 0.419
numPeeks = 3309224
average position in bucket = 0.602
implement exceptions simplify starting
implement exceptions simplify finished in 2.970
sxml size is 33,016,448 bytes
numPeeks = 17578537
average position in property list = 0.470
numPeeks = 3309224
average position in bucket = 0.602
polyvariance starting
polyvariance finished in 0.0
sxml.poly size is 33,016,448 bytes
numPeeks = 17578537
average position in property list = 0.470
numPeeks = 3309224
average position in bucket = 0.602
size = 178729
num types in program = 12513
num types in table = 64653
hash table size is 0 bytes
closure convert starting
flow analysis starting
flow analysis finished in 0.760
flow size is 4,252 bytes
numPeeks = 18621855
average position in property list = 0.443
numPeeks = 3325768
average position in bucket = 0.605
free variables starting
free variables finished in 0.380
globalize starting
globalize finished in 0.290
convert starting
convert finished in 32.310
closure convert finished in 34.180
cps.unsimplified size is 69,048,712 bytes
numPeeks = 25297174
average position in property list = 1.679
numPeeks = 3685461
average position in bucket = 0.672
closure convert simplify starting
simplify starting
num functions 12069
num local functions 138606
num primExps 156489
numPeeks = 25548537
average position in property list = 1.662
numPeeks = 3685461
average position in bucket = 0.672
remove-unused starting
remove-unused finished in 4.440
num functions 10422
num local functions 80482
num primExps 139488
numPeeks = 28693549
average position in property list = 1.483
numPeeks = 3685786
average position in bucket = 0.672
leaf-inline starting
inline starting
inline finished in 4.020
leaf-inline finished in 4.020
num functions 7864
num local functions 57906
num primExps 138322
numPeeks = 30304546
average position in property list = 1.411
numPeeks = 3685786
average position in bucket = 0.672
raise-to-jump starting
inferHandlers starting
inferHandlers finished in 0.160
raise-to-jump finished in 4.650
num functions 7864
num local functions 57524
num primExps 138295
numPeeks = 31724934
average position in property list = 1.357
numPeeks = 3685786
average position in bucket = 0.672
contify starting
contify finished in 3.110
num functions 3668
num local functions 54020
num primExps 130444
numPeeks = 32978353
average position in property list = 1.312
numPeeks = 3689431
average position in bucket = 0.674
constant propagation starting
inferHandlers starting
inferHandlers finished in 0.140
fixed point starting
fixed point finished in 3.620
constant propagation finished in 7.940
num functions 3668
num local functions 53397
num primExps 95704
numPeeks = 35346094
average position in property list = 1.244
numPeeks = 3823941
average position in bucket = 0.823
useless starting
analyze starting
analyze finished in 5.060
useless finished in 9.970
num functions 3668
num local functions 50915
num primExps 87502
numPeeks = 37873441
average position in property list = 1.186
numPeeks = 3961408
average position in bucket = 0.838
remove-unused starting
remove-unused finished in 2.550
num functions 3606
num local functions 49804
num primExps 85347
numPeeks = 39588129
average position in property list = 1.136
numPeeks = 3961500
average position in bucket = 0.838
simplify-types starting
fixed point starting
fixed point finished in 0.060
simplify-types finished in 2.950
num functions 3606
num local functions 41655
num primExps 82150
numPeeks = 41543485
average position in property list = 1.095
numPeeks = 3966951
average position in bucket = 0.838
poly-equal starting
poly-equal finished in 0.170
num functions 3618
num local functions 42299
num primExps 82663
numPeeks = 41667937
average position in property list = 1.091
numPeeks = 3967620
average position in bucket = 0.838
contify starting
contify finished in 2.140
num functions 3520
num local functions 42285
num primExps 82537
numPeeks = 42529380
average position in property list = 1.072
numPeeks = 3967718
average position in bucket = 0.838
inline starting
inline finished in 4.390
num functions 978
num local functions 67568
num primExps 134281
numPeeks = 44670916
average position in property list = 1.026
numPeeks = 3967718
average position in bucket = 0.838
remove-unused starting
remove-unused finished in 0.860
num functions 978
num local functions 64888
num primExps 133203
numPeeks = 47282825
average position in property list = 0.971
numPeeks = 3967731
average position in bucket = 0.838
raise-to-jump starting
inferHandlers starting
inferHandlers finished in 0.170
raise-to-jump finished in 3.440
num functions 978
num local functions 64832
num primExps 133178
numPeeks = 48835397
average position in property list = 0.944
numPeeks = 3967731
average position in bucket = 0.838
contify starting
contify finished in 3.080
num functions 977
num local functions 64830
num primExps 133176
numPeeks = 50122736
average position in property list = 0.923
numPeeks = 3967732
average position in bucket = 0.838
introduce-loops starting
introduce-loops finished in 0.050
num functions 977
num local functions 64854
num primExps 133176
numPeeks = 50297035
average position in property list = 0.920
numPeeks = 3967756
average position in bucket = 0.838
loop-invariant starting
loop-invariant finished in 3.010
num functions 977
num local functions 62335
num primExps 126505
numPeeks = 51676987
average position in property list = 0.899
numPeeks = 3967756
average position in bucket = 0.838
flatten starting
analyze starting
analyze finished in 0.150
flatten finished in 3.700
num functions 977
num local functions 62413
num primExps 84082
numPeeks = 53677400
average position in property list = 0.871
numPeeks = 3970447
average position in bucket = 0.838
redundant starting
redundant finished in 0.770
num functions 977
num local functions 62413
num primExps 84082
numPeeks = 54394802
average position in property list = 0.861
numPeeks = 3970447
average position in bucket = 0.838
remove-unused starting
remove-unused finished in 2.900
num functions 977
num local functions 62119
num primExps 82433
numPeeks = 56609014
average position in property list = 0.829
numPeeks = 3970460
average position in bucket = 0.838
simplify finished in 74.090
closure convert simplify finished in 74.090
cps size is 49,689,064 bytes
numPeeks = 56609014
average position in property list = 0.829
numPeeks = 3970460
average position in bucket = 0.838
backend starting
compute representations starting
compute representations finished in 0.030
inferHandlers starting
inferHandlers finished in 0.140
chunkify starting
chunkify finished in 2.810
allocate registers starting
allocate registers finished in 10.570
reg size is 4,420 bytes
numPeeks = 67004509
average position in property list = 0.916
numPeeks = 3970460
average position in bucket = 0.838
backend finished in 15.630
size is 59,136,316 bytes
numPeeks = 68683322
average position in property list = 0.923
numPeeks = 3986796
average position in bucket = 0.838
x86 code gen starting
outputC starting
outputC finished in 0.440
outputAssembly starting
translateChunk totals 18.250
simplify totals 103.800
verifyLiveInfo totals 31.100
computeJumpInfo totals 2.680
elimGoto totals 13.200
elimIff: 2
elimSwitch: 17
elimSimpleGoto totals 2.660
elimComplexGoto totals 3.980
verifyJumpInfo totals 1.350
peepholeBlock_pre totals 2.030
commuteBinALMD: 476
elimBinAL0L: 0
elimBinAL0R: 0
elimAddSub1: 1787
elimMDPow2: 183
toLivenessBlock totals 9.820
moveHoist totals 14.080
peepholeLivenessBlock totals 9.840
elimALCopy: 17039
elimFltACopy: 20
elimDeadDsts: 92
elimSelfMove: 1076
elimFltSelfMove: 0
commuteBinALMD: 1059
commuteFltBinA: 17
conditionalJump: 3249
copyPropagate totals 5.190
peepholeLivenessBlock_minor totals 4.920
elimDeadDsts_minor: 0
elimSelfMove_minor: 0
elimFltSelfMove_minor: 0
verifyLivenessBlock totals 0.0
toBlock totals 0.530
peepholeBlock_post totals 4.930
elimBinALMDouble: 33
elimFltBinADouble: 0
elimCMPTST: 0
generateTransfers totals 2.290
allocateRegisters totals 322.390
toLiveness totals 133.150
toNoLiveness totals 0.010
Assembly.allocateRegisters totals 188.560
Instruction.allocateRegisters totals 98.020
pre totals 22.920
post totals 34.130
allocateOperand totals 24.600
allocateFltOperand totals 0.0
allocateFltStackOperands totals 0.0
Directive.allocateRegisters totals 13.560
validate totals 0.0
outputAssembly finished in 452.100
x86 code gen finished in 496.480
numPeeks = 73619953
average position in property list = 0.925
numPeeks = 4048821
average position in bucket = 0.841
compile finished in 737.110
gcc -S -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filebOiKTi.s /tmp/fileW6a8tX.c
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileVuChSy.o /tmp/filebOiKTi.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filesavxTO.o /tmp/filemC95zY.9.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filexn25xG.o /tmp/fileBXQlhQ.8.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileNUFuxn.o /tmp/fileDYOgyZ.7.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filexEGwPJ.o /tmp/fileRJxRHU.6.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileWxcryO.o /tmp/fileDhBbVB.5.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileOq5dTS.o /tmp/filey4L0Da.4.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filebdLNzJ.o /tmp/filelifDPe.3.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileABQjvN.o /tmp/fileYyPBMd.2.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filer5RlPz.o /tmp/filemVttax.1.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileIn50AH.o /tmp/file4EVnXu.0.s
gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o mlton /tmp/fileVuChSy.o /tmp/fileIn50AH.o /tmp/filer5RlPz.o /tmp/fileABQjvN.o /tmp/filebdLNzJ.o /tmp/fileOq5dTS.o /tmp/fileWxcryO.o /tmp/filexEGwPJ.o /tmp/fileNUFuxn.o /tmp/filexn25xG.o /tmp/filesavxTO.o -L/home/sweeks/mlton/lib -lmlton -lm -lgmp
size mlton
757.02user 7.94system 12:47.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (19269major+397606minor)pagefaults 0swaps
text data bss dec hex filename
3703961 646120 23272 4373353 42bb69 mlton