x86 update
Stephen Weeks
MLton@sourcelight.com
Tue, 12 Dec 2000 10:04:05 -0800 (PST)
> Here's my latest version of the x86 backend. Changelog looks like:
Integration was successful. I put the latest snapshot at
http://www.star-lab.com/sweeks/mlton.tgz
> - improved verifyLiveInfo pass;
> it's about 5X - 8X faster than before;
On the self compile, this sped up from about 36s (on 12/7) to 23s.
> - eliminated code related to supporting MachineOutput.Operand.Void variant
I actually managed to fix the backend so that Void is no longer in the datatype.
> - modified the signatures for ouputC and outputS (file generators)
I went ahead and modified them a bit more -- I made the results record types.
> Steve, I think you should be able to connect the backend's Switch support
> to your revised MachineOutput Cases with a minimum of difficulty.
> Hopefully, you should just need to uncomment the appropriate lines in
> x86-translate.fun; look for
> (* differentiate between the types of MachineOutput.cases *)
> for the appropriate lines.
That went smoothly as well.
Here are the changes to the x86 stuff in the current snapshot.
* List.flattenX has been renamed as List.concatX for consistency. All calls
should be changed
* x86-codegen.{sig,fun}
record type for outputC and outputS
* x86-translate
took out Void operands
took out Operand.toString (it now appears in MACHINE_OUTPUT)
* x86-allocate-registers
Gave one example of using folds instead of maps and flattens to save
allocation. Look for (* added by sweeks *).
Matthew, I noticed a lot of uses of List.concat and List.map in your code, and a
lot of other opportunities for "deforestation" or more efficient uses of list
operations. In general, List.fold should be used if possible since it does a
single loop over the list. For example, in x86-peephole, there is a call
(List.concat l) @ l'
which could be more efficiently implemented as
List.fold (rev l, l', op @)
There are lots of other examples. It should be possible to cut down allocation
quite a bit in the backend by going through the code with an eye for such
things.
In any case, here is the latest self compile log. Overall, things seem to have
slowed down a bit, but nothing insane. I should also point out that we should
occasionally run self compiles without -v, because they will run quite a bit
faster (the calls to MLton.size will be avoided). For example, on the current
snapshot with -v the self compile time is 838s, but without -v it is 761s.
--------------------------------------------------------------------------------
time mlton -v -no-polyvariance mlton.cm
MLton internal (built Tue Dec 12 09:17:44 2000 on starlinux.epr.com)
created this file on Tue Dec 12 09:19:55 2000.
Do not edit this file.
Flag settings:
aux: false
chunk: chunk per function
contify strategy: Both
defines: [NODEBUG,MLton_safe=TRUE,MLton_detectOverflow=TRUE]
fixed heap: None
indentation: 3
includes: [mlton.h]
inline: NonRecursive {product = 320,small = 60}
input file: mlton.cm
instrument: false
instrument Sxml: false
keep Cps: false
match: left to right
messages: true
native: true
native-commented: 0
native-copy-prop: true
future: 64
native-ieee-fp: false
native-move-hoist: true
native-optimize: 1
native-split: Some(100000)
polyvariance: None
print at fun entry: false
profile: false
show types: false
compile starting
parse and elaborate starting
parse and elaborate finished in 62.920
core-ml size is 89,950,008 bytes
numPeeks = 14
average position in property list = 0.0
numPeeks = 2441584
average position in bucket = 0.177
lexAndParse totals 11.400
elaborate totals 51.480
dead starting
dead finished in 0.090
basis size is 825,068 bytes
numPeeks = 73995
average position in property list = 0.0
numPeeks = 2441584
average position in bucket = 0.177
size = 189848
gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileiZ0ADv /tmp/filetN3NUk.c -L/home/sweeks/mlton/lib -lmlton -lm -lgmp
/tmp/fileiZ0ADv /tmp/fileleCdKY
infer starting
unification starting
unification finished in 2.580
finish infer starting
finish infer finished in 19.460
infer finished in 22.380
xml.unsimplified size is 36,842,408 bytes
numPeeks = 1107165
average position in property list = 0.000
numPeeks = 2582402
average position in bucket = 0.231
infer simplify starting
infer simplify finished in 2.980
xml size is 20,490,044 bytes
numPeeks = 3350460
average position in property list = 0.100
numPeeks = 2582402
average position in bucket = 0.231
size = 122793
num types in program = 21354
num types in table = 36561
hash table size is 0 bytes
mono starting
mono finished in 6.810
mono.unsimplified size is 43,832,764 bytes
numPeeks = 8047380
average position in property list = 0.042
numPeeks = 3329919
average position in bucket = 0.635
mono simplify starting
mono simplify finished in 3.790
mono size is 35,129,612 bytes
numPeeks = 11257104
average position in property list = 0.079
numPeeks = 3329919
average position in bucket = 0.635
size = 198308
num types in program = 13484
num types in table = 67408
hash table size is 0 bytes
implement exceptions starting
implement exceptions finished in 0.330
sxml.unsimplified size is 35,692,220 bytes
numPeeks = 11564742
average position in property list = 0.077
numPeeks = 3331607
average position in bucket = 0.635
implement exceptions simplify starting
implement exceptions simplify finished in 4.290
sxml size is 33,512,940 bytes
numPeeks = 14032337
average position in property list = 0.088
numPeeks = 3331607
average position in bucket = 0.635
polyvariance starting
polyvariance finished in 0.0
sxml.poly size is 33,512,940 bytes
numPeeks = 14032337
average position in property list = 0.088
numPeeks = 3331607
average position in bucket = 0.635
size = 184786
num types in program = 13043
num types in table = 67690
hash table size is 0 bytes
closure convert starting
flow analysis starting
flow analysis finished in 2.630
flow size is 4,608 bytes
numPeeks = 15112408
average position in property list = 0.082
numPeeks = 3348816
average position in bucket = 0.638
free variables starting
free variables finished in 0.370
globalize starting
globalize finished in 0.330
convert starting
convert finished in 28.630
closure convert finished in 32.450
cps.unsimplified size is 69,407,292 bytes
numPeeks = 22040929
average position in property list = 1.730
numPeeks = 3395663
average position in bucket = 0.637
closure convert simplify starting
simplify starting
num functions 12564
num local functions 143841
num primExps 161589
removeUnused starting
removeUnused finished in 3.340
num functions 10831
num local functions 83706
num primExps 143790
leaf-inline starting
inline starting
inline finished in 4.010
leaf-inline finished in 4.010
num functions 8195
num local functions 59130
num primExps 141922
raise-to-jump starting
inferHandlers starting
inferHandlers finished in 0.170
raise-to-jump finished in 4.630
num functions 8195
num local functions 58747
num primExps 141896
contify starting
contify finished in 2.760
num functions 3742
num local functions 54823
num primExps 133167
constantPropagation starting
inferHandlers starting
inferHandlers finished in 0.140
fixed point starting
fixed point finished in 3.100
constantPropagation finished in 7.650
num functions 3742
num local functions 54243
num primExps 97735
useless starting
analyze starting
analyze finished in 4.860
useless finished in 9.960
num functions 3742
num local functions 51799
num primExps 89011
removeUnused starting
removeUnused finished in 0.610
num functions 3672
num local functions 50505
num primExps 86692
simplifyTypes starting
fixed point starting
fixed point finished in 0.050
simplifyTypes finished in 2.950
num functions 3672
num local functions 42038
num primExps 83443
poly-equal starting
poly-equal finished in 0.160
num functions 3684
num local functions 42675
num primExps 83942
contify starting
contify finished in 2.160
num functions 3584
num local functions 42657
num primExps 83822
inline starting
inline finished in 4.320
num functions 999
num local functions 67665
num primExps 136619
removeUnused starting
removeUnused finished in 4.540
num functions 999
num local functions 65381
num primExps 135578
raise-to-jump starting
inferHandlers starting
inferHandlers finished in 0.180
raise-to-jump finished in 1.510
num functions 999
num local functions 65325
num primExps 135553
contify starting
contify finished in 3.060
num functions 998
num local functions 65323
num primExps 135551
introduce-loops starting
introduce-loops finished in 0.050
num functions 998
num local functions 65349
num primExps 135551
loop-invariant starting
loop-invariant finished in 2.990
num functions 998
num local functions 62428
num primExps 127775
flatten starting
analyze starting
analyze finished in 0.160
flatten finished in 3.800
num functions 998
num local functions 62510
num primExps 86735
redundant starting
redundant finished in 2.950
num functions 998
num local functions 62510
num primExps 86735
removeUnused starting
removeUnused finished in 0.730
num functions 998
num local functions 62208
num primExps 85089
simplify finished in 75.430
closure convert simplify finished in 75.430
cps size is 50,671,716 bytes
numPeeks = 53906183
average position in property list = 0.792
numPeeks = 3662642
average position in bucket = 0.810
backend starting
compute representations starting
compute representations finished in 0.020
inferHandlers starting
inferHandlers finished in 0.160
chunkify starting
chunkify finished in 0.060
allocate registers starting
allocate registers finished in 8.690
backend finished in 10.550
size is 58,777,972 bytes
numPeeks = 62188646
average position in property list = 0.772
numPeeks = 3663640
average position in bucket = 0.810
x86 code gen starting
outputC starting
outputC finished in 0.370
outputAssembly starting
translateChunk totals 16.640
simplify totals 93.230
verifyLiveInfo totals 22.870
computeJumpInfo totals 1.190
elimGoto totals 6.490
elimIff: 3
elimSwitch: 37
elimSimpleGoto totals 0.990
elimComplexGoto totals 0.850
verifyJumpInfo totals 0.0
peepholeBlock_pre totals 3.840
commuteBinALMD: 508
elimAddSub1: 1790
elimMDPow2: 180
toLivenessBlock totals 17.750
moveHoist totals 12.390
peepholeLivenessBlock totals 8.330
elimALCopy: 17476
elimFltACopy: 23
elimDeadDsts: 102
elimSelfMove: 1072
elimFltSelfMove: 0
commuteBinALMD: 1037
commuteFltBinA: 17
conditionalJump: 2930
copyPropagate totals 8.360
peepholeLivenessBlock_minor totals 2.130
elimDeadDsts_minor: 0
elimSelfMove_minor: 0
elimFltSelfMove_minor: 0
verifyLivenessBlock totals 0.0
toBlock totals 0.560
peepholeBlock_post totals 5.090
elimBinALMDDouble: 33
elimFltBinADouble: 0
elimCMPTST: 0
generateTransfers totals 3.600
allocateRegisters totals 398.270
toLiveness totals 209.460
toNoLiveness totals 0.0
Assembly.allocateRegisters totals 187.900
Instruction.allocateRegisters totals 102.470
pre totals 23.720
post totals 35.890
allocateOperand totals 21.870
allocateFltOperand totals 0.0
allocateFltStackOperands totals 0.0
Directive.allocateRegisters totals 24.360
validate totals 0.0
outputAssembly finished in 522.050
x86 code gen finished in 563.870
numPeeks = 69437503
average position in property list = 0.906
numPeeks = 3741207
average position in bucket = 0.831
compile finished in 806.380
gcc -S -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileLvc9HL.s /tmp/fileOREyEX.c
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileXIzt7u.o /tmp/fileLvc9HL.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileY1iOe3.o /tmp/fileV704d8.9.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filebiDCQs.o /tmp/fileYbt8Uv.8.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileNUjWDs.o /tmp/fileFE54Yu.7.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filewXXwnS.o /tmp/filexqdwsb.6.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filefoLEfe.o /tmp/file46MCcC.5.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileWGGyfJ.o /tmp/fileV8si3W.4.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileYm50ZP.o /tmp/filejNaHtE.3.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileL3bKKZ.o /tmp/filemasFEK.2.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileQaVcxu.o /tmp/fileQjVzGR.1.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filerN3oiz.o /tmp/fileansbzc.0.S
gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o mlton /tmp/fileXIzt7u.o /tmp/filerN3oiz.o /tmp/fileQaVcxu.o /tmp/fileL3bKKZ.o /tmp/fileYm50ZP.o /tmp/fileWGGyfJ.o /tmp/filefoLEfe.o /tmp/filewXXwnS.o /tmp/fileNUjWDs.o /tmp/filebiDCQs.o /tmp/fileY1iOe3.o -L/home/sweeks/mlton/lib -lmlton -lm -lgmp
max semispace size(bytes): 226,492,416
max stack size(bytes): 3,776,512
GC time(ms): 350,280 (43.4%)
maxPause(ms): 4,970
number of GCs: 242
bytes allocated: 40,433,081,240
bytes copied: 11,657,694,308
max bytes live: 153,467,768
size mlton
829.63user 8.44system 14:01.44elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (25522major+406338minor)pagefaults 0swaps
text data bss dec hex filename
3837534 547056 27336 4411926 435216 mlton