x86 Update
Matthew Fluet
fluet@CS.Cornell.EDU
Fri, 18 Aug 2000 17:08:24 -0400 (EDT)
Here are the latest performance numbers for x86-G1 MLton. This uses
the may-alias information that I outlined earlier.
benchmark compile-time
C x86 x86/C
checksum 3.19 3.14 0.98
count-graphs 9.68 7.28 0.75
fib 2.84 2.88 1.01
knuth-bendix 14.36 9.68 0.67
life 7.19 5.79 0.81
logic 43.69 24.62 0.56
mlyacc 402.72 323.45 0.80
mpuz 4.48 3.89 0.87
ratio-regions 16.73 12.31 0.74
smith-normal-form 220.72 88.72 0.40
tak 2.89 2.88 1.00
wc 7.29 6.01 0.82
benchmark executable-size
C x86 x86/C
checksum 33319 32791 0.98
count-graphs 54343 52207 0.96
fib 33223 32559 0.98
knuth-bendix 82807 75623 0.91
life 50887 47823 0.94
logic 175935 181559 1.03
mlyacc 627295 574463 0.92
mpuz 38695 37719 0.97
ratio-regions 63511 67439 1.06
smith-normal-form 168974 161422 0.96
tak 33247 32655 0.98
wc 49543 47831 0.97
benchmark run-time
C x86 x86/C
checksum 11.63 12.40 1.07
count-graphs 18.95 18.99 1.00
fib 22.24 16.50 0.74
knuth-bendix 37.50 33.87 0.90
life 103.32 96.83 0.94
logic 92.15 70.00 0.76
mlyacc 41.18 30.46 0.74
mpuz 76.54 74.63 0.98
ratio-regions 41.70 31.82 0.76
smith-normal-form 4.03 4.00 0.99
tak 48.45 40.02 0.83
wc 24.85 22.21 0.89
All in all, I think a successful conclusion.
Here's what's on my short-term/mid-term todo list:
1. Floating-point support
2. Full front-end support for x86-codegen;
Steve and I spoke a little bit about this. It might make sense for
each back end to return either a list of object files or a list of
source files to the front end, which would then either link them
all together or compile and link. There are a couple of design
decisions to be made about how the -S, -C, -c options interact with
the x86-backend, how multiple assembly-files should be handled,
etc.
3. Inline frameSize and frameLayout pointers in the code-segment
(this follows a suggestion from Suresh; since we have the return
address for the frame we're interested in, we can place the
relevant size and pointer at negative offsets from the return
address. This gives constant time lookup of these values, rather
than the hash table technique I'm currently using. The hooks are
there in the simplifier to add pre-label assembly, although it
would take some minor changes to the GC. It probably wouldn't be a
win on the benchmarks, but I imagine that it might pay off for a
self-compile where GC's are occuring often.)
4. Using the liveness information to carry pseudo-regs across block
boundaries in registers
5. Investigate jump tables for large switch transfers.
6. Consider additional peephole optimizations after register
allocation; this could clean up some spurious register-register
moves or multiple saves to the same address.
After getting 1 and 2 done, I think the backend will be robust enough
for us to live with for a few months. I'd like to add 3 and 6,
because I don't think they will be particularly difficult. On the
other hand, 4 and 5 are probably going to take some work, especially
when they are both in effect (i.e., coordinating all of the jump table
destinations to have the same pseudo-reg to register mappings), but I
could see them paying off.