[MLton] latest benchmarks
Matthew Fluet
fluet at tti-c.org
Wed Jun 20 09:15:04 PDT 2007
I've merged the x86_64 branch into trunk. Since the previous
announcement of the experimental release, there were only two minor bugs
reported:
1) Bug with -align 8 on x86_64
2) Inconsistent behavior with -const 'MLton.detectOverflow false'
These have both been fixed, and I'm pretty happy with the state of the
x86_64 port.
I ran the benchmark suite to compare the last public release to the
current trunk. It is a bit of an apples-to-oranges comparison, since I
ran the benchmarks on an AMD Opteron (64-bit) system. So, the 20051205
compiler (and its resulting executables) are running in 32-bit mode,
while the trunk compiler (and its resulting executables) are running in
64-bit mode.
[BTW, it would be nice if someone could run a corresponding benchmark
suite on a 32-bit system, for a more apples-to-apples comparison.]
You can see all of the results at:
http://mlton.org/cgi-bin/viewsvn.cgi/*checkout*/mlton/trunk/doc/x86_64-port-notes/bench-20070619.txt?rev=5659
Some of the highlights:
* Benchmarks were run on a uni-core, dual-processor AMD Opteron 2.0GHz,
8GB Memory, Fedora Core 6 machine (with gcc version 4.1.1 and linux
version 2.6.20 (x86_64)).
* compile time and code size is up across the board on trunk vs
20051205. I suspect that part of the code size increase can be
attributed to the comparison of 32-bit executables to 64-bit
executables. Any 64-bit operation requires an additional 8bit
instruction prefix (as do 32-bit ops that touch the extended register
set). Compile time is probably partly explained by the bigger Basis
Library implementation (increasing elaboration time and carrying more
code through early optimizations), and partly by the fact that the trunk
compiler is executing a little slower than the 20051205 compiler.
* recent versions of gcc are doing fairly well with the C code. (Note
that using -codegen c with 20051205 uses the version of gcc on the host
machine.) Indeed, the flat-array.sml benchmark needs to be revised, as
gcc recognizes that the inner loop is pure (Overflow exceptions are
handled within the loop) and unused. The SSA{,2} optimizer should also
discover that the loop may be optimized, but that is another issue.
GCC also does fairly well on the checksum benchmark with 20051205,
though it does horribly on the checksum benchmark with trunk.
I suspect that the later behavior is due to the fact that on x86_64,
sequences (arrays/vectors) are indexed by 64-bit integers in the
primitive operations (sub, update, etc), but indexed by 32-bit integers
in the user code (Array.sub, Array.update, etc. since Int.int
corresponds to Int32.int). Hence, there are quite a few 64/32
conversions going on.
* I note that with both native codegens and C codegens, with both
20051205 and trunk, that -align 8 often has a positive impact on
runtime, and rarely has a significant negative impact. This might be
due to the Opteron memory system. Aligned reads probably help most on
Real64 intensive benchmarks.
* The amd64 codegen is doing alright as compared to the x86 codegen. I
see at most a factor of 2 slowdown, and a few speedups. Again, I'm not
sure what real conclusions can be drawn. Some slowdowns are going to be
due to the changes to the runtime and Basis Library since 20051205; to
isolate those, I need a comparison of 20051205 to trunk on a 32-bit
system. Some slowdowns are probably going to be due to the sequence
indexing discussed above.
More information about the MLton
mailing list