[MLton] commit r4241: First cut at fixing -codegen c -profile time
Matthew Fluet
fluet@mlton.org
Thu, 17 Nov 2005 19:06:17 -0800
A different approach to time profiling: maintain the current source
position via program operations. This has the advantage that it
significantly more portable than trying to inject just the right
information into C code in a manner that isn't corrupted by gcc
optimizations.
The high points:
* added volatile uint curSourceSeqsIndex; to struct GC_state;
* replaced PROFILE_TIME with PROFILE_TIME_FIELD and
PROFILE_TIME_LABEL, which distinguishes the two methods of time
profiling (the former uses a the GC_state field to compute the
sourceSeqsIndex, the later uses assembler labels).
* added -profile time-field and -profile time-label as expert
options; -profile time remains and is set according to the
codegen; you can use the expert options to override the codegen
default (use time-label with the C-codegen at your peril, use
time-field with the native codegen for benchmarking purposes).
* made _minimal_ changes to profile.fun; essentially, if we would
have put in a profile label, instead put in an explict move of
the sourceSeqsIndex to the GC_state field.
Note that this sets curSourceSeqsIndex more often than necessary.
Profile labels were inserted for code coverage properties; namely, at
the beginning of every basic block -- whether or not control could
have flowed from a block with a different souceSeqsIndex. Inspecting
the generated code, it is clear that we could insert fewer sets.
Here is a comparison of the results of using different combinations of
codegen and profiling methods:
[fluet@localhost native]$ mlton -codegen native -profile time hamlet.sml
[fluet@localhost native]$ ./hamlet > /dev/null
[fluet@localhost native]$ mlprof -thresh 2.0 hamlet mlmon.out
25.98 seconds of CPU time (4.54 seconds GC)
function cur
--------------------------- -----
<gc> 14.9%
Sequence.Slice.collate.loop 12.7%
BinaryMapFn.find.mem 11.0%
Integer.scan 8.6%
BinaryMapFn.insert 8.0%
Lab.compare 5.2%
Sequence.make2 4.4%
BinaryMapFn.T' 3.9%
Integer.scan.negate 3.0%
EvalCore.evalExp 2.6%
Integer.scan.num 2.2%
BinaryMapFn.foldli.fold 2.1%
Integer.scan.finishNum 2.1%
[fluet@localhost native]$ mlton -codegen native -profile time-field hamlet.sml
[fluet@localhost native]$ ./hamlet > /dev/null
[fluet@localhost native]$ mlprof -thresh 2.0 hamlet mlmon.out
29.31 seconds of CPU time (4.42 seconds GC)
function cur
--------------------------- -----
Sequence.Slice.collate.loop 18.4%
<gc> 13.1%
Integer.scan 9.3%
BinaryMapFn.insert 6.2%
BinaryMapFn.find.mem 6.0%
Sequence.make2 4.3%
Integer.scan.negate 3.8%
Lab.compare 3.6%
BinaryMapFn.T' 3.5%
wrapOverflow.fn 2.8%
Integer.scan.finishNum 2.7%
BinaryMapFn.N 2.6%
EvalCore.evalExp 2.5%
Integer.scan.num 2.1%
EvalCore.evalAtExp 2.0%
[fluet@localhost c]$ mlton -codegen c -profile time hamlet.sml
[fluet@localhost c]$ ./hamlet > /dev/null
[fluet@localhost c]$ mlprof -thresh 2.0 hamlet mlmon.out
60.78 seconds of CPU time (4.72 seconds GC)
function cur
--------------------------- -----
Sequence.Slice.collate.loop 18.7%
BinaryMapFn.find.mem 11.0%
BinaryMapFn.insert 10.2%
<gc> 7.2%
Integer.scan.negate 6.0%
Integer.scan 6.0%
Lab.compare 4.7%
Sequence.make2 3.8%
EvalCore.evalExp 3.5%
EvalCore.evalAtPat 3.3%
BinaryMapFn.T' 2.6%
EvalCore.evalPatRow 2.4%
Integer.scan.finishNum 2.4%
wrapOverflow.fn 2.1%
There is a slight slowdown of using the field method over using the
label method, but nothing like the slowdown of going to the C codegen.
Regardless of the running time of the program, you get essentially the
same profiling results (with the exception that <gc> time proportional
to total running time).
----------------------------------------------------------------------
U mlton/trunk/include/c-chunk.h
U mlton/trunk/mlton/backend/backend.fun
U mlton/trunk/mlton/backend/machine.fun
U mlton/trunk/mlton/backend/profile.fun
U mlton/trunk/mlton/backend/rep-type.fun
U mlton/trunk/mlton/backend/runtime.fun
U mlton/trunk/mlton/backend/runtime.sig
U mlton/trunk/mlton/codegen/c-codegen/c-codegen.fun
U mlton/trunk/mlton/control/control-flags.sig
U mlton/trunk/mlton/control/control-flags.sml
U mlton/trunk/mlton/main/compile.fun
U mlton/trunk/mlton/main/lookup-constant.fun
U mlton/trunk/mlton/main/main.fun
U mlton/trunk/runtime/gc.c
U mlton/trunk/runtime/gc.h