[MLton] experimental release 20051109
Matthew Fluet
fluet@cs.cornell.edu
Tue, 15 Nov 2005 09:52:26 -0500 (EST)
> Since MLton's time profiling works with gcc 2.9x and 3.x, which are
> the most common gcc's around, and {Declare,}ProfileLabel is very
> tricky and dependent on gcc version and platform, I'm reluctant to
> change them just before the release. Also, since gcc 4.x has a new
> requirement for aliasing and requires -fno-tree-ch to work (even with
> new definitions of {Declare,}ProfileLabel), it seems to me like the
> right solution for this release is to not change anything for all
> existing platform/gcc combinations, and to only introduce new
> behaviors when we encounter gcc 4.0.
>
> So, I propose the following two changes
>
> 1. We put tests in main.fun to add -fno-tree-ch if
> (a) we're using gcc 4.x, and
> (b) time profiling is on, and
> (c) the C codegen is being used
>
> 2. Put an #ifdef in c-chunk.h that uses the extern void definition
> of DeclareProfileLabel if gcc version is 4.x (assuming someone has
> verified that that works with gcc 4.x).
>
> Alternatively, we could change nothing, and simply document the
> -fno-tree-ch solution, since, at least for now, the current definition
> of {Declare,}ProfileLabel works with that (unless I've
> misunderstood).
>
> What do people think?
Just to throw some more fuel on the fire, I proposed a simplification of
time profiling back in June:
Finally, and much to the chagrin of those wanting to map source code
onto assembly code, I wonder if we can't simplify time profiling by
associating a currentSourceSeq field in the gc state and having
profile.fun explicitly change the field as appropriate when time
profiling. This would appear to simplify the time profiling, as we
wouldn't need to inspect the PC state, figure out how to grab begin and
end of text segment, etc. That ought to make it easier for new
platforms to support time profiling.
http://mlton.org/pipermail/mlton/2005-June/027131.html
Stephen argued that this would be too intrusive:
> You believe that the move of a constant integer to a known slot in
> the gc state at transitions in the profile graph is too intrusive?
Yes. The point is that it happens all the time, not just at (SSA)
nontail calls, and not just at (SSA) basic block entries. Furthermore,
to implement this portably within MLton, the right place to put it is
at the Machine IL, which means it will interfere with codegen
optimizations too. I bet it'll hurt more than 50% on some benchmarks.
That's a lot of skew.
http://mlton.org/pipermail/mlton/2005-June/027135.html
But, it does have the advantage that it significantly more portable than
trying to inject just the right information into C code in a manner that
isn't corrupted by gcc optimizations. (Also, note that passing
-fno-tree-ch violates Stephen's principle of not interfering with what you
are trying to measure, since we are specifically taking away a gcc
optimization that would otherwise be performed.)
The final benchmarks that I ran weren't great
http://mlton.org/pipermail/mlton/2005-June/027152.html
with more than 50% overhead on 9 benchmarks, though most of these were
very tight loop benchmarks -- which I don't think are the types of
programs that people are likely to be interested in profiling.