[MLton] experimental release 20051109

Matthew Fluet fluet@cs.cornell.edu
Tue, 15 Nov 2005 09:52:26 -0500 (EST)

> Since MLton's time profiling works with gcc 2.9x and 3.x, which are
> the most common gcc's around, and {Declare,}ProfileLabel is very
> tricky and dependent on gcc version and platform, I'm reluctant to
> change them just before the release.  Also, since gcc 4.x has a new
> requirement for aliasing and requires -fno-tree-ch to work (even with
> new definitions of {Declare,}ProfileLabel), it seems to me like the
> right solution for this release is to not change anything for all
> existing platform/gcc combinations, and to only introduce new
> behaviors when we encounter gcc 4.0.
> So, I propose the following two changes
> 1. We put tests in main.fun to add -fno-tree-ch if
>    (a) we're using gcc 4.x, and
>    (b) time profiling is on, and
>    (c) the C codegen is being used
> 2. Put an #ifdef in c-chunk.h that uses the extern void definition
>    of DeclareProfileLabel if gcc version is 4.x (assuming someone has
>    verified that that works with gcc 4.x).
> Alternatively, we could change nothing, and simply document the
> -fno-tree-ch solution, since, at least for now, the current definition
> of {Declare,}ProfileLabel works with that (unless I've
> misunderstood).
> What do people think?

Just to throw some more fuel on the fire, I proposed a simplification of 
time profiling back in June:

    Finally, and much to the chagrin of those wanting to map source code
    onto assembly code, I wonder if we can't simplify time profiling by
    associating a currentSourceSeq field in the gc state and having
    profile.fun explicitly change the field as appropriate when time
    profiling.  This would appear to simplify the time profiling, as we
    wouldn't need to inspect the PC state, figure out how to grab begin and
    end of text segment, etc.  That ought to make it easier for new
    platforms to support time profiling.


Stephen argued that this would be too intrusive:

    > You believe that the move of a constant integer to a known slot in
    > the gc state at transitions in the profile graph is too intrusive?

    Yes.  The point is that it happens all the time, not just at (SSA)
    nontail calls, and not just at (SSA) basic block entries. Furthermore,
    to implement this portably within MLton, the right place to put it is
    at the Machine IL, which means it will interfere with codegen
    optimizations too.  I bet it'll hurt more than 50% on some benchmarks.
    That's a lot of skew.


But, it does have the advantage that it significantly more portable than 
trying to inject just the right information into C code in a manner that 
isn't corrupted by gcc optimizations.  (Also, note that passing 
-fno-tree-ch violates Stephen's principle of not interfering with what you 
are trying to measure, since we are specifically taking away a gcc 
optimization that would otherwise be performed.)

The final benchmarks that I ran weren't great


with more than 50% overhead on 9 benchmarks, though most of these were 
very tight loop benchmarks -- which I don't think are the types of 
programs that people are likely to be interested in profiling.