[MLton] fixing -codegen c -profile time for the release

Wed, 16 Nov 2005 09:26:34 -0500 (EST)

> Our basic approach to profiling is to propagate annotations
> from the front end to the codegen, and ask the codegen to insert
> labels into the executable, so that the runtime can map IP to label
> and back to annoation.  This works flawlessly with the native codegen,
> because our native codegen never duplicates the labels.

This isn't quite true.  The native codegen is able to use the 
Machine.ProfileInfo.modify function to clone profile labels.  The native 
codegen wants the freedom to duplicate labels (I believe that the reason 
we do so is not for actual optimization, but for primitives that must be 
implemented by multiple basic blocks) and to drop labels (this is for 
optimization reasons, though the  weak  attribute on the C side appears to 
handle this situation well).

> This fails with gcc 4, which sometimes duplicates labels.  Furthermore, 
> it could have failed with gcc 2 or 3, at least according to the spec, 
> but hasn't because we've been lucky.

Agreed.  Also, it isn't entirely clear that even in the absence of 
duplication, gcc won't munch the program in such a way that our volatile 
assembly labels really cover the right portions of code.

> I don't entirely understand Florian's separate-section
> solution, but I still see the volatile asm, so I guess it would have
> similar problems.

I think what Florian was attempting to do was to use local (anonymous) 
labels, which could be freely duplicated, because the assembler would 
assign each one a unique name.  The trick then is figuring out how to 
populate the sourceNames array with label/sourceSeqIndex pairs.  Florian's 
idea was to construct this array in a separate section, where it 
presumably would be the only thing, so somewhat easier to pull back into 
the C world.

However, looking at the assembler docs, I don't believe that local 
(anonymous) labels can span multiple sections.

> So, my conclusion is that we should go with Matthew's approach for the
> release.  But it should only be used when time profiling with the C
> codegen -- there's no reason to hurt profiling in other situations.
> The simplicity, robustness, and portability of the approach outweigh
> the performance impact in this one case.

I knew there was a reason I didn't blow away the directory where I was 
doing those experiments. ;-)