[MLton] fixing -codegen c -profile time for the release
Matthew Fluet
fluet@cs.cornell.edu
Wed, 16 Nov 2005 09:26:34 -0500 (EST)
> Our basic approach to profiling is to propagate annotations
> from the front end to the codegen, and ask the codegen to insert
> labels into the executable, so that the runtime can map IP to label
> and back to annoation. This works flawlessly with the native codegen,
> because our native codegen never duplicates the labels.
This isn't quite true. The native codegen is able to use the
Machine.ProfileInfo.modify function to clone profile labels. The native
codegen wants the freedom to duplicate labels (I believe that the reason
we do so is not for actual optimization, but for primitives that must be
implemented by multiple basic blocks) and to drop labels (this is for
optimization reasons, though the weak attribute on the C side appears to
handle this situation well).
> This fails with gcc 4, which sometimes duplicates labels. Furthermore,
> it could have failed with gcc 2 or 3, at least according to the spec,
> but hasn't because we've been lucky.
Agreed. Also, it isn't entirely clear that even in the absence of
duplication, gcc won't munch the program in such a way that our volatile
assembly labels really cover the right portions of code.
> I don't entirely understand Florian's separate-section
> solution, but I still see the volatile asm, so I guess it would have
> similar problems.
I think what Florian was attempting to do was to use local (anonymous)
labels, which could be freely duplicated, because the assembler would
assign each one a unique name. The trick then is figuring out how to
populate the sourceNames array with label/sourceSeqIndex pairs. Florian's
idea was to construct this array in a separate section, where it
presumably would be the only thing, so somewhat easier to pull back into
the C world.
However, looking at the assembler docs, I don't believe that local
(anonymous) labels can span multiple sections.
> So, my conclusion is that we should go with Matthew's approach for the
> release. But it should only be used when time profiling with the C
> codegen -- there's no reason to hurt profiling in other situations.
> The simplicity, robustness, and portability of the approach outweigh
> the performance impact in this one case.
I knew there was a reason I didn't blow away the directory where I was
doing those experiments. ;-)