separate assembly
Matthew Fluet
fluet@research.nj.nec.com
Mon, 14 Aug 2000 15:31:18 -0400 (EDT)
Well, after the whole discussion on separate assembly, I've switched over
the output so that you will never get less than a whole chunk in an
assembly file.
On a related note, I've been using Control.traceCall to generate timing
information for various portions of the backend. Previously, my
translation had been from MachineOutput.Program.t to a (large) list of
basic blocks. Then the simplification phase worked on this list of
blocks (the jump analysis that I'm still working on requires multiple
blocks; it won't require more blocks than are in a chunk, since chunk
entries couldn't be eliminated anyways; the peepholing only requires a
block at a time). Finally the register allocation and validation phases
worked on the list of blocks, but again really only needed one block at a
time.
Now, I essentially map each of these phases over the list of chunks, in
order to keep the chunks separate until output. The minor problem is that
tracing these phases ends up with a timing for each chunk. For small
programs, this isn't bad -- there aren't too many chunks and it doesn't
hurt to carry around the whole program in assembly form between phases.
But, I think this could be a problem for large programs (like a
self-compile). For example, a fully commented assembly file for a
self-compile was almost 100Meg (stripping comments drops it down to the
reasonable 25Meg range). Under the current backend, I carry around this
whole program for a few phases; it would be nicer to
translate->...->validate->output a chunk at a time, and hopefully cut down
on the intermediate memory needed.
Bottom line is that it is useful to have the timing information of a pass
over all the chunks in order to note which phases could be improved, but
I don't know if I can calculate that. Looking through Control and Trace I
couldn't find a version that would calculate the total computation for all
calls to a function and then print the time at the end.
As a partial solution, I can compile small programs with chunkification
via OneChunk, and then the timing information is what I want.