space profiling

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Tue, 28 Aug 2001 09:16:22 -0400 (EDT)

> > One  problem  (perhaps not a big deal) with statically allocating the bins is
> > that you have to know the number of allocation points. 
> No problem.  I think it is easy for Matthew to recognize frontier increments.

Yes, that's no problem.  I now believe that we can do space profiling with
no impact on register allocation and without extraneous help from the
linker (at the expense of an insane number of symbols in the resulting
executable).  The trick is that the register allocation will build up a
global data section, to be emitted as a final .S file.

My proposal is to do the following.

We recognize when register allocation is processing an instruction of the
addl <size of allocation>,<frontier location>

This should be allocated as
addl <reg or immediate>,<reg>

Now, we gensym two labels: LabelPC and LabelBin

Now, we emit the allocated instruction as

	addl <reg or immediate>,<reg>
	addl <reg or immediate>,$(LabelBin+4)
	adcl $0,$(LabelBin+8)

And we add to the global data section

	.long $LabelPC, 0, 0

Once we're done register allocation for the whole program, the global data
section will look like:

	.long $LabelPCA, 0, 0
	.long $LabelPCB, 0, 0
	.long $LabelPCZ, 0, 0

We choose some common label known to the output routine and emit the
global data section as:

	.long <number of bins>
	.long $LabelPCA, 0, 0
	.long $LabelPCB, 0, 0
	.long $LabelPCZ, 0, 0

Now, on emission of space profiling info, the output routine should just
need to copy the right number of bins from LabelSpaceProfBins.  (Or, I
guess, just emit the non-zero bins, to use the sparce format.)

> > If the bins are statically allocated, then the code at an allocation point is
> > going to look like:
> > 
> >     save flags some where
> >     addl    <allocation size>, <low 32 bits of counter>
> >     adcl    $0, <high 32 bits of counter>
> >     restore flags from some where

So, this is similar to what I'm proposing above.  I don't quite know what
you mean by "save flags some where".  The previous instruction should be
the frontier increment, which will set the condition flags in some way,
but they are never used.  If you meant save registers some where, then the
proposal above eliminates the need.

> > Ok, I agree, the fixed bins is a win.

The only disadvantage is that the size of the executable will be much
larger, but if Steve is saying that there are only 60,000 allocation
points, then that shouldn't be a problem.  We might get some slow link
times as the linker attempts to resolve all 120,000 symbols.

O.k.  New proposal (although this puts more burden on the output routine).
Instead of gensyming new symbols, we just keep a counter of the number of
allocation points seen.  Each bin is now a constant offset from the start
of the bins.  We still need symbols to store the PC somewhere.  With this
method, I'd suggest two static arrays.  One of just bins and one of just
PC addresses.  (The one of just bins could be put in .bss section, which
saves a little space in the final executable.)  The output routine needs
to walk both arrays in tandem, but that shouldn't be a big problem.

> > Did you agree or disagree that for timing the bins should still be 96 bits on
> > output?   I  think  they should just so that the code in mlprof doesn't worry
> > about more than one kind of bin.  Going to the sparse variety  will  probably
> > save more space than this will cost us.
> I agree.  One format for both time and space, sparse, 96 bit.

Sounds good, although that means we need 64bit integer support.  I don't
think that will be too difficult.  I looked at what gcc does for long long
operations, and it's pretty straightforward.  Probably the biggest issue
will be redoing the C calling convention to support returns of 64 bit
values in %eax and %edx.