"root" of ChunkPerFunc chunk
Matthew Fluet
fluet@CS.Cornell.EDU
Wed, 1 Nov 2000 17:32:07 -0500 (EST)
I've added overflow checking primitives and the appropriate jumps.
Relatively painless and it seems to be working well. Here's how the
translation looks:
X = Int_addCheck(Y, Z)
...
movl Y,Temp
addl Z,Temp
jo OverflowLabel
jmp noOverflowLabel
}
{
noOverflowLabel:
movl Temp, X
...
I'm fairly sure that the Temp is necessary (in the sense that it will
force the destination of the add to be a register), because something like
SI(4) = Int_addCheck(Y, Z)
shouldn't write an overflow value to SI(4), because the original value of
SI(4) might be needed by the overflow handler. (Is this correct? I spent
some time trying to determine what optimizations could occur "through" an
overflow checking primitive, and I decided that there really aren't any,
at least that correspond to the optimizations already in the simplifier.)
Anyways, you see that the translation splits basic blocks at the check.
In theory, this shouldn't be a problem. When I generateTransfers (i.e.,
figure out what temporaries need be be committed to memory across
transfers, decide when unconditional jumps can be replaced by a
fall-through to the appropriate block), every noOverflowLabel is the
target of exactly one unconditional jump, we should always be able to
use a fall-through. (And we really want that for two reasons: 1.
no-overflow should be the common case, so the fall-through will be already
piplined; 2. when we fall-through, register allocation occurs over the big
not-so-basic block; the Temp will be allocated in a register, but when it
goes dead, there is a high probability that the register will just be
renamed to the destination; i.e., we shouldn't pay the penalty for
introducing the Temp except as a consequence of keeping the result in a
register.) Unfortunately, this wasn't happening in all cases. I'm fairly
certain that the problem arises from a fairly ad-hoc way of processing the
blocks in generateTransfers. I just walk through the list of all blocks,
if I haven't already generated a transfer for that block, do so (and
possibly process other blocks as a result of fall through), otherwise,
it's already been processed and I skip it. For the most part, the block
that ends
...
jo OverflowLabel
jmp noOverflowLabel
}
is processed before the block that begins
{
noOverflowLabel:
...
so the fall through occurs. But, sometimes as a result of other block
optimizations (e.g., eliminating jumps to jumps), the second block ends up
being processed first, so when we process the first one, we can't do the
fall through.
So, it would seem I need a better ordering for the processing of blocks.
Topological sort would seem to do the trick, but I need one (or more)
roots. And, after writing all this, I realized I do in fact have those
roots: they are the labels in the exports field of a Chunk.
So, probably largely ignore this, although I'd be interested in feedback
about what possible optimizations could be done to checking prims. I
don't think this far into the backend is the appropriate place to try to
determine when a checking prim can be replaced by a non-checking prim;
although I do one very special case:
Int_add(0, X), Int_add(X, 0), Int_sub(X, 0) -> drop the overflow check
Int_sub(0, X) -> Int_neg(X), but leave the overflow check
and I guess I could also do
Int_mul(1, X), Int_mul(X, 1), Int_mul(~1, X), Int_mul(X, ~1)