"root" of ChunkPerFunc chunk

Thu, 2 Nov 2000 09:38:17 -0500 (EST)

> > I spent
> > some time trying to determine what optimizations could occur "through" an
> > overflow checking primitive, and I decided that there really aren't any,
> > at least that correspond to the optimizations already in the simplifier.)
> 
> I'm not quite sure what you mean here.  The semantics of a checking primitive
> are to do the arithmetic op and to jump to the label immediately if there was an 
> overflow.  So, in the case of overflow, any side effects before the primitive
> must happen and any after must not happen.

Basically, I was trying to decide if any of the existing peephole
optimizations could be used with a checking prim.  Briefly, I considered
translating without spliting the basic blocks; instead, just have a jo
instruction in the middle of the block.  The advantage was larger basic
blocks.  Also, something like copy propagation (I'm only propagating moves
of the form  movl X, PseudoReg  , so there isn't any problem with a
side-effect not occuring) could proceed "through" the check.  However,
copy propagation doesn't occur between blocks, so the current translation
of checks prevent it.  (But, see my notes on the running times below.)

> > So, it would seem I need a better ordering for the processing of blocks.
> > Topological sort would seem to do the trick, but I need one (or more)
> > roots.  And, after writing all this, I realized I do in fact have those
> > roots: they are the labels in the exports field of a Chunk.

Using the exports list did the trick.  Processing the blocks is now done
with a queue.  Initally, all the export labels are in the queue.  After
processing a block, the targets of any conditional jumps are added to the
queue.    An inability to fall-through to another block might also enqueue
a target.  Now the output is exactly as I expected: you never see a label
corresponding to noOverflow and you never see spilling to the temporary.

> Note,  you  talked  about  simplifying  multiplies by -1, but these can
> still overflow.

Yes, they would be processed like Int_sub(0, X): turned into an Int_neg,
but leave the overflow check

> Did you do any runs to see how much the checking is costing?

Here are the results from the benchmarks I ran last night:

                   default    detectOverflow  default/detectOverflow
barnes-hut         15.27          X                  X
checksum           10.36      13.62               0.76
count-graphs       14.41      15.07               0.95
fft                73.79      72.48               1.02
fib                12.37      12.90               0.96
knuth-bendix       22.83      22.86               1.00
lexgen             40.61      40.11               1.01
life               75.49      80.20               0.94
logic              79.52      79.23               1.00
mandelbrot         20.31      24.76               0.82
matrix-multiply    14.44      13.00               1.11
merge             233.28     233.13               1.00
mpuz               50.79      48.26               1.05
nucleic            26.92      26.87               1.00
ratio-regions      27.04      27.65               0.98
simple             15.06      14.91               1.01
smith-normal-form   3.34       3.34               1.00
tak                30.84      31.94               0.97
tensor             14.91          X                  X
tsp                36.25          X                  X
vliw               23.04          X                  X
wc                  6.34       8.34               0.76
zern               13.49          X                  X

Benchmarks with X in detectOverflow had a compile error.  The problem was
that I was always assuming a destination for a checking prim; but, in
these cases I guess the destination was useless-ed away, but the overflow
check couldn't be eliminated.  It was a really simple fix.  But a good
reminder that some slowdown will arise not just from the actual
conditional jump, but also from CPS simplifications that couldn't happen
in the presence of the overflow check.

But, overall, the penalty doesn't seem to be that high.  checksum and
wc are a little disappointing, but everything else seems fine.  mandelbrot
is a little strange -- I really would have thought that the floating-point
computation would totally dominate that benchmark, but apparently counting
those iterations is costing a little.

The explaination for the cases where detectOverflow does better than the
default would appear to be related to the copyPropagation and moveHoist
optimizations.  All of the control flow with overflow checking breaks up
the blocks, so these optimizations don't have as much to work with.  mpuz
is a good example, because I know that it does better with copyPropagation
off than with it on.  The moral of the story being that I need to think
more about those optimizations, and if I can figure out what's really
going on, maybe get it to the point where the default is always better
than detectOverflow.