SSA simplify passes

Fri, 4 Jan 2002 20:00:23 -0500 (EST)

> With respect to the argument order reversal changing the run times, there are
> various hacks in the memory sub-system designed to improve sequential  memory
> access.   This  includes  read-ahead.   Thus, other things being equal, it is
> better to do things in increasing address order.  Of course it could just  be
> luck having to do with cache line alignment.

Hmm.  I don't know if the mem sub-system stuff really explains what I saw.
Although the argument order was reversed, the later compilation to
pseudo-regs and stack slots keeps them in order; so, it's not as though
filling in arguments would walk backwards.  On the other hand, accessing
arguments depends on the block or function.  I don't know if the order in
-loop-passes 1 or -loop-passes 2 is the same as the source program --
going with the idea that the programmer puts arguments in the order in
which they are used.  Of course, Steve's insistence that record arguments
appear in alphabetical order messes that up (btw, how does MLton map a
record to a tuple?).  Not to mention that we once again reorder arguments
at representation time for alignment reasons.  In any event, if that
really explains these seemingly random 10% speedups and slowdowns, then
maybe we should be a little more careful about when we reverse arguments.

But, your comment suggests that the naive translation of the following
might be less than optimal:
  L_3852 (x_6616, x_6620)
    x_6621 = #4 x_6620
    x_6622 = #3 x_6620
    x_6623 = #2 x_6620
    x_6615 = #1 x_6620
    x_6617 = (x_6615, global_4, x_6623, x_6622, x_6621)
In the codegen, I think I fix this up, in the sense that I'll essentially
implement it as
  L_3852 (x_6616, x_6620)
    x_6617 = (#1 x_6620, global_4, #2 x_6620, #3 x_6620, #4 x_6620)
because the single uses of the tuple components means that it's better to
direcly access them from memory; i.e., I'll get the address of x_6620 in a
register and then when doing the allocation pull the components as
indirect offsets from the register.

In any event, I suspect that "fixing" the SSA is simply a matter of
reversing some vector loop over tuple components.

Here are a couple of other random "cleanup"'s I'm trying to track down:

  L_890 ()
    case x_188 of
      Env_104 => L_508
      ...
    | Env_24 => L_889
  L_889 (x_1991: (((int * int) * real array) * 
                  ((int * int) * (int * int))))
    x_1993: ((int * int) * (int * int)) = #2 x_1991
    x_1994: ((int * int) * real array) = #1 x_1991
    env_30: (((int * int) * real array) * 
             ((int * int) * (int * int))) = (x_1994, x_1993)
    x_1992: lambdas_6 = Env_104 (x_1094, env_30)
    L_507 (x_184 (x_1992, global_52, global_7, global_34))

Pulling apart x_1991 just to put it back together again in env_30 seems
wasteful.  This happens all over the place in a self-compile.

This caught my eye in relation to Henry's comments:
  L_56 (x_346, x_343)
    x_376 = #4 x_343
    L_55 (x_35 (x_361, x_376)) None
  L_55 (x_373,
	x_372,
	x_371,
	x_370,
	x_369,
	x_368,
	x_367,
	x_366,
	x_365,
	x_364,
	x_363,
	x_362)
    x_375 = (x_362,
	     x_363,
	     x_364,
	     x_365,
	     x_366,
	     x_367,
	     x_368,
	     x_369,
	     x_370,
	     x_371,
	     x_372,
	     x_373)
    x_338 = (x_345, x_375, x_343)
    x_374 = #3 x_343
    L_54 (x_35 (x_361, x_374)) None

L_55's arguments will be accessed "backwards" when filling in the tuple,
so that seems bad; (Although, again, I need to see how this passes through
the backend which might reverse things again, in which case it would be
o.k.).

Another random thing I noticed: there seems to be a strange interplay
between the XML simplifier (particularly inlining) and polyvariance.  If
you look at the SSA of logic.sml you'll see 5 copies of a function
called oc_?, all of which are alpha-equivalent.  I think what happens is
that the XML inliner inlines the occurs_check function (and it's let bound
auxilary functions oc and ocs).  Polyvariance then duplicates the function
in which they were inlined (hence, they are duplicated).  After
closure conversion, these become SSA functions.  These auxiliary
functions are recursive, so they are never inlined back into their
respective XML enclosing functions.  We're left with 5 copies of identical
code.  I don't know enough about the XML IL and the simplifier and
polyvariance to know if there is an easy solution to this problem.