MLton's closure conversion
Stephen Weeks
MLton@sourcelight.com
Tue, 10 Oct 2000 15:57:01 -0700 (PDT)
> Interesting. Note, one thing we can do is to delay any selection (or some of
> the selections) until you get to a split in the control flow such that some
> of the environment is dead in one of the branches. Since everything was
> flattened out already in the F2 example (non-curried case) this wouldn't help
> there, I could imagine big improvements otherwise. I.e., in one branch of
> that all the environment is dead, and in the other all of it is alive: thus
> you would just delay all selects (eliminating them from the fast case and
> delaying further in the slow case).
I sort of see what you're saying. Now that I think about it, it would be nice
if we could keep the closure converter as is, and express what we want as a CPS
to CPS transformation -- something that delays selects as long as possible.
> The general problem of currying also is interesting. We tend not to use it,
> so it doesn't show up very much in our code. We could transform curried
> functions into curried calls to uncurried ones. I.e., replace
>
> fun f a b c =
> if a > 0
> then f (b + c) x (f y z w)
> else 17
>
> to
> fun f' (a, b, c) =
> if a > 0
> then f' (b + c, x, f' (y, z, w))
> else 17
>
> fun f a b c = f' (a, b, c)
>
> and then use the inliner to convert the f calls to f' in the case that it is
> completely called.
Yeah, I guess we really should put an uncurrier in. After all, it can't hurt,
can it? We know it doesn't help much on the standard benchmarks since they're
not very curried. But it certainly would have helped on that ported OCAML
code. Here's some old mail from Suresh about an uncurrier he wrote. Suresh,
could you send me the latest version of that uncurrier, and I'll look into
cleaning it up and putting it in.
> From: Suresh Jagannathan <suresh@research.nj.nec.com>
> To: MLton@research.nj.nec.com
> Subject: preliminary times w/uncurrier
> Date: Mon, 22 Nov 1999 20:37:17 -0500
>
>
> Here are times for representative small benchmarks gathered running on my local
> machine. Basically, (except for life) uncurrying doesn't buy a whole lot. The
> implementation I'm using is what Steven originally suggested: a bottom-up walk
> of the program replacing fully applied curried calls with their uncurried
> equivalent. Quickly glancing at the programs (in their uncurried and curried
> forms) reveals a number of places where curried functions are indeed replaced,
> but I suspect that by and large the benefit of replacing a function call or two
> is partially masked by the need to cons a new tuple. Of course, I haven't been
> able to get the kit to run yet. That's my next step, but unless something very
> unusual is taking place there, these numbers imply we're unlikely to see any
> significant improvement.
>
>
> Benchmark Time (uncurried) Time (curried)
> ===============================================
> barnes-hut 17.34 17.72
> count-graphs 19.37 20.33
> knuth-bendix 22.18 22.88
> lexgen 47.60 47.02
> life 73.02 79.17
> logic 79.17 78.12
> mandelbrot 19.20 19.12
> matrix-multiply 18.21 18.21
> nucleic 33.72 33.51
> simple 27.34 28.47
> tsp 38.46 38.41
> zern 67.08 67.47