[MLton-user] Tracking down allocations
Matthew Fluet
fluet at tti-c.org
Tue Jul 24 11:50:42 PDT 2007
On Fri, 20 Jul 2007, Joe Hurd wrote:
> Sorry for the delay in replying: I was at a conference at the
> beginning of the week.
>
>> I'd still be interested in seeing the source program, to see if there is a
>> clear reason why the tuple is being eliminated in one version but not the
>> other.
>
> I've uploaded the two versions of the program to the TemporaryUpload
> page of the MLton website, together with the results of profiling
> allocations. There are two functions that produce different results
> when manually lambda lifted: they can be found in the source by
> searching for the string MLTON DEBUGGING.
I took a look at the nested-functions.sml program. In addition to the
allocation Joe noted in the nested-functions.ssa file:
Enter MonteCarlo.placeStone.<branch> src/MonteCarlo.sml: 773
tuple_322 = (x_5086, x_6058, x_24097)
there are a bunch more which get charged to the
MonteCarlo.placeStone.<branch> source position:
tuple_348 = (x_6058, x_25012, x_24112, x_6060)
tuple_347 = (x_25012, x_6060, x_6058)
tuple_346 = (x_6058, x_6060, tuple_347)
tuple_345 = (tuple_346, x_6060, x_6058)
tuple_344 = (x_5085, tuple_348)
tuple_343 = (x_6058, x_5085, tuple_322, tuple_348)
tuple_342 = (x_6058, x_5085, tuple_322, tuple_348, x_6060)
tuple_340 = (tuple_347, x_5085, x_6058)
tuple_339 = (tuple_347, tuple_322, x_5085, x_6058)
tuple_338 = (x_6058, x_5085, tuple_322, tuple_347)
tuple_337 = (x_6058, x_5085, tuple_346)
tuple_336 = (x_6058, x_5085, tuple_322, tuple_346)
tuple_335 = (tuple_345, x_5085, x_6058)
tuple_334 = (tuple_343, tuple_342)
tuple_333 = (tuple_342, tuple_342, tuple_343)
tuple_332 = (tuple_338, tuple_339)
tuple_330 = (tuple_340, tuple_337)
tuple_329 = (tuple_336, tuple_339)
tuple_327 = (tuple_335, tuple_337, tuple_340)
These are all of the environments for the nested functions. I believe
that you are being hurt by MLton's flat-closure representation. In
particular, all of those nested functions, which make various calls to one
another, share a very common environment -- the one that you manually
created by lambda lifting. You can observe this in the above by noting
that many of those tuples share common values (the x_* variables); indeed,
expanding tuple_345, it looks like:
tuple_345 = ((x_6058, x_6060, (x_25012, x_6060, x_6058)), x_6060, x_6058)
Note that this duplicates x_6058 and x_6060 numerous times.
In the lambda-lifted.sml case, the sharing seems to be exposed enough for
MLton to eliminate it. What's interesting is that you lambda-lifted in a
curried way. While MLton's closure conversion would introduce tuples for
the environments of the implicit functions in the lambda-lifted ones, it
seems that the packing and unpacking of the environments exposes things
enough for MLton to eliminate it.
There is an old idea of Stephen's for an improved closure representation,
whose e-mail thread was duplicated a few years ago:
http://mlton.org/pipermail/mlton/2003-October/024570.html
I think it is trying to pick up exactly this case, where some sharing of
closures would be a win.
There is also the Shao/Appel closure representation analysis:
http://mlton.org/References#ShaoAppel94
In any case, I'm afraid that there isn't an immediate fix (though
Stephen's improved closure representation shouldn't be hard to implement).
I like the example, though, because I observe a 1.10X improvement in moves
per second using the lambda-lifted version, suggesting that there is some
real improvement to be had from an improved closure representation.
More information about the MLton-user
mailing list