arrays
Henry Cejtin
henry@sourcelight.com
Sun, 24 Dec 2000 16:28:04 -0600
I didn't see the assembler ever being called in the -v output. Clearly that
should be fixed up. I would love to have the current -v output only come
from something like -vv with a single -v just indicating the various programs
being run (the MLton compiler, the assembler, the linker, gcc, etc.) but not
the whole detailed output of all the internal parts of the MLton compiler
passes.
I did some quick tests of converting arrays with fat elements into parallel
arrays. The case I tried was an
(int * int * int) vector
vs.
int vector * int vector * int vector
and a routine which added up, for all the elements in the vector, either all
3 fields together or just the last field. (The notion is to pick which on
the basis of command line arguments so MLton won't optimize away any parts of
the arrays.)
The moral was that MLton did not flatten out the triple of vectors, so either
way the number of memory indirections was the same. For the normal version
one indirection to get the vector element (which was the address where the
tuple was) and then a second to get the int. For the `improved' version, one
indirection to get the correct vector and then one to get the int. All of
this was by just writing new versions of sup which indexes into each vector
and then makes a tuple. The notion being that the optimizer will fix this in
the case where you are only using some of the slots.
This saved a lot of space (20 bytes per for the normal and 12 bytes for the
`improved). Interestingly, in the case where you couldn't fit everything
into the L1 cache it sped things up by a lot because the tuple of vectors DID
make it into the L1. It make things 2-3 times faster, depending on if you
fit in the L2 cache or not.