[MLton] 64-bit pointers

Wed, 6 Oct 2004 17:16:18 -0700

I was thinking about porting MLton to a 64-bit machine (just thoughts,
no code!) and considered the following schemes for representing
pointers and mapping them to virtual memory addresses.

A. 32 bits, with bottom two bits zero.
B. 32 bits, with bottom bit zero, shift left by one.
C. 32 bits, with bottom bit zero, shift left by two.
D. 32 bits, shift left by two.
E. 32 bits, shift left by three.
F. 40 bits.
G. 64 bits.

These schemes vary in the number of bits to represent a pointer in an
object, the time to load a pointer from memory into a register, the
amount of addressable memory, and the object alignment.

	bits	time	mem(G)	align
A	32	fast	  4	4
B	32	slow	  8	4
C	32	slow	 16	8
D	32	slow	 16	4
E	32	slow	 32	8
F	40	slow	256	4
G	64	fast	 4G	8

Each of the (A-F) has a variant (A1-F1) in which pointers are added to
some constant base address.  This gives access to any region in the
virtual address space instead of just the low addresses.

I don't know that any of the thirteen schemes dominates another.  Here
are some thoughts.

(A) This is is what we have now, but is still useful on 64-bit
machines where the bottom 4G may be less cluttered than on a 32-bit
machine.

(A1) seems like a nice cost/benefit tradeoff for a program that only
needs 4G of memory, since the base can be used to find a contiguous 4G
somewhere in the address space.

(B) and (C) are similar, the tradeoff being to increase object
alignment requirements in order to allow more memory.  Importantly,
pointers having a bottom zero bit means that we can still set the
bottom bit to one to represent small values in sum types.

(D) and (E) are problematic because they leave no room to represent 
small objects in sum types with pointers.  I think that really rules
them out. 

(F) costs some in object alignment because a sequence of pointers in
an object may have to be padded to meet 4-byte alignment.  Loading a
pointer from memory into a register may be slightly faster than in
(B) or (C) because we don't have to shift, but I wonder if that
matters.

(G) costs the most in space, but has the fastest load time for
pointers of the schemes that allow access to 4G of memory.

A reasonable tradeoff in implementation complexity vs allowing our
users enough flexibility might be to provide:

	A, A1, B, B1, C, C1, G

After some experiments on those, we might be able to find a more
manageable set for users.

If anyone has pointers (:-) to papers on these schemes or others, or
has any other thoughts on pointer representation, please send them.