[MLton] bug with packed representations
Stephen Weeks
MLton@mlton.org
Fri, 23 Apr 2004 11:30:14 -0700
> The type-theory weenies, typically distinguish between "casting" and
> "coercing", where casting a value to another type is a nop with respect to
> representation, but a coercion may make a representation change. If the
> primop approach doesn't work (I imagine you need a family of primops to do
> this right). I'd suggest introducing a new constructor distinct from "Cast"
> and call it "Coerce".
I agree with Matthew -- our WordX_toWordY primitives handle coercions
nicely. It was simply a bug in the packed representation stuff that
led to the cast problem.
The Rssa and Machine ILs already have subtyping which lets one change
the type of a value in a safe way. For example, the type 0x1:32
(which is a singleton type whose only value is the 32-bit integer 1)
is a subtype of the sum type 0x0:32 + 0x1:32. We use casts to change
the type of a value (not its representation) when the type system
isn't good enough. Here are the casts we do.
* from a sum type to one of the summands. This is in general unsafe,
but the idea is that the representation pass introduces the casts
only immediately after a test on the tag bits. So, a little local
reasoning can prove them safe, and maybe even someday the type
checker will.
* from an IntInf to a vector or a word. These are unsafe casts that
are used in the implementation of IntInf. We rely on our SML code
in the basis implementation of IntInf to get these correct.
However, the new packed representation pass will now do exactly the
right thing for the type
datatype t = Big of word vector | Small of Int31.int
That is, it will represent IntInf exactly as we do now, with a
single tag bit, where 0 indicates a word vector or 1 indicates a
31-bit int. So, we could conceivably clean up our IntInf
implementation to use this.
* from a sum type to a word so that we can look at the tag bits. This
is safe.
* from an array to a vector. This one is weird because it is
connected with a change of the object's header. All quite unsafe.
We rely on the SML code in the basis library to get this right.
* from a heap pointer to an internal pointer. This is done for array
address computations which are unsafe.
* between words and ints. These should be totally safe.