[MLton] x86_64 port status

Matthew Fluet fluet@cs.cornell.edu
Sat, 6 May 2006 18:30:42 -0400 (EDT)


Notes on the status of the x86_64 port of MLton.
=======================================================================

Sources:

Work is progressing on the x86_64 branch; interested parties may check
out the latest revision with:

svn co svn://mlton.org/mlton/branches/on-20050822-x86_64-branch mlton.x86_64

and view the sources on the web at:

http://mlton.org/cgi-bin/viewsvn.cgi/mlton/branches/on-20050822-x86_64-branch/


Background:

(* Representing 64-bit pointers. *)
http://mlton.org/pipermail/mlton/2004-October/026162.html
(* MLton GC overview *)
http://mlton.org/pipermail/mlton/2005-July/027585.html
(* Runtime rewrite *)
http://mlton.org/pipermail/mlton/2005-December/028421.html


Summary:

Since the last summary, the Basis Library has been refactored so that
it is compile-time configurable on the following axes:

    OBJPTR -- size of an object pointer (32-bits or 64-bits)
    HEADER -- size of an object header (32-bits or 64-bits)
    SEQINDEX -- size of an array/vector length (32-bits or 64-bits)

    DEFAULT_CHAR -- size of Char.char (8-bits; no choice according to spec)
    DEFAULT_INT -- size of Int.int (32-bits, 64-bits, and IntInf.int)
    DEFAULT_REAL -- size of Real.real (32-bits, 64-bits)
    DEFAULT_WORD -- size of Word.word (32-bits, 64-bits)

    C_TYPES -- sizes of various primitive C types

The object pointer and object header are needed for the IntInf
implemention.  Configuring the default sizes support both adopting
64-bit integers and words as the default on 64-bit platforms, but also
supports retaining 32-bit integers and words as the default on 64-bit
platforms.  The sizes of primitive C types are determined by the
target architecture and operating system.  This ensures that the Basis
Library uses the right representation for file descriptors, etc., and
ensures that the implementation may be shared between 32-bit and
64-bit systems.


MLton.IntInf changes:

As noted above, the object pointer size is needed by the IntInf
implementation, which represents an IntInf.int either as a pointer to
a vector of GNU MP mp_limb_t objects or as the upper bits of a
pointer.  Since the representation of mp_limb_t changes from a 32-bit
system to a 64-bit system, and the size of an object pointer may be
compile-time configurable, we have changed the MLTON_INTINF signature
to have the following:

       structure BigWord : WORD
       structure SmallInt : INTEGER

       datatype rep =
          Big of BigWord.word vector
        | Small of SmallInt.int
       val rep: t -> rep


Technical Details:

The key techniques used in the refactoring of the Basis Library is
aggressive use of ML Basis path variables, successive rebindings of
structures, and special 'Choose' functors.  I'll describe each of
these a little below.

The Basis Library implementation is organized as a large ML Basis
project.  In order to establish the appropriate mappings between C
primitive types (int, long long int, etc.) and ML types (Int32.int,
Int64.int, etc), we use the $(TARGET_ARCH) and $(TARGET_OS) path
variables to elaborate a target specific c-types.sml file:

       <basis>/config/c/$(TARGET_ARCH)-$(TARGET_OS)/c-types.sml

The c-types.sml file is generated automatically for each target
system, using the runtime/gen/gen-types.c program, and looks something
like:

(* C *)
structure C_Char = struct open Int8 type t = int end
functor C_Char_ChooseIntN (A: CHOOSE_INTN_ARG) = ChooseIntN_Int8 (A)
structure C_SChar = struct open Int8 type t = int end
functor C_SChar_ChooseIntN (A: CHOOSE_INTN_ARG) = ChooseIntN_Int8 (A)
structure C_UChar = struct open Word8 type t = word end
...
structure C_Size = struct open Word32 type t = word end
functor C_Size_ChooseWordN (A: CHOOSE_WORDN_ARG) = ChooseWordN_Word32 (A)
...
structure C_Off = struct open Int64 type t = int end
functor C_Off_ChooseIntN (A: CHOOSE_INTN_ARG) = ChooseIntN_Int64 (A)
...
structure C_UId = struct open Word32 type t = word end
functor C_UId_ChooseWordN (A: CHOOSE_WORDN_ARG) = ChooseWordN_Word32 (A)
...
(* from "gmp.h" *)
structure C_MPLimb = struct open Word32 type t = word end
functor C_MPLimb_ChooseWordN (A: CHOOSE_WORDN_ARG) = ChooseWordN_Word32 (A)

Note that each C type has a corresponding structure which is bound to
an Int<N> or Word<N> structure of the appropriate signedness and size.
The extra binding of "type t = int" or "type t = word" ensures that
the Basis Library may refer to C_TYPE.t, rather than C_TYPE.int or
C_TYPE.word, for types whose signedness isn't specified by the
standard.  (For example, uid_t and gid_t are only required to be
integral types; in glibc, they happen to be unsigned.)

When elaborating the MLB file that implements the Basis Library, we
include

       <basis>/config/c/$(TARGET_ARCH)-$(TARGET_OS)/c-types.sml

multiple times, to rebind the C_TYPE structures to successively more
complete implementations of the ML structures.  (For example, we need
C_MPLimb to implement IntInf, but we need IntInf to implement
Word32.toLargeInt.  Hence, we first bind C_MPLimb to a minimal,
primitive structure, which provides enough to implement a little bit
of IntInf, which in turn provides enough to implement
Word32.toLargeInt, which we then rebind to C_MPLimb.)

In a similar manner, we successively bind the default Int structure
via:

       <basis>/config/default/$(DEFAULT_INT)

where the $(DEFAULT_INT) path variable denotes a file that looks
something like:

structure Int = Int32
type int = Int.int

functor Int_ChooseInt (A: CHOOSE_INT_ARG) :
    sig val f : Int.int A.t end =
    ChooseInt_Int32 (A)

The 'Choose' functors are the mechanism by which we ensure that the
majority of the Basis Library implemenation may be shared, while
remaining "parametric" in the primitive C types and the default ML
types.  Consider, for example, the INTEGER signature:

signature INTEGER =
    sig
      type int

      val fromInt: Int.int -> int
      val toInt: int -> Int.int

      ...
    end

How may we efficiently implement the Int8, Int16, Int32, and Int64
structures, when the bindings for Int<N>.{from,to}Int must be
different for the different choices of Int.int?  The solution adopted
is to ensure that each "pre-implementation" of Int<N> knows how to
convert to and from each possible choice of Int.int.  That is, we have

signature PRE_INTEGER =
    sig
      type int

      val fromInt8: Primitive.Int8.int -> int
      val fromInt16: Primitive.Int16.int -> int
      val fromInt32: Primitive.Int32.int -> int
      val fromInt64: Primitive.Int64.int -> int
      val fromIntInf: Primitive.IntInf.int -> int
      val toInt8: int -> Primitive.Int8.int
      val toInt16: int -> Primitive.Int16.int
      val toInt32: int -> Primitive.Int32.int
      val toInt64: int -> Primitive.Int64.int
      val toIntInf: int -> Primitive.IntInf.int

      ...
    end

We use a functor to convert each PRE_INTEGER to an INTEGER; within
this functor, we use the Int_ChooseInt functor to select the
appropriate conversion:

functor Int (structure I : PRE_INTEGER) : INTEGER =
    struct
      type int = I.int

      local
        structure S =
        Int_ChooseInt
        (type 'a = 'a -> int
         val fInt8 = I.fromInt8
         val fInt16 = I.fromInt16
         val fInt32 = I.fromInt32
         val fInt64 = I.fromInt64
         val fIntInf = I.fromIntInf)
      in
        val fromInt = S.f
      end

      local
        structure S =
        Int_ChooseInt
        (type 'a = int -> 'a
         val fInt8 = I.toInt8
         val fInt16 = I.toInt16
         val fInt32 = I.toInt32
         val fInt64 = I.toInt64
         val fIntInf = I.toIntInf)
      in
        val toInt = S.f
      end

      ...
end

The implementation of the 'Choose' functors is the obvious one:

signature CHOOSE_INT_ARG =
    sig
       type 'a t
       val fInt8: Int8.int t
       val fInt16: Int16.int t
       val fInt32: Int32.int t
       val fInt64: Int64.int t
       val fIntInf: IntInf.int t
    end

functor ChooseInt_Int8 (A : CHOOSE_INT_ARG) :
    sig val f : Int8.int A.t end =
    struct val f = A.fInt8 end
functor ChooseInt_Int16 (A : CHOOSE_INT_ARG) :
    sig val f : Int16.int A.t end =
    struct val f = A.fInt16 end
functor ChooseInt_Int32 (A : CHOOSE_INT_ARG) :
    sig val f : Int32.int A.t end =
    struct val f = A.fInt32 end
functor ChooseInt_Int64 (A : CHOOSE_INT_ARG) :
    sig val f : Int64.int A.t end =
    struct val f = A.fInt64 end
functor ChooseInt_IntInf (A : CHOOSE_INT_ARG) :
    sig val f : IntInf.int A.t end =
    struct val f = A.fIntInf end

As a convenience mechanism, the $(DEFAULT_CHAR), $(DEFAULT_INT),
$(DEFAULT_REAL), and $(DEFAULT_WORD) path variables are set by the
compiler, and may be controlled by a compiler flag:

   -default-type type
       Specify the default binding for a primitive type.  For example,
       '-default-type word64' causes the top-level type word and the
       top-level structure Word in the Basis Library to be equal to
       Word64.word and Word64:WORD, respectively.  Similarly,
       '-default-type intinf' causes the top-level type int and the
       top-level structure Int in the Basis Library to be equal to
       IntInf.int and IntInf:INTEGER, respectively.

As should be evident from the above, we only support power-of-two
sized defaults.  Also, the Basis Library specification doesn't allow
Char.char to be larger than 8bits, so '-default-type char8' is the
only option allowed for char.  While '-default-int int8' is allowed,
it probably isn't a good idea to set the default integer and word
sizes to less than 32-bits, but it ought to be useful to set integers
to IntInf.int.


Platform Porters/Maintainers:

Before merging the runtime and Basis Library changes in to HEAD, we
would like to ensure that things are too broken on other platforms;
I only have easy access to x86-linux and amd64-linux.

It would be very helpful if individuals on other platforms (BSD and
Darwin and Solaris particularly) could checkout the x86_64 branch and
try to compile the runtime:

   make runtime

I'm specifically interested in the files c-types.h and c-types.sml
(automatically copied to
basis-library/config/c/$(TARGET_ARCH)-$(TARGET_OS)/), where the sizes
and signedness of the C typedefs might be different from x86-linux.
Second, I'm interested in any constants that aren't present on
different platforms.  I've been following the Single UNIX
Specification (as a superset of Posix, XOpen, and other standards).
I'm guessing that we'll have to drop a few more things to get to the
intersection of our platforms.

Finally, the platform/* specific stuff will need to be ported.  Most
of that should be straightforward, following what I've done to linux;
essentially, changed some naming schemes, discharge all the gcc
warnings, etc.  Cygwin and MinGW will be the biggest challenges.