[MLton-commit] r6784

Wesley Terpstra wesley at mlton.org
Tue Aug 19 18:54:35 PDT 2008


Generate PIC amd64 code. This should work on elf targets and darwin-x86_64.
It will also mostly work on win64. As far as MinGW has always worked, anyway.

Unfortunately, I now understand that the windows concept of dynamic linking 
and the unix concept differ fairly significantly. Unix cares about if a
symbol can/can't be accessed by another shared object. If it is private and
never gets used elsewhere, then there is no need for the GOT. However, even
if you define it yourself, but it is exported, you must use the GOT.

Windows cares about where the definition comes from. If it's defined in the
same dll / executable, you MUST use the symbol by _name. However, if it is
defined in a DLL (even for normal linkage), you MUST use __imp__name. This
has actually been a lurking bug waiting to bite someone on mingw. If anyone
had tried to use _address on a DLL imported symbol, bad things could have
happened. Function calls worked because there are .a stub files one links
against which provide a _name that just 'jmp *__imp__name's. Of course, the
MLton _address of _name is thus not __imp__name. Calling the function via
_import * would still work, but pointer comparisons would fail. Also, the
_address of a non-function symbol would simply fail. It can't be wrapped.

It seems that symbols can be broken down into three categories:
  private  - defined in this module (executable/dll), never leaving it
  public   - defined in this module, but accessible outside it
  external - defined (as public) in some other module

At the C level, this corresponds directly to the following cases:
  INTERNAL extern int foo;
  EXPORTED extern int foo;
  IMPORTED extern int foo;
When defining a symbol, only public/private make sense:
  INTERNAL int foo;
  EXPORTED int foo;

I need to rework the SymbolScope to reflect these categories. One should
also check whether or not all _exports / _addresses / _etc agree for a
given symbol name. The following should cause an error:

val () = _export "foo" public: (int -> int) -> unit; (fn x => x + 1)
val x = _address "foo" private;  (* goes boom on ELF *)
val x = _address "foo" external; (* goes boom on windows *)
val x = _address "foo" public; (* no problem on either *)


----------------------------------------------------------------------

U   mlton/trunk/mlton/codegen/amd64-codegen/amd64-generate-transfers.fun
U   mlton/trunk/mlton/codegen/amd64-codegen/amd64-generate-transfers.sig
U   mlton/trunk/mlton/codegen/amd64-codegen/amd64-mlton.fun

----------------------------------------------------------------------

Modified: mlton/trunk/mlton/codegen/amd64-codegen/amd64-generate-transfers.fun
===================================================================
--- mlton/trunk/mlton/codegen/amd64-codegen/amd64-generate-transfers.fun	2008-08-20 00:44:21 UTC (rev 6783)
+++ mlton/trunk/mlton/codegen/amd64-codegen/amd64-generate-transfers.fun	2008-08-20 01:54:34 UTC (rev 6784)
@@ -503,42 +503,7 @@
               then AppendList.empty
            else AppendList.single (Assembly.directive_unreserve 
                                    {registers = [Register.rsp]})
-
-        val (mkCCallLabel, mkSymbolStubs) =
-           if !Control.Target.os = MLton.Platform.OS.Darwin
-              then 
-                 let
-                    val set: (word * String.t * Label.t) HashSet.t =
-                       HashSet.new {hash = #1}
-                    fun mkCCallLabel name =
-                       let
-                          val hash = String.hash name
-                       in
-                          (#3 o HashSet.lookupOrInsert)
-                          (set, hash,
-                           fn (hash', name', _) => hash = hash' andalso name = name',
-                           fn () => (hash, name, 
-                                     Label.newString (concat ["L_", name, "_stub"])))
-                       end
-                    fun mkSymbolStubs () =
-                       HashSet.fold
-                       (set, [], fn ((_, name, label), assembly) =>
-                        (Assembly.pseudoop_symbol_stub ()) ::
-                        (Assembly.label label) ::
-                        (Assembly.pseudoop_indirect_symbol (Label.fromString name)) ::
-                        (Assembly.instruction_hlt ()) ::
-                        (Assembly.instruction_hlt ()) ::
-                        (Assembly.instruction_hlt ()) ::
-                        (Assembly.instruction_hlt ()) ::
-                        (Assembly.instruction_hlt ()) ::
-                        assembly)
-                 in
-                    (mkCCallLabel, mkSymbolStubs)
-                 end
-              else
-                 (fn name => Label.fromString name,
-                  fn () => [])
-
+        
         datatype z = datatype Entry.t
         datatype z = datatype Transfer.t
         fun generateAll (gef as GEF {effect,...})
@@ -1182,12 +1147,14 @@
                 | CCall {args, frameInfo, func, return}
                 => let
                      datatype z = datatype CFunction.Convention.t
+                     datatype z = datatype CFunction.SymbolScope.t
                      datatype z = datatype CFunction.Target.t
                      val CFunction.T {convention=_,
                                       maySwitchThreads,
                                       modifiesFrontier,
                                       readsStackTop, 
                                       return = returnTy,
+                                      symbolScope,
                                       target,
                                       writesStackTop, ...} = func
                      val stackTopMinusWordDeref
@@ -1521,13 +1488,70 @@
                         case target of
                            Direct name =>
                               let
-                                 val target = mkCCallLabel name
+                                 datatype z = datatype MLton.Platform.OS.t
+                                 datatype z = datatype Control.Format.t
+                                 
+                                 val label = Label.fromString name
+                                 
+                                 (* how to access imported functions: *)
+                                 (* Windows rewrites the symbol __imp__name *)
+                                 val coff = Label.fromString ("_imp__" ^ name)
+                                 val macho = label (* @PLT is implicit *)
+                                 val elf = Label.fromString (name ^ "@PLT")
+                                 
+                                 val importLabel = 
+                                    case !Control.Target.os of
+                                       Cygwin => coff
+                                     | Darwin => macho
+                                     | MinGW => coff
+                                     |  _ => elf
+                                 
+                                 val direct =
+                                   AppendList.fromList
+                                   [Assembly.directive_ccall (),
+                                    Assembly.instruction_call
+                                    {target = Operand.label label,
+                                     absolute = false}]
+                                     
+                                 val plt =
+                                   AppendList.fromList
+                                   [Assembly.directive_ccall (),
+                                    Assembly.instruction_call
+                                    {target = Operand.label importLabel,
+                                     absolute = false}]
+                                
+                                 val indirect =
+                                   AppendList.fromList
+                                   [Assembly.directive_ccall (),
+                                    Assembly.instruction_call
+                                    {target = Operand.memloc_label importLabel,
+                                     absolute = true}]
                               in
-                                 AppendList.fromList
-                                 [Assembly.directive_ccall (),
-                                  Assembly.instruction_call
-                                  {target = Operand.label target,
-                                   absolute = false}]
+                                case (symbolScope, 
+                                      !Control.Target.os, 
+                                      !Control.format) of
+                                   (* Internal functions can be easily reached
+                                    * with a direct (rip-relative) call.
+                                    *)
+                                  (Internal, _, _) => direct
+                                   (* Windows always does indirect calls to
+                                    * imported functions. The importLabel has
+                                    * the function address written to it.
+                                    *)
+                                 | (External, MinGW, _) => indirect
+                                 | (External, Cygwin, _) => indirect
+                                   (* ELF systems (and darwin too) create
+                                    * procedure lookup tables (PLT) which 
+                                    * proxy the call to libraries. The PLT
+                                    * does not contain an address, but instead
+                                    * a stub function. Often the PLT is auto-
+                                    * matically created. This applies to all
+                                    * darwin-x86_64 function calls and calls
+                                    * made from an ELF executable.
+                                    *)
+                                 | (External, Darwin, _) => direct
+                                 | (External, _, Library) => plt
+                                 | _ => direct
                               end
                          | Indirect =>
                               AppendList.fromList
@@ -2168,16 +2192,11 @@
                       of [] => doit ()
                        | block => block::(doit ())))
         val assembly = doit ()
-        val symbol_stubs = mkSymbolStubs ()
         val _ = destLayoutInfo ()
         val _ = destProfileLabel ()
 
         val assembly = [Assembly.pseudoop_text ()]::assembly
         val assembly =
-           if List.isEmpty symbol_stubs
-              then assembly
-              else symbol_stubs :: assembly
-        val assembly =
            if List.isEmpty data
               then assembly
               else data::assembly

Modified: mlton/trunk/mlton/codegen/amd64-codegen/amd64-generate-transfers.sig
===================================================================
--- mlton/trunk/mlton/codegen/amd64-codegen/amd64-generate-transfers.sig	2008-08-20 00:44:21 UTC (rev 6783)
+++ mlton/trunk/mlton/codegen/amd64-codegen/amd64-generate-transfers.sig	2008-08-20 01:54:34 UTC (rev 6784)
@@ -7,7 +7,6 @@
  *)
 
 type int = Int.t
-type word = Word.t
 
 signature AMD64_GENERATE_TRANSFERS_STRUCTS =
   sig

Modified: mlton/trunk/mlton/codegen/amd64-codegen/amd64-mlton.fun
===================================================================
--- mlton/trunk/mlton/codegen/amd64-codegen/amd64-mlton.fun	2008-08-20 00:44:21 UTC (rev 6783)
+++ mlton/trunk/mlton/codegen/amd64-codegen/amd64-mlton.fun	2008-08-20 01:54:34 UTC (rev 6784)
@@ -660,19 +660,94 @@
              | CPointer_lt => cmp Instruction.B
              | CPointer_sub => binal Instruction.SUB
              | CPointer_toWord => mov ()
-             | FFI_Symbol {name, ...}
-             => let     
-                   val (dst,dstsize) = getDst1 ()
+             | FFI_Symbol {name, symbolScope, ...}
+             => let
+                   datatype z = datatype CFunction.SymbolScope.t
+                   datatype z = datatype Control.Format.t
+                   datatype z = datatype MLton.Platform.OS.t
+
+                   val (dst, dstsize) = getDst1 ()
+                   val label = Label.fromString name
+                   
+                   (* how to access an imported label's address *)
+                   (* windows coff will add another leading _ to label *)
+                   val coff = Label.fromString ("_imp__" ^ name)
+                   val macho = Label.fromString (name ^ "@GOTPCREL")
+                   val elf = Label.fromString (name ^ "@GOTPCREL")
+                   
+                   val importLabel = 
+                      case !Control.Target.os of
+                         Cygwin => coff
+                       | Darwin => macho
+                       | MinGW => coff
+                       | _ => elf
+                   
+                   (* It's direct, but still PIC *)
+                   val direct = 
+                      AppendList.fromList
+                      [Block.mkBlock'
+                       {entry = NONE,
+                        statements =
+                        [Assembly.instruction_lea
+                         {dst = dst,
+                          src = Operand.memloc_label label,
+                          size = dstsize}],
+                        transfer = NONE}]
+                   
+                   val indirect = 
+                      AppendList.fromList
+                      [Block.mkBlock'
+                       {entry = NONE,
+                        statements =
+                        [Assembly.instruction_mov
+                         {dst = dst,
+                          src = Operand.memloc_label importLabel,
+                          size = dstsize}],
+                        transfer = NONE}]
                 in
-                   AppendList.fromList
-                   [Block.mkBlock'
-                    {entry = NONE,
-                     statements =
-                     [Assembly.instruction_lea
-                      {dst = dst,
-                       src = Operand.memloc_label (Label.fromString name),
-                       size = dstsize}],
-                     transfer = NONE}]
+                   case (symbolScope, !Control.Target.os, !Control.format) of
+                    (* As long as the symbol is internal (this means it is not
+                     * exported to code outside this text segment), then 
+                     * RIP-relative addressing works on every OS/format. 
+                     *
+                     * WARNING: If the symbol >is< exported, even if defined
+                     * in this text segment, this technique can be dangerous.
+                     * 
+                     * C expects two pointers to the same symbol to be equal.
+                     * However, at least ELF&darwin relocate the address of
+                     * exported symbols to the executable. This is fatal for
+                     * _symbol alloc external, because the ML code would be
+                     * updating the wrong memory location, one unseen by the
+                     * executable. For functions it is less tragic, because
+                     * both addresses work, even if they don't compare to
+                     * equal under pointer arithmetic. Still wrong though.
+                     *)
+                      (Internal, _, _) => direct
+                    (* When compiling to a library, we need to access external
+                     * symbols via some address that is updated by the loader.
+                     * That address resides within our data segment, and can
+                     * be easily referenced using RIP-relative addressing.
+                     * This trick is used on every platform MLton supports.
+                     * Windows rewrites __imp__name symbols in our segment.
+                     * ELF and darwin-x86_64 rewrite name at GOTPCREL.
+                     *)
+                    | (External, _, Library) => indirect
+                    (* When linking an executable, ELF and darwin-x86_64 use 
+                     * a special trick to "simplify" the code. All exported
+                     * functions and symbols have pointers that correspond to
+                     * to the executable. Function pointers point to the 
+                     * automatically created PLT entry in the executable.
+                     * Variables are copied/relocated into the executable bss.
+                     * This means that direct access is fine for executable
+                     * and archive formats. (It also means direct access is
+                     * NOT fine for a library, even if it defines the symbol)
+                     * 
+                     * On windows, the address is the point of definition. So
+                     * we must use an indirect lookup even in executables.
+                     *)
+                    | (External, MinGW, _) => indirect
+                    | (External, Cygwin, _) => indirect
+                    | _ => direct
                 end
              | Real_Math_sqrt _ => sse_unas Instruction.SSE_SQRTS
              | Real_abs s =>




More information about the MLton-commit mailing list