[MLton] Re: MLton and shared libraries

Jens Axel Søgaard jensaxel@soegaard.net
Wed, 20 Apr 2005 00:46:00 +0200


Stephen Weeks wrote:

> I was thinking
> 
>   fn () => if Primitive.isSharedLibrary
>              then Primitive.Thread.returnToC ()
>            else MLtonProcess.exit MLtonProcess.Status.success
> 
> * I used Primitive.isSharedLibrary rather than Control.isSharedLibrary
>   because Primitive is for holding such controls in the basis library,
>   while Control is for holding such stuff in the compiler.

> * I called returnToC, because falling off the end of the program
>   without doing anything will cause a segfault, as the return will
>   have nothing to return to.

I have come a little closer, but need some help to figure out some
arguments for "main". But first things first. Here is a "progress
report" of what I tried so far.

Adding the command line option -shared-library {true|false} to
the mlton/main/main.fun and modifying setSuffix to the above in
basis-library/mlton/mlton.sml was easy.

Then I spent some time foolishly editing include/c-main.h and friends
until I found out that build/bin/mlton uses the include files in
build/lib/include.

Then I added the function to the end of c-main.h.

void                                                                    \
__attribute__((constructor))                                            \
init_function (int dummy)                                               \
{                                                                       \
    ...
}

When the shared library is opened in mzscheme, the function init_function
is now run automatically (as witnesses by fprintf-statements to stderr).

The problem was no to figure out what to put in the init_function in
order to initialize the MLton system. My initial plan was simply to
call main() with bogus argc and argv. Something like:

    void *argv[2];
    argv[0] = "programname";
    argv[1] = NULL;
    main(1, argv);

To my surprise this lead to this behavior:

bash# mzscheme -f test.ss
Welcome to MzScheme version 299.102, Copyright (c) 2004-2005 PLT Scheme, Inc.
 > (require (lib "foreign.ss"))
 > (unsafe!)
 > (define lib (get-ffi-obj "/usr/soegaard/ml/source/mlton-20041109/soegaard/test"))
Welcome to MzScheme version 299.102, Copyright (c) 2004-2005 PLT Scheme, Inc.
 >


Huh - where did the second welcome message appear from? Ooh! Due to the
dynamic linking, the main called in the dynamic constructor is not the main
defined  in the same file as init_function is defined, but rather mzscheme's
main function.

Due to the above I decided to copy the body of main into init_function
in stead and put in a generous amount of print statements to see
exactly where the inevitable core dump occurred.

Since the Initialize macro uses the names argv and argc I put in
some hard coded values at the beginning of init_function.

void errPrintFlush(char *msg)                                           \
{                                                                       \
          fprintf(stderr, msg);                                          \
          fflush(stderr);                                                \
}                                                                       \
void                                                                    \
__attribute__((constructor))                                            \
init_function (int dummy)                                               \
{                                                                       \
         int argc=1;                                                     \
         void *argv[2];                                                  \
         struct cont cont;                                               \
         argv[0] = "bar";                                                \
         argv[1] = NULL;                                                 \
         Initialize (al, cs, mg, mfs, mmc, pk, ps);                      \
             errPrintFlush("init_function> before real_Init\n");         \
         real_Init();                                                    \
             errPrintFlush("init_function> after real_Init\n");          \
         PrepFarJump(mc, ml);                                            \
             errPrintFlush("init_function> before trampoline\n");        \
         /* Trampoline */                                                \
         while (1) {                                                     \
                     errPrintFlush("init_function> trampoline 1\n");     \
                 cont=(*(struct cont(*)(void))cont.nextChunk)();         \
                     errPrintFlush("init_function> trampoline 2\n");     \
                 cont=(*(struct cont(*)(void))cont.nextChunk)();         \
                     errPrintFlush("init_function> trampoline 3\n");     \
                 cont=(*(struct cont(*)(void))cont.nextChunk)();         \
                 cont=(*(struct cont(*)(void))cont.nextChunk)();         \
                 cont=(*(struct cont(*)(void))cont.nextChunk)();         \
                 cont=(*(struct cont(*)(void))cont.nextChunk)();         \
                 cont=(*(struct cont(*)(void))cont.nextChunk)();         \
                 cont=(*(struct cont(*)(void))cont.nextChunk)();         \
         }                                                               \
             errPrintFlush("init_function> After main\n");               \
}                                                                       \

This leads to:

bash-2.05b# mzscheme -f test.ss
Welcome to MzScheme version 299.102, Copyright (c) 2004-2005 PLT Scheme, Inc.
init_function> before real_Init
init_function> after real_Init
init_function> before trampoline
init_function> trampoline 1
Segmentation fault (core dumped)

At this point I wanted to make sure, that my dummy definitions of
argc and argv wasn't the cause of this. I therefore decided to
find where they were used. The Initialize macro initializes gcState
and then ends in a call to MLton_init(argc, argv, &gcState).

void MLton_init (int argc, char **argv, GC_state s) {
         int start;

         Posix_ProcEnv_environ = (CstringArray)environ;
         start = GC_init (s, argc, argv);
         /* Setup argv and argc that SML sees. */
         /* start is now the index of the first real arg. */
         CommandLine_commandName = (uint)(argv[0]);
         CommandLine_argc = argc - start;
         CommandLine_argv = (uint)(argv + start);
}

Looking at GC_init and MLton_init didn't make it clear to
me what I kind of dummy arguments I need to use in
init_function.


The catious reader might have noticed that the body of main
actually starts like this:

         struct cont cont;                                               \
         Initialize (al, cs, mg, mfs, mmc, pk, ps);                      \
         if (gcState.isOriginal) {                                       \
                 real_Init();                                            \
                 PrepFarJump(mc, ml);                                    \
         } else {                                                        \
                 /* Return to the saved world */                         \
                 nextFun = *(int*)(gcState.stackTop - WORD_SIZE);        \
                 cont.nextChunk = nextChunks[nextFun];                   \
         }                                                               \

The dummy arguments I used, implies that the false branch is taken.
I figured that it should have been the true branch, so I removed
the if-statement, but this might be a mistake?

-- 
Jens Axel Søgaard