On Mon, Nov 23, 2009 at 8:39 PM, David Hansel <span dir="ltr"><<a href="mailto:hansel@reactive-systems.com">hansel@reactive-systems.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I was looking into what could be causing the problem and came across<br>
file MLton/lib/mlton/sml/mlnlffi-lib/memory/linkage-libdl.sml which<br>
is of course used by the FFI. I wasn't completely sure what the "era"<br>
deal in that code is, so I changed the body of function "get" to just<br>
"f()", resolving the FFI function's address before every call.</blockquote><div><br>The "era" is supposed to invalidated dynamically loaded library addresses if the executable is started up again after saving the world (MLton.World.save). Because it is a new executable invocation, the dynamically linked library needs to be reloaded and it might end up at a different address. This isn't actually done in the linkage-libdl.sml code; see the commented out "Cleaner.addNew" application. I don't recall why it is disabled. In any case, unless you are saving and loading worlds, it shouldn't affect your code.<br>
</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> After<br>
that change, all crashes were gone. Furthermore, changing the body<br>
of "get" to just "a" does NOT fix the crashes. That looked good<br>
so I added some "print" statements in "get" to see whether there is<br>
a problem with the address not being resolved properly. Unfortunately,<br>
just adding the "print" statements also made the crashes go away. In<br>
fact, just adding 'print "";' at the beginning of "get" eliminates<br>
the crashes. Interestingly, this eliminates the crashes completely.<br>
With other changes in our code I was able to eliminate some instances<br>
of the crashes but new ones would pop up at other places. I suspect<br>
that the proximity of this code to the actual FFI calls might play<br>
a role in that.<br></blockquote><div><br>This, and your next email, suggest that it is a bug with the native codegen. The probable role that the proximity of the "print" call plays is that there will be an C function call invoked by the "print", which "resets" the register allocator. Without the "print" call, there is a wider scope over which the register allocator is able to work, and, apparently, is mistakenly dropping a def.<br>
</div></div>