Here’s how Profiling works. If profiling is on, the front end (elaborator) inserts Enter and Leave statements into the source program for function entry and exit. For example,

fun f n = if n = 0 then 0 else 1 + f (n - 1)

becomes

fun f n =
   let
      val () = Enter "f"
      val res = (if n = 0 then 0 else 1 + f (n - 1))
                handle e => (Leave "f"; raise e)
      val () = Leave "f"
   in
      res
   end

Actually there is a bit more information than just the source function name; there is also lexical nesting and file position.

Most of the middle of the compiler ignores, but preserves, Enter and Leave. However, so that profiling preserves tail calls, the SSA shrinker has an optimization that notices when the only operations that cause a call to be a nontail call are profiling operations, and if so, moves them before the call, turning it into a tail call. If you observe a program that has a tail call that appears to be turned into a nontail when compiled with profiling, please report a bug.

There is the checkProf function in type-check.fun, which checks that the Enter/Leave statements match up.

In the backend, just before translating to the Machine IL, the profiler uses the Enter/Leave statements to infer the "local" portion of the control stack at each program point. The profiler then removes the Enters/Leaves and inserts different information depending on which kind of profiling is happening. For time profiling (with the AMD64Codegen and X86Codegen), the profiler inserts labels that cover the code (i.e. each statement has a unique label in its basic block that prefixes it) and associates each label with the local control stack. For time profiling (with the CCodegen and LLVMCodegen), the profiler inserts code that sets a global field that records the local control stack. For allocation profiling, the profiler inserts calls to a C function that will maintain byte counts. With stack profiling, the profiler also inserts a call to a C function at each nontail call in order to maintain information at runtime about what SML functions are on the stack.

At run time, the profiler associates counters (either clock ticks or byte counts) with source functions. When the program finishes, the profiler writes the counts out to the mlmon.out file. Then, mlprof uses source information stored in the executable to associate the counts in the mlmon.out file with source functions.

For time profiling, the profiler catches the SIGPROF signal 100 times per second and increments the appropriate counter, determined by looking at the label prefixing the current program counter and mapping that to the current source function.

Caveats

There may be a few missed clock ticks or bytes allocated at the very end of the program after the data is written.

Profiling has not been tested with signals or threads. In particular, stack profiling may behave strangely.