profiling

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Mon, 14 Jan 2002 18:12:07 -0500 (EST)


I checked in a small extension to the MLton structure, which adds a
structure Profile with the signature:

signature MLTON_PROFILE =
   sig
      (* a compile-time constant, like MLton.debug *)
      val profile: bool

      (* reset all the bin counters to 0 *)
      val reset: unit -> unit
      (* write out all of the bins to a mlmon.out format file,
       * with the given file name
       *)
      val write: string -> unit
   end

Now, I can write the following program:

val rec fib =
   fn 0 => 0
    | 1 => 1
    | n => fib (n - 1) + fib (n - 2)

val rec f =
   fn 0 => ()
    | n => (fib 38; f (n-1))

fun tak (x,y,z) =
   if not (y < x)
      then z
   else tak (tak (x - 1, y, z),
	     tak (y - 1, z, x),
	     tak (z - 1, x, y))

val rec g =
   fn 0 => ()
    | n => (tak (18,12,6); g (n-1))

val _ = MLton.Profile.write "mlmon.init.out"
val _ = MLton.Profile.reset ()
val _ = f 2
val _ = MLton.Profile.write "mlmon.fib.out"
val _ = MLton.Profile.reset ()
val _ = g 500
val _ = MLton.Profile.write "mlmon.tak.out"
val _ = MLton.Profile.reset ()

Compile it with -p (the implementations of MLton.Profile.{reset,write} are
predicated by MLton.Profile.profile, so when compiling without -p,
MLton.Profile.{reset,write} are nops), and run it.  Then I can do the
following:

[fluet@lennon w]$ mlprof -x -d 1 w mlmon.init.out
0.0 seconds of CPU time
[fluet@lennon w]$ mlprof -x -d 1 -t 5 w mlmon.fib.out
14.80 seconds of CPU time
fib_0                     100.00% (14.80s)
     L_83  24.93% (3.69s)                 
     L_85  20.00% (2.96s)                 
     L_78  15.00% (2.22s)                 
     fib_0  9.86% (1.46s)                 
     L_80   7.09% (1.05s)                 
     L_87   6.35% (0.94s)                 
     L_86   5.54% (0.82s)                 
     L_79   5.27% (0.78s)                 
[fluet@lennon w]$ mlprof -x -d 1 -t 5 w mlmon.tak.out
1.92 seconds of CPU time
tak_0                      100.00% (1.92s)
     L_75   12.50% (0.24s)                
     L_73   12.50% (0.24s)                
     L_71   11.98% (0.23s)                
     L_69   11.46% (0.22s)                
     loop_6 10.94% (0.21s)                
     L_68   10.42% (0.20s)                
     tak_0   9.38% (0.18s)                
     L_74    6.77% (0.13s)                
     L_76    5.73% (0.11s)                
     L_77    5.21% (0.10s)                
[fluet@lennon w]$ mlprof -x -d 1 -t 5 w mlmon.fib.out mlmon.tak.out 
16.72 seconds of CPU time
fib_0                      88.52% (14.80s)
     L_83   24.93% (3.69s)                
     L_85   20.00% (2.96s)                
     L_78   15.00% (2.22s)                
     fib_0   9.86% (1.46s)                
     L_80    7.09% (1.05s)                
     L_87    6.35% (0.94s)                
     L_86    5.54% (0.82s)                
     L_79    5.27% (0.78s)                
tak_0                       11.48% (1.92s)
     L_75   12.50% (0.24s)                
     L_73   12.50% (0.24s)                
     L_71   11.98% (0.23s)                
     L_69   11.46% (0.22s)                
     loop_6 10.94% (0.21s)                
     L_68   10.42% (0.20s)                
     tak_0   9.38% (0.18s)                
     L_74    6.77% (0.13s)                
     L_76    5.73% (0.11s)                
     L_77    5.21% (0.10s)                

Even though in this example the two profiling data are disjoint, there is
no problem combining mlmon files that include counts at the same address.

This is a prequel to getting specific profiling information on a self
compile to figure out why knownCase is > 6s on a self-compile.