[MLton] Question on profile.fun

Matthew Fluet fluet@cs.cornell.edu
Thu, 2 Jun 2005 20:45:55 -0400 (EDT)


> > You believe that the move of a constant integer to a known slot in
> > the gc state at transitions in the profile graph is too intrusive?
> 
> Yes.  The point is that it happens all the time, not just at (SSA)
> nontail calls, and not just at (SSA) basic block entries.
> Furthermore, to implement this portably within MLton, the right place
> to put it is at the Machine IL, which means it will interfere with
> codegen optimizations too.  I bet it'll hurt more than 50% on some
> benchmarks.  That's a lot of skew.  I'm already annoyed by the skew
> that we get with -profile time as it is, 20-30% on some benchmarks
> IIRC, although it would be worth rerunning to see where we are today.
> 
> In any case, you're welcome to try the experiment.  It would be
> interesting to know.

I tried the following experiment.  Added a flag  
     -profile-dummy {false|true}  
which instructs  profile.fun(line 447)  to, in addition to any other 
profiling work, insert code to increment a dummy field in the gcState at 
_every_  Profile statement in the RSSA IL program.  While this isn't 
exactly the work required to modify time profiling as I described 
previously, I figure that it is about on par with that work.  (Possibly 
more, since -profile-include/exclude flags may not require every Profile 
statement to actually modify the current state.)  Stephen is correct in 
that the difference between
     -profile mark -profile-dummy true
and
     -profile mark -profile-dummy false
can incur more than 50% (of the running time of the unprofiled program).
Though, of the 42 benchmark programs:
  25 incur less than 10%
  30 incur less than 20%
  33 incur less than 30%
  36 incur less than 40%
  38 incur less than 50%
  and the last four are at 56%, 65%, 75%, and 133%.

I went ahead and ran the benchmarks with the options
  mlton {-profile no,-profile {mark,time,alloc,count} {,-profile-dummy true}} 
to measure the current impact of profiling.  Note that this leaves the 
other profiling flags (-profile-{c,branch,exclude,include,stack,raise}) at 
their defaults, and most of those options would simply incur additional 
cost.

I note that the difference between
  -profile mark -profile-dummy false
and
  -profile no 
(a measure of the cost of carrying profiling data through the ILs and
optimizations, without incurring the cost of actually doing anything at
runtime; though, -profile mark does insert the time profiling labels into
the Machine IL, and I'm fairly certain that cutting up blocks at the
profiling labels is interfering with codegen optimizations), follows much 
the same pattern as above:
  27 incur less than 10%
  34 incur less than 20%
  36 incur less than 30%
  38 incur less than 40%
  40 incur less than 50%
  and the last two are at 65% and 603%.

The difference between
  -profile time -profile-dummy false
and
  -profile no
(the measure Stephen was interested in), also follows essentially the same 
pattern:
  25 incur less than 10%
  31 incur less than 20%
  36 incur less than 30%
  38 incur less than 40%
  40 incur less than 50%
  and the last two are at 65% and 613%.

So, the impact of actually doing time profiling over the impact of adding 
profiling data to the program is virtually nil: only one incurs a cost of 
more than 10%, and it is only 24%.


Anyways, here is the complete data.

MLton0 -- mlton -profile no
MLton1 -- mlton -profile mark 
MLton2 -- mlton -profile mark -profile-dummy true
MLton3 -- mlton -profile time 
MLton4 -- mlton -profile time -profile-dummy true
MLton5 -- mlton -profile alloc 
MLton6 -- mlton -profile alloc -profile-dummy true
MLton7 -- mlton -profile count 
MLton8 -- mlton -profile count -profile-dummy true
run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7 MLton8
barnes-hut          1.00   1.05   1.09   0.99   1.02   1.32   1.35   2.32   2.34
boyer               1.00   1.09   1.11   1.04   1.07   1.61   1.63   2.86   2.86
checksum            1.00   1.65   1.72   1.65   1.80   1.73   1.72   8.52   8.87
count-graphs        1.00   1.03   1.17   1.04   1.16   2.16   2.28  10.46  10.67
DLXSimulator        1.00   1.00   1.00   1.05   1.05   1.83   1.84   1.04   1.04
fft                 1.00   0.97   1.00   0.96   0.99   0.99   0.97   1.18   1.19
fib                 1.00   1.39   1.50   1.39   1.49   1.36   1.44   5.32   5.20
flat-array          1.00   1.10   1.16   1.10   1.17   1.10   1.16   8.03   8.07
hamlet              1.00   1.11   1.16   1.13   1.17   1.59   1.66   2.77   2.88
imp-for             1.00   0.99   2.32   0.99   2.32   1.00   2.32  47.49  48.98
knuth-bendix        1.00   1.24   1.33   1.25   1.34   1.45   1.56  10.22  10.38
lexgen              1.00   1.07   1.15   1.06   1.13   1.28   1.31   2.52   2.53
life                1.00   1.31   1.79   1.35   1.85   1.94   2.54  21.97  22.58
logic               1.00   1.08   1.11   1.11   1.14   1.40   1.44   3.33   3.28
mandelbrot          1.00   1.04   1.11   0.68   0.77   0.68   0.78   5.49   5.78
matrix-multiply     1.00   0.92   1.05   0.83   1.16   0.83   1.16   6.85   6.86
md5                 1.00   1.45   1.82   1.45   1.83   1.51   1.63  17.45  18.09
merge               1.00   1.01   1.03   0.99   1.05   1.32   1.33   1.38   1.43
mlyacc              1.00   1.04   1.08   1.05   1.08   1.73   1.79   2.43   2.46
model-elimination   1.00   1.08   1.12   1.09   1.13   1.61   1.66   3.07   3.11
mpuz                1.00   1.02   1.33   1.02   1.33   1.00   1.33  24.04  22.47
nucleic             1.00   0.99   1.02   0.99   1.02   1.27   1.30   1.77   1.80
output1             1.00   0.97   1.18   0.97   1.18   0.97   1.18   6.73   6.84
peek                1.00   1.25   2.00   1.25   2.00   1.25   2.00  81.34  82.03
psdes-random        1.00   1.10   1.59   1.10   1.62   1.10   1.63  25.47  26.09
ratio-regions       1.00   1.16   1.30   1.16   1.32   1.21   1.35   8.51   8.67
ray                 1.00   1.07   1.12   1.05   1.11   1.14   1.16   4.97   5.00
raytrace            1.00   1.05   1.12   1.06   1.13   1.13   1.19   7.99   8.10
simple              1.00   1.02   1.15   0.94   1.07   1.44   1.47   3.90   3.96
smith-normal-form   1.00   1.00   1.00   1.01   1.01   1.01   1.01   1.02   1.03
tailfib             1.00   0.96   1.52   0.96   1.52   0.96   1.52  20.50  20.79
tak                 1.00   1.44   1.47   1.43   1.47   1.36   1.37   4.45   4.50
tensor              1.00   0.84   1.49   0.84   1.49   0.84   1.49  64.94  66.26
tsp                 1.00   1.03   1.06   1.27   1.04   1.02   1.05   4.06   4.24
tyan                1.00   1.08   1.12   1.09   1.13   1.84   1.88   3.98   4.07
vector-concat       1.00   1.00   0.99   1.01   1.02   1.15   1.00   1.01   1.01
vector-rev          1.00   0.96   1.03   0.98   1.04   0.97   1.04   3.92   4.01
vliw                1.00   1.14   1.22   1.22   1.20   1.82   1.84   3.30   3.31
wc-input1           1.00   1.20   1.42   1.18   1.40   1.16   1.38  10.20  10.74
wc-scanStream       1.00   7.03   7.29   7.13   7.30  15.56  15.90  15.40  15.38
zebra               1.00   1.19   1.55   1.27   1.70   2.81   3.30  23.19  25.67
zern                1.00   1.01   1.10   1.01   1.10   1.00   1.10   7.12   7.36
size
benchmark            MLton0    MLton1    MLton2    MLton3    MLton4    MLton5    MLton6    MLton7    MLton8
barnes-hut           99,700   145,314   148,194   149,084   151,976   131,076   134,116   140,700   143,468
boyer               135,375   206,323   215,699   213,565   222,925   182,861   192,061   187,989   198,709
checksum             50,095    55,755    55,979    62,725    62,949    58,397    58,589    56,557    56,701
count-graphs         63,135    93,783    96,423   102,129   104,657    89,393    92,041    89,713    91,905
DLXSimulator        126,067   216,299   222,707   221,613   228,021   182,613   189,605   211,717   219,317
fft                  61,358    75,982    77,150    84,232    85,384    75,280    76,480    74,496    75,568
fib                  44,691    50,143    50,351    57,129    57,337    52,801    52,945    50,177    50,209
flat-array           44,715    50,143    50,303    57,113    57,273    52,801    52,945    50,513    50,577
hamlet            1,246,854 2,699,934 2,858,062 2,704,854 2,862,966 2,354,358 2,520,822 2,586,806 2,740,006
imp-for              44,547    52,551    53,031    59,457    59,937    53,481    53,913    55,249    55,601
knuth-bendix        105,907   165,595   170,571   172,517   177,637   149,637   154,357   178,669   184,941
lexgen              199,332   358,500   372,220   361,020   374,724   308,524   321,588   314,940   329,492
life                 62,059    85,407    86,879    91,801    93,305    82,073    83,809    87,185    89,089
logic               103,567   172,679   177,751   179,857   184,913   160,729   165,817   167,105   171,825
mandelbrot           44,643    50,215    50,439    57,153    57,393    52,793    53,033    51,065    51,209
matrix-multiply      46,294    54,414    54,846    61,200    61,632    55,368    55,784    55,880    56,264
md5                  74,531    97,431    98,151   103,953   104,721    92,505    93,225    89,385    89,833
merge                46,271    53,255    53,575    60,161    60,497    55,593    55,897    52,657    52,849
mlyacc              501,140   933,496   971,240   935,984   973,728   806,744   842,640   941,984   995,088
model-elimination   631,901 1,236,233 1,286,281 1,241,211 1,291,339 1,058,871 1,109,323 1,164,299 1,218,123
mpuz                 47,307    60,927    62,015    69,345    70,289    61,561    62,537    61,641    62,329
nucleic             196,246   221,650   223,026   228,420   229,812   220,484   222,004   233,108   235,060
output1              77,373    97,689    98,377    99,465   100,169    91,721    92,345    83,769    83,977
peek                 73,483    92,551    93,335    97,649    98,321    90,481    91,057    86,009    86,329
psdes-random         45,355    52,255    52,591    59,177    59,513    54,129    54,449    53,489    53,793
ratio-regions        70,387   126,567   132,791   135,497   141,769   110,721   116,897   139,945   146,425
ray                 178,284   291,556   296,644   298,444   303,532   256,732   261,340   265,868   273,868
raytrace            260,497   440,625   457,457   441,497   458,473   373,169   390,033   421,853   441,661
simple              219,103   379,215   394,463   383,857   399,249   359,777   375,569   402,561   420,609
smith-normal-form   178,867   220,459   223,275   225,973   228,837   206,117   208,965   209,509   212,005
tailfib              44,387    49,727    49,919    56,681    56,873    52,433    52,593    50,369    50,449
tak                  44,771    49,959    50,119    56,993    57,153    52,713    52,825    50,209    50,241
tensor               94,850   156,514   161,466   161,940   166,900   129,652   135,228   198,764   205,452
tsp                  79,059   105,999   107,679   112,377   114,121    98,385   100,097   104,225   106,305
tyan                132,123   232,427   240,331   238,141   246,101   199,773   207,853   209,421   217,549
vector-concat        45,971    52,047    52,191    59,001    59,145    54,529    54,641    51,777    51,841
vector-rev           45,199    51,311    51,503    58,313    58,505    53,633    53,809    51,713    51,873
vliw                387,187   916,471   955,039   919,071   957,623   750,415   787,719   834,191   871,487
wc-input1            99,071   138,827   140,027   140,963   142,163   127,219   128,371   111,667   111,859
wc-scanStream       106,215   151,215   152,495   153,415   154,695   138,771   140,003   119,379   119,667
zebra               121,515   264,243   277,699   270,589   284,125   187,853   204,925   318,441   337,465
zern                 85,796   106,444   107,724   115,222   116,534   102,734   103,934   112,046   113,262
compile time
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7 MLton8
barnes-hut          8.87   9.69   9.81   9.78  10.01   9.54   9.61   9.87   9.86
boyer               9.51  10.65  10.95  10.98  11.17  10.53  10.71  10.62  10.94
checksum            6.28   6.40   6.45   6.59   6.64   6.57   6.55   6.58   6.62
count-graphs        7.11   7.66   7.68   7.77   7.88   7.63   7.70   7.73   7.79
DLXSimulator        9.91  11.58  11.89  11.68  11.84  11.14  11.32  11.70  11.87
fft                 6.78   7.08   7.14   7.30   7.35   7.23   7.26   7.39   7.41
fib                 6.23   6.43   6.37   6.58   6.55   6.50   6.54   6.56   6.61
flat-array          6.24   6.40   6.47   6.59   6.61   6.58   6.56   6.59   6.73
hamlet             62.93  86.85  92.24  86.99  91.89  81.14  85.29  82.68  86.08
imp-for             6.35   6.50   6.53   6.67   6.70   6.60   6.60   6.73   6.72
knuth-bendix        8.20   9.16   9.30   9.38   9.51   9.15   9.22   9.49   9.59
lexgen             11.63  14.18  14.48  14.28  14.61  13.58  13.96  13.73  14.12
life                6.90   7.23   7.25   7.38   7.50   8.71   7.33   7.42   7.42
logic               8.36   9.41   9.66   9.66   9.95   9.38   9.51   9.45   9.47
mandelbrot          6.29   6.51   6.54   6.72   6.64   6.57   6.59   6.66   6.64
matrix-multiply     7.99   6.58   6.58   6.78   6.73   6.80   6.77   6.81   6.76
md5                 7.02   7.69   7.68   7.82   7.88   7.71   7.71   7.77   7.74
merge               6.33   6.51   6.46   6.63   6.67   6.70   6.63   6.72   6.68
mlyacc             27.13  34.82  35.98  34.84  36.07  32.96  34.16  33.36  34.76
model-elimination  29.42  40.24  41.53  40.32  41.72  37.63  38.88  36.46  38.03
mpuz                6.38   6.65   6.74   6.88   6.94   6.79   6.82   6.85   6.85
nucleic            13.81  14.09  14.19  14.54  14.64  14.36  14.30  14.58  14.61
output1             7.01   7.58   7.54   7.57   7.52   7.50   7.47   7.56   7.52
peek                6.93   7.45   7.50   7.57   7.55   7.50   7.54   7.65   7.66
psdes-random        6.25   6.53   6.52   6.71   6.76   6.64   6.63   6.69   6.71
ratio-regions       7.50   8.45   8.68   8.72   8.84   8.38   8.64   8.86   9.04
ray                10.32  12.57  12.73  12.78  12.95  12.17  12.35  12.62  12.82
raytrace           14.75  17.91  18.40  17.84  18.35  16.88  17.43  17.53  17.96
simple             12.22  14.37  14.95  14.48  14.97  13.98  14.65  14.68  15.27
smith-normal-form  10.51  11.57  11.60  11.59  11.73  11.41  11.47  11.70  11.79
tailfib             6.21   6.40   6.40   6.59   6.52   6.51   6.55   6.56   6.57
tak                 6.18   6.38   6.41   6.57   6.58   6.53   6.52   6.58   6.56
tensor              8.87  10.31  10.48  10.44  10.60  10.07  10.12  10.62  10.93
tsp                 7.36   8.08   8.14   9.71   8.27   8.04   8.14   8.36   8.37
tyan                9.72  11.63  11.89  11.83  12.06  11.24  11.50  11.54  11.77
vector-concat       6.24   6.47   6.53   6.67   6.70   7.20   6.62   6.66   6.68
vector-rev          6.21   7.34   6.39   6.68   6.64   6.62   6.61   6.69   6.65
vliw               20.30  30.19  31.32  38.42  31.35  28.04  29.16  27.53  28.74
wc-input1           7.85   8.81   8.82   8.85   8.89   8.62   8.69   8.74   8.69
wc-scanStream       8.17   9.14   9.25   9.22   9.23   9.02   9.03   9.03   9.04
zebra               9.29  11.24  11.64  11.39  11.82  10.78  11.07  12.38  12.76
zern                6.98   7.41   7.48   7.85   7.74   7.53   7.60   7.75   7.75
run time
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6  MLton7  MLton8
barnes-hut         50.57  53.24  55.01  49.92  51.37  66.92  68.46  117.11  118.22
boyer              53.95  58.84  60.10  56.34  57.66  86.89  88.07  154.22  154.15
checksum           97.36 160.35 167.09 160.89 175.53 168.39 167.05  829.31  863.90
count-graphs       41.38  42.76  48.26  42.97  48.08  89.45  94.32  432.76  441.54
DLXSimulator       90.58  90.61  90.76  95.37  95.55 165.44 166.24   94.01   93.97
fft                37.67  36.37  37.59  36.20  37.11  37.13  36.63   44.44   44.88
fib                70.29  97.56 105.30  97.52 104.97  95.66 100.98  373.84  365.49
flat-array         25.06  27.48  29.16  27.56  29.21  27.48  29.15  201.32  202.35
hamlet             53.08  59.16  61.40  59.72  62.08  84.56  88.18  146.84  153.01
imp-for            46.82  46.20 108.57  46.22 108.70  46.80 108.54 2223.65 2293.43
knuth-bendix       38.63  48.01  51.52  48.29  51.77  56.18  60.43  394.89  400.87
lexgen             44.42  47.73  51.02  46.95  50.10  56.65  58.40  111.86  112.36
life               14.42  18.94  25.81  19.45  26.63  27.93  36.60  316.88  325.70
logic              56.07  60.66  62.17  62.06  63.66  78.24  80.65  186.49  183.77
mandelbrot         82.46  85.93  91.28  55.98  63.40  55.98  63.96  452.52  476.85
matrix-multiply     8.20   7.52   8.60   6.83   9.54   6.84   9.49   56.16   56.30
md5                53.22  77.27  97.11  77.33  97.21  80.18  86.76  928.61  962.71
merge              87.86  88.44  90.62  87.31  92.63 115.78 116.82  121.43  125.70
mlyacc             41.03  42.77  44.30  43.08  44.45  71.10  73.40   99.64  101.11
model-elimination  79.62  85.98  88.96  86.45  90.02 127.97 132.24  244.67  247.38
mpuz               41.98  42.84  55.78  42.76  55.91  41.85  55.83 1008.87  943.37
nucleic            45.47  45.16  46.35  45.19  46.48  57.80  58.99   80.38   81.66
output1            15.24  14.83  17.95  14.85  17.95  14.86  17.93  102.54  104.18
peek               35.45  44.30  70.83  44.33  70.94  44.33  70.90 2882.97 2907.41
psdes-random       38.88  42.87  61.79  42.89  63.15  42.87  63.42  990.41 1014.38
ratio-regions      53.00  61.39  69.05  61.52  69.72  64.16  71.37  450.98  459.61
ray                30.73  32.84  34.31  32.37  34.23  35.13  35.70  152.67  153.65
raytrace           42.53  44.77  47.54  45.07  47.95  47.94  50.51  339.61  344.47
simple             60.43  61.72  69.39  56.99  64.63  87.20  88.59  235.90  239.03
smith-normal-form  36.94  36.97  37.09  37.13  37.16  37.16  37.19   37.83   37.88
tailfib            43.73  41.78  66.40  41.80  66.42  41.79  66.39  896.74  909.36
tak                27.60  39.72  40.43  39.38  40.43  37.51  37.90  122.74  124.18
tensor             58.97  49.31  88.00  49.35  88.07  49.28  88.05 3829.67 3907.20
tsp                64.40  66.41  68.30  81.56  67.17  65.55  67.37  261.32  272.98
tyan               58.11  62.71  64.83  63.19  65.45 106.68 109.43  231.13  236.41
vector-concat      92.48  92.40  91.95  93.55  94.28 106.68  92.91   92.99   93.77
vector-rev        119.16 114.60 122.84 116.38 124.34 115.85 124.12  466.79  478.11
vliw               51.63  59.10  62.99  62.90  61.76  93.84  94.80  170.32  170.97
wc-input1          38.72  46.45  55.15  45.73  54.38  44.84  53.34  394.99  415.97
wc-scanStream      34.30 241.25 250.02 244.38 250.43 533.79 545.33  528.20  527.37
zebra              40.69  48.31  63.23  51.66  69.29 114.39 134.11  943.36 1044.25
zern               41.11  41.53  45.34  41.52  45.22  41.31  45.17  292.87  302.44