profiling go
Henry Cejtin
henry@sourcelight.com
Sat, 9 Jun 2001 23:36:39 -0500
Sorry, yes, I had to convert the function from the A-normal form so that I
could actually read it, and clearly I typo'd a bit here and there. Here is
the actual cps with the `-show-types true' flag.
fun loop_51 (x_449: word,
x_448: int) =
let
val x_450: bool = MLton_eq(int) (x_448,
x_447)
fun L_377 () =
let
val x_451: bool = MLton_eq(word) (x_443,
x_449)
fun L_379 () =
let
val x_452: option_1 = SOME_1 (x_445)
in
x_452
end
fun L_380 () =
raise (global_39)
in
case x_451 of
false => L_380
| true => L_379
end
fun L_378 () =
let
fun L_381 () =
raise (global_15)
val x_453: int = Int_addCheck (x_448,
global_0) Overflow L_381
val x_455: word8 = Vector_sub(word8) (x_445,
x_448)
val x_456: word = Word8_toLargeWord (x_455)
val x_457: word = Word32_mul (global_70,
x_449)
val x_458: word = Word32_add (global_69,
x_457)
val x_454: word = Word32_add (x_456,
x_458)
in
loop_51 (x_454,
x_453)
end
in
case x_450 of
false => L_378
| true => L_377
end
No, the 0x63 really is 0x63, not 0x3F. I don't think that the re-loading was
because of a limit check. It looks to me like there are no allocations,
although it could be that there is some funny control flow that makes it
possible.
The instruction
xorl %edx, %edx
is the standard way to clear a register (%edx in this case). This is
completely un-needed since the mull instruction puts the result of
multiplying the %eax register by the %ebx register in the 64 bits formed by
the %edx register and the %eax register. It looks as if Matthew thought that
this register had to be cleared before the multiply.
This isn't the only hot loop in the code. Also, I don't mean to imply that
the current speed is unacceptable. For instance, if I convert the array of
seek pointers from Position.int option's to Position.int's (with negative 1
being the equivalent of NONE) then the code speeds up so that the C version
is only 1.95 times faster. I definitely do NOT intend to do this in the
code. It is just too ugly.
The idea is just that this is a good opportunity to see places where MLton
should generate better code.