profiling go
Henry Cejtin
henry@sourcelight.com
Sat, 9 Jun 2001 01:22:22 -0500
Here is the loop that computes the checksum of a vector of bytes:
fun loop_55 (x_495, x_494) =
if x_494 = x_493
then if x_489 = x_495
then SOME_1 x_491
else raise BadChecksum
else loop_55 (Word32.+ (Word32.tolargeWord (Vector.sub (x_491,
x_494)),
Word32.+ (0w63,
Word32.* (0wx1234567, x_495))),
x_494 + 1 (overflow => raise Overflow))
And here is the relevant part of the generated code:
loop_55:
movl (48*1)(%edi),%eax
cmpl (40*1)(%edi),%eax
je L_423
movl %eax,%ebx
incl %ebx
jo L_427
movl (36*1)(%edi),%ecx
movb (%ecx,%eax,1),%dl
movl %ebx,(48*1)(%edi)
movzbl %dl,%eax
movl %eax,localuint
movl (44*1)(%edi),%eax
movl $0x1234567,%ebx
xorl %edx,%edx
mull %ebx
addl $0x63,%eax
addl localuint,%eax
movl %eax,(44*1)(%edi)
jmp loop_55
I'm confused by the constant re-loading of %ecx (x_491 in the CPS code).
Also the storing of %eac in localuint. Perhaps this kind of code-invariant
motion is too much to ask for right now, but I would have thought it would
have fallen out.
Any way, the ML version now takes 48.82 CPU seconds and the C version takes
20.97 for a ratio of 2.33.