profiling go

Henry Cejtin henry@sourcelight.com
Sat, 9 Jun 2001 01:22:22 -0500


Here is the loop that computes the checksum of a vector of bytes:

fun loop_55 (x_495, x_494) =
       if x_494 = x_493
	  then if x_489 = x_495
		  then SOME_1 x_491
		  else raise BadChecksum
	  else loop_55 (Word32.+ (Word32.tolargeWord (Vector.sub (x_491,
								  x_494)),
				  Word32.+ (0w63,
					    Word32.* (0wx1234567, x_495))),
			x_494 + 1 (overflow => raise Overflow))

And here is the relevant part of the generated code:

loop_55:
	movl (48*1)(%edi),%eax
	cmpl (40*1)(%edi),%eax
	je L_423
	movl %eax,%ebx
	incl %ebx
	jo L_427
	movl (36*1)(%edi),%ecx
	movb (%ecx,%eax,1),%dl
	movl %ebx,(48*1)(%edi)
	movzbl %dl,%eax
	movl %eax,localuint
	movl (44*1)(%edi),%eax
	movl $0x1234567,%ebx
	xorl %edx,%edx
	mull %ebx
	addl $0x63,%eax
	addl localuint,%eax
	movl %eax,(44*1)(%edi)
	jmp loop_55

I'm  confused  by  the  constant  re-loading of %ecx (x_491 in the CPS code).
Also the storing of %eac in localuint.  Perhaps this kind  of  code-invariant
motion  is  too  much to ask for right now, but I would have thought it would
have fallen out.

Any  way,  the ML version now takes 48.82 CPU seconds and the C version takes
20.97 for a ratio of 2.33.