x86 performance
Matthew Fluet
fluet@research.nj.nec.com
Mon, 7 Aug 2000 16:54:53 -0400 (EDT)
Here's a comparision of the c-codegen and the x86-codegen that has me
a little puzzled about where I'm losing performance. It's a very
simple, non-allocating loop, so I would have thought the two codegens
would be very similar.
The input is the standard even-odd recursion:
fun even 0 = true
| even n = odd (n - 1)
and odd 1 = true
| odd n = even (n - 1)
Letting that loop for 750 million iterations I got the following results:
C-codegen x86-codegen
time: 16.99 time: 22.65
And here's what the spy program showed for the loop. I've filled in
reasonable labels and marked conditional jumps which are not taken.
gcState.frontier->%esi gcState.frontier->%esp
gcState.stackTop->%ebx gcState.stackTop->%ebp
even: leal 0x18(%esi),%eax movl %esp,%esi 1
addl $0x18,%esi
cmpl 0x8053888,%eax cmpl 0x8054288,%esi
jbe 0x804cd47 jle 0x804bef8
skip_GC: cmpl $0x0,0x18(%ebx) cmp $0x0,0x18(%ebp)
jne 0x804cd94 *je 0x804bf28
jmp 0x804bf00
even_n: movl 0x18(%ebx),%ebp movl 0x18(%ebp),%esi
decl %ebp subl $0x1,%esi
cmpl $0x1,%ebp cmpl $0x1,%esi
movl %esi,0x80541c4 2
*je 0x804cd50 *je 0x804bf28
jmp 0x804bf14
odd_n: movl 0x80541c4,%esi 3
decl %ebp subl $0x1,%esi
movl %ebp,0x18(%ebx) movl %esi,%edi 4
movl %edi,0x18(%ebp)
jmp 0x804ccd4 jmp 0x804be9c
1. calculate gcState.frontier + 24
2. %esi -> RI(1)
3. RI(1) -> %esi
4. SI(24) = RI(3)
Does the time difference between the programs seem reasonable? The
essential differences seem to be the two unconditional jumps and the
save and restore of RI(1).