good in general, but bad nested loops
Matthew Fluet
mfluet@intertrust.com
Mon, 9 Jul 2001 22:10:19 -0700 (PDT)
> I see that we are comparing registers against 0 instead of doing testl of the
> register with itself. The compare is 3 bytes vs. 2 bytes for the testl (for
> the %ebp register), but it doesn't seem to make any speed differences in the
> tests I performed. Still, it is something to put in.
That's easy enough to add. It's a trivial peephole optimization to write;
or, if the majority of the cases are coming straight from Machine IL, I
can just do the right translation. I'll add it to my todo.
> I also saw that we were loading a register from memory, incrementing the
> register, and then storing it back into the same memory location. (This is
> with overflow detection off, otherwise there is a test for overflow between
> these two.)
I see the following loop with overflow detection off:
0x804b48a: mov 0xdc(%edi),%esp
0x804b490: cmp $0x0,%esp
0x804b493: je 0x804b730
0x804b499: dec %esp
0x804b49a: mov %esp,0xdc(%edi)
0x804b4a0: incl 0xd8(%edi)
0x804b4a6: jmp 0x804b48a
This is what I expected -- the increment of 0xd8(%edi) should happen in
memory. The decrement of 0xdc(%edi) could happen in memory, since we
don't modify %esp. But, since the corresponding memory location is
already in a register, I do the dec there and then move it back.
> Perhaps for these cases it really is just a matter of not having the relevant
> variables in real registers. Or maybe it is the 2 adjacent stores into
> memory. (I seem to recall that this was bad for the CPU to schedule.)
I'm working on carrying stack slots around loops in registers. For that
tight loop (even with overflow checking), I'm hoping that no memory
accesses will be necessary. But I'm still a little ways away from that
just yet.