x86 performance
Matthew Fluet
fluet@research.nj.nec.com
Wed, 9 Aug 2000 12:57:59 -0400 (EDT)
> The C compiler uses leal as cheap 3-address arithmetic while the x86
> version uses a move followed by an add constant. Note, the C
> compiler way is 1 instruction, and it is only 3 bytes long. The x86
> version is 2 instructions and 5 bytes long. Also C code is
> absolutely filled with loads and stores at short offsets from a
> register (either because the register is a pointer to a struct or the
> stack) so I am sure that this addressing mode is very fast. This
> could be a big difference.
Turns out this is even trickier than one might first imagine. It was very
easy to set up the limit check points to use leal instead of movl/addl
when the requested bytes is a non-zero constant. (On a zero-constant, we
just compare the frontier and the limit with no intermediate calculation.
Which raises an interesting point -- in that non-allocating loop, there is
a check for 24 additional bytes at each entry; probably for the
continuation where I print out the result, but it looks like it got pushed
too far back into the loop.) Now the tops of the loops look like:
x86-codegen: spy-ed .s
0x804be60: leal 0x18(%esp,1),%esi leal (24*1)(%esp),%esi
0x804be64: cmpl 0x8054288,%esi cmpl (gcState+8),%esi
c-codegen:
0x804ccd4: leal 0x18(%esi),%eax leal 24(%esi),%eax
0x804ccd7: cmpl 0x8053888,%eax cmpl gcState+8,%eax
Notice that the x86 leal is 4 bytes and the c-codegen leal is 3 bytes. It
appears (although I can't find this in the documentation for either x86
addressing or GNU assembler), that using %esp automatically incurs a scale
value and an additional byte of instruction. Sort of annoying,
particularly since it's nice to be able to use %esp as a general purpose
register.