slow matrix multiply
Henry Cejtin
henry@sourcelight.com
Tue, 10 Jul 2001 20:24:59 -0500
What is going on with the register dance in MLton:
movl (188*1)(%edi),%edx # %edx = i
movl %edx,%ecx # %ecx = i
movl %ecx,%eax # %eax = i
movl $30,%ecx # %ecx = 30
The middle 2 moves are clearly silly given the last instruction. Also if you
look at the code, %edx is dead at this point. Thus the above could have been
just
movl (188*1)(%edi),%eax
movl $30,%ecx
Also, didn't we conclude that the cltd before imull's served no purpose?
More importantly, and this is probably only doable if overflow checking is
off, the conversion of multiplies to fancy lea instructions is a big big win
on Intel chips.
The last of these is a big deal, but requires overflow detection going away,
but the shuffle surprises me.