[MLton] The PackWord performance regression on the 64-bit branch
Matthew Fluet
fluet at cs.cornell.edu
Tue Nov 28 12:51:01 PST 2006
>> On the 64-bit branch MLton, the generated machine code (from the previous
>> SML code) is
>>
>> sall $2,%esi
>> addl (globalWord32+((3*4)+(0*1))),%esi
>> movzbl (0+((0x0*1)+(0*1)))(%esi),%edi
>> cmpl $0x0,%edi
>> jl L_1255
>> L_841:
>> incl %esi
>> shll $0x8,%edi
>> movzbl (0+((0x0*1)+(0*1)))(%esi),%edx
>> cmpl $0x0,%edx
>> jl L_1254
>> L_840:
>> orl %edi,%edx
>> incl %esi
>> shll $0x8,%edx
>> movzbl (0+((0x0*1)+(0*1)))(%esi),%edi
>> cmpl $0x0,%edi
>> jl L_1253
>> L_839:
>> orl %edi,%edx
>> incl %esi
>> shll $0x8,%edx
>> movzbl (0+((0x0*1)+(0*1)))(%esi),%edi
>> cmpl $0x0,%edi
>> jl L_1252
>> L_838:
>> orl %edx,%edi
>> movl %edi,(0+((64*1)+(0*1)))(%ebp)
>> movl $0x1,(0+((60*1)+(0*1)))(%ebp)
>> addl $60,%ebp
>> movl $L_837,(0+(-1*4))(%ebp)
>>
>> and it looks like a bug in the optimizer.
>
> Indeed; I would expect you to get code that is just like that in trunk.
I checked in a modification to the x86_64 branch basis library
implementation that avoids this redundant test. (In particular, we do not
need to check for Overflow when Word<N>.wordSize is less than
Int.precision.)
However, it would be nice to extend the optimizer to pick up these cases.
The redundantTests optimization is the right place to start, but it isn't
really doing complicated test propagation and test implication.
Essentially, redundantTests looks for Word_equal and Word_lt comparisons
that are dominated by tests that determine the outcome of the comparison,
in which case the comparison can be eliminated. The limitations of
redundantTests are two-fold:
1) Only explicit comparisons (i.e., Word_equal and Word_lt) are used to
insert facts into the dominator tree
2) A fact in the dominator tree may eliminate a comparison iff the
comparison (or its negation) is syntactically identical to a fact.
In the above situation, we have SSA code that looks like:
...
global_3: word32 = 0x0
...
x_212: word8 = Pointer_getWord8 (x_213, global_3)
x_211: word32 = WordU8_toWord32 (x_212)
x_210: bool = WordS32_lt (x_211, global_3)
case x_210 of
true => L_84 | false => L_83
...
The zero-extension (WordU8_toWord32) could introduce the facts:
0 <=U x_211, 0 <=S x_211, x_211 <=U 255, x_211 <=S 255
(and, possibly, x_211 <U 256, x_211 <S 256)
This would suffice to eliminate the WordS32_lt test, since in this
situation the corresponding negated fact (0 <=S x_211) is syntactically
equal to one of the facts introduced above.
However, it would be nice to eliminate tests which are not syntactically
equal, but are logically implied; that is, consider transitive
relationships and other implications (like x <=S 10 implies x <=S 20).
Similar in spirit to the zero- and sign-extension facts, we could also
introduce facts at Word_quot and Word_rem statements.
More information about the MLton
mailing list