[MLton] latest MLton segfault in gmp
Wesley W. Terpstra
wesley at terpstra.ca
Sat Oct 10 14:10:22 PDT 2009
On Sat, Oct 10, 2009 at 10:27 PM, Wesley W. Terpstra <wesley at terpstra.ca>wrote:
> I've tried compiling with -align 8 and then it works... I'm not sure this
> is a solution, though; it may have just masked the problem.
>
Found the smoking gun! Debian builds gmp with -O3 whereas I used -O2 for
MinGW32. If you look at the assembler output of mpz/mul_exp.c with the two
options you will notice a difference... the introduction of a 'movdqa'
instruction, which is an SSE2 instruction that expects 16-byte alignment.
>From what I've read, an array of 64-bit words should be 64-bit aligned.
MLton IntInfs are such arrays and must thus be 8-byte aligned. They aren't.
Here's the problem vectorized assembler from gcc with -O3 (I've marked the
problem code):
.LVL16:
andl $15, %eax
shrq $3, %rax
^^^^^^^^^^^ This ignores the 4-byte alignment of the array, only caring
about it's 8-byte alignment before it moves on to doing 16-byte aligned
moves.
cmpq %r12, %rax
cmova %r12, %rax
testq %rax, %rax
je .L10
.LBB2:
cmpq %rax, %r12
movq $0, (%r14)
leaq 8(%r14), %rdi
leaq -1(%r12), %rsi
je .L8
.L10:
movq %r12, %rbx
subq %rax, %rbx
movq %rbx, %rcx
shrq %rcx
movq %rcx, %r9
addq %r9, %r9
je .L16
pxor %xmm0, %xmm0
leaq (%r14,%rax,8), %r8
xorl %edx, %edx
.p2align 4,,10
.p2align 3
.L12:
.loc 1 64 0
movq %rdx, %rax
addq $1, %rdx
salq $4, %rax
cmpq %rcx, %rdx
movdqa %xmm0, (%r8,%rax)
^^^^^^^^^^^^^^^^^^^^^^^^^ At this point the memory MUST be 16-byte aligned,
but isn't if the input is 4-byte aligned +8 -> 12!=0 mod 16. This causes our
segfault.
jb .L12
subq %r9, %rsi
cmpq %r9, %rbx
leaq (%rdi,%r9,8), %rdi
je .L8
What's the plan going forward? align(AMD64) == 8?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mlton.org/pipermail/mlton/attachments/20091010/903bdf7d/attachment.htm
More information about the MLton
mailing list