On Sat, Oct 10, 2009 at 10:27 PM, Wesley W. Terpstra <span dir="ltr"><<a href="mailto:wesley@terpstra.ca">wesley@terpstra.ca</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="gmail_quote">I've tried compiling with -align 8 and then it works... I'm not sure this is a solution, though; it may have just masked the problem. <br></div></blockquote><div><br>Found the smoking gun! Debian builds gmp with -O3 whereas I used -O2 for MinGW32. If you look at the assembler output of mpz/mul_exp.c with the two options you will notice a difference... the introduction of a 'movdqa' instruction, which is an SSE2 instruction that expects 16-byte alignment.<br>
<br>From what I've read, an array of 64-bit words should be 64-bit aligned. MLton IntInfs are such arrays and must thus be 8-byte aligned. They aren't.<br><br>Here's the problem vectorized assembler from gcc with -O3 (I've marked the problem code):<br>
<br>.LVL16:<br> andl $15, %eax<br> shrq $3, %rax<br>^^^^^^^^^^^ This ignores the 4-byte alignment of the array, only caring about it's 8-byte alignment before it moves on to doing 16-byte aligned moves.<br>
cmpq %r12, %rax<br> cmova %r12, %rax<br> testq %rax, %rax<br> je .L10<br>.LBB2:<br> cmpq %rax, %r12<br> movq $0, (%r14)<br> leaq 8(%r14), %rdi<br> leaq -1(%r12), %rsi<br>
je .L8<br>.L10:<br> movq %r12, %rbx<br> subq %rax, %rbx<br> movq %rbx, %rcx<br> shrq %rcx<br> movq %rcx, %r9<br> addq %r9, %r9<br> je .L16<br>
pxor %xmm0, %xmm0<br> leaq (%r14,%rax,8), %r8<br> xorl %edx, %edx<br> .p2align 4,,10<br> .p2align 3<br>.L12:<br> .loc 1 64 0<br> movq %rdx, %rax<br> addq $1, %rdx<br>
salq $4, %rax<br> cmpq %rcx, %rdx<br> movdqa %xmm0, (%r8,%rax)<br>^^^^^^^^^^^^^^^^^^^^^^^^^ At this point the memory MUST be 16-byte aligned, but isn't if the input is 4-byte aligned +8 -> 12!=0 mod 16. This causes our segfault.<br>
jb .L12<br> subq %r9, %rsi<br> cmpq %r9, %rbx<br> leaq (%rdi,%r9,8), %rdi<br> je .L8<br><br>What's the plan going forward? align(AMD64) == 8?<br><br></div></div>