On Wed, Oct 14, 2009 at 11:56 PM, Matthew Fluet <span dir="ltr"><<a href="mailto:matthew.fluet@gmail.com">matthew.fluet@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>I'm hardly an expert. I used the <a href="http://www.x86-64.org" target="_blank">www.x86-64.org</a> document to implement<br></div>
the C calling convention in the native codegen, but didn't peruse it<br>
much otherwise.<br></blockquote><div><br>Nice link, thanks.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Searching for "align" in the document, though, reveals that on page<br>
12, it declares that {,signed,unsigned} {,long} long all have 8-byte<br>
alignment.</blockquote><div><br>Ok, that table is pretty clear. The ABI defines that Word64s must be 8-byte aligned. Therefore gcc was within it's rights to assume that the pointer was 8-byte aligned and the bug was ours.<br>
</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> However, on the next page it states: </blockquote><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
Like the Intel386 architecture, the AMD64 architecture in general<br>
does not re-<br>
quire all data accesses to be properly aligned. Misaligned data<br>
accesses are slower<br>
than aligned accesses but otherwise behave identically. The only<br>
exceptions are<br>
that __m128 and __m256 must always be aligned properly.<br></blockquote><div><br>This is not a contradiction. Architecture != ABI. The machine can do it, but the ABI forbids it.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
So, it isn't clear to me that one really needs to 8-align 64-bit integers.<br></blockquote><div><br>If we want to link with any other application code ... libc, libgmp, ffi, .... then it's 100% clear we need to do 8-byte alignment. We have just been lucky that no other software actually made use of the 8-byte alignment guarantee until now (since few architectural limitations actually trip over an ABI violation).<br>
<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
In the next subsection (p. 13), on aggregates and unions, it states:<br>
<br>
An array uses the same alignment as its elements, except that a<br>
local or global array variable of length at least 16 bytes or a C99 variable-length<br>
array variable always has alignment of at least 16 bytes.<br></blockquote><div><br>I think by global/local arrays they mean arrays not in the heap but the data segment. (local = static int64_t foo[4];, global = extern int64_t foo[4];)<br>
<br>At any rate, this sounds like we don't need to worry because MLton only passes arrays as pointers (both FFI and GMP limb structure).<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I did want to also point out that there is a legacy issue, I would<br>
assume, on Debian. Since mlton-20070826 is dynamically linked against<br>
libgmp, isn't it just an incredible luck of the draw that a<br>
self-compile with mlton-20070826 didn't happen to produce a<br>
non-16-byte aligned IntInf array.<br>
</blockquote></div><br>Yes, I was surprised too. However there are a couple reasons this worked out. First, the only code gcc managed to vectorize in the gmp C is the MPN_ZERO method. Second, the only place MPN_ZERO gets called (for us) is to clear the low bits of a left-shifted intinf. Third, it won't use 16-byte writes unless there are 16-bytes to write, so it had to be a >=128-bit left shift. I wonder if these maybe didn't happen in 20070826?<br>
<br>I imagine that as gcc gets smarter, vectorizing more code, this will become a more serious legacy issue.<br><br>