<div class="gmail_quote">On Fri, Oct 15, 2010 at 3:12 PM, David Hansel <span dir="ltr"><<a href="mailto:hansel@reactive-systems.com">hansel@reactive-systems.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Thanks for that information. Just one more question<br>
(more out of curiosity than anything else): is there<br>
a technical reason that 64-bit MLton does NOT use the<br>
FPU?</blockquote><div><br>Well, I think Matthew considered the SSE2 instructions superior to the x87 instructions. SSE2 is required on all 64-bit machines.<br><br>You'd have to ask Matthew for the details, but my understanding is that registers are easier to work with than a register stack. You can use traditional register allocation algorithms. Since the FPU has 8 slots in the stack and SSE2 on 64-bit has 16, you also get double the "registers" plus "random access". Plus, you can perform your calculations in 32-bit or 64-bit math in every step; not needing to worry about the extra precision. <br>
<br>Personally, I'd love to see 128-bit floating point using SSE2 registers; then the x87 would be completely obsolete.<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Or is this just a case of "nobody has implemented<br>
that yet"? </blockquote><div><br></div></div>Well, the amd64 and x86 codegens are very similar. You could probably mostly cut-and-paste the x86 FPU instructions out of the x86 codegen with very little trouble.<br><br>
A perhaps better question might be: how hard would it be to port the SSE2 math to i386. ;) Most modern 32-bit processors also have SSE2. Then you could have -ieee-fp sse2 on i386 to get the same effect as -ieee-fp true, but even faster than -ieee-fp false (but requiring SSE2 support, of course).<br>