discussion of X86 floating point on comp.lang.ml
Stephen Weeks
MLton@sourcelight.com
Thu, 19 Oct 2000 10:44:55 -0700 (PDT)
Matthew, I don't know if you follow comp.lang.ml, so just in case, you =
will
probably find the following article interesting.
http://x53.deja.com/threadmsg_md.xp?thitnum=3D1&mhitnum=3D4&CONTEXT=3D97=
1977159.1300693019&new=3D1&AN=3D683376389.1&uniq=3D971977204.1300758563
Subject:
Re: Team PLClub ICFP entry --
comparing the performance of
OCAML and SML
Date:
10/19/2000
Author:
Xavier Leroy
<Xavier.Leroy@see.my.sig.for.address>
=20
<< previous =B7 next >>=20
Allen Leung <leunga@cs.nyu.edu> writes:
=20
> Actually, the SML/NJ backend currently uses the ``wrong''
framework for
> FP register allocation on the x86. Instead of using the FP stack
registers
> as registers, it uses them only as temporaries for evaluation
expressions.
> Virtual registers are actually placed on the (memory) stack. =20
=20
OCaml does exactly the same, and I believe this is actually the
``right'' framework for FP on the x86 -- at least the one intended
by the designers of the Pentium and Pentium Pro/II/III. For
instance, loading from / storing to a register deep in the FP
register stack is nearly as expensive as loading from / storing to
the memory stack, provided it is in L1 cache.
=20
I did various experiments with using the FP register stack as real
registers, and it did not improve performance w.r.t. the
simple-minded strategy you describe above.
=20
But it is true that better FP performance will be obtained by using
the SSE2 extension (announced on the latest Pentiums and on
AMD's x86-64 processor), which at last provide "real"
floating-point registers.
=20
> So there is a huge penalty with FP intensive loops, compared to
using the
> ``right'' framework. How many of these benchmarks are FP
intensive? The
> performance of SML/NJ may have something to do with the RA.=20
=20
As I said, OCaml uses the same framework as SML/NJ here, and
this doesn't prevent it from outperforming SML/NJ by a good
factor on FP-intensive stuff. So, the explanation of SML/NJ's
performance is to be found elsewhere.
=20
- Xavier Leroy