[MLton] Patch ready
Wesley W. Terpstra
wesley at terpstra.ca
Sat Feb 10 20:24:40 PST 2007
On Feb 10, 2007, at 1:48 AM, Wesley W. Terpstra wrote:
> Attached is a patch implementing Wide* for MLton. I've not yet
> written WideTextIO / WidePrimIO.
Another patch. This time without any regressions compared to svn/
HEAD. May I commit it to HEAD?
The new regression/widechar.sml reveals another bug in the compiler.
There's a type-check failure in the SSA pass:
> Type error: Ssa.TypeCheck.coerce
> {from = word32 vector, to = word8 vector}
> Type error: analyze raised exception unhandled exception: TypeError
>
> unhandled exception: TypeError
I have no idea where to begin to debug this.
> The first bug is that on a powerpc, the output reads:
>> \U22530000\U43120000X
> ... so the \u and \U parsing in MLton is endian backwards. I'm not
> sure where this code lives.
I've "fixed" this bug. I think. It worked fine on Intel machines,
just on PowerPC the endian is wrong. The problem only applied to
STRINGS---characters are fine. eg:
> val x : WideString.string = "X\103\290\u0067\u4312\U00000067
> \U00004312\U00104312"
> val y : WideChar.char = #"\U00000067"
> val z : WideString.string = WideString.str y
>
> val xs = WideString.toString x
> val ys = WideChar.toString y
> val zs = WideString.toString z
> val () = print (xs ^ " - " ^ ys ^ " - " ^ zs ^ "\n")
on PowerPC, this outputs:
\U58000000\U67000000\U22010000\U67000000\U12430000\U67000000\U12430000
\U12431000 - g - g
on Intel, this outputs:
Xg\u0122g\u4312g\u4312\U00104312 - g - g
on Intel with the C codegen, this outputs:
Xg\u0122g\u4312g\u4312\U00104312 - g - g
Looking in the test.0.c file on PowerPC reveals:
Vector ("X\000\000\000g\000\000\000\"\001\000\000g\000\000\000\022C
\000\000g\000\000\000\022C\000\000\022C\020\000", 4, 215, 8)
... which is clearly little-endian.
I'm guessing from looking at c-codegen.fun that the problem is
WordXVector.toString not respecting the endian. That lead me to
> (Vector.fold (elements, [], fn (w, ac) =>
> let
> fun loop (i, w, ac) =
> if i = 0
> then ac
> else
> let
> val (q, r) = IntInf.quotRem (w,
> 0x100)
> in
> loop (i - 8, q,
> Char.fromInt (IntInf.toInt
> r) :: ac)
> end
> in
> loop (n, WordX.toIntInf w, ac)
> end)))
in atoms/word-x-vector.fun, which looks to be brainlessly little-
endian. I've changed it in the patch, but would really appreciate it
if someone looked this over! I don't know how widely used this method
is, and perhaps now I'm double converting the endian of other vectors.
> The other bug is that MLton is leaking polymorphism of string types.
> This bug is not specific to my changes; svn MLton does this too.
I'm using svn/HEAD now, and this still applies.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlton-unicode-v2.patch
Type: application/octet-stream
Size: 67541 bytes
Desc: not available
Url : http://mlton.org/pipermail/mlton/attachments/20070211/b1a8ed4e/mlton-unicode-v2-0001.obj
-------------- next part --------------
More information about the MLton
mailing list