[MLton] Patch ready

Wesley W. Terpstra wesley at terpstra.ca
Sat Feb 10 20:24:40 PST 2007

On Feb 10, 2007, at 1:48 AM, Wesley W. Terpstra wrote:
> Attached is a patch implementing Wide* for MLton. I've not yet  
> written WideTextIO / WidePrimIO.

Another patch. This time without any regressions compared to svn/ 
HEAD. May I commit it to HEAD?

The new regression/widechar.sml reveals another bug in the compiler.  
There's a type-check failure in the SSA pass:
> Type error: Ssa.TypeCheck.coerce
> {from = word32 vector, to = word8 vector}
> Type error: analyze raised exception unhandled exception: TypeError
> unhandled exception: TypeError
I have no idea where to begin to debug this.

> The first bug is that on a powerpc, the output reads:
>> \U22530000\U43120000X
> ... so the \u and \U parsing in MLton is endian backwards. I'm not  
> sure where this code lives.

I've "fixed" this bug. I think. It worked fine on Intel machines,  
just on PowerPC the endian is wrong. The problem only applied to  
STRINGS---characters are fine. eg:
> val x : WideString.string = "X\103\290\u0067\u4312\U00000067 
> \U00004312\U00104312"
> val y : WideChar.char = #"\U00000067"
> val z : WideString.string = WideString.str y
> val xs = WideString.toString x
> val ys = WideChar.toString y
> val zs = WideString.toString z
> val () = print (xs ^ " - " ^ ys ^ " - " ^ zs ^ "\n")
on PowerPC, this outputs:
\U12431000 - g - g
on Intel, this outputs:
Xg\u0122g\u4312g\u4312\U00104312 - g - g
on Intel with the C codegen, this outputs:
Xg\u0122g\u4312g\u4312\U00104312 - g - g

Looking in the test.0.c file on PowerPC reveals:
Vector ("X\000\000\000g\000\000\000\"\001\000\000g\000\000\000\022C 
\000\000g\000\000\000\022C\000\000\022C\020\000", 4, 215, 8)
... which is clearly little-endian.

I'm guessing from looking at c-codegen.fun that the problem is  
WordXVector.toString not respecting the endian. That lead me to
>        (Vector.fold (elements, [], fn (w, ac) =>
>                      let
>                         fun loop (i, w, ac) =
>                            if i = 0
>                               then ac
>                            else
>                               let
>                                  val (q, r) = IntInf.quotRem (w,  
> 0x100)
>                               in
>                                  loop (i - 8, q,
>                                        Char.fromInt (IntInf.toInt  
> r) :: ac)
>                               end
>                      in
>                         loop (n, WordX.toIntInf w, ac)
>                      end)))
in atoms/word-x-vector.fun, which looks to be brainlessly little- 
endian. I've changed it in the patch, but would really appreciate it  
if someone looked this over! I don't know how widely used this method  
is, and perhaps now I'm double converting the endian of other vectors.

> The other bug is that MLton is leaking polymorphism of string types.
> This bug is not specific to my changes; svn MLton does this too.

I'm using svn/HEAD now, and this still applies.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlton-unicode-v2.patch
Type: application/octet-stream
Size: 67541 bytes
Desc: not available
Url : http://mlton.org/pipermail/mlton/attachments/20070211/b1a8ed4e/mlton-unicode-v2-0001.obj
-------------- next part --------------

More information about the MLton mailing list