[MLton] Patch ready + change list for basis library
Wesley W. Terpstra
wesley at terpstra.ca
Fri Feb 9 16:48:57 PST 2007
Attached is a patch implementing Wide* for MLton. I've not yet
written WideTextIO / WidePrimIO.
I made/assumed these changes to the basis definition:
1. CHAR.ord says "returns the (non-negative) integer code point of
the character c in Unicode."
2. CHAR now says "The Char structure provides characters taken from
the ISO-8859-1 repertoire and locale-independent operations on them"
3. In CHAR delete the sentence "In WideChar, the functions toLower,
toLower, isAlpha,..., isUpper and, in general, the definition of a
``letter'' are locale-dependent."
4. Overview: added WideTextIO :> TEXT_IO -- it was missing despite
the TEXT_IO signature requiring it
I think WideTextIO is a bit pointless. I can't imagine that someone
really wants to write 4-byte characters out in host-specific endian
order. Whatever. I'll implement it after this patch is commited for
In preparing this patch I've discovered two other bugs, both
demonstrated by this program:
> val y : WideString.string = WideString.str (WideChar.chr 88)
> val x : WideString.string = WideString.^ ("\u5322\u1243", y)
> val s = WideString.toString x
> val () = print (s ^ "\n")
> val bug : WideChar.char vector = y
> val bug : Char.char vector = "asfasf"
The first bug is that on a powerpc, the output reads:
... so the \u and \U parsing in MLton is endian backwards. I'm not
sure where this code lives.
The other bug is that MLton is leaking polymorphism of string types.
This bug is not specific to my changes; svn MLton does this too.
I'm still running the regressions, no problems so far, but my ppc is
Any improvements to the patch?
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 51335 bytes
Desc: not available
Url : http://mlton.org/pipermail/mlton/attachments/20070210/8659a8a1/mlton-unicode-v1-0001.obj
More information about the MLton