[MLton] Unicode / WideChar

Henry Cejtin henry.cejtin@sbcglobal.net
Mon, 21 Nov 2005 09:38:40 -0600


Ok, that definitely makes sense (although I think of UTF-8 as being, by
definition, the UTF-8 encoding of unicode codepoints).
I am sure that you are correct that byte-wise little-endian sorting of UTF-8
encoded strings gives the same result as sorting the represented unicode
codepoint streams.  That is a nice coincidence, but it is the latter that is
the definition of string comparison.  (I.e., that is the definition I want.)
That is what I desparately want String.<= etc. to reflect.

If one instead wants to sort a bunch of bytes (a Word8.Vector.vector), then,
of course, if those bytes are an encoding of (unicode) characters, then you
have to specify the encoding used to get the comparison to match string
comparison (UTF-8, UTF-16, what ever).

As you say, none of this connects to what order the locale would put things
in.  I don't (usually) care about this and don't want things like String.<=
to care either.