[MLton] Unicode / WideChar

Mon, 21 Nov 2005 16:35:21 +0100

* Henry Cejtin:

> I am confused: using UTF-8 or UTF-16, or what ever encoding for unicode
> characters shouldn't effect the ordering of strings or characters at all.
> I.e., what ever encoding is used, characters (not bytes) are compared
> by comparing their unicode value, right?

Most of the systems which use UTF-16 now have been deployed during the
"64K is enough for everyone" period.  Back then, surrogates where not
really considered, and the comparison function was defined
accordingly:

<http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#compareTo(java.lang.String)>

(charAt returns a single 16-bit quantity.)

Later on, this mistake couldn't be corrected without breaking
backwards compatibility, so this wart is still there.