[MLton] Unicode / WideChar
Florian Weimer
fw@deneb.enyo.de
Mon, 21 Nov 2005 16:35:21 +0100
* Henry Cejtin:
> I am confused: using UTF-8 or UTF-16, or what ever encoding for unicode
> characters shouldn't effect the ordering of strings or characters at all.
> I.e., what ever encoding is used, characters (not bytes) are compared
> by comparing their unicode value, right?
Most of the systems which use UTF-16 now have been deployed during the
"64K is enough for everyone" period. Back then, surrogates where not
really considered, and the comparison function was defined
accordingly:
<http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#compareTo(java.lang.String)>
(charAt returns a single 16-bit quantity.)
Later on, this mistake couldn't be corrected without breaking
backwards compatibility, so this wart is still there.