[MLton] Unicode / WideChar
Florian Weimer
fw@deneb.enyo.de
Mon, 21 Nov 2005 01:03:16 +0100
* Henry Cejtin:
> Yes, I really DO mean collation order. Like I said, it makes sense (although
> it isn't important to me very much) that locale's modify the output of sorted
> lists which are intended for human consumption. On the other hand, I OFTEN
> want things sorted so I can do unions or intersections. For this, it is
> importent thing is the agreement on the order. Currently, if I get output
> from some one else and want to operate on some data I have, I have to know
> what their locale is and set mine to the same. It is that extra dependency
> that is a horrible pain.
I share your pain. But it gets worse.
On systems which are predominantly UTF-16 (Windows, Java, Mono),
lexicographic order does not even match the lexicographic order
induced by the Unicode codepoints.
UTF-8 (which is the de-facto standard on UNIX) sorts by codepoints, so
I don't have to deal with that, but it's still an issue to think about.