[MLton] Unicode / WideChar

Florian Weimer fw@deneb.enyo.de
Mon, 21 Nov 2005 01:03:16 +0100


* Henry Cejtin:

> Yes, I really DO mean collation order.  Like I said, it makes sense (although
> it isn't important to me very much) that locale's modify the output of sorted
> lists which are intended for human consumption.  On the other hand, I OFTEN
> want things sorted so I can do unions or intersections.  For this, it is
> importent thing is the agreement on the order.  Currently, if I get output
> from some one else and want to operate on some data I have, I have to know
> what their locale is and set mine to the same.  It is that extra dependency
> that is a horrible pain.

I share your pain.  But it gets worse.

On systems which are predominantly UTF-16 (Windows, Java, Mono),
lexicographic order does not even match the lexicographic order
induced by the Unicode codepoints.

UTF-8 (which is the de-facto standard on UNIX) sorts by codepoints, so
I don't have to deal with that, but it's still an issue to think about.