[MLton] Unicode / WideChar

Florian Weimer fw@deneb.enyo.de
Mon, 21 Nov 2005 09:19:05 +0100


> In particular UTF-8 will sort correctly with any 8 bit clean
> bytewise sort, as will UCS-2be and UCS-4be. Little endian
> representations needs a wordsize aware sort.

The trouble is that UCS-2 is virtually extinct by now.  UTF-16 is the
replacement, and sorting that representation lexicographically
(potentially after byte-swapping) does not result in the codepoint
order!

This means that your claim

> Only 3 sort algorithms are required to handle all cases:

is quite wrong.