[MLton] Unicode / WideChar
Florian Weimer
fw@deneb.enyo.de
Mon, 21 Nov 2005 09:19:05 +0100
> In particular UTF-8 will sort correctly with any 8 bit clean
> bytewise sort, as will UCS-2be and UCS-4be. Little endian
> representations needs a wordsize aware sort.
The trouble is that UCS-2 is virtually extinct by now. UTF-16 is the
replacement, and sorting that representation lexicographically
(potentially after byte-swapping) does not result in the codepoint
order!
This means that your claim
> Only 3 sort algorithms are required to handle all cases:
is quite wrong.