[MLton] WideChar
Wesley W. Terpstra
terpstra@gkec.tu-darmstadt.de
Sat, 11 Dec 2004 02:05:23 +0100
On Fri, Dec 10, 2004 at 07:40:28PM -0500, Adam Goode wrote:
> Case conversion is not too bad (in the simple case), and the
> UnicodeData.txt file gives these "simple" case mappings. Most characters
> don't have a notion of a case mapping, including the characters with
> ord x < 128.
>
> Right now, toUpper and toLower return the character unchanged if it
> doesn't have a corresponding mapping. Shouldn't this just be the
> behavior for the WideChar functions?
I suppose that makes sense.
So, you would leave ß as ß when converting toUpper in WideChar?
> In general, you don't worry that much about locales when you are working
> at the level of individual characters. But there are exceptions. For
> example, the letter I. In Turkish, U+0069 LATIN SMALL LETTER I (i) maps
> to U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE (İ), not to LATIN
> CAPITAL LETTER I (I). And U+0049 LATIN CAPITAL LETTER I (I) maps to U
> +0131 LATIN SMALL LETTER DOTLESS I (ı). Yikes.
That is nasty.
So, why not put these things in the locale and provide the Unicode
defined classes and conversions in the WideChar.is* .toUpper/Lower?
So, you get Unicode behaviour in WideChar+Char, but if you want
locale-correct conversions for German or Turkish, you use:
signature LOCALE =
sig
...
val toLower: t -> char -> string
val isUpper: t -> char -> bool
end
More suggestions?
--
Wesley W. Terpstra <wesley@terpstra.ca>