[MLton] WideChar

Wesley W. Terpstra terpstra@gkec.tu-darmstadt.de
Sat, 11 Dec 2004 02:05:23 +0100


On Fri, Dec 10, 2004 at 07:40:28PM -0500, Adam Goode wrote:
> Case conversion is not too bad (in the simple case), and the
> UnicodeData.txt file gives these "simple" case mappings. Most characters
> don't have a notion of a case mapping, including the characters with 
> ord x < 128. 
> 
> Right now, toUpper and toLower return the character unchanged if it
> doesn't have a corresponding mapping. Shouldn't this just be the
> behavior for the WideChar functions?

I suppose that makes sense.
So, you would leave ß as ß when converting toUpper in WideChar?

> In general, you don't worry that much about locales when you are working
> at the level of individual characters. But there are exceptions. For
> example, the letter I. In Turkish, U+0069 LATIN SMALL LETTER I (i) maps
> to U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE (İ), not to LATIN
> CAPITAL LETTER I (I). And U+0049 LATIN CAPITAL LETTER I (I) maps to U
> +0131 LATIN SMALL LETTER DOTLESS I (ı). Yikes.

That is nasty.
So, why not put these things in the locale and provide the Unicode
defined classes and conversions in the WideChar.is* .toUpper/Lower?

So, you get Unicode behaviour in WideChar+Char, but if you want
locale-correct conversions for German or Turkish, you use:

signature LOCALE =
  sig
    ...
    val toLower: t -> char -> string
    val isUpper: t -> char -> bool
  end

More suggestions?

-- 
Wesley W. Terpstra <wesley@terpstra.ca>