[MLton] Unicode... again

Mon Feb 12 12:52:04 PST 2007

Wesley W. Terpstra wrote:
> On Feb 12, 2007, at 8:20 PM, Matthew Fluet wrote:
>>> - For the time being I choose to ignore the basis' claim that "in 
>>> WideChar, the functions toLower, toLower, isAlpha,..., isUpper and, 
>>> in general, the definition of a ``letter'' are locale-dependent" and 
>>> raise an Unimplemented exception for these methods. I think the 
>>> standard is dreadfully misguided in assuming a global locale, and I 
>>> defer what to do here till later as it is what blocked my progress 
>>> last time. (IMO these functions have only questionable use, anyway)
>>
>> Not to dismiss any of the thought and work already done, but I'm curious
>> why another 'obvious' interpretation of WideChar hasn't been explored. 
>> That is, why don't we take WideChar as an (admittedly brain-dead) 
>> wrapping of functions defined in <wchar.h>.  These descriptions of 
>> these functions seem to match the Basis Library descriptions, in that 
>> they have a notion of the current locale.  Admittedly, WideChar 
>> wouldn't provide access to changing the locale (the setlocale 
>> function), but this would seem consistent with other portions of the 
>> SML Basis Library that provides just a thin veneer over corresponding 
>> POSIX functions.
> 
> We could do that. My definitions of the is* methods are place-holders. I 
> consider these methods worse than useless; I'd rather they simply didn't 
> exist. Since they do exist, mapping them to iswalpha, iswalnum, etc. 
> might be ok... as long as these are portably available. 

Indeed, the idea being that while we all know they are broken, at least 
they are broken in the same way as they are broken in C.

 > Still, I'd
> rather try to use the locale independent character classes specified by 
> Unicode. 

Right, and since that is going to include a lot more functions than the 
CHAR.is* functions, they will live in their own structure.