[MLton] WideChar?
Wesley W. Terpstra
terpstra@gkec.tu-darmstadt.de
Thu, 9 Dec 2004 02:01:25 +0100
On Wed, Dec 08, 2004 at 06:15:25PM -0600, Henry Cejtin wrote:
> As to exceptions vs. returning NONE, I think that if I was going from,
> say, a UTF-8 file, then I REALLY want an exception if the bytes are not
> legal UTF-8. On the other hand, I really want NONE if I apply some
> Int.scan to a wide string.
Hrm. These are valid points.
> [ \u can't do != 4 nibbles ]
True for many many reasons.
> I still don't get the need for any thing other than 1 byte characters
> (ord 0-255) and 4 byte characters. I.e., we have ASCII/ISO-Latin-1 or
> else we have unicode.
Well, most useful Unicode fits into the BMP -- Basic Multilingual Plane.
Saving 2* of the storage is sometimes important. WideChar = all Unicode.
There doesn't seem to me any harm in providing both, like Int11.
Keep in mind that Unicode only reaches to 10FFFF, ie: less than 6 nibbles.
It's really quite wasteful to throw 4 bytes at it. :-(
Actually, that suggests that maybe I should make WideChar = Word21. That
would be enough and might allow MLton to do cool representation tricks.
--
Wesley W. Terpstra <wesley@terpstra.ca>