[MLton] WideChar?

Wesley W. Terpstra terpstra@gkec.tu-darmstadt.de
Thu, 9 Dec 2004 02:01:25 +0100

On Wed, Dec 08, 2004 at 06:15:25PM -0600, Henry Cejtin wrote:
> As to exceptions vs. returning NONE, I think that if I was going from,
> say, a UTF-8 file, then I REALLY want an exception if the bytes are not
> legal UTF-8. On the other hand, I really want NONE if I apply some
> Int.scan to a wide string.

Hrm. These are valid points.

> [ \u can't do != 4 nibbles ]

True for many many reasons.

> I  still  don't  get the need for any thing other than 1 byte characters
> (ord 0-255) and 4 byte characters.  I.e., we have ASCII/ISO-Latin-1 or
> else we have unicode.

Well, most useful Unicode fits into the BMP -- Basic Multilingual Plane.
Saving 2* of the storage is sometimes important. WideChar = all Unicode.
There doesn't seem to me any harm in providing both, like Int11.

Keep in mind that Unicode only reaches to 10FFFF, ie: less than 6 nibbles.
It's really quite wasteful to throw 4 bytes at it. :-(

Actually, that suggests that maybe I should make WideChar = Word21. That
would be enough and might allow MLton to do cool representation tricks.

Wesley W. Terpstra <wesley@terpstra.ca>