[MLton] WideChar?
Wesley W. Terpstra
terpstra@gkec.tu-darmstadt.de
Thu, 9 Dec 2004 16:31:57 +0100
On Thu, Dec 09, 2004 at 10:00:31AM -0500, Matthew Fluet wrote:
> > As you say, the fact that a unicode code needs more than 4 nibbles is
> > really a problem. You cannot make the number of hex characters in \u
> > variable because then it is ambiguous (because you can't tell where the
> > character code ends). Always requiring 8 hex digits would really be
> > even more onerous than just the fact that you need to use \u at all.
>
> Again, for expedience, one might (gasp) extend the lexical defintion to
> allow \Uxxxxxxxx, which would let you write down any Unicode string.
> If you happen to fall in the low (plane / codepage / whatever terminology
> is correct), then you can use \uxxxx.
Sure, that's a good idea.
However, for the reasons I cited earlier, the \uxxxx and \Uxxxxxxxx methods
should only be used for non-typeable characters. Requiring people to look up
every letter of a string in a unicode table is not acceptable.
PS. If I make WideChar = Word21, will/could MLton pack arrays so they only
need 3 or less bytes per character? If so, then I see no need for a 2-byte
version of Unicode in memory. It would provide a simpler API.
--
Wesley W. Terpstra