[MLton-user] SML unicode support
Alexandre
Xlex0x835@rambler.ru
Wed, 5 Jan 2005 22:39:35 +0300
Ok, I understand my fault now.
So, if I it right now - unicode can be stored in C/C++ wchar_t - isn't
it?
If so, is it a problem to make SML interpreter to store characters in
wchar_t like container (I'm sorry, if the question is too lame)?
Regards,
/Alexandre.
P.S. To the maillist administrator: probably it is not comfortable only
for me, but usually (as far as I know) when I reply to the message my
mailer take maillist e-mail (in this example - mlton-user@mlton.org),
but not user, who post message. Or you make it especially to
"provocate" people to send message to the maillist & user (via "reply
all")?
On Jan 5, 2005, at 22:18, Henry Cejtin wrote:
> There is no way to casually handle UTF-8 (or even Unicode)
> characters in C.
> The encodings UTF-8 and UTF-16 do not store one character in 8 or 16
> bits.
> That would clearly not be possible because there are more than 256
> and even
> more than 65,536 Unicode characters. UTF-8 and UTF-16 are ways of
> encoding
> characters as COLLECTIONS of 8-bit bytes or 16-bit chunks.
> Not all
> characters will take the same number of bytes/chunks. UTF-32
> lets all
> characters be the same size (32-bits or 4 bytes) but no one stores
> them that
> way externally (in files) because of the large waste of space.
>
> The expectation is that files will be in UTF-8 or UTF-16 and on
> reading them
> they will be converted to something more convenient. (Note, if you
> store a
> string in UTF-8 itself, then you can't go to the N-th character
> without
> walking through all the previous characters to see how long they are.)
>
> _______________________________________________
> MLton-user mailing list
> MLton-user@mlton.org
> http://mlton.org/mailman/listinfo/mlton-user
>