[MLton-user] SML unicode support
Alexandre
Xlex0x835@rambler.ru
Wed, 5 Jan 2005 21:34:03 +0300
Probably...
But if so, I still have to questions:
-1. How else can casual unix programs handle utf-8 in char (using
locale hack)?
-2. Here (http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf), at
the 18 page (physical, 27 page on "paper") I find the following phrase:
"The Unicode Standard provides three distinct encoding forms for
Unicode characters, using 8-bit, 16bit, and 32-bit units. These are
correspondingly named UTF-8, UTF-16, and UTF-32."
So, if I understand it right, it mean, that: UTF-8 store one character
using 8 bit, UTF-16 - using 16 bit and UTF-32 - using 32 bits
(maximum). If so, C char type is Ok for that... Or I really confused?
If so, excuse me, but we (I & google) can not find enough information
about that... =/
Regards,
/Alexandre.
On Jan 5, 2005, at 21:20, Henry Cejtin wrote:
> You are confused. A C char certainly cannot hold an arbitrary UTF-8
> encoded
> character. The reason that your file copy worked is because at each
> stage
> the char variable had some PART of a UTF-8 character.
>
> _______________________________________________
> MLton-user mailing list
> MLton-user@mlton.org
> http://mlton.org/mailman/listinfo/mlton-user
>