[MLton] WideChar?
Henry Cejtin
henry@sourcelight.com
Wed, 8 Dec 2004 15:48:10 -0600
First, I must agree strongly with Wesley that the whole locale thing in C is
a complete disaster. It is certainly an incredible pain, but much worse than
that is the fact that when ever two processes communicate (either via a file
or over some pipe or socket), they can't count on sharing the same locale.
This means that they won't agree on any important properties.
With regards to strings, I agree that the standard seems very explicit that
you can only include space through tilde in them, but that strikes me as a
real mistake. It means that programs that want to communicate in other
characters are forced to use only the unreadable hex codes for them. Even
disallowing them in variable names is rather strict, but more defendable.
Still, I guess that there isn't much that can be done without trying to get
people to agree to actually updating the official standard.
Can't we just dump locales entirely and have narrow chars be ASCII (or ISO-
Latin-1) and wide chars be straight unicode with all external stuff in UTF-8?
With regard to the question on Int.scan not needing to handle wide chars, it
seems to me that your suggestion would be that I would have to use
<convert from char reader to LargeChar reader> (Int.scan StringCvt.DEC)
but that would raise an exception on funny characters instead of returning
NONE.