[MLton] WideChar?

Henry Cejtin henry@sourcelight.com
Wed, 8 Dec 2004 15:48:10 -0600


First,  I must agree strongly with Wesley that the whole locale thing in C is
a complete disaster.  It is certainly an incredible pain, but much worse than
that  is the fact that when ever two processes communicate (either via a file
or over some pipe or socket), they can't count on sharing  the  same  locale.
This means that they won't agree on any important properties.

With  regards  to strings, I agree that the standard seems very explicit that
you can only include space through tilde in them, but that strikes  me  as  a
real  mistake.   It  means  that  programs  that want to communicate in other
characters are forced to use only the unreadable hex codes  for  them.   Even
disallowing them in variable names is rather strict, but more defendable.

Still,  I  guess that there isn't much that can be done without trying to get
people to agree to actually updating the official standard.

Can't we just dump locales entirely and have narrow chars be ASCII  (or  ISO-
Latin-1) and wide chars be straight unicode with all external stuff in UTF-8?

With regard to the question on Int.scan not needing to handle wide chars,  it
seems to me that your suggestion would be that I would have to use
    <convert from char reader to LargeChar reader> (Int.scan StringCvt.DEC)
but  that  would  raise an exception on funny characters instead of returning
NONE.