[MLton] WideChar?

Stephen Weeks MLton@mlton.org
Wed, 8 Dec 2004 14:58:46 -0800


> With  regards  to strings, I agree that the standard seems very explicit that
> you can only include space through tilde in them, but that strikes  me  as  a
> real  mistake.   It  means  that  programs  that want to communicate in other
> characters are forced to use only the unreadable hex codes  for  them.   Even
> disallowing them in variable names is rather strict, but more defendable.
> 
> Still,  I  guess that there isn't much that can be done without trying to get
> people to agree to actually updating the official standard.

As I mentioned, using \u escapes for large characters in strings will
break portability with all other SML implementations, so it doesn't
seem any worse (in practice) to allow UTF-8.  The same argument
applies to UTF-8 variable names, but isn't enough to convince me to
allow that, since there are even bigger drawbacks.

The Definition and Basis Library don't address internationalization,
so we need to allow more leeway with these than we normally do, just
as we did with FFI, MLBs, ...

> With regard to the question on Int.scan not needing to handle wide chars,  it
> seems to me that your suggestion would be that I would have to use
>     <convert from char reader to LargeChar reader> (Int.scan StringCvt.DEC)
> but  that  would  raise an exception on funny characters instead of returning
> NONE.

True.  One could have a converter that returned NONE instead of
raising an exception.  Perhaps that would be clearer.  My point is
that there is no need to add functionality to the existing scanners.