[MLton] unicode
Matthew Fluet
fluet@cs.cornell.edu
Fri, 9 Sep 2005 13:16:09 -0400 (EDT)
> I've found this by accident:
>
> http://srfi.schemers.org/srfi-75/
>
> How is the state of the art for the support of Unicode in sml,
> especially mlton?
There is no real support for Unicode in the Definition of Standard ML;
there are a few throw-away sentences stating things along the lines of
"ASCII must be a subset of the character set in programs", but that hardly
constitues support.
Neither is there real support for Unicode in the Standard ML Basis
Library. The general consensus (which includes the opinions of the
editors of the Basis Library) is that the LargeChar structure is
insufficient for the purposes of Unicode.
MLton has some preliminary support for 16 and 32 bit characters and
strings. It is even possible to include arbitrary Unicode characters in
32-bit strings using a \Uxxxxxxxx escape sequence. (This longer escape
sequence is a minor extension over the Definition which only allows
\uxxxx.) This is by no means completely satisfactory in terms of support
for Unicode, but it is what is currently available.
There are periodic flurries of questions and discussion about Unicode in
SML/MLton. The most recent, which did lead to some seemingly sound design
decisions, was last December:
The discussion started at:
http://mlton.org/pipermail/mlton/2004-December/026396.html
Stephen posted a good summary of points at:
http://mlton.org/pipermail/mlton/2004-December/026440.html
and the discussion continued.