[MLton] Re: [Sml-basis-discuss] Unicode and WideChar support
Tue, 29 Nov 2005 09:39:42 -0600
> If you want to add Unicode support, then you have a working WideChar/
> String. Decoding UTF-8 into a WideChar is about 10-20 lines, so
> that's not much additional effort either. The real work is getting
> MLlex to support such a large character set. However, that's only
> needed for Unicode-enabled SML compilers.
I have been working with John Reppy on a (largely)
backwards-compatible replacement for ML-lex. The new tool is based on
Brzozowski's notion of regular expression derivatives, making it
easy to support boolean operations on REs such as intersection and
negation. Code generation is not finalized, but will most likely be
control-flow-based (one function per state, with tail calls) rather
We have designed the tool to support unicode. I hope to have an
initial version out for testing some time next month -- please feel
free to send mail with suggestions or requests.
 Derivatives of Regular Expressions, Janusz A. Brzozowski, Journal
of the ACM, Volume 11, Issue 4, 1964.