[MLton] Re: [Sml-basis-discuss] Unicode and WideChar support
Aaron Turon
adrassi@gmail.com
Tue, 29 Nov 2005 09:39:42 -0600
> If you want to add Unicode support, then you have a working WideChar/
> String. Decoding UTF-8 into a WideChar is about 10-20 lines, so
> that's not much additional effort either. The real work is getting
> MLlex to support such a large character set. However, that's only
> needed for Unicode-enabled SML compilers.
I have been working with John Reppy on a (largely)
backwards-compatible replacement for ML-lex. The new tool is based on
Brzozowski's notion of regular expression derivatives[1], making it
easy to support boolean operations on REs such as intersection and
negation. Code generation is not finalized, but will most likely be
control-flow-based (one function per state, with tail calls) rather
than table-based.
We have designed the tool to support unicode. I hope to have an
initial version out for testing some time next month -- please feel
free to send mail with suggestions or requests.
Best,
Aaron
[1] Derivatives of Regular Expressions, Janusz A. Brzozowski, Journal
of the ACM, Volume 11, Issue 4, 1964.