[MLton] Re: [Sml-basis-discuss] Unicode and WideChar support
Tue, 29 Nov 2005 10:51:10 -0600
The lexer doesn't generate strings. The input is assumed to be 8-bit
(i.e., type char) and one can specify 7-bit, 8-bit, and UTF-8
the character stream (ML-lex only supports 7-bit and 8-bit).
On Nov 29, 2005, at 10:30 AM, Geoffrey Alan Washburn wrote:
> Aaron Turon wrote:
>> I have been working with John Reppy on a (largely) backwards-
>> compatible replacement for ML-lex. The new tool is based on
>> Brzozowski's notion of regular expression derivatives, making
>> it easy to support boolean operations on REs such as intersection
>> and negation. Code generation is not finalized, but will most
>> likely be control-flow-based (one function per state, with tail
>> calls) rather than table-based. We have designed the tool to
>> support unicode. I hope to have an initial version out for testing
>> some time next month -- please feel free to send mail with
>> suggestions or requests.
> This would be great. In the past to handle some ad-hoc uses of
> UTF-8 in my parsers I've had to build a custom
> version of ml-lex with CharSetSize >129.
> Though given that there isn't yet an agreed upon Basis module
> for Unicode what does your lexer generate in terms of strings?
> -- [Geoff Washburnemail@example.com|http://www.cis.upenn.edu/