[MLton] Re: [Sml-basis-discuss] Unicode and WideChar support
John Reppy
jhr@cs.uchicago.edu
Tue, 29 Nov 2005 11:08:47 -0600
I think that we'll have
val yytext : unit -> substring
where UTF-8 is used to encode unicode characters. We use substrings
to avoid
unnecessary copying and a function to be lazy about substring
creation (our assumption
is that compilers are better at eliminating unused local functions
than unused calls
to external functions that happen to be pure).
Note that Unicode support is not part of ml-lex compatibility mode.
- John
On Nov 29, 2005, at 10:56 AM, Geoffrey Alan Washburn wrote:
> John Reppy wrote:
>> The lexer doesn't generate strings. The input is assumed to be 8-
>> bit characters
>> (i.e., type char) and one can specify 7-bit, 8-bit, and UTF-8
>> interpretations of
>> the character stream (ML-lex only supports 7-bit and 8-bit).
> Okay, maybe I need to rephrase my question as: If you tell it
> you want to use UTF-8 for the input stream,
> what type does yytext (or the equivalent) have? Is it just string,
> possibly containing sequences of high-bit characters?
> -- [Geoff Washburn|geoffw@cis.upenn.edu|http://www.cis.upenn.edu/
> ~geoffw/]