[MLton] Re: [Sml-basis-discuss] Unicode and WideChar support
Geoffrey Alan Washburn
geoffw@cis.upenn.edu
Tue, 29 Nov 2005 11:30:36 -0500
This is a multi-part message in MIME format.
--------------090805090107030003020101
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Aaron Turon wrote:
> I have been working with John Reppy on a (largely)
> backwards-compatible replacement for ML-lex. The new tool is based on
> Brzozowski's notion of regular expression derivatives[1], making it
> easy to support boolean operations on REs such as intersection and
> negation. Code generation is not finalized, but will most likely be
> control-flow-based (one function per state, with tail calls) rather
> than table-based.
>
> We have designed the tool to support unicode. I hope to have an
> initial version out for testing some time next month -- please feel
> free to send mail with suggestions or requests.
>
This would be great. In the past to handle some ad-hoc uses of
UTF-8 in my parsers I've had to build a custom
version of ml-lex with CharSetSize >129.
Though given that there isn't yet an agreed upon Basis module for
Unicode what does your lexer generate in terms of strings?
--
[Geoff Washburn|geoffw@cis.upenn.edu|http://www.cis.upenn.edu/~geoffw/]
--------------090805090107030003020101
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content=3D"text/html;charset=3DUTF-8" http-equiv=3D"Content-Type"=
>
<title></title>
</head>
<body bgcolor=3D"#ffffee" text=3D"#000000">
Aaron Turon wrote:<br>
<blockquote
cite=3D"midaac3b4680511290739m5c81bacexe32fb0fe1dae6f1c@mail.gmail.com"
type=3D"cite">
<pre wrap=3D"">I have been working with John Reppy on a (largely)
backwards-compatible replacement for ML-lex. The new tool is based on
Brzozowski's notion of regular expression derivatives[1], making it
easy to support boolean operations on REs such as intersection and
negation. Code generation is not finalized, but will most likely be
control-flow-based (one function per state, with tail calls) rather
than table-based.
We have designed the tool to support unicode. I hope to have an
initial version out for testing some time next month -- please feel
free to send mail with suggestions or requests.
</pre>
</blockquote>
=C2=A0=C2=A0=C2=A0 This would be great.=C2=A0 In the past to handle some =
ad-hoc uses of
UTF-8 in my parsers I've had to build a custom<br>
version of ml-lex with=C2=A0 CharSetSize >129.=C2=A0 <br>
<br>
=C2=A0=C2=A0=C2=A0 Though given that there isn't yet an agreed upon Basis=
module for
Unicode what does your lexer generate in terms of strings?=C2=A0 <br>
<br>
<pre class=3D"moz-signature" cols=3D"72">--=20
[Geoff Washburn|<a class=3D"moz-txt-link-abbreviated" href=3D"mailto:geof=
fw@cis.upenn.edu">geoffw@cis.upenn.edu</a>|<a class=3D"moz-txt-link-freet=
ext" href=3D"http://www.cis.upenn.edu/~geoffw/">http://www.cis.upenn.edu/=
~geoffw/</a>]
</pre>
</body>
</html>
--------------090805090107030003020101--