[MLton] WideChar

Henry Cejtin henry@sourcelight.com
Fri, 10 Dec 2004 17:15:33 -0600


With regards to the exception vs. NONE, I guess that I think of there being a
function, UTF8decode or something, which is given a string (really it  should
be bytes, but this is a convenience) and it returns a wide string.  It should
have an exception (probably a new one) which is raised when the input is  not
legal  UTF8.   Then the UTF8-scanners are just the wide scanner composed with
UTF8decode.  The logic for it being an exception is that  the  input  is  not
just  not  an  integer  (or  what  ever)  but  really nonsense.  (Of course I
understand that here convenience is probably more important than any  logical
argument, but I don't see them in conflict.)

As  to  the  tables,  note that most characters NEVER occur in a lex spec, or
else only occur in a large range, so the initial transformation (wide char to
equivalence  class)  has  a  rather small range (dozens to 100).  I certainly
never did this for Unicode, so I just used a flat array, but  I  would  think
that  some really simple compression would be enough here.  (I'm sure that we
are saying the same thing.)