[MLton] WideChar
Henry Cejtin
henry@sourcelight.com
Fri, 10 Dec 2004 17:15:33 -0600
With regards to the exception vs. NONE, I guess that I think of there being a
function, UTF8decode or something, which is given a string (really it should
be bytes, but this is a convenience) and it returns a wide string. It should
have an exception (probably a new one) which is raised when the input is not
legal UTF8. Then the UTF8-scanners are just the wide scanner composed with
UTF8decode. The logic for it being an exception is that the input is not
just not an integer (or what ever) but really nonsense. (Of course I
understand that here convenience is probably more important than any logical
argument, but I don't see them in conflict.)
As to the tables, note that most characters NEVER occur in a lex spec, or
else only occur in a large range, so the initial transformation (wide char to
equivalence class) has a rather small range (dozens to 100). I certainly
never did this for Unicode, so I just used a flat array, but I would think
that some really simple compression would be enough here. (I'm sure that we
are saying the same thing.)