[MLton] Parsing bug?

Matthew Fluet fluet@cs.cornell.edu
Mon, 6 Dec 2004 11:50:28 -0500 (EST)

> As far as I can see, the SML/NJ version gets it right. MLton doesn't.
> Even more disturbingly, MLton seems sensitive to how the input file is fed.
> If you run './train < train.input' you will get no output.
> If you run './train' and cut-and-paste in the input, you get:
> "klj"
> (5, 7)
> ... ie: it ignores the first line.
> Am I mistaken, or is this a bug?

It is a bug in the implementation of TextIO.getInstream.  For efficiency,
MLton implements an "imperative" ImperativeIO.  When realizing the
StreamIO.instream (as needed by scanStream), we empty the imperative
buffer into the StreamIO.instream, but we were forgetting to mark the
instream as a "stream" instream rather than an "imperative" instream.

If the StreamIO.instream were handed off to a scanner that succeeded with
an updated stream, we were o.k., because we remembered the updated stream.
But, if the scanner failed (as in the case of scanPairs on your input),
then we lost the whole input.  The next scan forced new input.

This explains why there was different behavior with the input file.  When
reading from console stdin, the OS gives us a line at a time, so only the
first line of input was lost.  When reading from a file stdin, we take 4K
at a time, which was the whole of your input; so we lost everything.