[MLton] getting String.fromString to return NONE
Stephen Weeks
sweeks@sweeks.com
Sun, 30 Jul 2006 13:01:54 -0700
> mlton 20041109 doesn't seem to want to ever return NONE from this
> function. Rather it seems to be returning an acceptable prefix of the
> string I pass it. For example
>
> String.fromString "4\t"
>
> just returns SOME "4", and String.fromString "\n" returns SOME "".
>
> (As opposed to String.fromString "4\\t", which returns SOME "4\t", and
> String.fromString "\\n", which returns SOME "\n", as you might
> expect.)
>
> This return of a prefix doesn't seem called for by the Basis
> description, and it certainly makes it difficult to tell if the input
> is well-formed or not.
Short answer: don't use String.fromString. Long answer follows.
In my opinion, it is not a good idea to use String.fromString. It's
specification is unclear and has changed over the years, and there is
much disagreement among implementations.
There was a discussion in September 2003 on the
basis-library-discussion list where we attempted to clear up some of
the issues.
https://mailman.cs.uchicago.edu/mailman/private/sml-basis-discuss/2003-September/thread.html
(Unfortunately, the list archives are closed to nonmembers. I don't
think there's any good reason for this, and you are welcome to send
mail to the list administrator asking him to change the policy.)
On that thread, I pointed out the many differences between
implementations with the following table taking examples from the
spec (note that this was the implementations of almost three years
ago).
spec ML Kit MLton Mosml Poly/ML SML/NJ
---- ------ ----- ------ ------- ------
1. "\\q" NONE ""
2. "a\^D" "a" "a\^D" "a\^D" NONE
3. "a\\ \\\\q" "a" NONE NONE NONE
4. "\\ \\" "" NONE NONE NONE NONE NONE
5. "" ""
6. "\\ \\\^D" "" "\^D" NONE "\^D" NONE NONE
7. "\\ a" "" NONE NONE NONE NONE NONE
Michael, here are your examples, with today's implementations.
spec Hamlet ML Kit MLton Mosml SML/NJ
----- ------ ------ ----- ----- -------
8. "4\t" "4" "4" "4\t" "4" "4\t" NONE
9. "\n" "" NONE "\n" "" "\n" NONE
10. "4\\t" "4\t" "4\t" "4\t" "4\t" "4\t" "4\t"
11. "\\n" "\n" "\n" "\n" "\n" "\n" "\n"
So, there's still a lot of differences. I closed the thread from a
few years ago with the following note.
Here is my current understanding of the proposed changes
and clarifications. I've added a couple of logical (I hope)
consequences regarding Char.scan.
1. Char.scan returns NONE upon double quote.
2. String.scan stops upon double quote, and returns the output to that
point.
3. String.scan consumes format sequences.
4. Char.scan consumes format sequences adjacent to a valid character.
I mention this because of pathological character constants like
#"a\ \", #"\ \a", #"\ \a\ \", and #"\ \\ \a".
5. Char.scan does not consume format sequences if it is unable to
produce a character. Consider the invalid character constant
#"\ \".
6. String.scan never returns NONE.
The logic behind all of this is that Char.scan scans the character
source as if the characters appeared following a #" in SML and that
String.scan scans the character source as if the characters appeared
following a " in SML. If indeed this is the idea, it would be good to
add something to the spec to this effect.
To me it all came down to this last paragraph, which gave a nice
intuition behind all the character and string scanning stuff.
However, that never made it into the spec, and there was no further
discussion. Nevertheless, I put my proposal into MLton's basis
library later in 2003.
According my logic that String.fromString scans the characters as if
they appeared following a double quote in SML, it seems reasonable
that NONE would never be returned (the empty string is a valid prefix)
and that tabs and newlines would not be allowed, since they are not
allowed in SML strings (some implementations don't even agree on that:
Hamlet and the ML Kit allow tabs, and Hamlet allows newlines). The
basis spec even as it stands says that
The rule is that if any prefix of the input is successfully scanned,
including an escaped formatting sequence, the functions returns some
string. They only return NONE in the case where the prefix of the
input cannot be scanned at all.
In any case, all this to say that you're much better off deciding what
you want and coding it yourself. You might even grab portions of the
basis library implementation from the implementation that is closest
to what you want.