[MLton] getting String.fromString to return NONE

Sun, 30 Jul 2006 13:01:54 -0700

> mlton 20041109 doesn't seem to want to ever return NONE from this
> function.  Rather it seems to be returning an acceptable prefix of the
> string I pass it.  For example
> 
>   String.fromString "4\t"
> 
> just returns SOME "4", and String.fromString "\n" returns SOME "". 
> 
> (As opposed to String.fromString "4\\t", which returns SOME "4\t", and
> String.fromString "\\n", which returns SOME "\n", as you might
> expect.)
>
> This return of a prefix doesn't seem called for by the Basis
> description, and it certainly makes it difficult to tell if the input
> is well-formed or not.  

Short answer: don't use String.fromString.  Long answer follows.

In my opinion, it is not a good idea to use String.fromString.  It's
specification is unclear and has changed over the years, and there is
much disagreement among implementations.

There was a discussion in September 2003 on the
basis-library-discussion list where we attempted to clear up some of
the issues.

  https://mailman.cs.uchicago.edu/mailman/private/sml-basis-discuss/2003-September/thread.html

  (Unfortunately, the list archives are closed to nonmembers.  I don't
   think there's any good reason for this, and you are welcome to send
   mail to the list administrator asking him to change the policy.)

On that thread, I pointed out the many differences between
implementations with the following table taking examples from the
spec (note that this was the implementations of almost three years
ago).

               spec           ML Kit  MLton  Mosml   Poly/ML  SML/NJ
               ----           ------  -----  ------  -------  ------
1. "\\q"       NONE                                  ""
2. "a\^D"      "a"            "a\^D"         "a\^D"           NONE
3. "a\\ \\\\q" "a"            NONE           NONE             NONE
4. "\\ \\"     ""             NONE    NONE   NONE    NONE     NONE
5. ""          ""                                    
6. "\\ \\\^D"  ""             "\^D"   NONE   "\^D"   NONE     NONE
7. "\\ a"      ""             NONE    NONE   NONE    NONE     NONE

Michael, here are your examples, with today's implementations.

               spec   Hamlet  ML Kit  MLton  Mosml            SML/NJ           
               -----  ------  ------  -----  -----            -------
8. "4\t"       "4"    "4"     "4\t"   "4"    "4\t"            NONE             
9. "\n"        ""     NONE    "\n"    ""     "\n"             NONE
10. "4\\t"     "4\t"  "4\t"   "4\t"   "4\t"  "4\t"            "4\t"
11. "\\n"      "\n"   "\n"    "\n"    "\n"   "\n"             "\n"

So, there's still a lot of differences.  I closed the thread from a
few years ago with the following note.

  Here is my current understanding of the proposed changes
  and clarifications.  I've added a couple of logical (I hope)
  consequences regarding Char.scan.

  1. Char.scan returns NONE upon double quote.
  2. String.scan stops upon double quote, and returns the output to that
     point.
  3. String.scan consumes format sequences.
  4. Char.scan consumes format sequences adjacent to a valid character.
     I mention this because of pathological character constants like
     #"a\ \", #"\ \a", #"\ \a\ \", and #"\ \\ \a".
  5. Char.scan does not consume format sequences if it is unable to
     produce a character.  Consider the invalid character constant 
     #"\ \".
  6. String.scan never returns NONE.

  The logic behind all of this is that Char.scan scans the character
  source as if the characters appeared following a #" in SML and that
  String.scan scans the character source as if the characters appeared
  following a " in SML.  If indeed this is the idea, it would be good to
  add something to the spec to this effect.

To me it all came down to this last paragraph, which gave a nice
intuition behind all the character and string scanning stuff.
However, that never made it into the spec, and there was no further
discussion.  Nevertheless, I put my proposal into MLton's basis
library later in 2003.

According my logic that String.fromString scans the characters as if
they appeared following a double quote in SML, it seems reasonable
that NONE would never be returned (the empty string is a valid prefix)
and that tabs and newlines would not be allowed, since they are not
allowed in SML strings (some implementations don't even agree on that:
Hamlet and the ML Kit allow tabs, and Hamlet allows newlines).  The
basis spec even as it stands says that

  The rule is that if any prefix of the input is successfully scanned,
  including an escaped formatting sequence, the functions returns some
  string. They only return NONE in the case where the prefix of the
  input cannot be scanned at all.

In any case, all this to say that you're much better off deciding what
you want and coding it yourself.  You might even grab portions of the
basis library implementation from the implementation that is closest
to what you want.