[MLton] Henry's comments on the User Guide

Tue, 10 Feb 2004 04:20:58 -0800

> In the `Installation section on page 3 the missing leading `/'  in  the  file
>     names  is confusing.  I know that this is to indicate that it is relative
>     to where you install MLton, but it isn't clear.  Also the statement  that
>     it  installs  `in root' is confusing.  It installs things under root, but
>     not at the top level.

I changed the text to

	MLton runs on a variety of platforms and is distributed in
	both source and binary form. The format for the binary package
	depends on the platform. The binary package will install under
	/usr or /usr/local, depending on the platform. If you install
	MLton somewhere else, you must set the lib variable in the
	bin/mlton script to the directory that contains the libraries
	(/usr/lib/mlton by default).

> The example on  page  3  of  -link-opt  is  bad  because  unless  you  change
>     /etc/ld.so.conf, /usr/lib is looked in by default.

What example directory would you suggest?

> The `supports the full SML 97 language' paragraph at the top of page 5 should
>     mention (or at least refer) to the known deviations of MLton

I added a reference to the "bugs" section.

> In the `complete basis library' I think that it is worth mentioning  that  we
>     track  the new spec and that we include many optional things like IntInf.
>     (Perhaps just my prejudice.)

Done.

> In the `excellent running times' section on page 5, I don't  think  that  the
>     shootout web page is alive any more.

Yeah, I dropped it.

> In  the  `unboxed  native  arrays'  section  on  page  5,  it  might be worth
>     mentioning here that monomorphic arrays are just arrays in MLton.

Done.

> In the `runtime system supports large arrays' on page 5, can we really handle
>     arrays  (of  1,  8  or  16  bit  objects  I  assume)  up to 2^31 - 1 now?
>     Excellent.

Yes, I added this a while back.  The only limitations now are RAM size
and address space fragmentation.  I would love to see some testing of
very large arrays.  Here's the extent of my testing :-)

open Array
val a = array (valOf Int.maxInt div 2, #"a")
val _ = update (a, 0, #"b")
val _ = if sub (a, 0) = sub (a, 1)
	   then raise Fail "bug"
	else ()

> In the `standalone executables' section on page 5, you want to say:
> 
>     You don't need any thing except the a.out and `standard' shared libraries
>         by  default.   (Here standard means not coming from MLton and already
>         on most systems.)
> 
>     You can  get  away  with  just  the  a.out  because  MLton  can  generate
>         statically linked executables if desired.

I changed it to

	MLton generates standalone executables.  No additional code or
	libraries are necessary in order to run an executable, except
	for the standard shared libraries.  MLton can also generate
	statically linked executables.

> In  the  `signal handlers' section on page 6, the use of the word `thread' is
>     confusing.   It  isn't  something  like  a  POSIX  thread,  but  a  MLton
>     construction.   I  don't  know what exactly to say, but as is it makes it
>     seem that MLton has `real' threads, which it does not.

I changed it to say "MLton thread" instead of thread.

	MLton supports signal handlers written in SML.  Signal
	handlers run in a separate MLton thread, and have access to
	the thread that was interrupted by the signal.  Signal
	handlers can be used in conjunction with threads to implement
	preemptive multitasking.

I also changed the thread bullet point to read

	MLton has support for its own threads, upon which either
	preemptive or non-preemptive multitasking can be implemented.
	At some point in the future, MLton will support CML.

> In the first paragraph of `Compile-time options' on page 8, what about .a and
>     .so files?

These are not allowed as files on the command line.  You can use
-link-opt to link with such files.

> In  the  `-cc-opt option' section on page 8, doesn't the option also get used
>     for .c (and .s) files even in the native mode?

Only for .c files.  You'r right about native.  I've changed the text
to:

	Pass the option to gcc when compiling C code.

> In the `-export-header' section on page 8 you should mention that setting  it
>     to true not only outputs a C header file, it also stops compilation after
>     doing so.  I.e., it does NOT do the compile. 

Done.

>     Actually that choice  seems
>     a  bit strange since if compile time is huge then you have to do it twice
>     (although I assume that the true case is quicker).

The true case only has to do elaboration, so it is fast enough.

> In the `-inline' section on page 8 you have to at least say  something  about
>     the  units, even if it is only that they are arbitrary.  At the moment it
>     doesn't even say that the threshold is a size threshold, or an very rough
>     estimate of that.

I changed the text to

	Set the inlining threshold used in the optimizer.  The
	threshold is an approximate measure of code size of a
	procedure.  The default is 320.

> In the `-runtime' section on page 9, you don't mention in discussing multiple
>     uses that it is the LAST value of any parameter  which dominates.

I added

	If the same runtime switch occurs more than once, then the
	last setting will take effect.

>     Also
>     you  have  to  say that command line arguments (via @MLton) are processed
>     AFTER -runtime ones so that the result of -runtime can be overridden.

That was already there.  The text says that the -runtime argument
"will be processed before other @MLton command line switches".

>     Speaking of these multiple things, it is completely wrong  and  bad  that
>     you can use multiple
>         @MLton ... --
>     in  a  single  run.  (Discussed on page 10 in section 4.2 `Runtime system
>     options'.)  That means that it is absolutely impossible to call  a  MLton
>     executable  with  a  first  argument  of `@MLton' and a later argument of
>     `--'.  In particular, it is not possible to pass  arguments  to  a  MLton
>     executable  unless  you KNOW that they are NOT going to contain `@MLton'.
>     If you only allow 0 or 2 `@MLton's then I can wrap a MLton executable  in
>     a shell script:
>         exec mlton-executable @MLton -- "$@"
>     which will guarantee that any command line arguments are simply passed to
>     the actual ML code and not eaten by the runtime system.

I see the problem.  I think the best solution is add a runtime switch,
"stop", which causes the runtime to stop once it reaches the next
"--".  That way, you can even compile an executable with "-runtime
stop" and the executable won't process any @MLton arguments.  Or, if
you want to do a shell script you can do

	exec mlton-executable @MLton stop -- "$@"

> In  the  `-show-basis'  and  `show-basis-used'  section  on  page  9,  change
>     `displays'  to  `prints  to standard output' to be more clear and in sync
>     with -export-header.

I made them all consistently use "print".

>  Also it is  rather  confusing  because  -show-basis
>     causes the types to be displayed (but NOT the basis library itself) while
>     -show-basis-used just lists the things used, but not their types.

I changed them to use the same layout routine, so that all now
display the types.

>     In connection with this, I would love to  have  an  option  which  caused
>     MLton  to  write out the types for my code (like -show-basis does for the
>     basis).  It would be a very convenient place to get  a  summary  of  some
>     code.

Done.  Try "mlton -show-basis true foo.sml".

>     Note,  for  both  of these options and also -export-header, it might make
>     more sense to have them accept a file name  instead  of  the  true/false.
>     This  would  allow  them  to be combined.  Right now it seems that if you
>     turn both -show-basis and -show-basis-used on then  only  the  latter  is
>     done with no error message.

It's a mistake do two (or more) things that print to standard output.
So, I think they should stay true/false, but should report an error if
more than one is on.  I've added a check that at most one of the
following is defined.

	-export-header
	-show-basis
	-show-basis-used

> In  the  `-no-load-world'  section on page 11 it is worth mentioning that the
>     reason for this is just for set-uid programs.

Done.  I wonder if no-load-world is almost unneeded now that one can
use -runtime stop?

> In the `-ram-slop' section on page 11 it is worth mentioning  that  x  should
>     probably  be no more than 1 and that making it strictly less than 1 is to
>     account for space used by the OS and other programs running at  the  same
>     time.

Done.

> The  fact  that  _import  and _export introduce phrases which are expressions
>     makes the choice of a trailing semicolon very bad.  This must mean  that,
>     for example, in
>         val z = _import "foo": real * char -> int;
>     the  semicolon  is  part  of  the  expression, right? 

Yes.

>     I understand the need to have a terminator for these expressions  because
>     they would otherwise end with a type which would make parsing tricky,

Right.

>     but
>     it seems that some other item would be much better.  How about
>         _import "foo": [real * char -> int]
>     or use `{' and `}' or a trailing `end'.

Instead of [] or {}, how about requiring the type to be parenthesized?
That works, parsing wise.  I wouldn't really mind "end" either,
although I think I would prefer parens.

But, I don't really see the problem with the current approach.  ";" is
no less ambiguous than "end", or parens for that matter.  The drawback
of changing is that it breaks old code, and that we can't support both
old and new (since the parser can't handle it).  I don't see that the
benefit of the change outweighs the drawback.

>     All of this really became apparent in the example in the `Calling from  C
>     to SML' section on page 12:
> 
>         _export "foo": real * char -> int;
>            (fn (x, c) => 13 + Real.floor x + Char.ord c)
> 
>     Yikes.

Yeah, that was a bit too cute. I rewrote it the the way I always write
exports.

val e = _export "foo": real * char -> int;
val _ = e (fn (x, c) => 13 + Real.floor x + Char.ord c)

Here's a few other changes I made in response to the marked-up
hardcopy that you sent.

* mlprof now only displays a call graph when called with -call-graph
  true

* Yes, LargeWord = Word64, not Word32.

* For #line directives, the file name can not contain *).  If it does,
  then MLton will (correctly) end the comment.  For example, the code

	(*line 13.1 "foo*)"*)

  will cause an unclosed string error, because the first *) ends the
  comment, causing the second " to start a string.  So, I don't think
  there is any incompatibility with the definition regarding #line
  directives.

* I changed the type of MLton.Exn.topLevelHandler to "exn -> 'a".

* I changed the text describing of MLton.Pointer.getX (p, i) to read

	returns the object stored at index i aof the array of X
	objects pointed to by p.  For example, getWord32 (p, 7)
	returns the 32-bit word stored 28 bytes beyond p.

* MLton.ProcEnv.setenv doesn't require its arguments to be null
  terminated.  I added this to the documentation.

* The MLton.Process.spawn{,e,p} functions were patterned on the
  corresponding Posix.Process.exec{,e,p} functions.  I did change the
  arguments to be named records instead of tuples, which may have been
  a mistake.  I could change it back so that they look like the exec
  counterparts.  However, the record field names were taken straight
  from the basis library documentation.  I don't think path is such a
  bad name for the argument to spawn or spawne, since it is givin a
  full path to an executable.  The only reason I didn't add

    val MLton.Process.spawnpe: 
      {file: string, args, string list, env: string list} -> pid

  is that there is no corresponding Posix.Process.execpe.  I could add
  MLton.Process.{exec,spawn}pe if needed.

* I clarified the semantics of MLton.Profile when compiling -profile
  no.

* I think it is better to return NONE than raise an exception when
  /dev/{,u}random can't be read from, as will always happen on
  Cygwin. That way, a programmer must explicitly decide what to do.

* I made lots of changes to MLton.Signal.  See the latest user guide
  and the latest MLton structure.