MLton 20241230

This page contains brief explanations of some recurring sources of confusion and problems that SML newbies encounter.

Many confusions about the syntax of SML seem to arise from the use of an interactive REPL (Read-Eval Print Loop) while trying to learn the basics of the language. While writing your first SML programs, you should keep the source code of your programs in a form that is accepted by an SML compiler as a whole.

The and keyword

It is a common mistake to misuse the and keyword or to not know how to introduce mutually recursive definitions. The purpose of the and keyword is to introduce mutually recursive definitions of functions and datatypes. For example,

fun isEven 0w0 = true
  | isEven 0w1 = false
  | isEven n = isOdd (n-0w1)
and isOdd 0w0 = false
  | isOdd 0w1 = true
  | isOdd n = isEven (n-0w1)

and

datatype decl = VAL of id * pat * expr
           (* | ... *)
     and expr = LET of decl * expr
           (* | ... *)

You can also use and as a shorthand in a couple of other places, but it is not necessary.

Constructed patterns

It is a common mistake to forget to parenthesize constructed patterns in fun bindings. Consider the following invalid definition:

fun length nil = 0
  | length h :: t = 1 + length t
The pattern `h

t` needs to be parenthesized:

fun length nil = 0
  | length (h :: t) = 1 + length t

The parentheses are needed, because a fun definition may have multiple consecutive constructed patterns through currying.

The same applies to nonfix constructors. For example, the parentheses in

fun valOf NONE = raise Option
  | valOf (SOME x) = x

are required. However, the outermost constructed pattern in a fn or case expression need not be parenthesized, because in those cases there is always just one constructed pattern. So, both

val valOf = fn NONE => raise Option
             | SOME x => x

and

fun valOf x = case x of
                 NONE => raise Option
               | SOME x => x

are fine.

Declarations and expressions

It is a common mistake to confuse expressions and declarations. Normally an SML source file should only contain declarations. The following are declarations:

datatype dt = ...
fun f ... = ...
functor Fn (...) = ...
infix ...
infixr ...
local ... in ... end
nonfix ...
open ...
signature SIG = ...
structure Struct = ...
type t = ...
val v = ...

Note that

let ... in ... end

isn’t a declaration.

To specify a side-effecting computation in a source file, you can write:

val () = ...

Equality types

SML has a fairly intricate built-in notion of equality. See EqualityType and EqualityTypeVariable for a thorough discussion.

Nested cases

It is a common mistake to write nested case expressions without the necessary parentheses. See UnresolvedBugs for a discussion.

(op *)

It used to be a common mistake to parenthesize op * as (op *). Before SML'97, *) was considered a comment terminator in SML and caused a syntax error. At the time of writing, SML/NJ still rejects the code. An extra space may be used for portability: (op * ). However, parenthesizing op is redundant, even though it is a widely used convention.

Overloading

A number of standard operators (+, -, ~, *, <, >, …​) and numeric constants are overloaded for some of the numeric types (int, real, word). It is a common surprise that definitions using overloaded operators such as

fun min (x, y) = if y < x then y else x

are not overloaded themselves. SML doesn’t really support (user-defined) overloading or other forms of ad hoc polymorphism. In cases such as the above where the context doesn’t resolve the overloading, expressions using overloaded operators or constants get assigned a default type. The above definition gets the type

val min : int * int -> int

See Overloading and TypeIndexedValues for further discussion.

Semicolons

It is a common mistake to use redundant semicolons in SML code. This is probably caused by the fact that in an SML REPL, a semicolon (and enter) is used to signal the REPL that it should evaluate the preceding chunk of code as a unit. In SML source files, semicolons are really needed in only two places. Namely, in expressions of the form

(exp ; ... ; exp)

and

let ... in exp ; ... ; exp end

Note that semicolons act as expression (or declaration) separators rather than as terminators.

Stale bindings

Unresolved records

Value restriction

Type Variable Scope