MLton 20241230

XML is an IntermediateLanguage, translated from CoreML by Defunctorize, optimized by XMLSimplify, and translated by Monomorphise to SXML.

Description

XML is polymorphic, higher-order, with flat patterns. Every XML expression is annotated with its type. Polymorphic generalization is made explicit through type variables annotating val and fun declarations. Polymorphic instantiation is made explicit by specifying type arguments at variable references. XML patterns can not be nested and can not contain wildcards, constraints, flexible records, or layering.

Implementation

Type Checking

XML also has a type checker, used for debugging. At present, the type checker is also the best specification of the type system of XML. If you need more details, the type checker (type-check.sig, type-check.fun), is pretty short.

Since the type checker does not affect the output of the compiler (unless it reports an error), it can be turned off. The type checker recursively descends the program, checking that the type annotating each node is the same as the type synthesized from the types of the expressions subnodes.

Details and Notes

XML uses the same atoms as CoreML, hence all identifiers (constructors, variables, etc.) are unique and can have properties attached to them. Finally, XML has a simplifier (XMLShrink), which implements a reduction system.

Types

XML types are either type variables or applications of n-ary type constructors. There are many utility functions for constructing and destructing types involving built-in type constructors.

A type scheme binds list of type variables in a type. The only interesting operation on type schemes is the application of a type scheme to a list of types, which performs a simultaneous substitution of the type arguments for the bound type variables of the scheme. For the purposes of type checking, it is necessary to know the type scheme of variables, constructors, and primitives. This is done by associating the scheme with the identifier using its property list. This approach is used instead of the more traditional environment approach for reasons of speed.

XmlTree

Before defining XML, the signature for language XML, we need to define an auxiliary signature XML_TREE, that contains the datatype declarations for the expression trees of XML. This is done solely for the purpose of modularity — it allows the simplifier and type checker to be defined by separate functors (which take a structure matching XML_TREE). Then, Xml is defined as the signature for a module containing the expression trees, the simplifier, and the type checker.

Both constructors and variables can have type schemes, hence both constructor and variable references specify the instance of the scheme at the point of references. An instance is specified with a vector of types, which corresponds to the type variables in the scheme.

XML patterns are flat (i.e. not nested). A pattern is a constructor with an optional argument variable. Patterns only occur in case expressions. To evaluate a case expression, compare the test value sequentially against each pattern. For the first pattern that matches, destruct the value if necessary to bind the pattern variables and evaluate the corresponding expression. If no pattern matches, evaluate the default. All patterns of a case statement are of the same variant of Pat.t, although this is not enforced by ML’s type system. The type checker, however, does enforce this. Because tuple patterns are irrefutable, there will only ever be one tuple pattern in a case expression and there will be no default.

XML contains value, exception, and mutually recursive function declarations. There are no free type variables in XML. All type variables are explicitly bound at either a value or function declaration. At some point in the future, exception declarations may go away, and exceptions may be represented with a single datatype containing a unit ref component to implement genericity.

XML expressions are like those of CoreML, with the following exceptions. There are no records expressions. After type inference, all records (some of which may have originally been tuples in the source) are converted to tuples, because once flexible record patterns have been resolved, tuple labels are superfluous. Tuple components are ordered based on the field ordering relation. XML eta expands primitives and constructors so that there are always fully applied. Hence, the only kind of value of arrow type is a lambda. This property is useful for flow analysis and later in code generation.

An XML program is a list of toplevel datatype declarations and a body expression. Because datatype declarations are not generative, the defunctorizer can safely move them to toplevel.