[MLton-user] Destructive update
Matthew Fluet
fluet@cs.cornell.edu
Wed, 8 Feb 2006 09:55:37 -0500 (EST)
> 1. I compile all my programs with the command
>
> mlton -basis 1997 -verbose 1 -runtime 'ram-slop 0.4'
>
> The 1997 basis is so I can also develop with Moscow ML, mainly for its
> read-eval-print loop. The ram-slop is so I can comfortably run 2
> programs compiled like this on my machine without thrashing. However,
> one of my programs needs a lot of RAM, and so I invoke it with
>
> program @MLton ram-slop 0.8 -- arguments
>
> However, I've noticed that it only ever uses 40% of the machine's RAM,
> even though I strongly suspect it would use more if it could. Am I
> correctly overriding the ram-slop parameter that I set during
> compilation?
That is the correct way of overriding the ram-slop parameter. You can
confirm this by running your program with "@MLton gc-messages --". The
first message should be of the form
total RAM = 4,188,676,096 RAM = 3,350,941,696
where total RAM is the machine memory and RAM is the maximum memory that
will be used by the heap.
Various other factors can limit the actual ammount of memory used by the
program. For one, MLton's garbage collector requires a contiguous heap,
so we're at the mercy of malloc and mmap. Also, there is a preference to
keep the live ratio within certain bounds, so if a lot of garbage is
recovered at each garbage collection, then the heap won't necessarily grow
very large.
You can always try running with "@MLton fixed-heap ??? --", in which case
the total heap will be exactly the specified size (either split into two
semi-spaces for copying GC or one large space for mark-compact GC).
This is probably fine for non-interactive programs, where the relatively
long pause time to GC a large heap isn't a problem.
> 2. My ML programming style, if I can call it that, makes intensive use
> of dictionaries, which have the following signature
>
> new : ('a * 'a -> order) -> ('a,'b) dict
> peek : ('a,'b) dict -> 'a -> 'b option
> insert : ('a,'b) dict -> 'a * 'b -> ('a,'b) dict
> foldl : ('a * 'b * 'c -> 'c) -> 'c -> ('a,'b) dict -> 'c
>
> I use them like arrays with a polymorphic key, and much of the time
> they could safely be destructively updated without breaking my
> programs.
That's a fairly common programming style, using purely functional data
structures. It does happen to play into the live ratio discussed above:
using such data structures tends to produce a lot of garbage, because only
the latest instance of the dictionary is live. (On the other hand, this
is exactly the scenario that supports the generational hypothesis.)
> a) Should I be using some other container type? How is this problem
> usually solved in the source code of the MLton compiler?
The imperative features of ML can be put to good use to destructively
update a data structure. One of the most commonly used data structures in
the MLton compiler is a mutable HashSet:
http://mlton.org/cgi-bin/viewsvn.cgi/mlton/trunk/lib/mlton/basic/hash-set.sig?rev=4014
> b) Have you thought of doing a compiler analysis to see what data
> structures can be destructively updated? It's not my field of
> expertise, but I believe Clean-style uniqueness types can be used to
> do this, and I'm sure there are other methods that I'm ignorant of.
> Two nice things about this analysis are that (i) the programmer
> doesn't need to know anything about it, and (ii) if the analysis
> returns "I don't know" the compiler can just do what it would normally
> do.
It is certainly something that we've thought about. I believe the
compiler ILs are expressive enough to handle the transformation from a
single-threaded functional data structure into a mutable data structure.
Its finding a sound analysis that scales to all of the IL features that
isn't entirely clear.