[MLton] RE: card/cross map in heap
Matthew Fluet
fluet at tti-c.org
Tue Aug 19 15:52:38 PDT 2008
On Thu, 17 Jul 2008, Matthew Fluet wrote:
> The ideal solution, especially for a situation like yours, where you are
> happy to use lots of memory on a dedicated machine, is to use
> @MLton fixed-heap 3.5G -- to grab as large a heap as you can (that
> comfortably fits in physical memory) at the begining of the program and never
> bother resizing it. As I understand it, resizing is only to 'play nice' with
> other processes in the system.
>
> The problem with fixed-heap, though, is that the runtime starts off trying to
> use the Cheney-copy collector (so, it really grabs 1/2 * 3.5G) and it may be
> some time before it is forced to use the mark-compact collector, and it is
> only at that point that the runtime will try to grab the 3.5G. Since
> fixed-heap affects the desiredSize (but not the minSize), you really need to
> set fixed-heap to the size that is actually able to be allocated, so that
> desiredSize == currentSize, and no resizing occurs.
Some thoughts that have occured to me.
First, I remarked earlier that we would sometimes like to know "if I unmap
this memory, can I mmap this size?" While there is no general way of
answering this question, I believe that (for the tight memory situations
where it becomes an issue) we have some useful information around. In
particular, after paging the heap to disk, if the subsequent createHeap is
forced to back off, then we have a reasonable upper bound on the amount of
memory we should ever ask for in the future. In particular, because we
have paged the heap to disk, we know that we have freed up as much memory
as we possible can. If mmap can't satisfy our request in this situation,
then we might have gone over the size of a contiguous mapping in our
address space. If that is the case, then there is no need to subsequently
page the heap to disk and try to allocate a larger heap -- we've already
got as large a heap as we can get. (Given that, we may want createHeap to
use a finer-grained backoff when used after paging the heap to disk; that
would really find the largest sized heap.)
Of course, the other reason mmap may fail is that the operating system
virtual memory mangager can't currently satisfy our request along with the
outstanding requests of other processes. That is, mmap may fail because
the MM won't over commit the physical/swap pages. It is possible that a
subsequent mmap will succeed, if in the intervening time, other processes
have given up memory.
That seems to suggest the following policy:
- record a mmapMaxHeap statistic.
This statistic is updated whenever createHeap is forced to backoff when
no extra memory is being used (that is, at the initial createHeap and a
createHeap after the heap has been paged to disk).
- take mmapMaxHeap into account when resizing the heap.
In particular, don't let the desiredSize exceed the mmapMaxHeap; it is
better to stick with a heap that is at mmapMaxHeap size than to try
paging to disk.
There is one exception to this policy. If minSize > mmapMaxHeap, then
we should allow the runtime to page the heap to disk and try to mmap
the desired memory. This helps handle the case that we saw an
mmapMaxHeap because of memory pressure in the system.
Technically, it is possible that an inability of mmap to satisfy a minSize
request could be a temporary situation, due to memory pressure from other
processes. One could always wait and try again in this situation.
However, anyone running a program that needs >2.5G memory probably knows
better than to run other high-memory processes at the same time and/or
provides a decent swap file/partition. So, I suspect that it is fairly
safe to assume that mmap failing to satisfy a minSize request corresponds
to a hard limit in the virtual address space for a contiguous map, and
thus corresponds to a true out-of-memory situation.
The second thought is that I wonder if the heap would be easier to
predict/control if we used one contiguous heap at all times (that is, even
for major copying collections). In particular, it would be nice to have
the behavior described above for fixed-heap --- namely, that one could use
fixed-heap to grab a large block of memory at the beginning of the program
and there would be no subsequent resizing, even if the runtime switched
over from major copying collections to major mark-compact collections. I
note that the original Samson91 paper
(http://mlton.org/References#Sansom91) works with a single contiguous
heap. I'm not sure that I understand the advantage of the current
implementation, where the secondaryHeap is managed separately, potentially
created and released a number of times during the execution of a program.
More information about the MLton
mailing list