serialization
Stephen Weeks
sweeks@wasabi.epr.com
Mon, 2 Aug 1999 01:30:10 -0700 (PDT)
This weekend, I implemented (de)serialization in MLton. Externally,
what's available is
val serialize: 'a -> Word8Vector.vector
val deserialize: Word8Vector.vector -> 'a
The implementation is about 250 lines in src/runtime/gc.c.
Right now there are two problems:
- It doesn't work if 'a is an arrow type.
- It isn't safe, in that if you feed deserialize a bogus
vector, unpredictable things may happen.
As to the arrow type problem, I propose to change the flow analysis as
follows:
- have one set of lambdas for each arrow type that is serialized
(recall that the flow analysis runs on SXML, so this
is a known finite number of sets)
- the result of deserialization to type t is the set for t
- insert a coercion at calls to serialize from the argument
set to the serialize set for that type
As to the safety problem, there are several possible solutions I have
thought of, none of which I am entirely happy with.
1. Build a predicate for each type that checks if a
Word8Vector.vector is a valid serialization of some
object of that type. Deserialize calls the predicate
before running.
2. Statically choose a random number r_t (say 128 bits) for
each type t. Prefix every serialized object of type t
with r_t. Deserialize checks the prefix before running.
3. Dynamically create serializer/deserializer pairs of
functions with the random approach of (2).
Here are some of the tradeoffs.
(1) is completely safe, but doesn't seem very easy to implement. The
predicates would have to be constructed (automatically) per
program. I don't think it can be written in Cps, so it would
have to be done at the Machine level. I am pretty sure the
information is reasonably accessible to the backend.
(2) is reasonably safe (i.e. there is some very low probability of
error). It is however not safe wrt malicious users who
purposely feed bad Word8Vectors.
(3) can be completely implemented in SML, given the primitives defined
above. However, it has the extreme disadvantage that two
MLton processes started separately, even from the same
executable, cannot communicate, since the random numbers are
chosen dynamically. This would seem to defeat one of the
major uses of serialization.
Any ideas?
BTW, along the way, I also fixed MLton.size so that it runs in time
proportional to the number of pointers in the object instead of having
to do a full GC.