New MLton sped things up by 17%. The C version (with the #define's to speed up getc/putc) is 3.4 times faster than the new version and 4 times faster than the old. Sadly still a long way to go. (Also the new compiled code has 120K of instructions while the old one has 78K.)