benchmarks for the release
Henry Cejtin
henry@sourcelight.com
Mon, 1 Apr 2002 02:01:00 -0600
In TextIO.inputLine, the following copies occur:
First the read system call copies the bytes into the buffer.
Second the inputLine code copies the next line in to a list.
Third the list is copied into an array (the final string).
I am not proposing changing the string representation, but note that the only
reason for the list is because you don't know how big the string is going to
be until you get to the newline. The alternative would be to keep a list of
the buffers which have the current line (usually just 1) and then copy
directly from them into the final string. This would require not re-using
buffers, but that is very cheap compared to copying the line into a list. In
the end it changes the number of copies from 3 to 2.
Note, an alternative would be to use some more efficient structure than a
list as the temporary space. Ignoring the horrible problems of thread
safety, you could have a fixed array which you copy bytes into until it
either overflows or you see the newline. This still costs you an extra copy,
but it isn't into a list. If the fixed array overflows then you go into
doubling, so the amortized cost is still only 1 extra copy