Matthew, I'm not sure if you're doing this optimization already, but in case you aren't I think that you can generated better code for limit check loops in the special case where the bytes requested is 0.