> There seems to be a minor increase in code size, but I'm not > precisely sure why. Could it be due to the fact that you used notInLoop instead of inLoop when deciding what to carry in registers? I think it wouldn't be too bad to change your code to do a tree fold to get inLoop.