I have replaced the needlessly complex wordAlign() inline function with the trivial return ((p + 3) & ~ 3); The difference in generated code is pretty huge, but it really is more readable this way to.