Do you really have code where 32*32->64 would be a big win compared to the Appel hack which would avoid the expensive call to C? It can only be a win if you have things that fit in 31 bits and are frequently multiplying them together and the product often does NOT fit in 31 bits. Obviously one can construct bad cases, but I would think that it would be VERY rare in practice.