You could just use 64-bit bins, but then the space requirements double again (8 bytes of bin for every byte of object code) and now the generated code at each allocation point is a 32-bit add to a 64-bit counter. That seems to push harder for calling a procedure which can do it and do the linking to save space.