Looking at the assembly, I see that if I don't declare GC_state volatile, then the code for gcState.canHandle++; if (FALSE and DEBUG) fprintf(stderr, "atomicBegin canHandle++ done\n"); if (gcState.signalIsPending) { is movl gcState+128, %eax movl gcState+264, %edx incl %eax testl %edx, %edx movl %eax, gcState+128 That is, the load for signalIsPending occurs before the increment and store of canHandle.