[MLton] power pc "port"

Sun, 5 Sep 2004 14:16:07 -0700

> Well, here are more details according to the C standard.  Consider the
> Word8_neg function, which is defined as follows:
> 
> unsigned char Word8_neg(unsigned char w) {
> 	return (- w);
> }
> 
> The steps that happen are:
> 
> 1) w must be promoted for the negation to be performed
> 2) the negation is performed
> 3) the result of the negation must be converted for the return, which
>    requires an unsigned char type.
...

Your explanation of what the C standard says makes sense.  What I
don't understand is why it would lead to an unexpected result.
Suppose that w1 is of type Word8, i.e. unsigned char.  Your
explanation says that Word8_neg (w) returns

	(Word8)(0xFF & (- ((int)w)))

That seems correct to me.  Consider the following C program.

----------------------------------------------------------------------
#include <stdio.h>

typedef unsigned char Word8;

Word8 Word8_neg (Word8 w) {
	return -w;
}

int main () {
	int i;
	Word8 w1, w2, w3;

	for (i = 0; i <= 255; ++i) {
		w1 = i;
		w2 = (Word8)(0xFF & (- ((int)w1)));
 		w3 = Word8_neg (w1);
		fprintf (stderr, "%d  %d  %d\n", (int)w1, (int)w2, (int)w3);
	}
}
----------------------------------------------------------------------

This program produces identical output on my x86, Sparc, and G5
machines.  And the output is exactly what I would expect.  What do you
see on your G4 machine?

> Overall, I think it would be a good idea to make the code that MLton
> generates mark things signed and unsigned more carefully.  Right
> now, the C variables that represent 8-bit words are always unsigned,
> even though the value they represent may be signed.  Same with
> 16-bit words.  I fear that this may cause other problems (either on
> PowerPC or on other platforms), since the C compiler does not have
> accurate information about whether or not these values should have a
> sign extension.

I strongly disagree with this conclusion.  There is no difference
between signed and unsigned words -- it's just blobs of bits -- the
difference is between signed and unsigned operators.  Of course, we
must get the operators right, and that requires doing appropriate
signed or unsigned extensions.  So, we need to distinguish between,
say, signed and unsigned multiply because those are different
functions.  But we don't need to distinguish between, say, signed and
unsigned add (or negate), because those are the *same* function.

So, I still don't understand what's going wrong.  I suspect that once
I understand the problem more clearly, the solution will become clear.