[MLton] Type of _address?

skaller skaller@users.sourceforge.net
Fri, 22 Jul 2005 06:35:47 +1000


--=-01+wNx2H1H0hFTIr2UOv
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

On Thu, 2005-07-21 at 21:23 +0200, Wesley W. Terpstra wrote:

> There are lots of issues I also don't know in this area.
> I just know that the rule of thumb for bug free code is to avoid casts.

Say 'strictly conforming' not 'bug free'.

C code doesn't have to be portable -- the problem with casting
is that it hides architecture dependent assumptions: more
precisely, the problem is with aliasing in general,
unions can also be used to do a 'cast' without a cast,
and so can some implicit conversions, for example:

void *f =3D "Hello";

is allowed in C and C++, C even allows:

char *g =3D f;

which C++ does not: notice there is no cast, there is, however,
a conversion.

C is a really really bad language, it is a good idea to
always compile your C code as C++, it catches extra errors.

> I've heard of systems where pointers to data and code are distinct.\

Sure .. the i386 family.

> ie: (int*)4 cannot be represented as a (void(*)(int)). Also, I know=20
> that alignment matters, so if you cast a (char*) to an (int*) and try
> to dereference it, you can segfault.=20

The rule is that if you cast, say, an int* to a char* and back,
you are guaranteed to get the same value (on all systems).

There is also a guarantee that IF an int is big enough,
you can cast a pointer to an int, and then the int back
to a pointer, and you have the same value again.=20

IF it is big enough :) And of course, if you do anything
but copy that int about, all bets are off.

> There's also a lot of leeway for
> compilers to pull the rug from under your feet, because casts to/from
> anything other than char are not guaranteed to even be reversible, afaik
> (not to mention anything you do has unspecified results).

*unsigned char* is what you mean. In particular given:

union X {
  T v;
  unsigned char a[sizeof(T)];
};

you may copy a value into v, move the contents of 'a'=20
elsewhere, copy it back, and 'v' is guaranteed to have
the same value: in C++ too provided v is a POD type.
(Plain Old Data type).

But this is not the case with 'char'. Note the following
wording in my man page:

       int fputc(int c, FILE *stream);
       fputc() writes the character c, cast to an unsigned char,=20
		to stream.


So why does it say 'cast to an unsigned char'??

Because C89 is bugged .. (just about everything in C is wrong).
Without those words, there is no guarantee that getc/putc
are inverses: just think about writing out 0xFF which is
zero in ones complement, and reading back 0x00, which is
ALSO zero in ones complement .. :)

[This cannot be so for unsigned char, there is only
one possible zero, namely 0x00]

> It happens to work because the current gcc is very forgiving.=20

No, it works because C is not required to be portable.
Many constructions are perfectly legal C, they're just
not strictly conforming.

> C assumes that if two pointers point to different types then they cannot
> alias.=20

Fortran may assume that .. C doesn't .. that's one reason
Fortran is faster than C.

Aliasing can be perfectly OK in C, it all depends :)

Aliasing signed and unsigned types of the same denomination
is guaranteed. In particular, a non-negative value of a signed type is
guaranteed to have the same value when aliased to the corresponding
unsigned type:

	union X { int i; unsigned int j; } x;
	x.i =3D 1; assert (x.i =3D=3D x.j);

The pointer to the first element of a structure is guaranteed
to also point to the whole structure.

The general sense of what you say is correct .. but unfortunately
C isn't strict enough to actually deliver optimisations as one
would like.=20

> afaik this is not portable (even when sizeof(int) =3D sizeof(long)):
> 	int x =3D 5;
> 	long* y =3D (long*)&x;

I think you are right (even when the sizes are the same the
representations can be different).

The bottom line is: it is very hard to write low level
code which is also strictly conforming: there is no requirement
that C be strictly conforming. A program is a valid C program
if it is correct on even one C compiler.

I try to avoid the word 'portable' because as Bill Plauger
explained it, something like: portable doesn't mean
platform independent, on the contrary, it implies an active
effort to modify the code so it works on another platform:
portable means you can port some code, which implies it
need to be ported, that is, to be modified, that is,
portable actually means exactly the opposite to what
you think: it means the code is NOT platform independent :))

Anyhow, for MLton C output: it is normal that the
code not be strictly conforming; this doesn't mean
it can't be improved to work on more platforms, but
suggests that you shouldn't worry TOO much about
abuses .. that's normal for C :)

Felix tries to output conforming ISO C++ code ..
but actually there are hidden assumptions, for
example:

	union X {=20
		struct Y { int a; int b; int c; } y;
		int z[3];
	} x;

I assume x.z+1 =3D=3D &x.y.b, but there is no such assurance:
I'd be surprised if any real compiler swapped the order
of b and c around though. Of course I can FORCE it
to be strictly conforming like:

	struct Y { int a; struct YY { int b; int c; } yy; } y;

and now it is assured that a, b, and c are in order.

--=20
John Skaller <skaller at users dot sourceforge dot net>


--=-01+wNx2H1H0hFTIr2UOv
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQBC4AcjsRp8/9aGVGsRAqOuAJ43zfRd2qCKQCe8/gk6+LdqymnMNACdF+/Q
B3MKRmyY6zEpSxxhsahaSFU=
=MhyP
-----END PGP SIGNATURE-----

--=-01+wNx2H1H0hFTIr2UOv--