[MLton-user] SML unicode support
Alexandre
Xlex0x835@rambler.ru
Wed, 5 Jan 2005 21:14:37 +0300
Dear subscribers,
this message was "moved" from comp.lang.ml, because of it's slow
messages moderating.
>The SML Basis library has an optional structure, WideChar, that
>supports Unicode. However, neither MLton nor SML/NJ implements
>WideChar. Also, neither compiler supports UTF-8 (or otherwise)
>encoded string constants.
And what's the problem with WideChar? Is it difficult to implement it?
As far as I understand, I can store utf-8 character in a C char
variable. At least the following C example (written to test this idea)
work fine with an utf-8 russian, english mixed text (it just copy
symbol to symbol from one text doc - test_file.in to other -
test_file.out; tested on Darwin 7.7):
===============================================
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (void) {
FILE *testfile_in, *testfile_out;
unsigned char ch1 = NULL, ch2 = NULL;
printf("sizeof ch1 - %li bytes\n", sizeof ch1);
printf("sizeof ch2 - %li bytes\n", sizeof ch2);
testfile_in = fopen("test_file.in", "r");
testfile_out = fopen("test_file.out", "w");
if (testfile_in == NULL) {
printf("Input file open error.\n");
return 0;
}/* if */
if (testfile_out == NULL) {
printf("Output file open/creation error.\n");
return 0;
}/* if */
while (ch1 = getc(testfile_in), !feof(testfile_in)) {
ch2 = ch1; //make a copy
putc(ch2, testfile_out);
}/* while */
fclose(testfile_in);
fclose(testfile_out);
return 0;
}/* main */
===============================================
So, from http://mlton.org/ForeignFunctionInterfaceTypes it is possible
to conclude, that an SML char/string (which are equal to char/char*
accordingly) should be able to handle utf-8 characted/string... Or I
understand something wrong?
>We are working on adding support for Unicode to MLton, and expect it
>to be in our next release.
That's nice! Actually I'm curious about MLTon team and Stephen in
particular - they managed to do so many things and do it so well! ;)
>In the meantime, you might have a look at fxp, which implements
>Unicode encoders and decoders in SML without any compiler support.
>http://atseidl2.informatik.tu-muenchen.de/~berlea/Fxp/
Thanks for the link.
/Alexandre.