[MLton] Unicode... again

skaller skaller at users.sourceforge.net
Sat Feb 10 07:08:45 PST 2007


On Sat, 2007-02-10 at 13:51 +0100, Wesley W. Terpstra wrote:
> On Feb 10, 2007, at 3:24 AM, skaller wrote:
> > Be aware in these considerations that GNU gettext functionality
> > generally requires a prefix for dictionary lookup too, in C
> > you write
> >
> > 	_("....")
> >
> > I think, where macro _ specifies a catalogued message to be
> > translated to the user language. This would stack on top
> > of any string type indicators..
> 
> When we bind gettext, we will have lots of options. I'm not sure it  
> would be horrible if we required a WideString input to _. We  
> certainly won't be using macros, though.

No of course not .. I'm just pointing out that however you
denote a string vs widestring literal .. there is an additional
mark for gettext processing which also needs to be very short
(otherwise the program becomes unreadable).

Just reminding that when chosing lexicology gettext might
be considered.

> > the problem with inference in general is bad error handling.
> > For strings you might get an error you couldn't even see:
> > a bad character code in a string is likely to be hard to
> > find if your text editor can't display it (which is possible
> > if it is a bad character ... :)
> 
> I'm confused. The compiler will give a compile-time error if there is  
> a character too big for the inferred type. You'll know the line and  
> column too, so that's not hard to fix?

I had problems with this in a Python string .. couldn't figure
out what was going on. In this case I could see the bad character
in gvim, but couldn't figure out WHY it was bad .. the problem
being I could see the character but that didn't tell me the
encoding. Again .. only a small point here. 

I don't think I like the idea of 'infering' a string or char
type. What is the type of:

	"A"

?? If it is 8 bit but

	"\u0123"

is 32 bit just because it happens to contain a character that
can't be represented as an ordinary Char I'm not sure I like it.
I may want 

	"A"

to be 32 bit, and I may want some strong to be forced to be
8 bit. I think I'm probably confused here ..

//////

Sometime or other MLton with get a 64 bit engine and I will
be able to use it .. :) So I'm flying a blind, just trying
to provide as much input as possible. I'm not an i18n expert,
but as representative of Australia -- a multi-cultural society --
on an ISO committee I had to learn a bit to ensure Australia's
needs were met. We have software for things like motor vehicle
drivers licence theory tests that presents in the candidate's
native language (for example), thus a need for single server
to operate multiple languages, and hence a single locale
is unacceptable.

I18n support in much software is poor or non-existent and
that includes programming languages .. so .. 

This thread started with you writing:

"Once again I find myself needing Unicode in MLton. I failed to  
implement this last time. I bit off more than I could chew in one  
bite. I propose to instead start with a minimal implementation that  
captures the most useful elements. "

and I think this is an excellent approach and highly worthy!

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net



More information about the MLton mailing list