[MLton] Unicode... again

Thu Feb 8 19:32:11 PST 2007

On Fri, 2007-02-09 at 09:59 +1100, Michael Norrish wrote:
> Wesley W. Terpstra wrote:

> I think I'm in total agreement with your vision. 

Yeah, for once I agree :)

In the spirit I'd drop 16 bit support initially. Just provide

* 32 bits 
* 8 bits
* UTF-8

>  Pragmatically, I
> wonder how important you think providing the 16 bit character type is.
> It seems a kind of optional extra for people who want space-efficient
> BMP.  Or do you imagine the vast majority of people will want to just
> use the BMP, and will therefore resent wasting 16 bits per char? 

That's the problem! The vast majority of people do indeed want
primarily the BMP. But they shouldn't be allowed easy access to it:
the whole point of a Standard is as a guide for what everyone 
SHOULD do to facilitate communication and interoperability, and
32 bits is the way to go here, not 16, which is a stupid compromise
made prematurely by greedy industrial powers.

I18N consensus is that 32 bits is the right compromise,
and UTF-8 encoding is the right space efficient one if you're
willing to give up random access.

16 bits is neither space efficient nor does it give random
access to the full Standardised code point space -- it's
probably the representation of choice for mobile phones though.

>  (It
> certainly does seem as if there won't be much use of stuff outside
> BMP, but who can tell?)

I don't agree .. if you are thinking mainly of basic characters
for spoken human languages .. yes, you're probably right.

But you can be sure the committees will move on to consider
other symbols and use up some of the space .. and then people
will want access to it.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net