[MLton] Patch ready

Sun Feb 11 17:45:42 PST 2007

Thanks for the review!

On Feb 12, 2007, at 1:26 AM, Matthew Fluet wrote:
> Here are some notes on the WideChar patch.  There's nothing too  
> major, so feel free to commit after addressing the comments.

Committed.

> Index: regression/widechar.sml
> Looks like a port of bytechar.sml; shouldn't return "WRONG".

Yes. I couldn't test it before, but it's fixed now. I've also added a  
number of new tests to cover the character constants.

> I'm not sure that we'll ever really need to
> exercise the choice of WideChar binding (indeed, if we interpret it
> like LargeInt, then it is pretty much dicated to be Char32), but it is
> nice to have it.

If someone uses MLton on an embedded device (which, to my shame, I  
have) this is likely helpful.

> Index: mlton/atoms/word-x-vector.fun
> ===================================================================
> --- mlton/atoms/word-x-vector.fun	(revision 5163)
> +++ mlton/atoms/word-x-vector.fun	(working copy)
> @@ -39,7 +39,11 @@
>                                         Char.fromInt (IntInf.toInt  
> r) :: ac)
>                                end
>                       in
> -                        loop (n, WordX.toIntInf w, ac)
> +                        (* Control.targetIsBigEndian is not always  
> set, so
> +                         * only use it if we really need to know  
> the value. *)
> +                        if n > 8 andalso Control.targetIsBigEndian ()
> +                        then rev (loop (n, WordX.toIntInf w, []))  
> @ ac
> +                        else loop (n, WordX.toIntInf w, []) @ ac
>                       end)))
>     end
>
> This endian handling is probably best handled by the C-codegen.  As
> you point out, the target endianness isn't always set and it would be
> nice if the IL (and their printed representations) were target
> agnostic.  Also, turning a WordXVector.t into a string is used for
> hashing, so making it target endian dependent changes hash values.
> But, in the meantime, this seems expedient.

I'll defer this to someone more knowledgeable...
However, that method should probably raise an exception on non-byte  
arrays if you take the endian handling out.

> All of the sequence/slice/vector/vector-slice functions are there to
> implement String and Substring functions at a place where we're more
> free with unsafe subscript operations.  String.translate and
> Substring.translate don't need the extra polymorphism, so I'm guessing
> that you need it somewhere.  In any case, the extra polymorphism
> doesn't hurt.

Actually, I need the polymorphism for WideString.toString. The input  
is a WideString.string being converted through translate to a  
String.string.

> Index: basis-library/text/algo.sml
> Please move this to basis-library/util/.  Also, please give it a
> containing structure; in the Basis Library implementation, we try to
> avoid top-level values that are not part of the top-level environment
> of the Basis Library.  (Although that signature match with
> BASIS_{EXTRA,2002} generally culls any extra bindings, its a good
> convention to follow.)

Done.

> One thing I notice about the design is that you don't try to propagate
> CharX and StringX structures through everything; rather, after
> choosing Char and WideChar using
> basis/library/config/default/default-{charN,widecharN}.sml, everything
> is expressed in terms of Char and WideChar.  This seems good, and it
> is somewhat simpler than what needs to happen with IntN and WordN.

I know. The last attempts I made at this exposed more character  
types. To be honest, though, I think this was misguided. Either you  
want Unicode, or you don't. If you do, then pay the RAM for it. If  
you can't, then use a better compression scheme than just chopping  
off the high bits.

> Index: basis-library/text/string0.sml
> Give a PRE_STRING signature?  Also, PreWideString seems unused.
I've renamed it to String, just like with Char in char0.sml
I've commented out WideString; it may be needed in the future.
While I agree that a PRE_CHAR signature is useful, there was no  
PRE_STRING signature before, and I don't see what it adds?

> Index: basis-library/text/char0.sml
> Looks good.  You could give a PRE_CHAR signature.  You could also use
> a local .. in .. end to not expose the PreCharX structure.  (Nice use
> of the {Char,WideChar}_ChooseChar functors.)

PRE_CHAR added in char0.sig local added.

> Index: basis-library/text/string.sig
> ===================================================================
> --- basis-library/text/string.sig	(revision 5163)
> +++ basis-library/text/string.sig	(working copy)
> @@ -33,7 +33,7 @@
>        val isSuffix: string -> string -> bool
>        val map: (char -> char) -> string -> string
>        val maxSize: int
> -      val scan: (char, 'a) StringCvt.reader -> (string, 'a)  
> StringCvt.reader
> +      val scan: (Char.char, 'a) StringCvt.reader -> (string, 'a)  
> StringCvt.reader
>        val sub: string * int -> char
>        val toCString: string -> String.string
>        val toString: string -> String.string
>
> This seems to go against the Basis specification.  The STRING.scan
> function is supposed to be in terms of the STRING.char type.  But, it
> is an odd type given the type of the corresponding function in CHAR,
> so I think you are right in spirit.

Oh! You're right. I think that's a probably a typo in the basis  
specification. It's pretty clear that these methods are intended to  
be built upon the corresponding methods from CHAR from the relevant  
text. I guess this gets added as point #4 to my list of changes made  
to the basis library.

> Index: basis-library/text/char.sml
> Use 'lSPACE' for consistency with BANG, TIL, and DEL?

Done.

I'm so glad to finally have made some forward progress on this issue.  
I kept restarting, throwing it away, restarting, ... I'm pretty happy  
with how I see it shaping up now, though. If I'd implemented it two  
years ago, I'd have implemented it wrong.

I have a full regression running over-night for powerpc and intel.  
Hopefully, none of the above changes broke anything.