[MLton] Patch ready
Wesley W. Terpstra
wesley at terpstra.ca
Sun Feb 11 17:45:42 PST 2007
Thanks for the review!
On Feb 12, 2007, at 1:26 AM, Matthew Fluet wrote:
> Here are some notes on the WideChar patch. There's nothing too
> major, so feel free to commit after addressing the comments.
Committed.
> Index: regression/widechar.sml
> Looks like a port of bytechar.sml; shouldn't return "WRONG".
Yes. I couldn't test it before, but it's fixed now. I've also added a
number of new tests to cover the character constants.
> I'm not sure that we'll ever really need to
> exercise the choice of WideChar binding (indeed, if we interpret it
> like LargeInt, then it is pretty much dicated to be Char32), but it is
> nice to have it.
If someone uses MLton on an embedded device (which, to my shame, I
have) this is likely helpful.
> Index: mlton/atoms/word-x-vector.fun
> ===================================================================
> --- mlton/atoms/word-x-vector.fun (revision 5163)
> +++ mlton/atoms/word-x-vector.fun (working copy)
> @@ -39,7 +39,11 @@
> Char.fromInt (IntInf.toInt
> r) :: ac)
> end
> in
> - loop (n, WordX.toIntInf w, ac)
> + (* Control.targetIsBigEndian is not always
> set, so
> + * only use it if we really need to know
> the value. *)
> + if n > 8 andalso Control.targetIsBigEndian ()
> + then rev (loop (n, WordX.toIntInf w, []))
> @ ac
> + else loop (n, WordX.toIntInf w, []) @ ac
> end)))
> end
>
> This endian handling is probably best handled by the C-codegen. As
> you point out, the target endianness isn't always set and it would be
> nice if the IL (and their printed representations) were target
> agnostic. Also, turning a WordXVector.t into a string is used for
> hashing, so making it target endian dependent changes hash values.
> But, in the meantime, this seems expedient.
I'll defer this to someone more knowledgeable...
However, that method should probably raise an exception on non-byte
arrays if you take the endian handling out.
> All of the sequence/slice/vector/vector-slice functions are there to
> implement String and Substring functions at a place where we're more
> free with unsafe subscript operations. String.translate and
> Substring.translate don't need the extra polymorphism, so I'm guessing
> that you need it somewhere. In any case, the extra polymorphism
> doesn't hurt.
Actually, I need the polymorphism for WideString.toString. The input
is a WideString.string being converted through translate to a
String.string.
> Index: basis-library/text/algo.sml
> Please move this to basis-library/util/. Also, please give it a
> containing structure; in the Basis Library implementation, we try to
> avoid top-level values that are not part of the top-level environment
> of the Basis Library. (Although that signature match with
> BASIS_{EXTRA,2002} generally culls any extra bindings, its a good
> convention to follow.)
Done.
> One thing I notice about the design is that you don't try to propagate
> CharX and StringX structures through everything; rather, after
> choosing Char and WideChar using
> basis/library/config/default/default-{charN,widecharN}.sml, everything
> is expressed in terms of Char and WideChar. This seems good, and it
> is somewhat simpler than what needs to happen with IntN and WordN.
I know. The last attempts I made at this exposed more character
types. To be honest, though, I think this was misguided. Either you
want Unicode, or you don't. If you do, then pay the RAM for it. If
you can't, then use a better compression scheme than just chopping
off the high bits.
> Index: basis-library/text/string0.sml
> Give a PRE_STRING signature? Also, PreWideString seems unused.
I've renamed it to String, just like with Char in char0.sml
I've commented out WideString; it may be needed in the future.
While I agree that a PRE_CHAR signature is useful, there was no
PRE_STRING signature before, and I don't see what it adds?
> Index: basis-library/text/char0.sml
> Looks good. You could give a PRE_CHAR signature. You could also use
> a local .. in .. end to not expose the PreCharX structure. (Nice use
> of the {Char,WideChar}_ChooseChar functors.)
PRE_CHAR added in char0.sig local added.
> Index: basis-library/text/string.sig
> ===================================================================
> --- basis-library/text/string.sig (revision 5163)
> +++ basis-library/text/string.sig (working copy)
> @@ -33,7 +33,7 @@
> val isSuffix: string -> string -> bool
> val map: (char -> char) -> string -> string
> val maxSize: int
> - val scan: (char, 'a) StringCvt.reader -> (string, 'a)
> StringCvt.reader
> + val scan: (Char.char, 'a) StringCvt.reader -> (string, 'a)
> StringCvt.reader
> val sub: string * int -> char
> val toCString: string -> String.string
> val toString: string -> String.string
>
> This seems to go against the Basis specification. The STRING.scan
> function is supposed to be in terms of the STRING.char type. But, it
> is an odd type given the type of the corresponding function in CHAR,
> so I think you are right in spirit.
Oh! You're right. I think that's a probably a typo in the basis
specification. It's pretty clear that these methods are intended to
be built upon the corresponding methods from CHAR from the relevant
text. I guess this gets added as point #4 to my list of changes made
to the basis library.
> Index: basis-library/text/char.sml
> Use 'lSPACE' for consistency with BANG, TIL, and DEL?
Done.
I'm so glad to finally have made some forward progress on this issue.
I kept restarting, throwing it away, restarting, ... I'm pretty happy
with how I see it shaping up now, though. If I'd implemented it two
years ago, I'd have implemented it wrong.
I have a full regression running over-night for powerpc and intel.
Hopefully, none of the above changes broke anything.
More information about the MLton
mailing list