[MLton] Re: [MLton-commit] r6699
Matthew Fluet
fluet at tti-c.org
Mon Jun 15 16:13:18 PDT 2009
On Mon, 15 Jun 2009, Wesley W. Terpstra wrote:
> On Mon, Jun 15, 2009 at 5:21 PM, Matthew Fluet<fluet at tti-c.org> wrote:
>> It's about the Win32 spawn* functions (and possibly the CreateProcess
>> function), which provide fork/exec-like functionality.
>>
>> The issue (as I understand it) is that the char **argv argument passed to
>> spawnv{,p}{,e} becomes the const char **argv argument passed to main of
>> the created process. One doesn't expect the contents of those character
>> arrays to be changed from spawn{,p}{,e} to main (that is, one shouldn't need
>> to do any escaping at all and one certainly doesn't need to for the *nix
>> exec{,p}{,e} functions), but there is some evidence that MinGW does (or
>> un-does?) escaping of the arguments.
>
> The root problem is that windows does not have an **argv. That's a
> unix convention. Windows programs receive a single flat array (see
> CreateProcess). The crt has code which parses and splits this flat
> array to emulate argv functionality. exec() and spawn() functions have
> code which pastes the arguments together. Unfortunately, a
> long-standing bug in windows is that these pasting and parsing
> operations are NOT compatible.
>
> The MinGW (/ windows CRT) version of pasting is simply ("a", "b", "c")
> -> "a b c". Obviously this breaks for ("a b", "c") -> "a b c". That's
> why MinGW needs to escape arguments to spawn as well as CreateProcess.
> The escaping function in mlton/process.sml was hand-crafted to match
> the parsing done the windows crt at program start-up. The
> launchWithCreate method similarly combines ("a b", "c") -> "a b c",
> but after it escapes it's arguments the same as it would for spawn().
>
> Cygwin has to paste and parse arguments just as MinGW does, however,
> it's possible that the cygwin parsing/pasting actually matches (but I
> wouldn't bet on this). If they do match, then no escaping is needed
> for spawn. However, like MinGW, Cygwin sometimes calls CreateProcess.
> The arguments will need to be escaped and pasted together in whatever
> way matches the cygwin runtime. I don't know how the cygwin runtime
> parses it's single-argument, but was I read said:
>
> (* In cygwin, according to what I read, \ should always become \\.
> * Furthermore, more characters cause escaping as compared to MinGW.
> * From what I read, " should become "", not \", but I leave the old
> * behaviour alone until someone runs the spawn regression.
> *)
>
> However, I didn't (and don't) have a cygwin to poke for the parsing
> algorithm used.
While I can understand the marshalling/unmarshalling of arguments through
a single string, what I'm unclear on is where Cygwin and MinGW interpose
their own conventions. That is, spawn{,p}{,e} and CreateProcess are Win32
functions (right?) --- yet Cygwin and MinGW interpose their own version
that (may or may not) munge the arguments (before calling the "real"
spawn{,p}{,e} and CreateProcess)? Similarly, starting a program from the
console should begin execution at main; though, technically, it is
wherever the loader begins execution, so Cygwin and MinGW could provide
their own _start (or whatever symbol it is in Windows) that (may or may
not) unmunge the arguments before calling main.
Of course, when calling spawn{,p}{,e} or CreateProcess from a Cygwin or
MinGW program, it can't know whether the called executable is itself a
Cygwin, MinGW, or plain Windows program. Similarly, when starting up, a
Cygwin or MinGW executable can't know whether it was called via
spawn{,p}{,e} or CreateProcess by a Cygwin, MinGW, or plain Windows
(including CMD.exe) program. So, I don't see why it is sensible for
Cygwin or MinGW to munge/unmunge arguments at all, since it can't know
what was/will-be done on the other end.
More information about the MLton
mailing list