From fluet at tti-c.org Thu Jun 11 13:59:30 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Thu Jun 11 13:59:35 2009 Subject: [MLton] release and future devel Message-ID: I'd like to push out a mlton release this summer, probably within the next three or four weeks, unless someone identifies a significant issue. Although there are few features added since the last release (20070826), there have been a number of bugs fixed. Furthermore, it would good to make a release at this time before undertaking more significant developments. I know of only three outstanding reported (but unverified) bugs: http://mlton.org/pipermail/mlton-user/2008-March/001358.html (Intel) http://mlton.org/pipermail/mlton/2008-September/030355.html (PolySpace) http://mlton.org/pipermail/mlton/2009-February/030513.html (PolySpace) In each case, the company won't release source code to verify and diagnose the bug. I have begun updating the doc/changelog file with a summary (and included some additional changelog entries), based on reviewing the commit messages. Let me know if there are any other significant points. I regularly self-compile on x86-darwin and amd64-linux, so I don't think that there are any issues on those platforms. I should also be able to check x86-cygwin and x86-linux. If you regularly use another platform, it would be helpful to verify that things work as expected on that platform; bootstrapping from mlton-20070826 and running the regression suite is a good baseline test. Obviously, your favorite SML application is another good test. Another reason for releasing at this time is that I would like to undertake a round of slash-and-burn to clear away some outdated and un-maintained code and to simplify some other aspects. [I will be starting a new faculty position at the Rochester Institute of Technology in the fall. I will primarily be working with undergraduate and masters students, and to make MLton an effective platform for research with such students, it needs to be simplified.] This will include dropping some previously supported "features" and will likely be destabilizing in the short term (particularly for some of the more exotic platforms). "Features" that I would like to drop: * Bytecode codegen --- the bytecode codegen has never gotten any significant use; it is not "portable" bytecode (which has confused some people); it is not well understood by any of the active developers. * Support for .cm files --- the ML Basis system provides much better infrastructure for "programming in the very large" than our rudimentary support for CM; in particular, we treat .cm files as simple list of files (recursively expanding .cm files unless we've seen them before), with none of the scoping that CM and MLB provide. The cm2mlb tool can be used to convert CM projects to MLB projects, preserving the CM scoping. * time-label profiling --- this is only supported on the x86 and amd64 codegens. Using assembly labels for code coverage has been a bit awkward; there were the hard to implement getText{Start,End} functions (which Wesley eliminated in favor of binary search tree of just the profiling labels), it can be confusing as to whether other platforms support time profiling (using the C codegen and time-field profiling, they all should), and it complicates the native codegens that must introduce (and maintain) profiling labels for every basic block, which is a barrier to future codegens. In any case, I don't think the time-field technique has a prohibitive cost (and what cost it has could be lowered): http://mlton.org/pipermail/mlton-commit/2005-November/000238.html I'll mention the deprecation of the bytecode codegen and support for .cm files in the release summary notes; the elimination of the time-label profiling won't be a visible change in functionality. From henry.cejtin at sbcglobal.net Thu Jun 11 14:23:04 2009 From: henry.cejtin at sbcglobal.net (Henry Cejtin) Date: Thu Jun 11 14:23:37 2009 Subject: [MLton] release and future devel In-Reply-To: References: Message-ID: <203802.87535.qm@web82403.mail.mud.yahoo.com> Oh happy days!!! Would it be easy to have a version of the compiler for AMD64 (or any 64-bit architecture I guess) where Int is Int64? In particular, where I could have arrays and vectors with more than 2 billion elements? From fluet at tti-c.org Thu Jun 11 15:13:02 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Thu Jun 11 15:13:05 2009 Subject: [MLton] release and future devel In-Reply-To: <203802.87535.qm@web82403.mail.mud.yahoo.com> References: <203802.87535.qm@web82403.mail.mud.yahoo.com> Message-ID: On Thu, 11 Jun 2009, Henry Cejtin wrote: > Oh happy days!!! > > Would it be easy to have a version of the compiler for AMD64 (or any 64-bit > architecture I guess) where Int is Int64? In particular, where I could have > arrays and vectors with more than 2 billion elements? You already have this with mlton-20070826, using '-default-type int64' or '-default-type intinf': [fluet@shadow tmp]$ cat z.sml fun doit (n: int) : unit = let val v = Vector.tabulate (n, Word8.fromInt) val sum = Vector.foldr Word8.+ 0wx0 v in (print o concat) ["n = ", Int.toString n, "; sum = ", Word8.toString sum, "\n"] end val n = Int32.toInt (valOf (Int32.maxInt)) val () = doit n val () = doit (n + 1) val () = doit (n + n) [fluet@shadow tmp]$ ~/devel/mlton/mlton-20070826/build/bin/mlton z.sml [fluet@shadow tmp]$ ./z n = 2147483647; sum = 1 unhandled exception: Overflow [fluet@shadow tmp]$ ~/devel/mlton/mlton-20070826/build/bin/mlton -default-type int64 z.sml [fluet@shadow tmp]$ ./z n = 2147483647; sum = 1 n = 2147483648; sum = 0 n = 4294967294; sum = 3 [fluet@shadow tmp]$ ~/devel/mlton/mlton-20070826/build/bin/mlton -default-type intinf z.sml [fluet@shadow tmp]$ ./z n = 2147483647; sum = 1 n = 2147483648; sum = 0 n = 4294967294; sum = 3 On a 64-bit platform, all array/vector sub/update operations are with 64-bit indices. The Basis Library just wraps them with coercions to/from the default Int type. From fluet at tti-c.org Thu Jun 11 15:18:44 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Thu Jun 11 15:18:47 2009 Subject: [MLton] library support Message-ID: The most invasive change since 20070826 has been the addition of support for building stand-alone ML libraries. Honestly, I think this has gotten less testing and less documentation than it needs, but as long as it doesn't (seriously) disrupt the building of executables, it won't affect most users. The one place where it does affect existing users is the proliferation of scoping attributes for FFI. If I understand things correctly, the following summarizes the state of play: - Building an executable with only imports (_address/_symbol/_import) Either 'private' or 'public' should be used for imports from .o/.a files linked statically with the executable; technically, 'private' would be the correct choice on coff, while 'public' would be the correct choice on elf and macho --- unless the symbol was compiled with the fancy __attribute__((visibility("hidden"))) annotation, in which case 'private' would be the correct choice. 'external' should be used for imports from .dll/.dylib/.so files linked dynamically with the executable. + on elf (executables are non-PIC), 'private', 'public', and 'external' are all treated the same; thus, no errors if you get things wrong. Indeed, the default ('external') will always work. + on macho (x86 executables are non-PIC), 'private' and 'public' are treated the same, but 'external' is treated differently. However, the linker will patch things up if you use 'external' where you should have used 'private' or 'public'; thus, no errors if you get things wrong. Indeed, the default ('external') will always work. + on coff (x86), 'private' and 'public' are treated the same, but 'external' is treated differently; furthermore, linker reports errors if you get things wrong. This differs from previous versions of MLton. In previous versions of MLton, we compiled all imports as 'private'; I can't find the e-mail now, but I think I recall Wesley asserting that the linker would, as a convenience, automatically generate the appropriate 'external' stubs if a symbol couldn't be statically resolved. However, this only worked for function symbols, so _address of a C global variable exported from a DLL would not have worked (??). - Building an executable with exports (_symbol 'alloc'/_export) Either 'private' or 'public' should be used for exports to .o files linked statically with the executable. If the client .c file uses the .h file generated by -export-header, then you can't get things wrong. Question: If I infer correctly regarding 'private' vs. 'public' in imports above, then I am at a loss as to how one conveniently shares ML code between target platforms --- which I thought was the whole reason for introducing 'target agnostic' attributes, rather than gcc's target specific attributes: __declspec(...) vs __attribute__((visibility("..."))). In any case, something is broken with library support on x86-darwin: [fluet@fenrir library]$ ./library-all + ../../build/bin/mlton -default-ann 'allowFFI true' -link-opt -L. -debug true -format libarchive libm1.sml libm1.c + ../../build/bin/mlton -default-ann 'allowFFI true' -link-opt -L. -debug true -link-opt -lm1 -format library libm2.sml libm2.c + ../../build/bin/mlton -default-ann 'allowFFI true' -link-opt -L. -debug true -format libarchive libm3.sml libm3.c + ../../build/bin/mlton -default-ann 'allowFFI true' -link-opt -L. -debug true -link-opt -lm3 -link-opt -lm2 -format library libm4.sml libm4.c + ../../build/bin/mlton -default-ann 'allowFFI true' -link-opt -L. -debug true -format archive libm5.sml libm5.c + ../../build/bin/mlton -default-ann 'allowFFI true' -link-opt -L. -debug true -link-opt -lm5 -link-opt -lm4 -format executable -default-ann 'allowFFI true' -export-header check.h check.sml check.c /usr/libexec/gcc/i686-apple-darwin8/4.0.1/ld: ./libm5.a(libm5.a.o) has external relocation entries in non-writable section (__TEXT,__text) for symbols: _m4_close collect2: ld returned 1 exit status call to system failed with exit status 1: gcc -o check -g /tmp/file6zUOi2.o /tmp/file1JPVB3.o /tmp/filegAcro6.o -L/Users/fluet/devel/mlton/mlton.git-svn.trunk/build/lib/self -lmlton-gdb -lgdtoa-gdb -lm -lgmp -L/usr/local/lib -L/opt/local/lib -m32 -L. -lm5 -lm4 From wesley at terpstra.ca Thu Jun 11 17:39:29 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Thu Jun 11 17:40:03 2009 Subject: [MLton] library support In-Reply-To: References: Message-ID: <162de7480906111739w31b87d13x7cba5919b0a54b42@mail.gmail.com> On Fri, Jun 12, 2009 at 12:18 AM, Matthew Fluet wrote: > The most invasive change since 20070826 has been the addition of support for > building stand-alone ML libraries. ?Honestly, I think this has gotten less > testing and less documentation than it needs, but as long as it doesn't > (seriously) disrupt the building of executables, it won't affect most users. It would be nice if some other people would try using it. "It works for me"! > ? + on elf (executables are non-PIC), 'private', 'public', and > ? ? 'external' are all treated the same; ?thus, no errors if you get things > ? ? wrong. ?Indeed, the default ('external') will always work. Yes. > ? + on macho (x86 executables are non-PIC), 'private' and 'public' are > ? ? treated the same, but 'external' is treated differently. ?However, the > ? ? linker will patch things up if you use 'external' where you should have > ? ? used 'private' or 'public'; thus, no errors if you get things wrong. > ? ? Indeed, the default ('external') will always work. Yes. > ? + on coff (x86), 'private' and 'public' are treated the same, but > ? ? 'external' is treated differently; furthermore, linker reports errors if > ? ? you get things wrong. ?This differs from previous versions of MLton. > ? ? In previous versions of MLton, we compiled all imports as 'private'; I > ? ? can't find the e-mail now, but I think I recall Wesley asserting that > ? ? the linker would, as a convenience, automatically generate the > ? ? appropriate 'external' stubs if a symbol couldn't be statically > ? ? resolved. ?However, this only worked for function symbols, so _address > ? ? of a C global variable exported from a DLL would not have worked (??). All correct. > ?- Building an executable with exports (_symbol 'alloc'/_export) > > ? Either 'private' or 'public' should be used for exports to .o files > ? linked statically with the executable. ?If the client .c file uses the .h > ? file generated by -export-header, then you can't get things wrong. Yes. > - Building an executable with only imports (_address/_symbol/_import) > Either 'private' or 'public' should be used for imports from .o/.a files > linked statically with the executable So far so good ... > technically, 'private' would be the correct choice on coff I'm not sure why you say this. For an executable, public would be just as good. If you are building a library, then you will need to match the public/private exports between all uses with the library, just like on any other platform. > while 'public' would be the correct choice on > elf and macho --- unless the symbol was compiled with the fancy > __attribute__((visibility("hidden"))) annotation, in which case 'private' > would be the correct choice. This is all correct. > 'external' should be used for imports from > .dll/.dylib/.so files linked dynamically with the executable. Yes. > Question: If I infer correctly regarding 'private' vs. 'public' in imports > above, then I am at a loss as to how one conveniently shares ML code between > target platforms --- which I thought was the whole reason for introducing > 'target agnostic' attributes, rather than gcc's target specific attributes: > __declspec(...) vs __attribute__((visibility("..."))). Where's the problem? AFAIK the rules are the same for all platforms: Match private/public definitions with declarations (in both C and ML). Use external for things imported from a dynamic library/dll. > In any case, something is broken with library support on x86-darwin: I assume you mean amd64-darwin? x86-darwin is not supported for library generation. All "x86" machines from apple support amd64. If you meant amd64, then something has broken since I last used it, because that used to work. From fluet at tti-c.org Thu Jun 11 19:17:36 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Thu Jun 11 19:17:40 2009 Subject: [MLton] library support In-Reply-To: <162de7480906111739w31b87d13x7cba5919b0a54b42@mail.gmail.com> References: <162de7480906111739w31b87d13x7cba5919b0a54b42@mail.gmail.com> Message-ID: On Fri, 12 Jun 2009, Wesley W. Terpstra wrote: >> technically, 'private' would be the correct choice on coff > > I'm not sure why you say this. For an executable, public would be just as good. I say this because an un-attributed C function declaration (i.e., one without __declspec(dllexport)) would appear to denote a 'private' symbol. >> while 'public' would be the correct choice on >> elf and macho --- unless the symbol was compiled with the fancy >> __attribute__((visibility("hidden"))) annotation, in which case 'private' >> would be the correct choice. > > This is all correct. > >> 'external' should be used for imports from >> .dll/.dylib/.so files linked dynamically with the executable. > > Yes. > >> Question: If I infer correctly regarding 'private' vs. 'public' in imports >> above, then I am at a loss as to how one conveniently shares ML code between >> target platforms --- which I thought was the whole reason for introducing >> 'target agnostic' attributes, rather than gcc's target specific attributes: >> __declspec(...) vs __attribute__((visibility("..."))). > > Where's the problem? AFAIK the rules are the same for all platforms: > Match private/public definitions with declarations (in both C and ML). > Use external for things imported from a dynamic library/dll. I guess the problem is that the C-side defaults are not the same for coff as for elf/macho. That is, if in a .c file I simply write: int foo(void) { return 1; } then, as I understand it: * on coff, with no attribute, gcc will compile it as 'private'; * on coff, I need __declspec(dllexport) to get gcc to compile it as 'public'. * on elf/macho, with no attribute, gcc will compile it as 'public'; * on elf/macho, I need __attribute__((visibility("hidden"))) to get gcc to compile it as 'private'. I infer this from the macros in /runtime/export.h. It is true that if I were to write: #include "export.h" PRIVATE int foo(void) { return 1; } or #include "export.h" PUBLIC int foo(void) { return 1; } then the macro expansion will insert the appropriate attribute. But, this doesn't appear to help in the case that you are linking to a static library that was compiled naively --- that is, with un-attributed function declarations --- on both coff and elf/macho. >> In any case, something is broken with library support on x86-darwin: > > I assume you mean amd64-darwin? x86-darwin is not supported for > library generation. No, I meant "x86-darwin". Extending mlton.org/LibrarySupport with the list of known good and known bad platforms would be helpful. > All "x86" machines from apple support amd64. But, by default, gcc (and mlton) treat an Intel Mac as an x86 target. From henry.cejtin at sbcglobal.net Thu Jun 11 16:06:05 2009 From: henry.cejtin at sbcglobal.net (Henry Cejtin) Date: Thu Jun 11 19:17:57 2009 Subject: [MLton] release and future devel In-Reply-To: References: <203802.87535.qm@web82403.mail.mud.yahoo.com> Message-ID: <914681.45045.qm@web82404.mail.mud.yahoo.com> re Int64 being the same as Int: Wow, I remember some discussion of the idea of setting things up so that one could set the type of int but I didn't know that it had actually gone in including in the C code for the GC. This is just what I wanted and there for a long time. Thanks From wesley at terpstra.ca Fri Jun 12 01:20:23 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Fri Jun 12 05:19:06 2009 Subject: [MLton] library support In-Reply-To: References: <162de7480906111739w31b87d13x7cba5919b0a54b42@mail.gmail.com> Message-ID: <162de7480906120120n770d76f1kda6eb3db9156d9a9@mail.gmail.com> On Fri, Jun 12, 2009 at 4:17 AM, Matthew Fluet wrote: > On Fri, 12 Jun 2009, Wesley W. Terpstra wrote: > I say this because an un-attributed C function declaration (i.e., one > without __declspec(dllexport)) would appear to denote a 'private' symbol. public/private don't differ when compiling to an executable. __declspec(dllexport) has no effect then. >> Where's the problem? AFAIK the rules are the same for all platforms: >> Match private/public definitions with declarations (in both C and ML). >> Use external for things imported from a dynamic library/dll. > > I guess the problem is that the C-side defaults are not the same for coff as > for elf/macho. ?That is, if in a .c file I simply write: > > int foo(void) { return 1; } > > then, as I understand it: > > ?* on coff, with no attribute, gcc will compile it as 'private'; > ?* on coff, I need __declspec(dllexport) to get gcc to compile it as > ? 'public'. The above is correct. As for the C-side defaults not being consistent, there's not much I can do about this. When compiling to a library, you should be tagging all your methods as public or private, so the default doesn't matter. When compiling to an executable private/public are the same, so the default doesn't matter. > ?* on elf/macho, with no attribute, gcc will compile it as 'public'; > ?* on elf/macho, I need __attribute__((visibility("hidden"))) to get gcc > ? to compile it as 'private'. Yes. For libraries this means you need to tag every function, but properly managed libraries have always required that. > It is true that if I were to write: > > #include "export.h" > PRIVATE int foo(void) { return 1; } > or > #include "export.h" > PUBLIC int foo(void) { return 1; } > > then the macro expansion will insert the appropriate attribute. > > But, this doesn't appear to help in the case that you are linking to a > static library that was compiled naively --- that is, with un-attributed > function declarations --- on both coff and elf/macho. If you are statically linking, then you're talking about one of two cases: 1) you are creating an executable: private/public doesn't matter. just pick one. 2) you are creating a dynamic library: this has always been an area requiring special attention since the static library needs to have been compiled (and tagged) appropriately. > No, I meant "x86-darwin". I can't recall, but I think the c-codegen works on x86-darwin. That script tries all codegens, however. Try just using the c-codegen? > Extending mlton.org/LibrarySupport with the list > of known good and known bad platforms would be helpful. The native x86 codegen on darwin is the only case I know of that is broken. I haven't tried it on the more exotic ports like solaris, but it should (at least in principle) work since those use the c-codegen. From wesley at terpstra.ca Sun Jun 14 13:25:49 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Sun Jun 14 13:26:23 2009 Subject: [MLton] Bug: -stop o -keep g Message-ID: <162de7480906141325u5463d30aw3a6efc48a46e6b58@mail.gmail.com> -stop o -keep g will cause every object file output to have the same name. MLton seems to have had this problem forever. From fluet at tti-c.org Sun Jun 14 13:41:45 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Sun Jun 14 13:41:49 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: References: Message-ID: On Mon, 11 Aug 2008, Wesley Terpstra wrote: > As reported by Nicolas Bertolotti, the escape function for shell arguments was > broken on MinGW. This patch corrects it. It might still be broken on cygwin. It appears to be broken on cygwin. > + (* In cygwin, according to what I read, \ should always become \\. > + * Furthermore, more characters cause escaping as compared to MinGW. > + * From what I read, " should become "", not \", but I leave the old > + * behaviour alone until someone runs the spawn regression. > + *) > + fun cygwinEscape y = > + if not (strContains " \t\"\r\n\f'" y) andalso y<>"" then y else > + concat ["\"", > String.translate > (fn #"\"" => "\\\"" | #"\\" => "\\\\" | x => String.str x) y, > - dquote] > + "\""] testing spawn 0a1,15 > FAIL: "hello\":"\"hello\\\"" > FAIL: :"" > FAIL: hi":"hi\"" > FAIL: evil > arg:"evil > arg" > FAIL: evil arg:"evil arg" > FAIL: evil arg:"evil arg" > FAIL: evil^Marg:"evil^Marg" > FAIL: evil^Larg:"evil^Larg" > FAIL: "bar\:"\"bar\\" > FAIL: bah \bar:"bah \\bar" > FAIL: ba h\\:"ba h\\\\" > FAIL: holy"smoke:"holy\"smoke" > FAIL: holy "smoke:"holy \"smoke" difference with -type-check true From wesley at terpstra.ca Sun Jun 14 13:50:55 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Sun Jun 14 14:25:34 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: References: Message-ID: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> On Sun, Jun 14, 2009 at 10:41 PM, Matthew Fluet wrote: > On Mon, 11 Aug 2008, Wesley Terpstra wrote: >> >> As reported by Nicolas Bertolotti, the escape function for shell arguments >> was >> broken on MinGW. This patch corrects it. It might still be broken on >> cygwin. > > It appears to be broken on cygwin. I don't have a cygwin build environment. This can't be a difficult thing to fix though? From fluet at tti-c.org Sun Jun 14 15:35:54 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Sun Jun 14 15:35:57 2009 Subject: [MLton] Bug: -stop o -keep g In-Reply-To: <162de7480906141325u5463d30aw3a6efc48a46e6b58@mail.gmail.com> References: <162de7480906141325u5463d30aw3a6efc48a46e6b58@mail.gmail.com> Message-ID: On Sun, 14 Jun 2009, Wesley W. Terpstra wrote: > -stop o -keep g will cause every object file output to have the same > name. Fixed. > MLton seems to have had this problem forever. Hardly. A simple test with mlton-20070826 shows that it does not have any problems with '-stop o -keep g'. A brief review of the history of /mlton/main/main.fun reveals that the naming of the ".o" files was changed in r6992 (in an effort to save ".o" files in the same directory as the output executable); the quieting of the unused argument (in an fairly trivial function) in r6997 should have suggested that the behavior of the function had changed in an non-trivial manner. From fluet at tti-c.org Sun Jun 14 15:41:05 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Sun Jun 14 15:41:09 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> References: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> Message-ID: On Sun, 14 Jun 2009, Wesley W. Terpstra wrote: > On Sun, Jun 14, 2009 at 10:41 PM, Matthew Fluet wrote: >> On Mon, 11 Aug 2008, Wesley Terpstra wrote: >>> >>> As reported by Nicolas Bertolotti, the escape function for shell arguments >>> was >>> broken on MinGW. This patch corrects it. It might still be broken on >>> cygwin. >> >> It appears to be broken on cygwin. > > I don't have a cygwin build environment. This can't be a difficult > thing to fix though? The commit and comments therein are by you; I don't know how they are meant to behave. From wesley at terpstra.ca Sun Jun 14 15:58:49 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Sun Jun 14 15:59:23 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: References: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> Message-ID: <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> On Mon, Jun 15, 2009 at 12:41 AM, Matthew Fluet wrote: > The commit and comments therein are by you; I don't know how they are meant > to behave. I don't know either. Escaping arguments was broken as reported by someone on the list. So I wrote that test-case to see if they are preserved correctly (they weren't). Then I made it pass on MinGW. I don't have a cygwin environment, so I tried to leave the original cygwin functionality intact (which was almost surely also buggy) and made reference to some comments I had read from someone who claimed to have implemented escaping on cygwin. To fix this, someone with cygwin needs to figure out what to do to arguments to have them unmolested. Google probably knows. If I recall correctly escaping for exec, spawn, and CreateProcess might not even all be the same. I can look into this if you like, but I would need to setup a cygwin/mlton environment. Currently I'm trying to debug my LLVM codegen. =D From henry.cejtin at sbcglobal.net Sun Jun 14 17:42:04 2009 From: henry.cejtin at sbcglobal.net (Henry Cejtin) Date: Sun Jun 14 17:42:37 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> References: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> Message-ID: <61209.79605.qm@web82403.mail.mud.yahoo.com> I don't know about Cygwin, but the standard Unix method to protect strings from shell expansion is as follows: Replace every single quote in the string with the string '\'' Put a single quote before the start and a single quote after the end. The idea is that you enter single quote mode at the start and exit at the end, but inside, any single quote is replaced by Single quote to exit single quote mode. \' to actually add a single quote, backslash protected. Single quote to re-enter single quote mode. I would be very surprised if the Cygwin shell didn't handle this correctly. From fw at deneb.enyo.de Sun Jun 14 22:27:55 2009 From: fw at deneb.enyo.de (Florian Weimer) Date: Sun Jun 14 22:28:00 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <61209.79605.qm@web82403.mail.mud.yahoo.com> (Henry Cejtin's message of "Sun, 14 Jun 2009 17:42:04 -0700 (PDT)") References: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> <61209.79605.qm@web82403.mail.mud.yahoo.com> Message-ID: <874ouioylg.fsf@mid.deneb.enyo.de> * Henry Cejtin: > The idea is that you enter single quote mode at the start and exit at the > end, but inside, any single quote is replaced by > > Single quote to exit single quote mode. > \' to actually add a single quote, backslash protected. > Single quote to re-enter single quote mode. > > I would be very surprised if the Cygwin shell didn't handle this correctly. It's not about the shell, it's about other Windows applications parsing the command line (I think). From ville at laurikari.net Mon Jun 15 07:18:11 2009 From: ville at laurikari.net (Ville Laurikari) Date: Mon Jun 15 07:18:16 2009 Subject: [MLton] Bug: -stop o -keep g In-Reply-To: References: <162de7480906141325u5463d30aw3a6efc48a46e6b58@mail.gmail.com> Message-ID: <20090615141811.GA12711@laurikari.net> On Sun, Jun 14, 2009 at 05:35:54PM -0500, Matthew Fluet wrote: > /mlton/main/main.fun reveals that the naming of the ".o" files was > changed in r6992 (in an effort to save ".o" files in the same directory > as the output executable); the quieting of the unused argument (in an Indeed, I broke it for the case when the output directory is not specified. It appears I never even tested it that way; my makefiles alwayss specify the directory for the output. I'm sorry. -- Ville From fluet at tti-c.org Mon Jun 15 08:21:44 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Mon Jun 15 08:21:49 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <874ouioylg.fsf@mid.deneb.enyo.de> References: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> <61209.79605.qm@web82403.mail.mud.yahoo.com> <874ouioylg.fsf@mid.deneb.enyo.de> Message-ID: On Mon, 15 Jun 2009, Florian Weimer wrote: >> The idea is that you enter single quote mode at the start and exit at the >> end, but inside, any single quote is replaced by >> >> Single quote to exit single quote mode. >> \' to actually add a single quote, backslash protected. >> Single quote to re-enter single quote mode. >> >> I would be very surprised if the Cygwin shell didn't handle this correctly. > > It's not about the shell, it's about other Windows applications > parsing the command line (I think). It's about the Win32 spawn* functions (and possibly the CreateProcess function), which provide fork/exec-like functionality. The issue (as I understand it) is that the char **argv argument passed to spawnv{,p}{,e} becomes the const char **argv argument passed to main of the created process. One doesn't expect the contents of those character arrays to be changed from spawn{,p}{,e} to main (that is, one shouldn't need to do any escaping at all and one certainly doesn't need to for the *nix exec{,p}{,e} functions), but there is some evidence that MinGW does (or un-does?) escaping of the arguments. From wesley at terpstra.ca Mon Jun 15 10:03:10 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Mon Jun 15 10:03:43 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: References: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> <61209.79605.qm@web82403.mail.mud.yahoo.com> <874ouioylg.fsf@mid.deneb.enyo.de> Message-ID: <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> On Mon, Jun 15, 2009 at 5:21 PM, Matthew Fluet wrote: > It's about the Win32 spawn* functions (and possibly the CreateProcess > function), which provide fork/exec-like functionality. > > The issue (as I understand it) is that the ?char **argv ?argument passed to > spawnv{,p}{,e} becomes the ?const char **argv ?argument passed to main of > the created process. ?One doesn't expect the contents of those character > arrays to be changed from spawn{,p}{,e} to main (that is, one shouldn't need > to do any escaping at all and one certainly doesn't need to for the *nix > exec{,p}{,e} functions), but there is some evidence that MinGW does (or > un-does?) escaping of the arguments. The root problem is that windows does not have an **argv. That's a unix convention. Windows programs receive a single flat array (see CreateProcess). The crt has code which parses and splits this flat array to emulate argv functionality. exec() and spawn() functions have code which pastes the arguments together. Unfortunately, a long-standing bug in windows is that these pasting and parsing operations are NOT compatible. The MinGW (/ windows CRT) version of pasting is simply ("a", "b", "c") -> "a b c". Obviously this breaks for ("a b", "c") -> "a b c". That's why MinGW needs to escape arguments to spawn as well as CreateProcess. The escaping function in mlton/process.sml was hand-crafted to match the parsing done the windows crt at program start-up. The launchWithCreate method similarly combines ("a b", "c") -> "a b c", but after it escapes it's arguments the same as it would for spawn(). Cygwin has to paste and parse arguments just as MinGW does, however, it's possible that the cygwin parsing/pasting actually matches (but I wouldn't bet on this). If they do match, then no escaping is needed for spawn. However, like MinGW, Cygwin sometimes calls CreateProcess. The arguments will need to be escaped and pasted together in whatever way matches the cygwin runtime. I don't know how the cygwin runtime parses it's single-argument, but was I read said: (* In cygwin, according to what I read, \ should always become \\. * Furthermore, more characters cause escaping as compared to MinGW. * From what I read, " should become "", not \", but I leave the old * behaviour alone until someone runs the spawn regression. *) However, I didn't (and don't) have a cygwin to poke for the parsing algorithm used. From fw at deneb.enyo.de Mon Jun 15 10:06:33 2009 From: fw at deneb.enyo.de (Florian Weimer) Date: Mon Jun 15 10:06:39 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: (Matthew Fluet's message of "Mon, 15 Jun 2009 10:21:44 -0500 (CDT)") References: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> <61209.79605.qm@web82403.mail.mud.yahoo.com> <874ouioylg.fsf@mid.deneb.enyo.de> Message-ID: <87eitlh1eu.fsf@mid.deneb.enyo.de> * Matthew Fluet: > The issue (as I understand it) is that the char **argv argument > passed to spawnv{,p}{,e} becomes the const char **argv argument > passed to main of the created process. Windows hasn't got an argv array, there is just a single argument string. The argv array needs to be serialized and deserialized, and you need matching conventions for that. (Unfortunately, I don't know what the conventions are; they might not even exist.) From fluet at tti-c.org Mon Jun 15 16:13:18 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Mon Jun 15 16:13:25 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> References: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> <61209.79605.qm@web82403.mail.mud.yahoo.com> <874ouioylg.fsf@mid.deneb.enyo.de> <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> Message-ID: On Mon, 15 Jun 2009, Wesley W. Terpstra wrote: > On Mon, Jun 15, 2009 at 5:21 PM, Matthew Fluet wrote: >> It's about the Win32 spawn* functions (and possibly the CreateProcess >> function), which provide fork/exec-like functionality. >> >> The issue (as I understand it) is that the ?char **argv ?argument passed to >> spawnv{,p}{,e} becomes the ?const char **argv ?argument passed to main of >> the created process. ?One doesn't expect the contents of those character >> arrays to be changed from spawn{,p}{,e} to main (that is, one shouldn't need >> to do any escaping at all and one certainly doesn't need to for the *nix >> exec{,p}{,e} functions), but there is some evidence that MinGW does (or >> un-does?) escaping of the arguments. > > The root problem is that windows does not have an **argv. That's a > unix convention. Windows programs receive a single flat array (see > CreateProcess). The crt has code which parses and splits this flat > array to emulate argv functionality. exec() and spawn() functions have > code which pastes the arguments together. Unfortunately, a > long-standing bug in windows is that these pasting and parsing > operations are NOT compatible. > > The MinGW (/ windows CRT) version of pasting is simply ("a", "b", "c") > -> "a b c". Obviously this breaks for ("a b", "c") -> "a b c". That's > why MinGW needs to escape arguments to spawn as well as CreateProcess. > The escaping function in mlton/process.sml was hand-crafted to match > the parsing done the windows crt at program start-up. The > launchWithCreate method similarly combines ("a b", "c") -> "a b c", > but after it escapes it's arguments the same as it would for spawn(). > > Cygwin has to paste and parse arguments just as MinGW does, however, > it's possible that the cygwin parsing/pasting actually matches (but I > wouldn't bet on this). If they do match, then no escaping is needed > for spawn. However, like MinGW, Cygwin sometimes calls CreateProcess. > The arguments will need to be escaped and pasted together in whatever > way matches the cygwin runtime. I don't know how the cygwin runtime > parses it's single-argument, but was I read said: > > (* In cygwin, according to what I read, \ should always become \\. > * Furthermore, more characters cause escaping as compared to MinGW. > * From what I read, " should become "", not \", but I leave the old > * behaviour alone until someone runs the spawn regression. > *) > > However, I didn't (and don't) have a cygwin to poke for the parsing > algorithm used. While I can understand the marshalling/unmarshalling of arguments through a single string, what I'm unclear on is where Cygwin and MinGW interpose their own conventions. That is, spawn{,p}{,e} and CreateProcess are Win32 functions (right?) --- yet Cygwin and MinGW interpose their own version that (may or may not) munge the arguments (before calling the "real" spawn{,p}{,e} and CreateProcess)? Similarly, starting a program from the console should begin execution at main; though, technically, it is wherever the loader begins execution, so Cygwin and MinGW could provide their own _start (or whatever symbol it is in Windows) that (may or may not) unmunge the arguments before calling main. Of course, when calling spawn{,p}{,e} or CreateProcess from a Cygwin or MinGW program, it can't know whether the called executable is itself a Cygwin, MinGW, or plain Windows program. Similarly, when starting up, a Cygwin or MinGW executable can't know whether it was called via spawn{,p}{,e} or CreateProcess by a Cygwin, MinGW, or plain Windows (including CMD.exe) program. So, I don't see why it is sensible for Cygwin or MinGW to munge/unmunge arguments at all, since it can't know what was/will-be done on the other end. From wesley at terpstra.ca Mon Jun 15 19:06:45 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Mon Jun 15 19:07:19 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: References: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> <61209.79605.qm@web82403.mail.mud.yahoo.com> <874ouioylg.fsf@mid.deneb.enyo.de> <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> Message-ID: <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> On Tue, Jun 16, 2009 at 1:13 AM, Matthew Fluet wrote: > While I can understand the marshalling/unmarshalling of arguments through a > single string, what I'm unclear on is where Cygwin and MinGW interpose their > own conventions. ?That is, spawn{,p}{,e} and CreateProcess are Win32 > functions (right?) --- yet Cygwin and MinGW interpose their own version that > (may or may not) munge the arguments (before calling the "real" > spawn{,p}{,e} and CreateProcess)? There is a difference between a system library function and a kernel call. Cygwin applications are not linked against the windows CRT. All their calls go through cygwin1.dll. That means CreateProcess, spawn, exec, ... everything is run out of the cygwin1.dll. I'm not even certain that 'spawn' corresponds to a kernel call since you could implement it using methods like CreateProcess. > Similarly, starting a program from the > console should begin execution at main; though, technically, it is wherever > the loader begins execution, so Cygwin and MinGW could provide their own > _start (or whatever symbol it is in Windows) that (may or may not) unmunge > the arguments before calling main. Correct. Main is not the start of a program, crt1.o is. > Of course, when calling spawn{,p}{,e} or CreateProcess from a Cygwin or > MinGW program, it can't know whether the called executable is itself a > Cygwin, MinGW, or plain Windows program. Actually, that's false. Cygwin programs recognize each other and do special magic to communicate. For instance, there is no kill() call in windows. Yet cygwin processes are able to kill their children and fire off a signal handler. How? They secretly open a pipe between the processes to carry signaling information. Similarly fork() does extremely frightening voodoo where it copies memory from it's "parent" process to emulate unix. > So, I don't see why it is sensible for Cygwin > or MinGW to munge/unmunge arguments at all, since it can't know what > was/will-be done on the other end. Well, I won't debate whether it's sensible, but that is how it works. It does seem likely that the string eventually delivered to a CreateProcess kernel call is escaped similarly for both cygwin and mingw (though as I mentioned, it is possible this is false). I wouldn't be surprised if the spawn() function on cygwin1.dll requires no escaping at all. They're in a position to fix this bug for their applications. The CreateProcess will definitely need some sort of escaping, though. It might be different than MinGW, because the library re-munges the arguments. I don't know. Someone who has it has to reverse engineer it or find example code to test via google. From fluet at tti-c.org Tue Jun 16 08:36:33 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Tue Jun 16 08:36:40 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> References: <162de7480906141350o350faa94pc20e6b2bbb52014f@mail.gmail.com> <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> <61209.79605.qm@web82403.mail.mud.yahoo.com> <874ouioylg.fsf@mid.deneb.enyo.de> <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> Message-ID: On Tue, 16 Jun 2009, Wesley W. Terpstra wrote: > On Tue, Jun 16, 2009 at 1:13 AM, Matthew Fluet wrote: >> So, I don't see why it is sensible for Cygwin >> or MinGW to munge/unmunge arguments at all, since it can't know what >> was/will-be done on the other end. > > Well, I won't debate whether it's sensible, but that is how it works. > It does seem likely that the string eventually delivered to a > CreateProcess kernel call is escaped similarly for both cygwin and > mingw (though as I mentioned, it is possible this is false). O.k., then what is the rationale for the Basis Library implementation to add another level of munging on top of that done by Cygwin or MinGW? > I wouldn't be surprised if the spawn() function on cygwin1.dll > requires no escaping at all. They're in a position to fix this bug for > their applications. My testing seems to support this claim. The /regression/args-spawn.sml test (renamed from spawn.sml and using CommandLine.name() to fetch the executable name) works fine, including when run from a path with spaces and when invoked with a full path (with spaces). Of course, this is using a Cygwin program to spawn a Cygwin program (that happens to be itself); I'm not sure whether this configuration is meant to support a Cygwin program spawning a Windows CRT program? It seems to: [fluet@winxp-cygwin tmp]$ cat args-spawn.sml val cmd = CommandLine.name () val _ = print (concat ["cmd: ", cmd, "\n"]) val args = CommandLine.arguments () val _ = foldl (fn (arg,()) => print (concat ["arg: ", arg, "\n"])) () args open Posix.Process open MLton.Process val () = let val pid = spawn {path = hd args, args = args} val status = waitpid (W_CHILD pid, []) in () end [fluet@winxp-cygwin tmp]$ ./args-spawn C:\\Documents\ and\ Settings\\fluet\\My\ Documents\\My\ Programs\\finger.exe -l fluet@ttic.uchicago.edu cmd: ./args-spawn arg: C:\Documents and Settings\fluet\My Documents\My Programs\finger.exe arg: -l arg: fluet@ttic.uchicago.edu [nagoya.uchicago.edu] > Finger: connect::Connection refused [fluet@winxp-cygwin tmp]$ ./args-spawn /cygdrive/c/Documents\ and\ Settings/fluet/My\ Documents/My\ Programs/finger.exe -l fluet@ttic.uchicago.edu cmd: ./args-spawn arg: /cygdrive/c/Documents and Settings/fluet/My Documents/My Programs/finger.exe arg: -l arg: fluet@ttic.uchicago.edu [nagoya.uchicago.edu] > Finger: connect::Connection refused [fluet@winxp-cygwin tmp]$ ./args-spawn /cygdrive/c/Documents\ and\ Settings/fluet/My\ Documents/My\ Programs/finger.exe '-l fluet@ttic.uchicago.edu' cmd: ./args-spawn arg: /cygdrive/c/Documents and Settings/fluet/My Documents/My Programs/finger.exe arg: -l fluet@ttic.uchicago.edu Displays information about a user on a specified system running the Finger service. Output varies based on the remote system. FINGER [-l] [user]@host [...] -l Displays information in long list format. user Specifies the user you want information about. Omit the user parameter to display information about all users on the specifed host. @host Specifies the server on the remote system whose users you want information about. Note that in the last case, the single argument with an embedded space is delivered whole to the program, prompting the help listing that is given on any invalid argument. I also note that the invoked program executes whether I use finger.exe or finger. This would appear to be handled by the cygwin1.dll implementation of spawne, since the ML string is passed unmodified through to the spawne function. This seems like reasonable behavior for MLton.Process.spawn{,e,p}. > The CreateProcess will definitely need some sort > of escaping, though. Ok, because the CreateProcess function takes a single string for the arguments, so the munging needs to be done on the ML side. And, the current claim is that the "default" munging for Cygwin, MinGW, and (normal) Windows CRT are all different? Or, rather, that there is no "standard" munging? In any case, there seems to be a problem with CreateProcess, possibly independent of the argument munging. The child process seems to to be created: [fluet@winxp-cygwin tmp]$ cat args-create.sml val cmd = CommandLine.name () val _ = print (concat ["cmd: ", cmd, "\n"]) val args = CommandLine.arguments () val _ = foldl (fn (arg,()) => print (concat ["arg: ", arg, "\n"])) () args open MLton.Process val () = let val pid = create {args = tl args, env = NONE, path = hd args, stderr = Param.self, stdin = Param.self, stdout = Param.self} val status = reap pid in () end [fluet@winxp-cygwin tmp]$ ./args-create 'C:\WINDOWS\system32\finger.exe' cmd: ./args-create arg: C:\WINDOWS\system32\finger.exe unhandled exception: SysErr: No child processes [child] From wesley at terpstra.ca Wed Jun 17 01:56:33 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Wed Jun 17 01:57:09 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> References: <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> <61209.79605.qm@web82403.mail.mud.yahoo.com> <874ouioylg.fsf@mid.deneb.enyo.de> <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> Message-ID: <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> On Tue, Jun 16, 2009 at 5:36 PM, Matthew Fluet wrote: > > O.k., then what is the rationale for the Basis Library implementation to add another level of munging on top of that done by Cygwin or MinGW? The extra munging for spawn() is to work around the windows CRT bug for MinGW. This way you get the usual unix behaviour where the arguments you pass come back unmolested as CommandLine.arguments (). The CreateProcess munging serves a similar purpose. >> I wouldn't be surprised if the spawn() function on cygwin1.dll >> requires no escaping at all. They're in a position to fix this bug for >> their applications. > > My testing seems to support this claim. ?The /regression/args-spawn.sml test (renamed from spawn.sml and using CommandLine.name() to fetch the executable name) works fine, including when run from a path with spaces and when invoked with a full path (with spaces). Excellent, so the cygwin1.dll spawn() and exec() work. Just remove the munging of arguements to spawn/exec for cygwin. The MinGW munging will need to stay, of course. > > Of course, this is using a Cygwin program to spawn a Cygwin program (that happens to be itself); I'm not sure whether this configuration is meant to support a Cygwin program spawning a Windows CRT program? ?It seems to That suggests that both the windows CRT and cygwin1.dll parse their single argument string the same way. > > Ok, because the CreateProcess function takes a single string for the arguments, so the munging needs to be done on the ML side. ?And, the current claim is that the "default" munging for Cygwin, MinGW, and (normal) Windows CRT are all different? Or, rather, that there is no "standard" munging? The current claim is that they *might* be different, but are probably the same. Your testing of spawn() on finger certainly points to this. > > In any case, there seems to be a problem with CreateProcess, possibly independent of the argument munging. ?The child process seems to to be created: > > [fluet@winxp-cygwin tmp]$ cat args-create.sml > val cmd = CommandLine.name () > val _ = print (concat ["cmd: ", cmd, "\n"]) > val args = CommandLine.arguments () > val _ = foldl (fn (arg,()) => print (concat ["arg: ", arg, "\n"])) () args > > open MLton.Process > val () = > ? let > ? ? ?val pid = > ? ? ? ? create {args = tl args, > ? ? ? ? ? ? ? ? env = NONE, > ? ? ? ? ? ? ? ? path = hd args, > ? ? ? ? ? ? ? ? stderr = Param.self, > ? ? ? ? ? ? ? ? stdin = Param.self, > ? ? ? ? ? ? ? ? stdout = Param.self} > ? ? ?val status = reap pid > ? in > ? ? ?() > ? end > [fluet@winxp-cygwin tmp]$ ./args-create 'C:\WINDOWS\system32\finger.exe' > cmd: ./args-create > arg: C:\WINDOWS\system32\finger.exe > unhandled exception: SysErr: No child processes [child] I suspect this is why: > /* 20070822, fluet: The following 'pure win32' implementation of cwait > ?* no longer works on recent Cygwin versions.? It always takes the > ?* {errno = ECHILD; return -1} branch, even when the child process > ?* exists. > ?*/ > > /* Cygwin replaces cwait with a call to waitpid. > ?* waitpid only works when the process was created by cygwin and there > ?* is a secret magical pipe for sending signals and exit statuses over. > ?* Screw that. We implement our own cwait using pure win32. > ?*/ > /* C_Errno_t(C_PId_t) MLton_Process_cwait(C_PId_t pid, Ref(C_Status_t) status) { The code you removed was designed to work when reaping child cygwin processes in addition to native applications (like finger). Perhaps restore this function? From fluet at tti-c.org Wed Jun 17 05:25:36 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Wed Jun 17 05:25:39 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> References: <162de7480906141558vaa15558m54e20607c137799e@mail.gmail.com> <61209.79605.qm@web82403.mail.mud.yahoo.com> <874ouioylg.fsf@mid.deneb.enyo.de> <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> Message-ID: On Wed, 17 Jun 2009, Wesley W. Terpstra wrote: > On Tue, Jun 16, 2009 at 5:36 PM, Matthew Fluet wrote: >> In any case, there seems to be a problem with CreateProcess, possibly independent of the argument munging. ?The child process seems to to be created: >> >> [fluet@winxp-cygwin tmp]$ ./args-create 'C:\WINDOWS\system32\finger.exe' >> cmd: ./args-create >> arg: C:\WINDOWS\system32\finger.exe >> unhandled exception: SysErr: No child processes [child] > > I suspect this is why: > >> /* 20070822, fluet: The following 'pure win32' implementation of cwait >> ?* no longer works on recent Cygwin versions.? It always takes the >> ?* {errno = ECHILD; return -1} branch, even when the child process >> ?* exists. >> ?*/ >> >> /* Cygwin replaces cwait with a call to waitpid. >> ?* waitpid only works when the process was created by cygwin and there >> ?* is a secret magical pipe for sending signals and exit statuses over. >> ?* Screw that. We implement our own cwait using pure win32. >> ?*/ >> /* C_Errno_t(C_PId_t) MLton_Process_cwait(C_PId_t pid, Ref(C_Status_t) status) { > > The code you removed was designed to work when reaping child cygwin > processes in addition to native applications (like finger). Perhaps > restore this function? It seems that there are multiple, independent(?) issues. Certainly one is that a Cygwin program doesn't like to be started via CreateProcess with a non-NULL lpEnvironment argument; I can't find any documentation for this, but I can definitely demonstrate it with some short C programs. The /runtime/platform/windows.c Windows_Process_create function always uses a non-NULL lpEnvironment argument; in the case that the child is meant to inherit its parent's environment, the whole environment is copied via Posix.ProcEnv.environ. The other issue is that I don't believe that the int/pid_t returned by spawn{,p}{,e} is "the same" as the LPROCESS_INFORMATION.hProcess returned by CreateProcess. I don't believe that the old code really works when reaping child processes referenced by their spawn{,p}{,e} pid_t (whether or not they are cygwin processes). This is consistent with my comments made at the time of the last release: at that time, there were no regressions that tested MLton.Process.{spawn{,e,p},create} but the compiler itself used MLton.Process.spawn to invoke gcc and MLton_Process_cwait for the implementation of Posix.Process.waitpid to synchronize on the termination of gcc. This is also consistent with my current experiments, where finger.exe launched with MLton.Process.spawn could by synchronized on via Posix.Process.waitpid. On the other hand, we probably do need GetExitCodeProcess to synchronize on the termination of a program launched via CreateProcess (for which we have only the LPROCESS_INFORMATION.hProcess handle). Again, this is consistent with my experiments above, where sending the LPROCESS_INFORMATION.hProcess handle to MLton_Process_cwait yields the "no child process [child]" error. It turns out that the MLton.Process.create result doesn't actually provide a means to extract the process id, although MLton.Process.Child.getPid would be trivial. This is a good thing, because it means that MLton.Process.reap can always use the old MLton_Process_cwait (with GetExitCodeProcess), because it will necessarily be a LPROCESS_INFORMATION.hProcess handle, while Posix.Process.waitpid can use the new MLton_Process_cwait, because it will necessarily be a spawn{,e}{,p} pid_t. (Actually, I think the Cygwin waitpid does work fine with a spawn{,e}{,p} pid_t, so there doesn't need to be a special case.) From wesley at terpstra.ca Wed Jun 17 06:48:13 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Wed Jun 17 06:48:47 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: References: <874ouioylg.fsf@mid.deneb.enyo.de> <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> Message-ID: <162de7480906170648u60f51716s95e6a5f893c5a9ac@mail.gmail.com> On Wed, Jun 17, 2009 at 2:25 PM, Matthew Fluet wrote: > It seems that there are multiple, independent(?) issues. ?Certainly one is > that a Cygwin program doesn't like to be started via CreateProcess with a > non-NULL lpEnvironment argument; I can't find any documentation for this, > but I can definitely demonstrate it with some short C programs. ... great. Is this a cygwin bug? This seems a pretty serious issue. > The other issue is that I don't believe that the int/pid_t returned by > spawn{,p}{,e} is "the same" as the LPROCESS_INFORMATION.hProcess returned by > CreateProcess. ?I don't believe that the old code really works when reaping > child processes referenced by their spawn{,p}{,e} pid_t (whether or not they > are cygwin processes). I believe the reason this used to work is the same reason that you outline below. MLton_Process_cwait was used only to match CreateProcess calls. The cygwin waitpid was used to match calls to spawn/fork. We had implicitly overloaded the C type, but due to the uses being disjoint there was no problem. > It turns out that the MLton.Process.create result doesn't actually provide a > means to extract the process id, although MLton.Process.Child.getPid would > be trivial. I believe this was intentional. At any rate, the solution seems clear except that lpEnvironment => cygwin implosion. One possible work-around would be to temporarily change the calling process' environment before invoking CreateProcess. I don't find anyone else reporting this problem on google, though. Are you sure? From fluet at tti-c.org Wed Jun 17 08:45:20 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Wed Jun 17 08:45:24 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <162de7480906170648u60f51716s95e6a5f893c5a9ac@mail.gmail.com> References: <874ouioylg.fsf@mid.deneb.enyo.de> <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> <162de7480906170648u60f51716s95e6a5f893c5a9ac@mail.gmail.com> Message-ID: On Wed, 17 Jun 2009, Wesley W. Terpstra wrote: > On Wed, Jun 17, 2009 at 2:25 PM, Matthew Fluet wrote: >> It seems that there are multiple, independent(?) issues. ?Certainly one is >> that a Cygwin program doesn't like to be started via CreateProcess with a >> non-NULL lpEnvironment argument; I can't find any documentation for this, >> but I can definitely demonstrate it with some short C programs. > > ... great. Is this a cygwin bug? This seems a pretty serious issue. > > At any rate, the solution seems clear except that lpEnvironment => > cygwin implosion. One possible work-around would be to temporarily > change the calling process' environment before invoking CreateProcess. > I don't find anyone else reporting this problem on google, though. Are > you sure? Here's my testing: [fluet@winxp-cygwin tmp]$ cat child.c #include #include int main(int argc, const char* argv[]) { fprintf(stderr, "HW! [stderr]\n"); fprintf(stdout, "HW! [stdout]\n"); exit (5); } [fluet@winxp-cygwin tmp]$ gcc -Wall -o child.exe child.c [fluet@winxp-cygwin tmp]$ cat parent.c #include #include #define BUFSIZE 4096 int main (int argc, const char* argv[]) { LPCTSTR modnamep = NULL; TCHAR modname[BUFSIZE]; if (argc > 1) { lstrcpy(modname, argv[1]); printf("modname = %s\n", modname); modnamep = modname; } else { return 0; } LPTSTR cmdlinep = NULL; TCHAR cmdline[BUFSIZE]; if (argc > 2) { lstrcpy(cmdline, argv[2]); printf("cmdline = %s\n", cmdline); cmdlinep = cmdline; } LPTSTR envp = NULL; TCHAR env[BUFSIZE]; if (argc > 3) { envp = env; int i = 3; LPSTR curp = env; while (argc > i) { lstrcpy (curp, argv[i]); printf("env[%i] = %s\n", i - 3, argv[i]); curp += lstrlen(curp) + 1; i++; } *curp = (TCHAR)0; } STARTUPINFO siStartInfo; ZeroMemory(&siStartInfo, sizeof(STARTUPINFO)); siStartInfo.cb = sizeof(STARTUPINFO); PROCESS_INFORMATION piProcInfo; ZeroMemory(&piProcInfo, sizeof(PROCESS_INFORMATION)); BOOL bSuccess = FALSE; printf("CreateProcess\n"); bSuccess = CreateProcess(modnamep, // module name cmdlinep, // command line NULL, // process security attributes NULL, // primary thread security attributes TRUE, // handles are inherited 0, // creation flags envp, // use parent's environment NULL, // use parent's current directory &siStartInfo, // STARTUPINFO pointer &piProcInfo); // receives PROCESS_INFORMATION if ( ! bSuccess ) { printf("CreateProcess failed (%ld).\n", (long int)(GetLastError())); return 0; } printf("piProcInfo.hProcess = %ld\n", (long int)(piProcInfo.hProcess)); DWORD status = 0; WaitForSingleObject(piProcInfo.hProcess, INFINITE); GetExitCodeProcess(piProcInfo.hProcess, &status); printf("status: %ld\n", status); CloseHandle(piProcInfo.hProcess); CloseHandle(piProcInfo.hThread); return 1; } [fluet@winxp-cygwin tmp]$ gcc -Wall -o parent.exe parent.c [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\WINDOWS\system32\finger.exe' 'C:\WINDOWS\system32\finger.exe -l fluet@mlton.org' modname = C:\WINDOWS\system32\finger.exe cmdline = C:\WINDOWS\system32\finger.exe -l fluet@mlton.org CreateProcess piProcInfo.hProcess = 1856 [mlton.org] > Finger: connect::Connection refused status: 0 [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\WINDOWS\system32\finger.exe' 'C:\WINDOWS\system32\finger.exe -l fluet@mlton.org' 'FOO=bar' modname = C:\WINDOWS\system32\finger.exe cmdline = C:\WINDOWS\system32\finger.exe -l fluet@mlton.org env[0] = FOO=bar CreateProcess piProcInfo.hProcess = 1856 Unknown host: mlton.org status: 0 So, finger.exe is unhappy about not having the full environment (or, rather, the DNS resolver is unhappy), but the command runs as expected. [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\WINDOWS\system32\cmd.exe' 'C:\WINDOWS\system32\cmd.exe /c set' modname = C:\WINDOWS\system32\cmd.exe cmdline = C:\WINDOWS\system32\cmd.exe /c set CreateProcess piProcInfo.hProcess = 1856 COMSPEC=C:\WINDOWS\system32\cmd.exe PATH=C:\cygwin\home\fluet\bin;...elided...;c:\WINDOWS\system32\WindowsPowerShell\v1.0;C:\cygwin\bin PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.JS;.WS PROMPT=$P$G SYSTEMDRIVE=C: SYSTEMROOT=C:\WINDOWS WINDIR=C:\WINDOWS status: 0 [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\WINDOWS\system32\cmd.exe' 'C:\WINDOWS\system32\cmd.exe /c set' 'FOO=bar' modname = C:\WINDOWS\system32\cmd.exe cmdline = C:\WINDOWS\system32\cmd.exe /c set env[0] = FOO=bar CreateProcess piProcInfo.hProcess = 1856 COMSPEC=C:\WINDOWS\system32\cmd.exe FOO=bar PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.JS;.WS PROMPT=$P$G status: 0 So, cmd.exe sees the variables named in an explicit lpEnvironment (and loses the environment of the parent; I suspect that COMSPEC, PATHEXT, and PROPMT are baked into cmd.exe). On the other hand: [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\cygwin\home\fluet\tmp\child.exe' 'C:\cygwin\home\fluet\tmp\child.exe 1 2 3' modname = C:\cygwin\home\fluet\tmp\child.exe cmdline = C:\cygwin\home\fluet\tmp\child.exe 1 2 3 CreateProcess piProcInfo.hProcess = 1848 HW! [stderr] HW! [stdout] status: 5 [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\cygwin\home\fluet\tmp\child.exe' 'C:\cygwin\home\fluet\tmp\child.exe 1 2 3' 'FOO=bar' modname = C:\cygwin\home\fluet\tmp\child.exe cmdline = C:\cygwin\home\fluet\tmp\child.exe 1 2 3 env[0] = FOO=bar CreateProcess piProcInfo.hProcess = 1848 status: -1073741515 A little more investigation reveals that a cygwin program is unhappy unless C:\cygwin\bin is in the PATH environment variable: [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\cygwin\home\fluet\tmp\child.exe' 'C:\cygwin\home\fluet\tmp\child.exe 1 2 3' 'PATH=C:\cygwin\bin' modname = C:\cygwin\home\fluet\tmp\child.exe cmdline = C:\cygwin\home\fluet\tmp\child.exe 1 2 3 env[0] = PATH=C:\cygwin\bin CreateProcess piProcInfo.hProcess = 1844 HW! [stderr] HW! [stdout] status: 5 [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\cygwin\home\fluet\tmp\child.exe' 'C:\cygwin\home\fluet\tmp\child.exe 1 2 3' 'PATH=C:\WINDOWS\system32' modname = C:\cygwin\home\fluet\tmp\child.exe cmdline = C:\cygwin\home\fluet\tmp\child.exe 1 2 3 env[0] = PATH=C:\WINDOWS\system32 CreateProcess piProcInfo.hProcess = 1848 status: -1073741515 And, with the exception of needing C:\cygwin\bin in the PATH, we are able to manage the environment: [fluet@winxp-cygwin tmp]$ cat environ.sml val () = List.app (fn s => (print (String.toString s); print "\n")) (Posix.ProcEnv.environ ()) [fluet@winxp-cygwin tmp]$ mlton -output environ.exe environ.sml [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\cygwin\home\fluet\tmp\environ.exe' 'C:\cygwin\home\fluet\tmp\environ.exe' modname = C:\cygwin\home\fluet\tmp\environ.exe cmdline = C:\cygwin\home\fluet\tmp\environ.exe CreateProcess piProcInfo.hProcess = 1848 PATH=/home/fluet/bin:...elided...:/cygdrive/c/WINDOWS/system32/WindowsPowerShell/v1.0:/usr/bin SYSTEMDRIVE=C: SYSTEMROOT=C:\\WINDOWS WINDIR=C:\\WINDOWS TERM=cygwin HOME=/home/fluet status: 0 [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\cygwin\home\fluet\tmp\environ.exe' 'C:\cygwin\home\fluet\tmp\environ.exe' 'PATH=C:\cygwin\bin' 'FOO=bar' 'BAR=foo' modname = C:\cygwin\home\fluet\tmp\environ.exe cmdline = C:\cygwin\home\fluet\tmp\environ.exe env[0] = PATH=C:\cygwin\bin env[1] = FOO=bar env[2] = BAR=foo CreateProcess piProcInfo.hProcess = 1848 PATH=/usr/bin FOO=bar BAR=foo TERM=cygwin HOME=/home/fluet status: 0 This also explains why the current implementation of create, that copies the whole environment via Posix.ProcEnv.environ (which would appear to have a PATH entry with the appropriate directory) fails. The 'problem' is that the PATH environment variable returned by Posix.ProcEnv.environ has been cygwinified: it has '/usr/bin', not 'C:\cygwin\bin'. Of course, '/usr/bin' means nothing to CreateProcess/Windows, which I'm guessing is using the PATH to find cygwin1.dll (in C:\cygwin\bin) for a cygwin program. So, I suspect that the exit status -1073741515 corresponds to a dll load failure. finger.exe and cmd.exe are only dependent upon core system dlls, which presumably can be found even in the absence of a PATH. All in all, I think the way forward is clear: * create {env = NONE, ...} should result in calling CreateProcess with a NULL lpEnvironment; this is, by far, the most common case I would think. * create {env = SOME [...], ...} should behave as currently implemented; the user is responsible for establishing the correct environment (it is just the case that the correct environment for a Cygwin program is subtle) From wesley at terpstra.ca Wed Jun 17 09:48:04 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Wed Jun 17 09:48:45 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: References: <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> <162de7480906170648u60f51716s95e6a5f893c5a9ac@mail.gmail.com> Message-ID: <162de7480906170948y33dd6c57uad4a8860669a7c8@mail.gmail.com> On Wed, Jun 17, 2009 at 5:45 PM, Matthew Fluet wrote: > This also explains why the current implementation of create, that copies the > whole environment via Posix.ProcEnv.environ (which would appear to have a > PATH entry with the appropriate directory) fails. ?The 'problem' is that the > PATH environment variable returned by Posix.ProcEnv.environ has been > cygwinified: it has '/usr/bin', not 'C:\cygwin\bin'. This seems a problem. >?Of course, '/usr/bin' > means nothing to CreateProcess/Windows, which I'm guessing is using the PATH > to find cygwin1.dll (in C:\cygwin\bin) for a cygwin program. Ahh... Of course! > All in all, I think the way forward is clear: > > ?* create {env = NONE, ...} should result in calling CreateProcess with a > ? NULL lpEnvironment; this is, by far, the most common case I would > ? think. Why does this work? I don't understand how a cygwin program can call another cygwin program with env = NONE. I would have expected this means PATH still has /usr/bin. Unless the environment provided to MLton from cygwin is a lie that has been translated from the real environment and the real environment is what gets copied by CreateProcess. If you set an environment variable in the calling cygwin process, does the CreateProcess cygwin child see it? > ?* create {env = SOME [...], ...} should behave as currently implemented; > ? the user is responsible for establishing the correct environment (it is > ? just the case that the correct environment for a Cygwin program is > ? subtle) Indeed. From fluet at tti-c.org Wed Jun 17 12:47:33 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Wed Jun 17 12:47:39 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <162de7480906170948y33dd6c57uad4a8860669a7c8@mail.gmail.com> References: <162de7480906151003n6036521eqed699ff03ca114bf@mail.gmail.com> <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> <162de7480906170648u60f51716s95e6a5f893c5a9ac@mail.gmail.com> <162de7480906170948y33dd6c57uad4a8860669a7c8@mail.gmail.com> Message-ID: On Wed, 17 Jun 2009, Wesley W. Terpstra wrote: > On Wed, Jun 17, 2009 at 5:45 PM, Matthew Fluet wrote: >> ?* create {env = NONE, ...} should result in calling CreateProcess with a >> ? NULL lpEnvironment; this is, by far, the most common case I would >> ? think. > > Why does this work? I don't understand how a cygwin program can call > another cygwin program with env = NONE. I would have expected this > means PATH still has /usr/bin. Unless the environment provided to > MLton from cygwin is a lie that has been translated from the real > environment and the real environment is what gets copied by > CreateProcess. Cygwin most definitely "lies" when providing the result for Posix.ProcEnv.environ. There seem to be a number of environment variables that Cygwin translates between Windows paths and Cygwin paths: [fluet@winxp-cygwin tmp]$ ./environ.exe | grep TMP TMP=/cygdrive/c/WINDOWS/TEMP [fluet@winxp-cygwin tmp]$ cmd.exe /c set | grep TMP TMP=c:\WINDOWS\TEMP It also seems that the vast number of environment variables set by Cygwin are not replicated into the real environment that is copied by CreateProcess with a NULL lpEnvironment. That is, recall: [fluet@winxp-cygwin tmp]$ ./parent.exe 'C:\WINDOWS\system32\cmd.exe' 'C:\WINDOWS\system32\cmd.exe /c set' modname = C:\WINDOWS\system32\cmd.exe cmdline = C:\WINDOWS\system32\cmd.exe /c set piProcInfo.hProcess = 1856 COMSPEC=C:\WINDOWS\system32\cmd.exe PATH=C:\cygwin\home\fluet\bin;...elided... PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.JS;.WS PROMPT=$P$G SYSTEMDRIVE=C: SYSTEMROOT=C:\WINDOWS WINDIR=C:\WINDOWS status: 0 and contrast to: [fluet@winxp-cygwin tmp]$ /cygdrive/c/WINDOWS/system32/cmd.exe /c set ALLUSERSPROFILE=C:\Documents and Settings\All Users COMMONPROGRAMFILES=C:\Program Files\Common Files COMPUTERNAME=WINXP-CYGWIN ...elided... PATH=C:\cygwin\home\fluet\bin;...elided... ...elided... TEXMFCONFIG=/home/fluet/share/texmf-config TEXMFHOME=/home/fluet/share/texmf-local TMP=c:\WINDOWS\TEMP ...elided... WINDIR=C:\WINDOWS WINDOW=4 _=/cygdrive/c/WINDOWS/system32/cmd.exe The Cygwin spawnv{,p} functions (and presumably the execv{,p} functions) seem to replicate the environment variables set by Cygwin (and uncygwinify the paths of select variables) and construct an explicit environment (that mimics inheritance of the parent environment) for use as lpEnvironment for the underling CreateProcess call. Note that invoking 'cmd.exe /c set' from the bash shell has loads of variables, and the TEXMFCONFIG and TEXMFHOME paths have been left alone but PATH and TMP are uncygwinified. The Cygwin spawnv{,p}e functions only replicate the given environment variables (but do uncygwinify the paths of select variables) and construct an explicit environment for use as lpEnvironment for the underlying CreateProcess call. Hence, there is the same PATH/cygwin1.dll problem when spawnv{,p}e-ing a Cygwin program --- except that here you need '/usr/bin' in PATH (to be uncyginified to 'C:\cygwin\bin')! > If you set an environment variable in the calling > cygwin process, does the CreateProcess cygwin child see it? No. (Unless you set one of the "special" environment variables: PATH, SYSTEMDRIVE, SYSTEMROOT, WINDIR --- and with the exception of PATH, I would be wary of mucking with any of the others.) Another option would be to mimic Cygwin's implementation of spawnv{,p} and uncygwinify the paths of select variables from Posix.ProcEnv.eviron; it's just not documented which variables. >> ?* create {env = SOME [...], ...} should behave as currently implemented; >> ? the user is responsible for establishing the correct environment (it is >> ? just the case that the correct environment for a Cygwin program is >> ? subtle) > > Indeed. > > _______________________________________________ > MLton mailing list > MLton@mlton.org > http://mlton.org/mailman/listinfo/mlton From wesley at terpstra.ca Wed Jun 17 17:52:22 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Wed Jun 17 17:52:57 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: References: <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> <162de7480906170648u60f51716s95e6a5f893c5a9ac@mail.gmail.com> <162de7480906170948y33dd6c57uad4a8860669a7c8@mail.gmail.com> Message-ID: <162de7480906171752o563ecf7coe63cfb004030e70d@mail.gmail.com> On Wed, Jun 17, 2009 at 9:47 PM, Matthew Fluet wrote: > Cygwin most definitely "lies" when providing the result for > Posix.ProcEnv.environ. ?There seem to be a number of environment variables > that Cygwin translates between Windows paths and Cygwin paths: You make it sound like these cygwinify and uncygwinify functions are easy to use. Maybe we should consider uncygwinifying the name of the executable passed to CreateProcess? At the moment everything else in MLton (file/dir open, exec/spawn, ...) uses cygwin paths, but Process.create does not. > The Cygwin spawnv{,p}e functions only replicate the given environment > variables (but do uncygwinify the paths of select variables) and construct > an explicit environment for use as lpEnvironment for the underlying > CreateProcess call. > > Another option would be to mimic Cygwin's implementation of spawnv{,p} and > uncygwinify the paths of select variables from Posix.ProcEnv.eviron; This sounds like a good idea to me. > it's just not documented which variables. The cygwin installer can also grab source code for packages. Whichever package builds cygwin1.dll must mention these special paths. Alternatively, run 'strings' on cygwin1.dll and look for paths you know are translated and see which other strings are nearby? From fluet at tti-c.org Wed Jun 17 19:34:20 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Wed Jun 17 19:34:22 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <162de7480906171752o563ecf7coe63cfb004030e70d@mail.gmail.com> References: <162de7480906151906o48ae9189n265bcb4fbacbc1a0@mail.gmail.com> <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> <162de7480906170648u60f51716s95e6a5f893c5a9ac@mail.gmail.com> <162de7480906170948y33dd6c57uad4a8860669a7c8@mail.gmail.com> <162de7480906171752o563ecf7coe63cfb004030e70d@mail.gmail.com> Message-ID: On Thu, 18 Jun 2009, Wesley W. Terpstra wrote: > On Wed, Jun 17, 2009 at 9:47 PM, Matthew Fluet wrote: >> Cygwin most definitely "lies" when providing the result for >> Posix.ProcEnv.environ. ?There seem to be a number of environment variables >> that Cygwin translates between Windows paths and Cygwin paths: > > You make it sound like these cygwinify and uncygwinify functions are > easy to use. Aren't the cygwinify functions simply: cygwin_conv_to_full_posix_path cygwin_conv_to_posix_path cygwin_win32_to_posix_path_list cygwin_win32_to_posix_path_list_buf_size and the uncygwinify functions simply: cygwin_conv_to_full_win32_path cygwin_conv_to_win32_path cygwin_posix_to_win32_path_list cygwin_posix_to_win32_path_list_buf_size from http://cygwin.com/cygwin-api/cygwin-functions.html? > Maybe we should consider uncygwinifying the name of the > executable passed to CreateProcess? At the moment everything else in > MLton (file/dir open, exec/spawn, ...) uses cygwin paths, but > Process.create does not. ??Confused?? MLton.Process.create (on Cygwin) has used cygwin_conv_to_full_win32_path since your 20041202 patch (http://mlton.org/pipermail/mlton/2004-December/026368.html), slightly modified by Stephen (moving the cygwin_conv_to_full_win32_path call out of MLton_Process_create and performing the conversion on the SML side) when applied as r3662 (http://mlton.org/cgi-bin/viewsvn.cgi?view=rev&rev=3662). >> The Cygwin spawnv{,p}e functions only replicate the given environment >> variables (but do uncygwinify the paths of select variables) and construct >> an explicit environment for use as lpEnvironment for the underlying >> CreateProcess call. >> >> Another option would be to mimic Cygwin's implementation of spawnv{,p} and >> uncygwinify the paths of select variables from Posix.ProcEnv.eviron; > > This sounds like a good idea to me. > >> it's just not documented which variables. > > The cygwin installer can also grab source code for packages. Whichever > package builds cygwin1.dll must mention these special paths. > Alternatively, run 'strings' on cygwin1.dll and look for paths you > know are translated and see which other strings are nearby? It was fairly easy to find in the cygwin sources: http://cygwin.com/cgi-bin/cvsweb.cgi/~checkout~/src/winsup/cygwin/environ.cc?content-type=text/plain&cvsroot=src Look for conv_envvars. But, I'm not convinced that it is worth it. Reading the archives from 200411/200412, it is clear that Cygwin+CreateProcess has never really been tested. Indeed, trying to redirect the child's stdout to the parent's stdout using Param.self doesn't work at the terminal (though it works when the parent's stdout is redirected to a file). From wesley at terpstra.ca Thu Jun 18 02:43:58 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Thu Jun 18 03:59:14 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: References: <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> <162de7480906170648u60f51716s95e6a5f893c5a9ac@mail.gmail.com> <162de7480906170948y33dd6c57uad4a8860669a7c8@mail.gmail.com> <162de7480906171752o563ecf7coe63cfb004030e70d@mail.gmail.com> Message-ID: <162de7480906180243h29ce1bc4oa25192cfb961ec63@mail.gmail.com> On Thu, Jun 18, 2009 at 4:34 AM, Matthew Fluet wrote: > MLton.Process.create (on Cygwin) has used cygwin_conv_to_full_win32_path > since your 20041202 patch Oh. :) From your examples I thought it was using windows paths, not cygwin. But, I'm not convinced that it is worth it. That's a reasonable position. The whole useWindowsProcess was implemented AFAIK because we were using VirtualAlloc instead of mmap. At this pointis there still any advantage to using VitrualAlloc? Use-mmap seems to work quite reliably. It might be reasonable to just use fork()/exec() for Process.create and eliminate the native memory management for cygwin. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mlton.org/pipermail/mlton/attachments/20090618/37b64535/attachment.html From fluet at tti-c.org Thu Jun 18 04:58:57 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Thu Jun 18 04:59:01 2009 Subject: [MLton] Re: [MLton-commit] r6699 In-Reply-To: <162de7480906180243h29ce1bc4oa25192cfb961ec63@mail.gmail.com> References: <162de7480906170156l6cdeeb9n8e52ffd7310d9d29@mail.gmail.com> <162de7480906170156s4626365clb76f1fd074329b72@mail.gmail.com> <162de7480906170648u60f51716s95e6a5f893c5a9ac@mail.gmail.com> <162de7480906170948y33dd6c57uad4a8860669a7c8@mail.gmail.com> <162de7480906171752o563ecf7coe63cfb004030e70d@mail.gmail.com> <162de7480906180243h29ce1bc4oa25192cfb961ec63@mail.gmail.com> Message-ID: On Thu, 18 Jun 2009, Wesley W. Terpstra wrote: > On Thu, Jun 18, 2009 at 4:34 AM, Matthew Fluet wrote: > >> MLton.Process.create (on Cygwin) has used cygwin_conv_to_full_win32_path >> since your 20041202 patch > > Oh. :) From your examples I thought it was using windows paths, not cygwin. True, my examples were using a simple C program that didn't bother with the conversion, so I used windows paths. Also, it seems that cygwin_conv_to_full_win32_path is a nop (or acts simply as a to absolute path) if the input looks like a windows path; so MLton.Process.create would actually accept an explicit windows path. >> But, I'm not convinced that it is worth it. > > That's a reasonable position. The whole useWindowsProcess was implemented > AFAIK because we were using VirtualAlloc instead of mmap. At this pointis > there still any advantage to using VitrualAlloc? Use-mmap seems to work > quite reliably. It might be reasonable to just use fork()/exec() for > Process.create and eliminate the native memory management for cygwin. I might investigate defaulting to use-mmap. From fluet at tti-c.org Fri Jun 19 13:07:44 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Fri Jun 19 13:07:47 2009 Subject: [MLton] cygwin and mmap Message-ID: Although we've now gotten most things squared away with x86-cygwin and the MLton.Process.* functions, I'm tempted to switch the MLton_Platform_CygwinUseMmap default from FALSE to TRUE. I did a bit of archeology and discovered that the default was TRUE in the 20051206 release (established by r4104). The limitation at the time was that a Cygwin-hosted self-compile wasn't possible: http://mlton.org/pipermail/mlton/2005-October/028101.html http://mlton.org/pipermail/mlton/2005-November/028182.html Another limitation at the time was that while there was a use-mmap runtime option, there was no dont-use-mmap runtime option. That it, it wasn't possible with the 20051206 release to disable the use of mmap. On the other hand, the default was FALSE in the 20070826 release. The reason for the revert is that the x86_64 branch had been created on 20050822 and the runtime system was getting a major rewrite. I was cherry-picking commits from trunk and r4104 was one that was never picked up. So the old default got propagated. Furthermore, at the time of the 20070826 release, my only Cygwin machine was a 512MB laptop --- not suitable for a self-compile. While there have been a few reported issues with the 20070826 release on x86-cygwin (for many of which, use-mmap was the suggested fix), it isn't clear that anyone has been self-compiling on x86-cygwin, although there have been some experimental packages. For the present work, I've been running Cygwin/Windows-XP under VMWare Server 2.0, which has worked fairly well. I've been able to perform multiple rounds of self-compiles, both with CygwinUseMmap defaulting to FALSE and to TRUE. Furthermore, with CygwinUseMmap defaulting to TRUE, I am able to pass *all* the regressions (except textio.2.sml, which is due to CR/LF conversions, and socket.sml, which might be due to Windows firewall). Obviously, with CygwinUseMmap defaulting to FALSE, none of the regressions that rely on fork() pass. In any case, I've changed the use-mmap runtime option to take a boolean value. One caveat is that, thus far, the virtual machine has been configured with 4G memory (although Windows only reports seeing 3G), so the garbage-collector and virtual-memory systems are not being stressed terribly. I'll try configuring down to 2G (and maybe down to 1G) and see if things remain stable. From fluet at tti-c.org Fri Jun 19 13:26:05 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Fri Jun 19 13:26:08 2009 Subject: [MLton] release and future devel In-Reply-To: References: Message-ID: On Thu, 11 Jun 2009, Matthew Fluet wrote: > I regularly self-compile on x86-darwin and amd64-linux, so I don't think that > there are any issues on those platforms. I should also be able to check > x86-cygwin and x86-linux. If you regularly use another platform, it would be > helpful to verify that things work as expected on that platform; > bootstrapping from mlton-20070826 and running the regression suite is a good > baseline test. Obviously, your favorite SML application is another good > test. I've updated most of the documentation to reflect the current state of affairs (as I understand them). Check out http://mlton.org/Credits and update if I've overlooked something. I've also been conservative about mentioning newly supported platforms. There have been some commits referencing IA64 and PowerPC64, but I'm not sure of the extent of the support; seems only for HPUX and AIX (respectively). From fluet at tti-c.org Fri Jun 19 18:41:06 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Fri Jun 19 18:41:10 2009 Subject: [MLton] cygwin and mmap In-Reply-To: References: Message-ID: On Fri, 19 Jun 2009, Matthew Fluet wrote: > One caveat is that, thus far, the virtual machine has been configured with 4G > memory (although Windows only reports seeing 3G), so the garbage-collector > and virtual-memory systems are not being stressed terribly. I'll try > configuring down to 2G (and maybe down to 1G) and see if things remain > stable. I was able to self-compile with both 'use-mmap true' and 'use-mmap false' with 1G memory. Here are the GC summaries: [fluet@winxp-cygwin mlton]$ cat use-mmap-true.1g.log GC type time ms number bytes bytes/sec ------------- ------- ------- --------------- --------------- copying 7,079 13 591,616,636 83,573,476 mark-compact 82,454 13 3,080,548,832 37,360,817 minor 116,210 673 5,799,126,212 49,902,125 total time: 612,953 ms total GC time: 212,214 ms (34.6%) max pause time: 14,500 ms total bytes allocated: 66,623,505,512 bytes max bytes live: 656,260,600 bytes max heap size: 745,406,464 bytes max stack size: 14,499,840 bytes num cards marked: 22,228,445 bytes scanned: 7,418,450,504 bytes bytes hash consed: 0 bytes [fluet@winxp-cygwin mlton]$ cat use-mmap-false.1g.log GC type time ms number bytes bytes/sec ------------- ------- ------- --------------- --------------- copying 7,110 13 590,547,480 83,058,713 mark-compact 78,845 13 2,834,232,564 35,946,890 minor 108,872 596 5,517,453,328 50,678,349 total time: 523,937 ms total GC time: 201,145 ms (38.4%) max pause time: 14,657 ms total bytes allocated: 66,624,783,712 bytes max bytes live: 656,260,600 bytes max heap size: 745,406,464 bytes max stack size: 14,090,240 bytes num cards marked: 19,761,168 bytes scanned: 6,590,745,560 bytes bytes hash consed: 0 bytes Both memory managers were able to achieve the same maximum heap size. Somewhere along the way, the garbage collections got out of sync, probably due to some heap allocation or resizing request failing and requiring a backoff. Although the 'use-mmap true' run is a little slower (that might simply be due to the fact that I ran it immediately before the 'use-mmap false' run, so it incured the cost of paging everybody else out), since it provides more functionality, I will make it the new default. For the record, here are the corresponding runs with 4G: [fluet@winxp-cygwin mlton]$ cat use-mmap-true.4g.log GC type time ms number bytes bytes/sec ------------- ------- ------- --------------- --------------- copying 3,981 10 284,825,700 71,546,268 mark-compact 48,126 9 1,312,785,960 27,278,101 minor 99,795 300 5,320,801,552 53,317,318 total time: 508,968 ms total GC time: 165,061 ms (32.4%) max pause time: 8,891 ms total bytes allocated: 66,610,123,092 bytes max bytes live: 271,678,984 bytes max heap size: 1,172,111,360 bytes max stack size: 10,960,896 bytes num cards marked: 16,782,779 bytes scanned: 5,121,248,816 bytes bytes hash consed: 0 bytes [fluet@winxp-cygwin mlton]$ cat use-mmap-false.4g.log MLton finished in 371.52 + 174.24 (32% GC) GC type time ms number bytes bytes/sec ------------- ------- ------- --------------- --------------- copying 2,813 9 208,347,896 74,066,087 mark-compact 52,954 10 1,502,771,236 28,378,804 minor 108,915 325 5,805,274,292 53,300,960 total time: 506,203 ms total GC time: 174,272 ms (34.4%) max pause time: 8,719 ms total bytes allocated: 66,619,945,340 bytes max bytes live: 269,871,128 bytes max heap size: 1,172,111,360 bytes max stack size: 10,960,896 bytes num cards marked: 17,969,572 bytes scanned: 5,412,110,084 bytes bytes hash consed: 0 bytes Again, same maximum heap size (and negligible difference in running time). Also, the run with fewer minor gcs is fipped, so I don't think there is a significant difference between the two. From ville at laurikari.net Fri Jun 19 23:38:50 2009 From: ville at laurikari.net (Ville Laurikari) Date: Fri Jun 19 23:38:56 2009 Subject: [MLton] release and future devel In-Reply-To: References: Message-ID: <20090620063850.GA10108@laurikari.net> On Fri, Jun 19, 2009 at 03:26:05PM -0500, Matthew Fluet wrote: > I've also been conservative about mentioning newly supported platforms. > There have been some commits referencing IA64 and PowerPC64, but I'm not > sure of the extent of the support; seems only for HPUX and AIX > (respectively). We use hpux-ia64 and aix-powerpc64 in production, so I'd count them as stable enough to call them supported platforms. I can provide builds for hpux-hppa, hpux-ia64, aix-powerpc64, solaris-sparc, and solaris-amd64. These are all platform we use in production. We no longer use aix-powerpc. Builds for other platforms (linux, mingw, cygwin, darwin) can be provided by other people, I believe. -- Ville From fluet at tti-c.org Sat Jun 20 05:46:04 2009 From: fluet at tti-c.org (Matthew Fluet) Date: Sat Jun 20 05:46:10 2009 Subject: [MLton] release and future devel In-Reply-To: <20090620063850.GA10108@laurikari.net> References: <20090620063850.GA10108@laurikari.net> Message-ID: On Sat, 20 Jun 2009, Ville Laurikari wrote: > On Fri, Jun 19, 2009 at 03:26:05PM -0500, Matthew Fluet wrote: >> I've also been conservative about mentioning newly supported platforms. >> There have been some commits referencing IA64 and PowerPC64, but I'm not >> sure of the extent of the support; seems only for HPUX and AIX >> (respectively). > > We use hpux-ia64 and aix-powerpc64 in production, so I'd count them as > stable enough to call them supported platforms. Great. > I can provide builds for hpux-hppa, hpux-ia64, aix-powerpc64, > solaris-sparc, and solaris-amd64. These are all platform we use in > production. Thanks. > We no longer use aix-powerpc. Builds for other platforms > (linux, mingw, cygwin, darwin) can be provided by other people, I > believe. Certainly. From wesley at terpstra.ca Wed Jun 24 10:03:57 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Wed Jun 24 10:04:30 2009 Subject: [MLton] Bug? print statement makes program terminate / run-forever Message-ID: <162de7480906241003o5bbab5a4hdd038fa4a484eea7@mail.gmail.com> Depending on if the 'print' statement is present, the program either loops forever or terminates rapidly. val x = 5.0 fun sqrtx y = let val y' = ((x / y + y) / 2.0) val () = print (Real.toString y ^ " ...\n") in if Real.== (y', y) then y else sqrtx y' end val () = print (Real.toString (sqrtx 2.0) ^ "\n") From wesley at terpstra.ca Wed Jun 24 10:11:01 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Wed Jun 24 10:11:35 2009 Subject: [MLton] Re: Bug? print statement makes program terminate / run-forever In-Reply-To: <162de7480906241003o5bbab5a4hdd038fa4a484eea7@mail.gmail.com> References: <162de7480906241003o5bbab5a4hdd038fa4a484eea7@mail.gmail.com> Message-ID: <162de7480906241011w2557433fof94a60e3a4574a53@mail.gmail.com> On Wed, Jun 24, 2009 at 7:03 PM, Wesley W. Terpstra wrote: > Depending on if the 'print' statement is present, the program either > loops forever or terminates rapidly. MLton 20061107 works. So does 20070826. I suspect these recent floating point optimizations are to blame? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mlton.org/pipermail/mlton/attachments/20090624/1856b7a7/attachment.htm From vesa.a.j.k at gmail.com Wed Jun 24 12:06:40 2009 From: vesa.a.j.k at gmail.com (Vesa Karvonen) Date: Wed Jun 24 12:06:43 2009 Subject: [MLton] Re: Bug? print statement makes program terminate / run-forever In-Reply-To: <162de7480906241011w2557433fof94a60e3a4574a53@mail.gmail.com> References: <162de7480906241003o5bbab5a4hdd038fa4a484eea7@mail.gmail.com> <162de7480906241011w2557433fof94a60e3a4574a53@mail.gmail.com> Message-ID: <9e43b9a0906241206q67af8aa1y9db53b553acbf624@mail.gmail.com> On Wed, Jun 24, 2009 at 8:11 PM, Wesley W. Terpstra wrote: > On Wed, Jun 24, 2009 at 7:03 PM, Wesley W. Terpstra > wrote: >> >> Depending on if the 'print' statement is present, the program either >> loops forever or terminates rapidly. > > MLton 20061107 works. So does 20070826. > > I suspect these recent floating point optimizations are to blame? Doesn't look like it as the the problem doesn't seem to reproduce on amd64 (and there doesn't seem to be much scope for constant folding). (I'll try shortly with x86.) Did you try this on a x86 platform? If so, did you try with -ieee-fp true? Here is my theory. On x86, the print in the loop has the effect of forcing the y' value out of FP registers. This allows the loop to terminate. Otherwise, with the 80-bit registers of the x86, and the 80-bit value of y' in a register at the time of the comparison, the loop will never terminate as the 64-bit value of y and the 80-bit value of y' will never be equal. -Vesa Karvonen From wesley at terpstra.ca Wed Jun 24 12:16:24 2009 From: wesley at terpstra.ca (Wesley W. Terpstra) Date: Wed Jun 24 12:16:57 2009 Subject: [MLton] Re: Bug? print statement makes program terminate / run-forever In-Reply-To: <9e43b9a0906241206q67af8aa1y9db53b553acbf624@mail.gmail.com> References: <162de7480906241003o5bbab5a4hdd038fa4a484eea7@mail.gmail.com> <162de7480906241011w2557433fof94a60e3a4574a53@mail.gmail.com> <9e43b9a0906241206q67af8aa1y9db53b553acbf624@mail.gmail.com> Message-ID: <162de7480906241216w4169b479w6801bcde260c0357@mail.gmail.com> On Wed, Jun 24, 2009 at 9:06 PM, Vesa Karvonen wrote: > > MLton 20061107 works. So does 20070826. > Strange that this didn't used to happen. > Did you try this on a x86 platform? > This was only on x86, yes. > If so, did you try with -ieee-fp true? > With this flag it terminates. > Here is my theory. On x86, the > print in the loop has the effect of forcing the y' value out of FP > registers. This allows the loop to terminate. Otherwise, with the > 80-bit registers of the x86, and the 80-bit value of y' in a register > at the time of the comparison, the loop will never terminate as the > 64-bit value of y and the 80-bit value of y' will never be equal. Actually it seems the two 80-bit values never become equal: loop_12: fldL (globalReal64+0x8) fldL 0x30(%ebp) fdivr %st, %st(1) fadd %st, %st(1) fxch %st(1) fdivL (globalReal64+0x0) fxch %st(1) fucomp %st(1) fnstsw %ax andw $0x4500,%ax cmpw $0x4000,%ax je L_1331 L_151: fstpL 0x30(%ebp) jmp loop_12 The comparison is indeed happening in a floating point register, which means it might behave differently than when truncated in/out of memory for the C call. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mlton.org/pipermail/mlton/attachments/20090624/8eba5668/attachment.html