Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on Windows, waitForProcess hangs forever on certain git commands #273

Closed
JakobBruenker opened this issue Jan 18, 2023 · 13 comments · Fixed by #277
Closed

on Windows, waitForProcess hangs forever on certain git commands #273

JakobBruenker opened this issue Jan 18, 2023 · 13 comments · Fixed by #277

Comments

@JakobBruenker
Copy link

JakobBruenker commented Jan 18, 2023

Encountered in this cabal issue: haskell/cabal#8688

Running the following script will successfully execute the first process, but then will get stuck indefinitely while waiting for the second process to finish. The program cannot be killed with Ctr+C, but can be killed from the task manager.

If the process is killed and the git fetch command is run manually, the program will succeed if it's run again.

In WSL, the script ran successfully.

Reproduction steps:

Run the following script with cabal run <filename>:

{- cabal:
build-depends: base >= 4.16 && < 5
             , process == 1.6.16.0
-}
import Data.Foldable
import System.Process

processes :: [CreateProcess]
processes =
  [ CreateProcess {cmdspec = RawCommand "git" ["clone","--no-checkout","https://github.com/rtyley/small-test-repo"], cwd = Nothing, env = Nothing, std_in = Inherit, std_out = Inherit, std_err = Inherit, close_fds = False, create_group = False, delegate_ctlc = True, detach_console = False, create_new_console = False, new_session = False, child_group = Nothing, child_user = Nothing, use_process_jobs = True}
  , CreateProcess {cmdspec = RawCommand "git" ["fetch"], cwd = Just "./small-test-repo", env = Nothing, std_in = Inherit, std_out = Inherit, std_err = Inherit, close_fds = False, create_group = False, delegate_ctlc = True, detach_console = False, create_new_console = False, new_session = False, child_group = Nothing, child_user = Nothing, use_process_jobs = True}
  ]

main :: IO ()
main = do
  for_ processes $ \cp -> withCreateProcess cp $ \_ _ _ p -> do
    waitForProcess p >>= print

Environment:

  • Windows 11
  • ghc 9.4.4
  • git version 2.38.1.windows.1
  • process 1.6.16.0
@snoyberg
Copy link
Collaborator

Pinging @Mistuke if you have any thoughts on this.

@Mistuke
Copy link
Contributor

Mistuke commented Jan 23, 2023

Sure I can take a look. Just to check, is this msys2 git or native windows git?

I'm currently looking at a similar issue in cabal where it's looking like the finalizers aren't run, so the resource is never released and the child process becomes zombie.

I have been unable to reproduce it myself so hopefully I can with this.

@JakobBruenker
Copy link
Author

@Mistuke I think native? But I'm not sure what the proper way to find out is

@JakobBruenker
Copy link
Author

If you can't reproduce it and I can help in any other way, let me know

@Mistuke
Copy link
Contributor

Mistuke commented Jan 23, 2023

@Mistuke I think native? But I'm not sure what the proper way to find out is

Ah basically did you install it through pacman or the git website, or some other means?

@JakobBruenker
Copy link
Author

@Mistuke I appear to have installed it via https://gitforwindows.org/

@Mistuke
Copy link
Contributor

Mistuke commented Jan 23, 2023

Ah great, that's native then. I'll dig in in a bit, will let you know if I need a system call trace. :)

@Mistuke
Copy link
Contributor

Mistuke commented Jan 29, 2023

Ok, I have been able to reproduce it, and I think that

main :: IO ()
main = do
  for_ processes $ \cp -> withCreateProcess cp $ \_ _ _ p -> do
    waitForProcess p >>= print

is... an interesting dichotomy.. when using use_process_jobs = True and calling waitForProcess inside withCreateProcess we wait while distinguishing between the process "exited" and the kernel having released all resources https://github.com/haskell/process/blob/v1.6.16.0/cbits/win32/runProcess.c#L620 The reason for this is that we want to know that the kernel has already flushed all the I/O data we might want to read back in. This avoid the racy conditions where the program hasn't flushed the file yet when we try to read it...

However the NT kernel only releases the process when the last HANDLE to it has been closed, but we're holding on to the last handle in p. This results in a race condition. If the Kernel has released the object, we're OK, if it hasn't, we'll wait indefinitely on https://github.com/haskell/process/blob/v1.6.16.0/cbits/win32/runProcess.c#L635 as we're waiting on ourselves.

However this use-case is not unreasonable, so question is how to support it. @snoyberg would you agree that the semantics of waitForProcess doesn't guarantee that the handle is still usable after the function returns?

That is to say, I think I'm allowed to close the process HANDLE in this call, as if you've waited for the process to terminate there can be no expectation that the handle is valid past this point.. Currently we close them when the finalizers run on GC or termination.

@snoyberg
Copy link
Collaborator

I don't think that's true in general. I'm not as familiar with the Windows process API (though I'm not a stranger to it), but it's a pretty common thing in my experience to, for example:

  • Start a process
  • Fork a background thread to read data from the process and do something with it
  • In the main thread, wait for the process to exit

@Mistuke
Copy link
Contributor

Mistuke commented Jan 29, 2023

That scenario should be fine though, what I'm asking is essentially, that after the process has exited, so after the main thread returns from waiting for the process exit, whether there's any expectation that the PID is still valid. I would have expected no since the program terminated. Of course you can still drain the data handles, but with the PID only getting the exit code seems valid.

i.e. pidfd_open is invalid after that point no?

@snoyberg
Copy link
Collaborator

Then I misunderstood, sorry. I believe what you're saying is correct. My understanding of how the package operates is that once it sees that a PID has closed, it never uses it again. This is based on the Unix contracts around waitforpid, but I think the same logic applies on the Windows side.

@Mistuke
Copy link
Contributor

Mistuke commented Feb 9, 2023

Sorry I have not forgotten about this. I have a fix for it but wanted to do some more verification in GHC before posting it. I've been busy with work but have set aside some time Saturday to finish checking and post the patch.

@Mistuke
Copy link
Contributor

Mistuke commented Mar 5, 2023

#277 Fixes this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants