Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupting cabal-3.8.0.0.20220526 and master with ctrl+c makes it hang on windows #8208

Open
jneira opened this issue Jun 11, 2022 · 107 comments · Fixed by haskell/process#277

Comments

@jneira
Copy link
Member

jneira commented Jun 11, 2022

I am experiencing some weird behaviour interrupting cabal rc execution with ctrl+c in windows:
Doing a ctrl+c in the Configuring component for my package, with ghc-pkg subprocesses, hangs the program with rc1

I cant reproduce with 3.6.0.0

@jneira jneira added type: bug platform: windows regression on master Regression that is unreleased and needs to be fixed before release labels Jun 11, 2022
@Mikolaj
Copy link
Member

Mikolaj commented Jun 11, 2022

Possibly introduced by #7921, even though it says "on unix systems".

could you try building cabal with process 1.6.14 or later?
building with GHC 9.2 may include that by default
I don't remember which GHC gitlab CI uses and so which which GHC the cabal binaries have been built
nor which GHC ghcup uses to build RC1 cabal
anyway, that's a long shot, process < 1.6.14 had some faults, but possibly not as grave as hanging

@jneira
Copy link
Member Author

jneira commented Jun 11, 2022

cabal in gitlab is built with ghc-8.10.7 iirc

@jneira
Copy link
Member Author

jneira commented Jun 11, 2022

I checked that pr and I did not get any hang, tests could be wrong though
#7921 (comment)

And a confirmation of the bug by another windows user would be great, i will try again with other projects myself

@Mikolaj
Copy link
Member

Mikolaj commented Jun 13, 2022

Yes, dear Windows users, please reproduce.

@robx
Copy link
Collaborator

robx commented Jun 14, 2022

The "on unix systems" is just to say that the change isn't expected to necessarily fix things on Windows. It definitely does touch Windows too, and there's a decent chance of something like this slipping through...

One factor that might come into play is that GHC has two different Windows I/O managers, which behave differently with respect to types and sub-processes, see e.g. here haskell/process#235. It might be worth checking if that is involved here. (I don't really know which GHC version uses which I/O manager by default and/or whether they're switchable...)

@Mikolaj Mikolaj modified the milestones: 3.9, Considered for 3.8 Jun 14, 2022
@jneira
Copy link
Member Author

jneira commented Jun 14, 2022

in my test I used ghc-8.10.7, which did not have the new io manager
cabal release itself has been built with ghc-8.10.7
afair that new io manager is still not the default in ghc-9.2

@Mikolaj
Copy link
Member

Mikolaj commented Jul 5, 2022

FYI: I'm getting "waitForProcess: does not exist (No child process)" when interrupting long running cabal test, on an ancient Ubuntu. I don't remember getting that before 3.8, but my memory is unreliable. However, this is obviously better than accumulating half-dead processes.

We still have to decide whether to revert for 3.8, e.g., waiting until the new GHC IO fixes the problem or until somebody fixes it differently or until enough Windows users can't reproduce it and so a fluke on a single Windows system is a valid suspicion.

An argument for not reverting is that we only got one report of the problem for the pre-release. However, not many Windows users are early adopters so that may be a sampling bias.

@jneira
Copy link
Member Author

jneira commented Jul 5, 2022

We even are not sure what is the cause (if it is reproducible at all!). Small evidence to perform any action or delay de release imho.
I am afraid only the release will give us more feedback 🤷

@robx
Copy link
Collaborator

robx commented Jul 6, 2022

It's a bit tangential to this particular issue, but how about generally building releases with GHC 9.2? That would address those "no child process" errors for the released cabal versions at least.

Regarding Windows issues in general, it'd be great if there were some easy way for non-Windows devs to get access to a Windows dev environment. Is there some organisation that could e.g. provide access to Windows VMs to contributors of core Haskell tooling?

@Mikolaj
Copy link
Member

Mikolaj commented Jul 6, 2022

@hasufell: in your experience, is GHC 9.2.3 stable enough to build cabal 3.8.1 with it (that ghcup would later distribute)? I'm in favour, to avoid both of stale processes and "waitForProcess: does not exist (No child process)". I suppose changing GHC_VERSION in .gitlab-ci.yml would be enough to effect the change?

If anybody has a clue about free Windows VMs for cabal devs, please let us know. @bgamari: do you think HF or GHC HQ may have any?

@hasufell
Copy link
Member

hasufell commented Jul 6, 2022

in your experience, is GHC 9.2.3 stable enough to build cabal 3.8.1

I don't use 9.2.3 actively, so I don't know.

@Mikolaj
Copy link
Member

Mikolaj commented Jul 6, 2022

But don't your users complain to you about particular GHCs and demand their money back?

@hasufell
Copy link
Member

hasufell commented Jul 6, 2022

But don't your users complain to you about particular GHCs and demand their money back?

Constantly.

marcus

@Mikolaj
Copy link
Member

Mikolaj commented Jul 6, 2022

I take it as GHC 9.2.3 not being particularly maligned by the users, so let's try #8271.

@Mikolaj
Copy link
Member

Mikolaj commented Jul 9, 2022

@jneira: I've compiled branch 3.8 with GHC 9.2.3. Could you try if your problem persists? https://gitlab.haskell.org/haskell/cabal/-/pipelines/54283

@robx: I'm still getting "waitForProcess: does not exist (No child process)" [edit: most of the time] with branch 3.8 cabal compiled on GHC 9.2.3 (https://gitlab.haskell.org/haskell/cabal/-/jobs/1106009) and I'm not ever getting that with cabal 3.6.2. Is that just my ancient Ubuntu acting up? Could you repeat your tests with this version?

Edit: and both 3.6.2 from ghcup (which doesn't show the waitForProcess message for me) and the newly compiled cabal are said by ldd to be static exes, so probably both come from the gitlab-ci job build-x86_64-linux-alpine, so that's probably not the cause of the differences.

@robx
Copy link
Collaborator

robx commented Jul 9, 2022

@robx: I'm still getting "waitForProcess: does not exist (No child process)" [edit: most of the time] with branch 3.8 cabal compiled on GHC 9.2.3 (https://gitlab.haskell.org/haskell/cabal/-/jobs/1106009) and I'm not ever getting that with cabal 3.6.2. Is that just my ancient Ubuntu acting up? Could you repeat your tests with this version?

I was under the mistaken impression that GHC 9.2.3 shipped with the fixed version of the process package. Unfortunately, that turns out not to be the case: It ships with process-1.6.13.2 while the fix is in process-1.6.14.0. Sorry about that :/ (though it's probably still a good change to build with GHC 9.2).

@Mikolaj
Copy link
Member

Mikolaj commented Jul 9, 2022

though it's probably still a good change to build with GHC 9.2

Yes, I think so.

@hasufell
Copy link
Member

hasufell commented Jul 9, 2022

It's trivial to build against a newer core library. You do not need to update GHC for that.

@jneira
Copy link
Member Author

jneira commented Jul 9, 2022

Regarding Windows issues in general, it'd be great if there were some easy way for non-Windows devs to get access to a Windows dev environment. Is there some organisation that could e.g. provide access to Windows VMs to contributors of core Haskell tooling?

well at least we have free windows machines in ci (GitHub and gitlab)
But it is hard to reproduce the hang programatically, maybe killing via cli long running cabal processes at a random points?

@jneira
Copy link
Member Author

jneira commented Jan 12, 2023

sorry, no time to do more tracing till (maybe) the weekend 🙃

@Mikolaj
Copy link
Member

Mikolaj commented Jan 13, 2023

Happy Birthday @Mistuke! :D

@Mistuke
Copy link
Collaborator

Mistuke commented Jan 14, 2023

Happy Birthday @Mistuke! :D

Thank you :D

@jneira
Copy link
Member Author

jneira commented Jan 15, 2023

yeah, will check master again to confirm it has the same behaviour

i am afraid that the bug conitnues reproducing for me at bcfc79c

@Mistuke
Copy link
Collaborator

Mistuke commented Jan 19, 2023 via email

@Mikolaj Mikolaj removed the regression on master Regression that is unreleased and needs to be fixed before release label Feb 9, 2023
@Mikolaj
Copy link
Member

Mikolaj commented Feb 9, 2023

This is going to be a regression in cabal 3.10 now, not on master. However, only @jneira can reproduce it so far and, unfortunately, he is too busy, so let's wait for wider feedback with 3.10.

@Mistuke: thank you for spending the time and confirming it doesn't look as immediately and universally disastrous as I feared. Perhaps it's only a hang on ctrl-c after all and not a symptom of some more general and dangrous flaw.

@Mistuke
Copy link
Collaborator

Mistuke commented Feb 9, 2023 via email

@Mistuke
Copy link
Collaborator

Mistuke commented Mar 5, 2023

haskell/process#277 should fix this.

@ulysses4ever
Copy link
Collaborator

Mmm, isn't it a bit premature to close it? Has anyone actually tried it on cabal?

@Mistuke
Copy link
Collaborator

Mistuke commented Mar 12, 2023

Github closed it because it was linked

@Mistuke Mistuke reopened this Mar 12, 2023
@jneira
Copy link
Member Author

jneira commented Mar 19, 2023

Hi, I have tried build cabal using process including the mentioned patch and i am afraid i continue experiencing it:

PS D:\dev\ws\haskell\cabal> git rev-parse HEAD
6c95f3fee3cdee859704b6476646cefd4628a850
PS D:\dev\ws\haskell\cabal> cat .\cabal.project | grep "process"
packages: ../process
PS D:\dev\ws\haskell\cabal> cd ../process
PS D:\dev\ws\haskell\process> git rev-parse HEAD
9dbb520d711b59f6ccaf32980fb794a369a3e9ed
PS D:\dev\ws\haskell\process> cd ../cabal
PS D:\dev\ws\haskell\cabal> cabal build cabal-install --disable-tests --disable-benchmarks
Resolving dependencies...
Up to date
PS D:\dev\ws\haskell\cabal> cabal list-bin cabal-install
D:\dev\ws\haskell\cabal\dist-newstyle\build\x86_64-windows\ghc-8.10.7\cabal-install-3.10.1.0\x\cabal\build\cabal\cabal.exe
PS D:\dev\ws\haskell\cabal> cd ../cabal-test
PS D:\dev\ws\haskell\cabal-test> D:\dev\ws\haskell\cabal\dist-newstyle\build\x86_64-windows\ghc-8.10.7\cabal-install-3.10.1.0\x\cabal\build\cabal\cabal.exe build
Warning: this is a debug build of cabal-install with assertions enabled.
Build profile: -w ghc-8.10.7 -O1
In order, the following will be built (use -v for more details):
 - hsc2hs-0.68.9 (exe:hsc2hs) (requires build)
 - network-3.1.2.8 (lib:network) (requires build)
 - cabal-test-0.1.0.0 (lib) (first run)
 - cabal-test-0.1.0.0 (exe:cabal-test) (first run)
Configuring executable 'hsc2hs' for hsc2hs-0.68.9..

In the las line i pressed ctrl+c and the execution was blocked 😞

@Mikolaj
Copy link
Member

Mikolaj commented Mar 20, 2023

I wonder what version of process this binary (cabal-install-3.10.1.0 for Windows) has been built with... If that's the standard binary, coming from https://gitlab.haskell.org/haskell/cabal/-/pipelines/64225, we use there quite an old GHC and also needlessly old minor version of the GHC (TODO, PR welcome).

@Mistuke
Copy link
Collaborator

Mistuke commented Mar 25, 2023

Hi, I have tried build cabal using process including the mentioned patch and i am afraid i continue experiencing it:
...


In the las line i pressed ctrl+c and the execution was blocked 😞

That's weird, lets first confirm it's stuck on the same issue :). Could you upload the binary that's failing again, and could you also do a new trace with process spy as before?

The new code in process shouldn't be able to deadlock unless the child process never actually exits.

Second question, could you like me to the cabal code that handles ctrl+c?

@Mistuke
Copy link
Collaborator

Mistuke commented Mar 25, 2023

Also it would be very handy if you could replicate the issue with a standalone reproducer like the one in haskell/process#273 this would likely increase my chances of reproducing the issue as it removes some timing variability.

@jasagredo
Copy link
Collaborator

I don't seem to get this behavior anymore with newer cabals. It works properly as expected. Unless a clear reproducer is provided I guess we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment