Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with GHCRTS=--io-manager=native (version: git HEAD) #5851

Open
ruifengx opened this issue Sep 4, 2022 · 9 comments
Open

Crash with GHCRTS=--io-manager=native (version: git HEAD) #5851

ruifengx opened this issue Sep 4, 2022 · 9 comments

Comments

@ruifengx
Copy link

ruifengx commented Sep 4, 2022

Note: the latest version (2.7.5) of stack on Windows was compiled using GHC 8.10.4, which did not have the new native WinIO manager. This error is produced with manually compiled stack, using the source from this GitHub repo, at commit 13ba67a (yesterday). I followed the procedure in the GitHub CI file to compile the executable, namely a release.hs check followed by release.hs build.

General summary/comments (optional)

Steps to reproduce

  1. Install above-mentioned manually compiled stack.
  2. cd into any stack project.
  3. Set environment variable GHCRTS=--io-manager=native.
  4. Run command stack build.
  5. stack reports the following error:
    Encountered error while migrating Pantry database:
        \\.\NUL: hDuplicateTo: illegal operation (handles are incompatible)
    Please report this on https://github.com/commercialhaskell/stack/issues
    As a workaround you may delete Pantry database in C:\sr\pantry\pantry.sqlite3 triggering its recreation.
    
    where C:\sr is my STACK_ROOT.
  6. Remove $STACK_ROOT/pantry/pantry.sqlite3 as instructed, and rerun stack build.
  7. Same error reported.

Note: No error if we unset the GHCRTS environment variable.

Expected

No error.

Actual

Error reported, and the build process did not start.

If you suspect that a Stack command misbehaved, please include the output of
that command in --verbose mode:

Version manual-build, Git revision 13ba67aee29189338b9e6c07cfaeb58fd12ff2d9 RELEASE-CANDIDATE x86_64 hpack-0.35.0
2022-09-04 13:53:02.806765: [debug] Checking for project config at: path\to\stack.yaml
2022-09-04 13:53:02.806765: [debug] Loading project config file stack.yaml
2022-09-04 13:53:02.806765: [error] Encountered error while migrating Pantry database:
    \\.\NUL: hDuplicateTo: illegal operation (handles are incompatible)
Please report this on https://github.com/commercialhaskell/stack/issues
As a workaround you may delete Pantry database in C:\sr\pantry\pantry.sqlite3 triggering its recreation.

Stack version

As explained above, a manual build. Work tree unchanged (therefore using almost the same environment as CI builds).

stack --version
Version manual-build, Git revision 13ba67aee29189338b9e6c07cfaeb58fd12ff2d9 RELEASE-CANDIDATE x86_64 hpack-0.35.0

Method of installation

Other: manual build as explained above. I can provide the executable if needed.

@mpilgrem
Copy link
Member

mpilgrem commented Sep 5, 2022

Thank you. I am wondering if this is an upstream issue with GHC.

I am using Stack installed with stack --stack-yaml stack-ghc-942.yaml install, and this test project (using snapshot nightly-2022-09-05:

module Main (main) where

import GHC.IO.SubSystem (isWindowsNativeIO)

main :: IO ()
main = print isWindowsNativeIO

It works fine with this package.yaml (extracts) - that is, setting RTS options at compile time:

executables:
  ioTest-exe:
    main:                Main.hs
    source-dirs:         app
    ghc-options:
    - -threaded
    - -rtsopts
    - -with-rtsopts=-N
    - -with-rtsopts=--io-manager=native

It also works fine with this package.yaml (extracts) and stack exec -- ioTest-exe +RTS --io-manager=native -RTS - that is, setting RTS options at the command line.

executables:
  ioTest-exe:
    main:                Main.hs
    source-dirs:         app
    ghc-options:
    - -threaded
    - -rtsopts
    - -with-rtsopts=-N

As you experienced, it is only when setting RTS options using an environment variable $Env:GHCRTS="--io-manager=native" that it fails.

@mpilgrem
Copy link
Member

mpilgrem commented Sep 5, 2022

I've raised this at the GHC level, in case it is a GHC bug (see GHC issue #22146).

@ruifengx
Copy link
Author

ruifengx commented Sep 6, 2022

It works fine with this package.yaml (extracts) - that is, setting RTS options at compile time:

executables:
  ioTest-exe:
    main:                Main.hs
    source-dirs:         app
    ghc-options:
    - -threaded
    - -rtsopts
    - -with-rtsopts=-N
    - -with-rtsopts=--io-manager=native

Sorry, but I am a bit unsure whether this configuration is testing the same crash as I reported. In my case, the crash is never from the test program itself, but from stack (when running stack build, for example).

As you experienced, it is only when setting RTS options using an environment variable $Env:GHCRTS="--io-manager=native" that it fails.

I am confused now, because setting GHCRTS like this and manually run the test program works correctly for me. On the contrary, it is stack itself crashing upon seeing $env:GHCRTS="--io-manager=native"; specifically, from the error message, I suspect it has something to do with Pantry or the file system API used in Pantry (which probably comes from GHC, so it still can be an upstream bug indeed).

@mpilgrem
Copy link
Member

mpilgrem commented Sep 6, 2022

@ruifengx, you are, of course, correct. I was not thinking clearly. What I should have been comparing was using Stack with the RTS option at the command line:

stack build +RTS --io-manager=native -RTS

which yields the same error. The exception itself is defined in the pantry package, in module Pantry.Types, where PantryException is made an instance of Display - the relevant constructor is MigrationFailure. I suspect the exception is thrown by Pantry.SQLite.initSorage. I continue to investigate.

@mpilgrem
Copy link
Member

mpilgrem commented Sep 6, 2022

I do think this is an upstream issue. See commercialhaskell/pantry#59 and GHC issue #22146.

@ruifengx
Copy link
Author

According to the latest comment in linked GHC issue, I do not believe we will get a fix from GHC soon (if ever), so I have just submitted hspec/silently#26 to address this problem. If it got merged there, I hope there could be a new stack patch release to include this change, so that we could finally get rid of the annoying commitAndReleaseBuffer: invalid argument (invalid character) error by globally adopting the new IO manager (currently in my own workflow, the only blocker to a global GHCRTS=--io-manager=native is stack).

@ruifengx
Copy link
Author

ruifengx commented Sep 23, 2022

So I just built stack with my patched silently, by cloning v2.9.1 and adding the following to stack.yaml:

extra-deps:
- git: https://github.com/ruifengx/silently.git
  commit: 0718fc3d1f2b31b66757f51f7c8dbbc0d85179b7

Now (with --io-manager=native) it does not crash any more, but instead it got stuck on waiting the output of ghc-pkg. I suspect this is probably also an upstream issue, likely related to how exec works in the new I/O manager. I remember reading an issue mentioning that exec on Windows is inherently broken, because the implementation does not execute the new program in the same process (because it is impossible), but it will simply kill the current process and start a new one. Therefore GHC used a best-effort heuristic to work around this problem, but I guess here the heuristic just fails.


EDIT: here is part of the output with --verbose:

2022-09-23 11:56:57.464002: [debug] Loaded compiler information from cache
2022-09-23 11:56:57.464974: [debug] Asking for a supported GHC version
2022-09-23 11:56:57.464974: [debug] Resolving package entries
2022-09-23 11:56:57.464974: [debug] Parsing the targets
2022-09-23 11:56:57.467979: [debug] Checking flags
2022-09-23 11:56:57.468980: [debug] SourceMap constructed
2022-09-23 11:56:57.474987: [debug] Starting to execute command inside EnvConfig
2022-09-23 11:56:57.476979: [debug] Finding out which packages are already installed
2022-09-23 11:56:57.477974: [debug] Run process: C:\Users\krant\AppData\Local\Programs\stack\x86_64-windows\ghc-9.2.4\bin\ghc-pkg-9.2.4.exe --global --no-user-package-db dump --expand-pkgroot

It hangs indefinitely after this, waiting for ghc-pkg to exit. (Task manager shows ghc-pkg does not exit and waits for nothing.)

@mpilgrem
Copy link
Member

@ruifengx, thanks for investigating further. To help me follow along, when you refer to exec, are you referring to the rio package's RIO.Process.exec?

@ruifengx
Copy link
Author

No, actually I am not particularly familiar with RIO. I was referring to the simulated POSIX interface exec(2). I recall seeing a long discussion in the GitLab GHC repo, but in the past 20 minutes or so my searching failed to locate that specific post. Anyway, exec is only my guess; I mentioned it only in case it would ring a bell for you or some GHC developers.

I did some digging into the stack codebase, and here is what I found out (hope it would be helpful):

  • Searching the command-line parameters, we see it is produced in Stack.PackageDump.ghcPkgCmdArgs
  • It calls Stack.Prelude.sinkProcessStderrStdout
  • Which in turn calls RIO.Process.proc and RIO.Process.withProcessWait_

I highly suspect the problem is rooted within RIO.Process.withProcessWait_.

mpilgrem added a commit that referenced this issue Nov 19, 2022
Also does not suggestion an inappropriate workaround if the cause is the upstream bug discussed at #5851.

Also updates the error documentation, generally.

Also conforms the Haddock documentation of pretty exceptions in various modules.

Also refactors some import lists, in passing, with a longer term view to a more consistent approach.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants