-
-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Characterize eval plugin race leading to GetLinkable errors #4185
Conversation
I am wondering under what circumstances we are using |
No, it's only used in one place, in this bracket_: haskell-language-server/plugins/hls-eval-plugin/src/Ide/Plugin/Eval/CodeLens.hs Lines 213 to 218 in 9593d04
|
@jhrcek I don't quite understand why we have the above error if we have writeCoreFileIfNeeded :: ShakeExtras -> HscEnv -> Maybe LinkableType -> Action (IdeResult ModGuts) -> TcModuleResult -> Action (IdeResult HiFileResult)
writeCoreFileIfNeeded _ hsc Nothing _ tmr = do
incrementRebuildCount
res <- liftIO $ mkHiFileResultNoCompile hsc tmr
pure ([], Just $! res)
....
getLinkableRule :: Recorder (WithPriority Log) -> Rules ()
getLinkableRule recorder =
defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetLinkable f -> do
ModSummaryResult{msrModSummary = ms} <- use_ GetModSummary f
HiFileResult{hirModIface, hirModDetails, hirCoreFp} <- use_ GetModIface f
let obj_file = ml_obj_file (ms_location ms)
core_file = ml_core_file (ms_location ms)
case hirCoreFp of
Nothing -> error $ "called GetLinkable for a file without a linkable: " ++ show f
...
getModIfaceRule :: Recorder (WithPriority Log) -> Rules ()
getModIfaceRule recorder = defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetModIface f -> do
fileOfInterest <- use_ IsFileOfInterest f
res <- case fileOfInterest of
IsFOI status -> do
-- Never load from disk for files of interest
tmr <- use_ TypeCheck f
linkableType <- getLinkableType f
hsc <- hscEnv <$> use_ GhcSessionDeps f
let compile = fmap ([],) $ use GenerateCore f
se <- getShakeExtras
(diags, !mbHiFile) <- writeCoreFileIfNeeded se hsc linkableType compile tmr
let fp = hiFileFingerPrint <$> mbHiFile
hiDiags <- case mbHiFile of
Just hiFile
| OnDisk <- status
, not (tmrDeferredError tmr) -> liftIO $ writeHiFile se hsc hiFile
_ -> pure []
return (fp, (diags++hiDiags, mbHiFile))
.... |
I'm not sure I follow your reasoning. I don't understand what you mean by
I think the problem is that when you call
|
This seems to be a different error than the one in #4093 (comment) 'called GetLinkable for a file without a linkable' versus 'called GetLinkable for a file which doesn't need compilation: ' |
Ah, you're right. We end up in the first error branch because But I still don't get why you think we should end up calling |
Sorry that I am not stating it clear. |
But as you can see in the screenshot of failing test case linkableType=Nothing (printed by this trace) |
Right, It confused me :( |
@soulomoon explanation in PR description corrected based on your pointers. |
I still don't understand why isEvaluating returns false after we queue a file for evaluation, because |
I didn't check this deeply yet, so this is just me theorizing now. But if you invalidate IsEvaluating, the NeedsCompilation rule might already be in flight (e.g in the True if branch here) and might have already decided not to (re)generate core file - I think this is what we see in the failed test screenshot. I mean the race is that (1) and (2) are swapped sometimes.
|
I believe that any on going compile should have been cancelled by the restart and the NeedsCompilation rule should be re-evaluated. |
@@ -56,13 +56,13 @@ instance IsIdeGlobal EvaluatingVar | |||
queueForEvaluation :: IdeState -> NormalizedFilePath -> IO () | |||
queueForEvaluation ide nfp = do | |||
EvaluatingVar var <- getIdeGlobalState ide | |||
atomicModifyIORef' var (\fs -> (Set.insert nfp fs, ())) | |||
atomicModifyIORef' var (\fs -> (trace ("TRACE: queueForEvaluation: " <> show nfp ) $ Set.insert nfp fs, ())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is odd that we are still seeing IsEvaluating: False, inbetween queueForEvaluation
and unqueueForEvaluation
Maybe we can add the trace to runEvalCmd
to make sure the restart session take effect.
(do queueForEvaluation st nfp
setSomethingModified VFSUnmodified st [toKey IsEvaluating nfp] "Eval"
(trace ("TRACE: queueForEvaluation done")
Some reproduction in
It is odd that we are not seeing the recomputation of |
Indeed, the "Timed out waiting to receive a message from the server." is the standard way this test fails. |
Right, I accidentally added a line break after After queueForEvaluation, I know why, we need to add trace right after the old session stop but before setting up a new one. Indeed even after the session restart, we have
|
I see we're already logging shake session restarts, so here's the screenshot including those restarts within the failing tests in the context of other messages. haskell-language-server/plugins/hls-eval-plugin/src/Ide/Plugin/Eval/CodeLens.hs Lines 213 to 218 in 9b0699d
As you can see there doesn't seem to be any retriggering of eval stuff within the two restarts - just two messages that look like finishing rules that started before triggering session restart. The restart works a bit differently than I imagined in the sense that the "currently running build actions" are simply cancelled (unless they are enqueued in which case they are requeued for later execution). haskell-language-server/ghcide/src/Development/IDE/Core/Shake.hs Lines 759 to 761 in 9b0699d
|
We should be seeing the
and |
Here is the full log of a single failing test run if you want to verify some things about it: https://gist.github.com/jhrcek/74fbd87d3d34190aef2ce0de4e99801f |
I am giving 500 runs on it now. |
I got a test failure with your PR branch after 95 iterations. |
But thanx for the test logs.
Some new finding, we are not seeing
This might be deleted right before the call to restartShakeSession in |
For the above change, after doing 500 runs, the error does not appear locally. i think marking the keys as dirty while the session still running in general would have this race condition. In broader scale, we maintain dirtiness both in shakeExtra and the shakeDB, and race conditions grows from the gap. I am thinking about refactor this part of the code to prevent such problem once and for all. |
e426e76
to
6216417
Compare
Updated based on lateset findings from @soulomoon |
…nd rule values [flaky test #4185 #4093] (#4190) The main problem is the out of sync state in the build system. Several states involved, the shake database running result state for key. shake extra's dirty state for key and shake extra's rule values state. To stablize the build system. This PR force some of the updates of these state into a single STM block. 1. collect dirtykeys and ship it to session restart directly. No more invalid removal before session restart. Fixing Flaky test failure result in error of GetLinkable #4093 2. move the dirtykey removal and rules values updating to hls-graph by adding a call back fucntion to RunResult. Properly handle the dirtykeys and rule value state after session start and closely followed by another session restart Fixing ghcide-tests' addDependentFile test #4194 3. properly handle clean up by wrapping the asyncWithCleanUp to refreshDeps --------- Co-authored-by: wz1000 <zubin.duggal@gmail.com> Co-authored-by: Jan Hrcek <2716069+jhrcek@users.noreply.github.com> Co-authored-by: Michael Peyton Jones <me@michaelpj.com>
No longer needed as the issue has been fixed in #4190. |
Here's the full characterization of the race that leads to occasional GetLinkable errors that are still reproducible on master (see steps to reproduce)
Here's my best attempt to describe the race condition succinctly:
So the question is why does getLinkableType sometimes return Nothing (instead of
Just BCOLinkable
which would be required for writing core file)?NeedsCompilation
rule - so it returnslinkableType == Just BCOLinkable
when the file is evaluatingIsEvaluating
rule to decide if given file is evaluating. This rule consults a global variable that stores "set of files being evaluated"EDIT: @soulomoon pointed out mistake in my explanation - the error we see in the tests is a different than the original explanation claimed. Explanation corrected slightly, but it doesn't change what the actual race condition is.
Here is the corrected explanation in a video form walking through the code
Explanation.webm
The race condition is that IsEvaluating returns False (and thus linkableType ends up as Nothing, so .core file is not generated) in case that queueForEvaluation is called AFTER the isEvaluating check is done or if unqueueForEvaluation is called BEFORE isEvaluating check is done.
Here are 2 examples of one successful test and one failing test.


The thing that makes the difference is whether IsEvaluating rule is triggered WHILE given file is in the "set of files being evaluated".