Description
Hello folks,
This is a meta ticket, feel free to close it if you think this is not the appropriate place to discuss the subject.
Summary: GHC changed how diagnostics are emitted, HLS can benefit from this, help needed to integrate the work and feedback welcome to improve GHC to best serve HLS.
Well-Typed has been working together with @goldfirere for the past few months on what used to be GHC proposal 306, i.e. refactor GHC so that it would emit richer Haskell types instead of SDoc
s for its diagnostic messages. Since the original GHC ticket (cfr. ghc#18516) and the inception of our work, we tried to do our best to keep in mind HLS and Haskell IDEs in general as the primary consumers/beneficiaries of this work. The Wiki page goes a bit deeper into the original design work, if people are interested.
I think we are now at a stage where it would be nice to start thinking how we can collectively integrate this into HLS and what we can learn from this exercise so that we can improve what we have already in GHC before 9.4 (that would be nice but not strictly required, I guess).
30 seconds overview
To give you a 30 seconds overview of GHC's HEAD
, we now have a typeclass:
class Diagnostic a where
diagnosticMessage :: a -> DecoratedSDoc
diagnosticReason :: a -> DiagnosticReason
diagnosticHints :: a -> [GhcHint]
And a bunch of types that implement instances for it, which forms a hierarchy:
data GhcMessage where
-- | A message from the parsing phase.
GhcPsMessage :: PsMessage -> GhcMessage
-- | A message from typecheck/renaming phase.
GhcTcRnMessage :: TcRnMessage -> GhcMessage
-- | A message from the desugaring (HsToCore) phase.
GhcDsMessage :: DsMessage -> GhcMessage
-- | A message from the driver.
GhcDriverMessage :: DriverMessage -> GhcMessage
-- | An \"escape\" hatch which can be used when we don't know the source of
-- the message or if the message is not one of the typed ones.
GhcUnknownMessage :: forall a. (Diagnostic a, Typeable a) => a -> GhcMessage
GHC now emits these types (each individually wrapped in a MsgEnvelope
which gives you Severity
and a SrcSpan
amongst other things) which can be "queried" to get not just the pretty-printed message but also the reason why it arose (it will include aWarningFlag
if this is a warning) and a list of GhcHint
. Furthermore, both these types and GhcHint
will include proper information about the diagnostic: for example, if this is a message coming from the type checker about unused pattern binds, GHC will emit a TcRnUnusedPatternBinds :: HsBind GhcRn -> TcRnMessage
that will give the caller the actual binds, not just a list of SDoc
. The GhcHint
type follows a similar logic, for example GHC might emit [SuggestExtension LangExt.BangPatterns]
.
Little disclaimer
Currently not all diagnostics have been ported within GHC (due to their sheer amount). For some of those, we still have the old SDoc
wrapped into a constructor (specific to the type of message we want to emit), like, for example a TcRnUnknownMessage :: SDoc -> TcRnMessage
. Similarly, not all diagnostics have a proper list of hints attached to them, for some not-yet-ported or not-fully-ported, we still have the hints bundled with the pretty-printed message (i.e. the diagnosticMessage
). We are actively working on this.
Where to go from here
The idea would be for HLS to use this information directly instead of parsing and using regexes on the raw output of the compiler. In a GHC meta-ticket about the "remaining work" (see "Little Disclaimer" section) @wz1000 and @bgamari suggested to look at the error parsing logic in ghcide
to see if we can get rid of all those "matchRegex" calls and just use the new infrastructure.
I wanted to have some fun taking a crack at it (as a proof-of-concept, mostly), but it revealed to be harder than I was expecting, mostly due to the perfect storm arising from the changes between GHC 9.2 and 9.3/HEAD (including Alan's work on exactprint) and the changes to Data.List
and its import which broke virtually all the packages I have tried to build. Despite I have used the patched versions of @fendor I have found online, having a working version of ghcide
built with HEAD
was too hard and I have abandoned ship, but I am happy to be told I was doing it all wrong and there is a branch that magically compiles :)
Having said that, I've started taking a cursory look at the CodeAction.hs
module, and the first thing that stood out was that currently ghcide
is running regexes on the _message
field on the Diagnostic
type, which comes from lsp
:
data Diagnostic =
Diagnostic
{ _range :: Range
, _severity :: Maybe DiagnosticSeverity
, _code :: Maybe (Int |? Text)
, _source :: Maybe DiagnosticSource
, _message :: Text
, _tags :: Maybe (List DiagnosticTag)
, _relatedInformation :: Maybe (List DiagnosticRelatedInformation)
} deriving (Show, Read, Eq, Ord, Generic)
Here _message
is Text
whereas we would need to be given something more structured. @wz1000 pointed out (in the same GHC 19905 issue) that we can change this, but rather than keeping this conversation tucked into the GHC issue tracker, I wanted to share this on this repo to collect feedback and useful pointers on how we could push the work forward. Again, from a super quick look at ghcide
codebase it seems like the IdeResult
type could be changed to return not just a Diagnostic
but a GhcMessage
, which we would get (almost) from free from GHC's tcRnModule
(in HEAD
, that is):
tcRnModule :: HscEnv
-> ModSummary
-> Bool
-> HsParsedModule
-> IO (Messages TcRnMessage, Maybe TcGblEnv) -- We would need to `fmap` GhcTcRnMessage over this.
Therefore, concretely, I guess what I am asking is the following:
-
What would be the least painful way to try out whether GHC is solving HLS/ghcide's actual problems? Is
there some magical fork targeting GHCHEAD
somewhere? -
What would be the required steps to feed
ghcide
with structured types instead of a textual_message
? -
Is there any other place apart form the
CodeAction.hs
we should focus or lens on to understand if the GHC work is going into the right direction?
Thanks in advance! 😉