Skip to content

Integrate GHC's new diagnostic infrastructure #2014

Open
@adinapoli

Description

@adinapoli

Hello folks,

This is a meta ticket, feel free to close it if you think this is not the appropriate place to discuss the subject.

Summary: GHC changed how diagnostics are emitted, HLS can benefit from this, help needed to integrate the work and feedback welcome to improve GHC to best serve HLS.

Well-Typed has been working together with @goldfirere for the past few months on what used to be GHC proposal 306, i.e. refactor GHC so that it would emit richer Haskell types instead of SDocs for its diagnostic messages. Since the original GHC ticket (cfr. ghc#18516) and the inception of our work, we tried to do our best to keep in mind HLS and Haskell IDEs in general as the primary consumers/beneficiaries of this work. The Wiki page goes a bit deeper into the original design work, if people are interested.

I think we are now at a stage where it would be nice to start thinking how we can collectively integrate this into HLS and what we can learn from this exercise so that we can improve what we have already in GHC before 9.4 (that would be nice but not strictly required, I guess).

30 seconds overview

To give you a 30 seconds overview of GHC's HEAD, we now have a typeclass:

class Diagnostic a where
  diagnosticMessage :: a -> DecoratedSDoc
  diagnosticReason  :: a -> DiagnosticReason
  diagnosticHints   :: a -> [GhcHint]

And a bunch of types that implement instances for it, which forms a hierarchy:

data GhcMessage where
  -- | A message from the parsing phase.
  GhcPsMessage      :: PsMessage -> GhcMessage
  -- | A message from typecheck/renaming phase.
  GhcTcRnMessage    :: TcRnMessage -> GhcMessage
  -- | A message from the desugaring (HsToCore) phase.
  GhcDsMessage      :: DsMessage -> GhcMessage
  -- | A message from the driver.
  GhcDriverMessage  :: DriverMessage -> GhcMessage
  -- | An \"escape\" hatch which can be used when we don't know the source of
  -- the message or if the message is not one of the typed ones. 
  GhcUnknownMessage :: forall a. (Diagnostic a, Typeable a) => a -> GhcMessage

GHC now emits these types (each individually wrapped in a MsgEnvelope which gives you Severity and a SrcSpan amongst other things) which can be "queried" to get not just the pretty-printed message but also the reason why it arose (it will include aWarningFlag if this is a warning) and a list of GhcHint. Furthermore, both these types and GhcHint will include proper information about the diagnostic: for example, if this is a message coming from the type checker about unused pattern binds, GHC will emit a TcRnUnusedPatternBinds :: HsBind GhcRn -> TcRnMessage that will give the caller the actual binds, not just a list of SDoc. The GhcHint type follows a similar logic, for example GHC might emit [SuggestExtension LangExt.BangPatterns].

Little disclaimer

Currently not all diagnostics have been ported within GHC (due to their sheer amount). For some of those, we still have the old SDoc wrapped into a constructor (specific to the type of message we want to emit), like, for example a TcRnUnknownMessage :: SDoc -> TcRnMessage. Similarly, not all diagnostics have a proper list of hints attached to them, for some not-yet-ported or not-fully-ported, we still have the hints bundled with the pretty-printed message (i.e. the diagnosticMessage). We are actively working on this.

Where to go from here

The idea would be for HLS to use this information directly instead of parsing and using regexes on the raw output of the compiler. In a GHC meta-ticket about the "remaining work" (see "Little Disclaimer" section) @wz1000 and @bgamari suggested to look at the error parsing logic in ghcide to see if we can get rid of all those "matchRegex" calls and just use the new infrastructure.

I wanted to have some fun taking a crack at it (as a proof-of-concept, mostly), but it revealed to be harder than I was expecting, mostly due to the perfect storm arising from the changes between GHC 9.2 and 9.3/HEAD (including Alan's work on exactprint) and the changes to Data.List and its import which broke virtually all the packages I have tried to build. Despite I have used the patched versions of @fendor I have found online, having a working version of ghcide built with HEAD was too hard and I have abandoned ship, but I am happy to be told I was doing it all wrong and there is a branch that magically compiles :)

Having said that, I've started taking a cursory look at the CodeAction.hs module, and the first thing that stood out was that currently ghcide is running regexes on the _message field on the Diagnostic type, which comes from lsp:

data Diagnostic =
  Diagnostic
    { _range              :: Range
    , _severity           :: Maybe DiagnosticSeverity
    , _code               :: Maybe (Int |? Text)
    , _source             :: Maybe DiagnosticSource
    , _message            :: Text
    , _tags               :: Maybe (List DiagnosticTag)
    , _relatedInformation :: Maybe (List DiagnosticRelatedInformation)
    } deriving (Show, Read, Eq, Ord, Generic)

Here _message is Text whereas we would need to be given something more structured. @wz1000 pointed out (in the same GHC 19905 issue) that we can change this, but rather than keeping this conversation tucked into the GHC issue tracker, I wanted to share this on this repo to collect feedback and useful pointers on how we could push the work forward. Again, from a super quick look at ghcide codebase it seems like the IdeResult type could be changed to return not just a Diagnostic but a GhcMessage, which we would get (almost) from free from GHC's tcRnModule (in HEAD, that is):

tcRnModule :: HscEnv
           -> ModSummary
           -> Bool
           -> HsParsedModule
           -> IO (Messages TcRnMessage, Maybe TcGblEnv) -- We would need to `fmap` GhcTcRnMessage over this.

Therefore, concretely, I guess what I am asking is the following:

  1. What would be the least painful way to try out whether GHC is solving HLS/ghcide's actual problems? Is
    there some magical fork targeting GHC HEAD somewhere?

  2. What would be the required steps to feed ghcide with structured types instead of a textual _message?

  3. Is there any other place apart form the CodeAction.hs we should focus or lens on to understand if the GHC work is going into the right direction?

Thanks in advance! 😉

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions