-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Improve error recovery for unclosed use
tree lists.
#4680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, given an unclosed `use name::{…}` would result in the entire rest of the file being parsed as part of the `use_tree_list`, generating errors on virtually every token in the file. This spew of errors is unhelpful, can slow down editors, and in the case of emacs lsp-mode + flycheck (my environment), will cause the editor to stop displaying errors entirely under the assumption that the checker must be broken. We solve this by bailing out of the `use_tree_list` as soon as we encounter a token that isn't a `,` or `}` between or after items. This has the downside that we now do a slightly worse job of recovering from syntax errors _inside_ of a balanced `use_tree_list`, for instance, in `use std::{thread, , mem}`. Previously, we would report one syntax error and then correct emit a `use_tree` for `mem` and close the list; now we'll bail early and emit errors until the semicolon, at which point we resync. This represents a fundamental tradeoff (as best I can tell) in doing error-recover without lookahead; when we encounter that first parse error, we have to either guess that we should continue the `use_tree_list`, or that we should bail. I believe that bailing is superior because it avoids the unbounded desync, and because it vastly improves the common case of adding a new `use` clause from scratch near the top of an existing file. We could potentially improve the other case slightly by discarding tokens until we find something in `ITEM_RECOVERY_SET`, which would reduce the number of errors we spew, but that didn't seem to be a pre-existing idiom in the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some inline notes on the diff. None of them seemed appropriate for comments in the code, but I'm happy to add them if you think they'd be useful there!
/// Note that this is called both by `use_item` and `use_tree_list`, | ||
/// so handles both `some::path::{inner::path}` and `inner::path` in | ||
/// `use some::path::{inner::path};` | ||
fn use_tree(p: &mut Parser, top_level: bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This substantially reverts #1866; The new behavior also fixes the infinite loop and results in a better error-recovery on the provided test case.
"n next(", | ||
9, | ||
); | ||
do_check(r"use a::b::{foo,<|>,bar<|>};", "baz", 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is no longer eligible for block-based reparsing because the new error-recovery behavior no longe results in a single parse tree covering the {…}
block.
ERROR@35..36 | ||
SEMICOLON@35..36 ";" | ||
SEMICOLON@22..23 ";" | ||
WHITESPACE@23..24 "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of the new behavior being superior on a pre-existing test case; we emit many fewer spurious errors, and manage to recover in time to capture the use
tree on line 2 properly.
@@ -0,0 +1,3 @@ | |||
use std::{thread:, mem}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case demonstrates the case in which the new behavior is slightly worse and emits more spurious errors, and drops the std::mem
use
.
@@ -0,0 +1 @@ | |||
use ra_syntax::{TextSize, TextRange,}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While testing this, I accidentally broke the behavior of a use
tree with a trailing comma. The bug was caught in the IDE tests, but there was no pre-exising parser test so I added one.
Sadly, I still don't have a lot of time to look into this, but let me quickly write down some notes, without looking into the code.
So, to unblock progress here we need:
|
Urgs, sorry for still not getting back to this. This is super-important, but sadly I don't seem able to carve out the time to cleanup the parser. I'll close this PR, but I definetelly won't forget about it, and come back once I start with fixing the paresr :( |
Previously, given an unclosed
use name::{…}
would result in theentire rest of the file being parsed as part of the
use_tree_list
,generating errors on virtually every token in the file. This spew of
errors is unhelpful, can slow down editors, and in the case of emacs
lsp-mode + flycheck (my environment), will cause the editor to stop
displaying errors entirely under the assumption that the checker must
be broken.
We solve this by bailing out of the
use_tree_list
as soon as weencounter a token that isn't a
,
or}
between or after items.This has the downside that we now do a slightly worse job of
recovering from syntax errors inside of a balanced
use_tree_list
,for instance, in
use std::{thread, , mem}
. Previously, we wouldreport one syntax error and then correct emit a
use_tree
formem
and close the list; now we'll bail early and emit errors until the
semicolon, at which point we resync.
This represents a fundamental tradeoff (as best I can tell) in doing
error-recover without lookahead; when we encounter that first parse
error, we have to either guess that we should continue the
use_tree_list
, or that we should bail. I believe that bailing issuperior because it avoids the unbounded desync, and because it vastly
improves the common case of adding a new
use
clause from scratchnear the top of an existing file.
We could potentially improve the other case slightly by discarding
tokens until we find something in
ITEM_RECOVERY_SET
, which wouldreduce the number of errors we spew, but that didn't seem to be a
pre-existing idiom in the code base.