Skip to content

Conversation

nelhage
Copy link
Contributor

@nelhage nelhage commented Jun 1, 2020

Previously, given an unclosed use name::{…} would result in the
entire rest of the file being parsed as part of the use_tree_list,
generating errors on virtually every token in the file. This spew of
errors is unhelpful, can slow down editors, and in the case of emacs
lsp-mode + flycheck (my environment), will cause the editor to stop
displaying errors entirely under the assumption that the checker must
be broken.

We solve this by bailing out of the use_tree_list as soon as we
encounter a token that isn't a , or } between or after items.

This has the downside that we now do a slightly worse job of
recovering from syntax errors inside of a balanced use_tree_list,
for instance, in use std::{thread, , mem}. Previously, we would
report one syntax error and then correct emit a use_tree for mem
and close the list; now we'll bail early and emit errors until the
semicolon, at which point we resync.

This represents a fundamental tradeoff (as best I can tell) in doing
error-recover without lookahead; when we encounter that first parse
error, we have to either guess that we should continue the
use_tree_list, or that we should bail. I believe that bailing is
superior because it avoids the unbounded desync, and because it vastly
improves the common case of adding a new use clause from scratch
near the top of an existing file.

We could potentially improve the other case slightly by discarding
tokens until we find something in ITEM_RECOVERY_SET, which would
reduce the number of errors we spew, but that didn't seem to be a
pre-existing idiom in the code base.

Previously, given an unclosed `use name::{…}` would result in the
entire rest of the file being parsed as part of the `use_tree_list`,
generating errors on virtually every token in the file. This spew of
errors is unhelpful, can slow down editors, and in the case of emacs
lsp-mode + flycheck (my environment), will cause the editor to stop
displaying errors entirely under the assumption that the checker must
be broken.

We solve this by bailing out of the `use_tree_list` as soon as we
encounter a token that isn't a `,` or `}` between or after items.

This has the downside that we now do a slightly worse job of
recovering from syntax errors _inside_ of a balanced `use_tree_list`,
for instance, in `use std::{thread, , mem}`. Previously, we would
report one syntax error and then correct emit a `use_tree` for `mem`
and close the list; now we'll bail early and emit errors until the
semicolon, at which point we resync.

This represents a fundamental tradeoff (as best I can tell) in doing
error-recover without lookahead; when we encounter that first parse
error, we have to either guess that we should continue the
`use_tree_list`, or that we should bail. I believe that bailing is
superior because it avoids the unbounded desync, and because it vastly
improves the common case of adding a new `use` clause from scratch
near the top of an existing file.

We could potentially improve the other case slightly by discarding
tokens until we find something in `ITEM_RECOVERY_SET`, which would
reduce the number of errors we spew, but that didn't seem to be a
pre-existing idiom in the code base.
Copy link
Contributor Author

@nelhage nelhage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some inline notes on the diff. None of them seemed appropriate for comments in the code, but I'm happy to add them if you think they'd be useful there!

/// Note that this is called both by `use_item` and `use_tree_list`,
/// so handles both `some::path::{inner::path}` and `inner::path` in
/// `use some::path::{inner::path};`
fn use_tree(p: &mut Parser, top_level: bool) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This substantially reverts #1866; The new behavior also fixes the infinite loop and results in a better error-recovery on the provided test case.

"n next(",
9,
);
do_check(r"use a::b::{foo,<|>,bar<|>};", "baz", 10);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is no longer eligible for block-based reparsing because the new error-recovery behavior no longe results in a single parse tree covering the {…} block.

ERROR@35..36
SEMICOLON@35..36 ";"
SEMICOLON@22..23 ";"
WHITESPACE@23..24 "\n"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of the new behavior being superior on a pre-existing test case; we emit many fewer spurious errors, and manage to recover in time to capture the use tree on line 2 properly.

@@ -0,0 +1,3 @@
use std::{thread:, mem};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case demonstrates the case in which the new behavior is slightly worse and emits more spurious errors, and drops the std::mem use.

@@ -0,0 +1 @@
use ra_syntax::{TextSize, TextRange,};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing this, I accidentally broke the behavior of a use tree with a trailing comma. The bug was caught in the IDE tests, but there was no pre-exising parser test so I added one.

@matklad
Copy link
Contributor

matklad commented Jun 9, 2020

Sadly, I still don't have a lot of time to look into this, but let me quickly write down some notes, without looking into the code.

  • we try to maintain an invariant that for any parse tree {}pairs form the right parse sequence. That means that { and the corrsponding } always are first and last child of the same parent.
  • the reason we want this invariant is to enable simple heuristic for incremetal reparsing -- as long as the user typed anything which has {} balanced, it's always safe to reparse only the parent {} block
  • we however have troubles with enforcing this invariant -- we have another pair of delimiters, which is even more strong than {} -- invisible parenthesis (L_DOLLAR & R_DOLLAR) from macro expansion. We def hit bugs where dollars and {} conflicted with each other, but I don't remember details now.
  • I am actually not sure that, for incremental reparsing, the said invariant is required/sufficient

So, to unblock progress here we need:

  • to figure out what invariants we actually want
  • document them
  • add required asserts for checking them
  • ideally, add fuzzy/property tests for them

@matklad
Copy link
Contributor

matklad commented Aug 30, 2021

Urgs, sorry for still not getting back to this. This is super-important, but sadly I don't seem able to carve out the time to cleanup the parser. I'll close this PR, but I definetelly won't forget about it, and come back once I start with fixing the paresr :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants