Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve leading vert when pretty-printing patterns #76188

Closed
wants to merge 1 commit into from

Conversation

Aaron1011
Copy link
Member

Fixes #76182

Previously, we did not preserve the precense of a leading vert ('|')
when parsing a pattern. This lead to an instance of #43081 when invoking
a proc-macro, since the pretty-printed tokens would be missing the '|'
present in the captured tokens.

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 1, 2020
@@ -550,6 +550,8 @@ pub struct Pat {
pub id: NodeId,
pub kind: PatKind,
pub span: Span,
/// Whether or not this pattern starts with a leading `|`
pub leading_vert: bool,
Copy link
Member

@matklad matklad Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is vert terminology used elsewhere? Looking at the two TokenKinds (one in ast, one in rustc_lexer), the name for the thing is Or, so maybe stick with leading_or?

OTOH, I don't really enjoy and / or names for & and | symbols, so we might want to rename those (obviously, in a separate PR). In rust-analyzer, those are called amp and pipe, but vert or vbar also would work for | I think (although i do find vert specifically confusing a bit).

EDIT: to clarify on the naming thing: I don't like and / or because they name operations that the symbols somtimes represent, and not the symbols themselves. In contrast, ampersand & vertical bar are the names of the glyphs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vert terminology is used in parse_top_pat, which is why I picked it.

@matklad
Copy link
Member

matklad commented Sep 1, 2020

I wonder if this should be

r? @petrochenkov

?

Fixes rust-lang#76182

Previously, we did not preserve the precense of a leading vert ('|')
when parsing a pattern. This lead to an instance of rust-lang#43081 when invoking
a proc-macro, since the pretty-printed tokens would be missing the '|'
present in the captured tokens.
@petrochenkov
Copy link
Contributor

Perhaps it's possible to fixup this in tokenstream_probably_equal_for_proc_macro instead?
That's what is done for all other trailing separators, and that's a less invasive fix short term. (And pretty-printing / re-parsing won't hopefully be necessary long term.)

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 1, 2020
@Aaron1011
Copy link
Member Author

Aaron1011 commented Sep 1, 2020

Perhaps it's possible to fixup this in tokenstream_probably_equal_for_proc_macro instead?

Unfortunately, this is currently done by completely ignoring those tokens, regardless of the context.

// When checking for `probably_eq`, we ignore certain tokens that aren't
// preserved in the AST. Because they are not preserved, the pretty
// printer arbitrarily adds or removes them when printing as token
// streams, making a comparison between a token stream generated from an
// AST and a token stream which was parsed into an AST more reliable.
fn semantic_tree(tree: &TokenTree) -> bool {
if let TokenTree::Token(token) = tree {
if let
// The pretty printer tends to add trailing commas to
// everything, and in particular, after struct fields.
| token::Comma
// The pretty printer emits `NoDelim` as whitespace.
| token::OpenDelim(DelimToken::NoDelim)
| token::CloseDelim(DelimToken::NoDelim)
// The pretty printer collapses many semicolons into one.
| token::Semi
// The pretty printer collapses whitespace arbitrarily and can
// introduce whitespace from `NoDelim`.
| token::Whitespace
// The pretty printer can turn `$crate` into `::crate_name`
| token::ModSep = token.kind {
return false;
}
}
true

While I don't think there's any special danger of losing | tokens during AST manipulation/token capturing, I'm worried about growing the list of tokens that are completely ignored when checking for 'probable equality'. Ideally, we would just ignore a leading | in a pattern, but I don't think there's a simple way to distinguish something like | pat_binding from int_val1 | int_val2

@Aaron1011
Copy link
Member Author

Aaron1011 commented Sep 1, 2020

As a side node, I'm not very familiar with how the pretty-printer is used (outside of the 'probable equality' check). Is preserving these kinds of syntactic details generally considered desirable, or is removing things like leading |s considered part of the 'pretty' in 'pretty-printing'?

@petrochenkov
Copy link
Contributor

I think the primary requirement is "the output of --pretty=expanded looks readable", because it's used for debugging.
In particular, it should be rustfmtable (#38016).

The second requirement is that the pretty-printed code builds (if it doesn't use hygiene), because some people apparently parse pretty-printed code (#57155 (comment)).

The third (temporary) requirement is that it should be good enough for the proc macro pretty-print / reparse hack.

Only the last requirement requires tokens to be printed precisely.

@Aaron1011
Copy link
Member Author

@petrochenkov: Do you still prefer changing the 'probably equality' check to ignore vertical separators?

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 10, 2020
@petrochenkov
Copy link
Contributor

@Aaron1011
Yes, since trailing/leading separators in AST are only necessary for the print-reparse hack, if we move further with #43081 we'll need to remove them again.

@petrochenkov
Copy link
Contributor

Also, + can be added to the list as well because trailing pluses in Bound1 + Bound2 + Bound3 + are not preserved during pretty-printing.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 10, 2020
Aaron1011 added a commit to Aaron1011/rust that referenced this pull request Sep 10, 2020
Fixes rust-lang#76182

This is an alternative to PR rust-lang#76188

These tokens are not preserved in the AST in certain cases
(e.g. a leading `|` in a pattern or a trailing `+` in a trait bound).

This PR ignores them entirely during the pretty-print/reparse check
to avoid spuriously using the re-parsed tokenstream.
@Aaron1011
Copy link
Member Author

@petrochenkov: I opened #76585, which ignores | and +.

@Aaron1011 Aaron1011 closed this Sep 10, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Sep 12, 2020
…chenkov

Ignore `|` and `+` tokens during proc-macro pretty-print check

Fixes rust-lang#76182

This is an alternative to PR rust-lang#76188

These tokens are not preserved in the AST in certain cases
(e.g. a leading `|` in a pattern or a trailing `+` in a trait bound).

This PR ignores them entirely during the pretty-print/reparse check
to avoid spuriously using the re-parsed tokenstream.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Sep 12, 2020
…chenkov

Ignore `|` and `+` tokens during proc-macro pretty-print check

Fixes rust-lang#76182

This is an alternative to PR rust-lang#76188

These tokens are not preserved in the AST in certain cases
(e.g. a leading `|` in a pattern or a trailing `+` in a trait bound).

This PR ignores them entirely during the pretty-print/reparse check
to avoid spuriously using the re-parsed tokenstream.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Sep 12, 2020
…chenkov

Ignore `|` and `+` tokens during proc-macro pretty-print check

Fixes rust-lang#76182

This is an alternative to PR rust-lang#76188

These tokens are not preserved in the AST in certain cases
(e.g. a leading `|` in a pattern or a trailing `+` in a trait bound).

This PR ignores them entirely during the pretty-print/reparse check
to avoid spuriously using the re-parsed tokenstream.
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 13, 2020
…enkov

Ignore `|` and `+` tokens during proc-macro pretty-print check

Fixes rust-lang#76182

This is an alternative to PR rust-lang#76188

These tokens are not preserved in the AST in certain cases
(e.g. a leading `|` in a pattern or a trailing `+` in a trait bound).

This PR ignores them entirely during the pretty-print/reparse check
to avoid spuriously using the re-parsed tokenstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Leading pipes in patterns need to be preserved in the AST
5 participants