Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle str literals written with ' lexed as lifetime #122217

Merged
merged 6 commits into from
Mar 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion compiler/rustc_infer/messages.ftl
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ infer_lifetime_param_suggestion_elided = each elided lifetime in input position

infer_meant_byte_literal = if you meant to write a byte literal, prefix with `b`
infer_meant_char_literal = if you meant to write a `char` literal, use single quotes
infer_meant_str_literal = if you meant to write a `str` literal, use double quotes
infer_meant_str_literal = if you meant to write a string literal, use double quotes
infer_mismatched_static_lifetime = incompatible lifetime on type
infer_more_targeted = {$has_param_name ->
[true] `{$param_name}`
Expand Down
13 changes: 5 additions & 8 deletions compiler/rustc_infer/src/errors/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1339,15 +1339,12 @@ pub enum TypeErrorAdditionalDiags {
span: Span,
code: String,
},
#[suggestion(
infer_meant_str_literal,
code = "\"{code}\"",
applicability = "machine-applicable"
)]
#[multipart_suggestion(infer_meant_str_literal, applicability = "machine-applicable")]
MeantStrLiteral {
#[primary_span]
span: Span,
code: String,
#[suggestion_part(code = "\"")]
start: Span,
#[suggestion_part(code = "\"")]
end: Span,
},
#[suggestion(
infer_consider_specifying_length,
Expand Down
14 changes: 4 additions & 10 deletions compiler/rustc_infer/src/infer/error_reporting/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2078,16 +2078,10 @@ impl<'tcx> TypeErrCtxt<'_, 'tcx> {
// If a string was expected and the found expression is a character literal,
// perhaps the user meant to write `"s"` to specify a string literal.
(ty::Ref(_, r, _), ty::Char) if r.is_str() => {
if let Ok(code) = self.tcx.sess().source_map().span_to_snippet(span) {
if let Some(code) =
code.strip_prefix('\'').and_then(|s| s.strip_suffix('\''))
{
suggestions.push(TypeErrorAdditionalDiags::MeantStrLiteral {
span,
code: escape_literal(code),
})
}
}
suggestions.push(TypeErrorAdditionalDiags::MeantStrLiteral {
start: span.with_hi(span.lo() + BytePos(1)),
end: span.with_lo(span.hi() - BytePos(1)),
})
}
// For code `if Some(..) = expr `, the type mismatch may be expected `bool` but found `()`,
// we try to suggest to add the missing `let` for `if let Some(..) = expr`
Expand Down
11 changes: 10 additions & 1 deletion compiler/rustc_lexer/src/cursor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ impl<'a> Cursor<'a> {
/// If requested position doesn't exist, `EOF_CHAR` is returned.
/// However, getting `EOF_CHAR` doesn't always mean actual end of file,
/// it should be checked with `is_eof` method.
pub(crate) fn first(&self) -> char {
pub fn first(&self) -> char {
// `.next()` optimizes better than `.nth(0)`
self.chars.clone().next().unwrap_or(EOF_CHAR)
}
Expand All @@ -59,6 +59,15 @@ impl<'a> Cursor<'a> {
iter.next().unwrap_or(EOF_CHAR)
}

/// Peeks the third symbol from the input stream without consuming it.
pub fn third(&self) -> char {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since second and third are exclusively used in the error path if I'm not mistaken (haven't double-checked), perf shouldn't matter that much and we can maybe avoid introducing those helpers? Idk, do we have access to self.chars in the parser? If so, we could just use .clone().nth(N), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.chars is private to the lexer. I initially made it public, but given it is used only in one place it felt better to introduce the method. But then we could instead just have an nth method that expands to .clone().nth(N) instead.

// `.next()` optimizes better than `.nth(1)`
let mut iter = self.chars.clone();
iter.next();
iter.next();
iter.next().unwrap_or(EOF_CHAR)
}

/// Checks if there is nothing more to consume.
pub(crate) fn is_eof(&self) -> bool {
self.chars.as_str().is_empty()
Expand Down
3 changes: 2 additions & 1 deletion compiler/rustc_parse/messages.ftl
Original file line number Diff line number Diff line change
Expand Up @@ -570,7 +570,7 @@ parse_more_than_one_char = character literal may only contain one codepoint
.remove_non = consider removing the non-printing characters
.use_double_quotes = if you meant to write a {$is_byte ->
[true] byte string
*[false] `str`
*[false] string
} literal, use double quotes

parse_multiple_skipped_lines = multiple lines skipped by escaped newline
Expand Down Expand Up @@ -835,6 +835,7 @@ parse_unknown_prefix = prefix `{$prefix}` is unknown
.label = unknown prefix
.note = prefixed identifiers and literals are reserved since Rust 2021
.suggestion_br = use `br` for a raw byte string
.suggestion_str = if you meant to write a string literal, use double quotes
.suggestion_whitespace = consider inserting whitespace here

parse_unknown_start_of_token = unknown start of token: {$escaped}
Expand Down
22 changes: 21 additions & 1 deletion compiler/rustc_parse/src/errors.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1994,6 +1994,17 @@ pub enum UnknownPrefixSugg {
style = "verbose"
)]
Whitespace(#[primary_span] Span),
#[multipart_suggestion(
parse_suggestion_str,
applicability = "maybe-incorrect",
style = "verbose"
)]
MeantStr {
#[suggestion_part(code = "\"")]
start: Span,
#[suggestion_part(code = "\"")]
end: Span,
},
}

#[derive(Diagnostic)]
Expand Down Expand Up @@ -2205,12 +2216,21 @@ pub enum MoreThanOneCharSugg {
ch: String,
},
#[suggestion(parse_use_double_quotes, code = "{sugg}", applicability = "machine-applicable")]
Quotes {
QuotesFull {
#[primary_span]
span: Span,
is_byte: bool,
sugg: String,
},
#[multipart_suggestion(parse_use_double_quotes, applicability = "machine-applicable")]
Quotes {
#[suggestion_part(code = "{prefix}\"")]
start: Span,
#[suggestion_part(code = "\"")]
end: Span,
is_byte: bool,
prefix: &'static str,
},
}

#[derive(Subdiagnostic)]
Expand Down
57 changes: 52 additions & 5 deletions compiler/rustc_parse/src/lexer/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ pub(crate) fn parse_token_trees<'psess, 'src>(
cursor,
override_span,
nbsp_is_whitespace: false,
last_lifetime: None,
};
let (stream, res, unmatched_delims) =
tokentrees::TokenTreesReader::parse_all_token_trees(string_reader);
Expand Down Expand Up @@ -105,6 +106,10 @@ struct StringReader<'psess, 'src> {
/// in this file, it's safe to treat further occurrences of the non-breaking
/// space character as whitespace.
nbsp_is_whitespace: bool,

/// Track the `Span` for the leading `'` of the last lifetime. Used for
/// diagnostics to detect possible typo where `"` was meant.
last_lifetime: Option<Span>,
}

impl<'psess, 'src> StringReader<'psess, 'src> {
Expand All @@ -130,6 +135,18 @@ impl<'psess, 'src> StringReader<'psess, 'src> {

debug!("next_token: {:?}({:?})", token.kind, self.str_from(start));

if let rustc_lexer::TokenKind::Semi
| rustc_lexer::TokenKind::LineComment { .. }
| rustc_lexer::TokenKind::BlockComment { .. }
| rustc_lexer::TokenKind::CloseParen
| rustc_lexer::TokenKind::CloseBrace
| rustc_lexer::TokenKind::CloseBracket = token.kind
{
// Heuristic: we assume that it is unlikely we're dealing with an unterminated
// string surrounded by single quotes.
self.last_lifetime = None;
}

// Now "cook" the token, converting the simple `rustc_lexer::TokenKind` enum into a
// rich `rustc_ast::TokenKind`. This turns strings into interned symbols and runs
// additional validation.
Expand Down Expand Up @@ -247,6 +264,7 @@ impl<'psess, 'src> StringReader<'psess, 'src> {
// expansion purposes. See #12512 for the gory details of why
// this is necessary.
let lifetime_name = self.str_from(start);
self.last_lifetime = Some(self.mk_sp(start, start + BytePos(1)));
if starts_with_number {
let span = self.mk_sp(start, self.pos);
self.dcx().struct_err("lifetimes cannot start with a number")
Expand Down Expand Up @@ -395,10 +413,21 @@ impl<'psess, 'src> StringReader<'psess, 'src> {
match kind {
rustc_lexer::LiteralKind::Char { terminated } => {
if !terminated {
self.dcx()
let mut err = self
.dcx()
.struct_span_fatal(self.mk_sp(start, end), "unterminated character literal")
.with_code(E0762)
.emit()
.with_code(E0762);
if let Some(lt_sp) = self.last_lifetime {
err.multipart_suggestion(
"if you meant to write a string literal, use double quotes",
vec![
(lt_sp, "\"".to_string()),
(self.mk_sp(start, start + BytePos(1)), "\"".to_string()),
],
Applicability::MaybeIncorrect,
);
}
err.emit()
}
self.cook_unicode(token::Char, Mode::Char, start, end, 1, 1) // ' '
}
Expand Down Expand Up @@ -669,15 +698,33 @@ impl<'psess, 'src> StringReader<'psess, 'src> {
let expn_data = prefix_span.ctxt().outer_expn_data();

if expn_data.edition >= Edition::Edition2021 {
let mut silence = false;
// In Rust 2021, this is a hard error.
let sugg = if prefix == "rb" {
Some(errors::UnknownPrefixSugg::UseBr(prefix_span))
} else if expn_data.is_root() {
Some(errors::UnknownPrefixSugg::Whitespace(prefix_span.shrink_to_hi()))
if self.cursor.first() == '\''
&& let Some(start) = self.last_lifetime
&& self.cursor.third() != '\''
{
// An "unclosed `char`" error will be emitted already, silence redundant error.
silence = true;
Some(errors::UnknownPrefixSugg::MeantStr {
start,
end: self.mk_sp(self.pos, self.pos + BytePos(1)),
})
} else {
Some(errors::UnknownPrefixSugg::Whitespace(prefix_span.shrink_to_hi()))
}
} else {
None
};
self.dcx().emit_err(errors::UnknownPrefix { span: prefix_span, prefix, sugg });
let err = errors::UnknownPrefix { span: prefix_span, prefix, sugg };
if silence {
self.dcx().create_err(err).delay_as_bug();
} else {
self.dcx().emit_err(err);
}
Comment on lines +722 to +727
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably structure it a bit differently, namely let err = create_err(); if silence { err.delay_as_bug() } else { err.emit() } but that's just a difference in taste. Definitely not blocking.

} else {
// Before Rust 2021, only emit a lint for migration.
self.psess.buffer_lint_with_diagnostic(
Expand Down
20 changes: 15 additions & 5 deletions compiler/rustc_parse/src/lexer/unescape_error_reporting.rs
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,21 @@ pub(crate) fn emit_unescape_error(
}
escaped.push(c);
}
let sugg = format!("{prefix}\"{escaped}\"");
MoreThanOneCharSugg::Quotes {
span: full_lit_span,
is_byte: mode == Mode::Byte,
sugg,
if escaped.len() != lit.len() || full_lit_span.is_empty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this prevents the negative overflow & that makes sense but why wasn't this an issue before? Don't have the time to dig deep into the code myself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we were replacing the full span with the new string (instead of doing BytePos math). We just end up with an invalid suggestion that won't be displayed.

let sugg = format!("{prefix}\"{escaped}\"");
MoreThanOneCharSugg::QuotesFull {
span: full_lit_span,
is_byte: mode == Mode::Byte,
sugg,
}
} else {
MoreThanOneCharSugg::Quotes {
start: full_lit_span
.with_hi(full_lit_span.lo() + BytePos((prefix.len() + 1) as u32)),
end: full_lit_span.with_lo(full_lit_span.hi() - BytePos(1)),
is_byte: mode == Mode::Byte,
prefix,
}
}
});
dcx.emit_err(UnescapeError::MoreThanOneChar {
Expand Down
12 changes: 6 additions & 6 deletions tests/ui/inference/str-as-char.stderr
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ error: character literal may only contain one codepoint
LL | let _: &str = '"""';
| ^^^^^
|
help: if you meant to write a `str` literal, use double quotes
help: if you meant to write a string literal, use double quotes
|
LL | let _: &str = "\"\"\"";
| ~~~~~~~~
Expand All @@ -15,18 +15,18 @@ error: character literal may only contain one codepoint
LL | let _: &str = '\"\"\"';
| ^^^^^^^^
|
help: if you meant to write a `str` literal, use double quotes
help: if you meant to write a string literal, use double quotes
|
LL | let _: &str = "\"\"\"";
| ~~~~~~~~
| ~ ~

error: character literal may only contain one codepoint
--> $DIR/str-as-char.rs:10:19
|
LL | let _: &str = '"\"\"\\"\\"';
| ^^^^^^^^^^^^^^^^^
|
help: if you meant to write a `str` literal, use double quotes
help: if you meant to write a string literal, use double quotes
|
LL | let _: &str = "\"\"\\"\\"\\\"";
| ~~~~~~~~~~~~~~~~~~~~
Expand All @@ -39,10 +39,10 @@ LL | let _: &str = 'a';
| |
| expected due to this
|
help: if you meant to write a `str` literal, use double quotes
help: if you meant to write a string literal, use double quotes
|
LL | let _: &str = "a";
| ~~~
| ~ ~

error: aborting due to 4 previous errors

Expand Down
4 changes: 2 additions & 2 deletions tests/ui/issues/issue-23589.stderr
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ error[E0308]: mismatched types
LL | let v: Vec(&str) = vec!['1', '2'];
| ^^^ expected `&str`, found `char`
|
help: if you meant to write a `str` literal, use double quotes
help: if you meant to write a string literal, use double quotes
|
LL | let v: Vec(&str) = vec!["1", '2'];
| ~~~
| ~ ~

error: aborting due to 2 previous errors

Expand Down
4 changes: 2 additions & 2 deletions tests/ui/lexer/lex-bad-char-literals-2.stderr
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ error: character literal may only contain one codepoint
LL | 'nope'
| ^^^^^^
|
help: if you meant to write a `str` literal, use double quotes
help: if you meant to write a string literal, use double quotes
|
LL | "nope"
| ~~~~~~
| ~ ~

error: aborting due to 1 previous error

8 changes: 4 additions & 4 deletions tests/ui/lexer/lex-bad-char-literals-3.stderr
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,21 @@ error: character literal may only contain one codepoint
LL | static c: char = '●●';
| ^^^^
|
help: if you meant to write a `str` literal, use double quotes
help: if you meant to write a string literal, use double quotes
|
LL | static c: char = "●●";
| ~~~~
| ~ ~

error: character literal may only contain one codepoint
--> $DIR/lex-bad-char-literals-3.rs:5:20
|
LL | let ch: &str = '●●';
| ^^^^
|
help: if you meant to write a `str` literal, use double quotes
help: if you meant to write a string literal, use double quotes
|
LL | let ch: &str = "●●";
| ~~~~
| ~ ~

error: aborting due to 2 previous errors

8 changes: 4 additions & 4 deletions tests/ui/lexer/lex-bad-char-literals-5.stderr
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,21 @@ error: character literal may only contain one codepoint
LL | static c: char = '\x10\x10';
| ^^^^^^^^^^
|
help: if you meant to write a `str` literal, use double quotes
help: if you meant to write a string literal, use double quotes
|
LL | static c: char = "\x10\x10";
| ~~~~~~~~~~
| ~ ~

error: character literal may only contain one codepoint
--> $DIR/lex-bad-char-literals-5.rs:5:20
|
LL | let ch: &str = '\x10\x10';
| ^^^^^^^^^^
|
help: if you meant to write a `str` literal, use double quotes
help: if you meant to write a string literal, use double quotes
|
LL | let ch: &str = "\x10\x10";
| ~~~~~~~~~~
| ~ ~

error: aborting due to 2 previous errors

Loading
Loading