-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ruff] Fix syntax error introduced for an empty string followed by a u-prefixed string (UP025)
#18899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ntBre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I had a suggestion for a slightly simpler version of the current implementation and also a question that I'm not sure about either.
| let first_char = checker | ||
| .locator() | ||
| .slice(TextRange::at(string.start(), TextSize::new(1))); | ||
| let u_position = u32::from(!(first_char == "u" || first_char == "U")); | ||
| let prefix_range = | ||
| TextRange::at(string.start() + TextSize::new(u_position), TextSize::new(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this arithmetic is necessary. The is_unicode check above confirms that the first character will be a u, so this just boils down to what was inside Edit::range_deletion originally.
| let first_char = checker | |
| .locator() | |
| .slice(TextRange::at(string.start(), TextSize::new(1))); | |
| let u_position = u32::from(!(first_char == "u" || first_char == "U")); | |
| let prefix_range = | |
| TextRange::at(string.start() + TextSize::new(u_position), TextSize::new(1)); | |
| let prefix_range = TextRange::at(string.start(), TextSize::new(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think something like this might be a simpler version of the current implementation:
let prefix_range = TextRange::at(string.start(), TextSize::new(1));
let locator = checker.locator();
let content = locator
.slice(TextRange::new(prefix_range.end(), string.end()))
.to_owned();
// If the preceding character is equivalent to the quote character, insert a space to avoid a
// syntax error. For example, when removing the `u` prefix in `""u""`, rewrite to `"" ""`
// instead of `""""`.
// see https://github.com/astral-sh/ruff/issues/18895
let edit = if locator
.slice(TextRange::up_to(prefix_range.start()))
.chars()
.last()
.is_some_and(|char| content.starts_with(char))
{
Edit::range_replacement(" ".to_string(), prefix_range)
} else {
Edit::range_deletion(prefix_range)
};
diagnostic.set_fix(Fix::safe_edit(edit));This just gives either a deletion like before or simply replaces the u with a space, instead of having to replace the whole string.
| 26 26 | | ||
| 27 |-f"foo"u"bar" # OK | ||
| 27 |+f"foo""bar" # OK | ||
| 27 |+f"foo" "bar" # OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly concerned that we're now adding spaces where they aren't totally necessary. The issue indicated a few more constraints on the problem:
introduce a syntax error when the prefixed string follows a non-triple-quoted empty string with the same kind of quotation marks without an intervening space
Cases like this ("foo""bar" and even "foo""") are okay, as are empty triple-quoted strings like """""""" (8 double quotes):
$ python -m tokenize <<<'""""""""'
1,0-1,6: STRING '""""""'
1,6-1,8: STRING '""'
1,8-1,9: NEWLINE '\n'
2,0-2,0: ENDMARKER ''I think we could check the kind of the preceding token to see if it's a triple-quoted string, but I don't think we can perform an is_empty check on the tokens.
@MichaReiser what do you think? Is it okay to add spaces where they aren't strictly necessary, or do you know of a better way to check if the previous string part is non-triple-quoted and empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the easiest would be to make this a rule that operates on StringLike and then iterate over all parts. This also ensures that, when removing multiple parts, that the spacing is only inserted when necessary.
I think we could check the kind of the preceding token to see if it's a triple-quoted string, but I don't think we can perform an is_empty check on the tokens.
I think this is correct because of line continuations where subtracting the prefix from the range doesn't yield the correct result:
a= "\
" u"c"s it okay to add spaces where they aren't strictly necessary, or do you know of a better way to check if the previous string part is non-triple-quoted and empty?
I think it's fine for as long as it only pads at most one space (never results in two consecutive spaces)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to this problem, I don't really understand the next steps. Here's the current implementation
/// UP025
pub(crate) fn unicode_kind_prefix(checker: &Checker, string: &StringLiteral) {
if string.flags.prefix().is_unicode() {
let mut diagnostic = checker.report_diagnostic(UnicodeKindPrefix, string.range);
let prefix_range = TextRange::at(string.start(), TextSize::new(1));
let locator = checker.locator();
let content = locator
.slice(TextRange::new(prefix_range.end(), string.end()))
.to_owned();
// If the preceding character is equivalent to the quote character, insert a space to avoid a
// syntax error. For example, when removing the `u` prefix in `""u""`, rewrite to `"" ""`
// instead of `""""`.
// see https://github.com/astral-sh/ruff/issues/18895
let edit = if locator
.slice(TextRange::up_to(prefix_range.start()))
.chars()
.last()
.is_some_and(|char| content.starts_with(char))
{
Edit::range_replacement(" ".to_string(), prefix_range)
} else {
Edit::range_deletion(prefix_range)
};
diagnostic.set_fix(Fix::safe_edit(edit));
}
}As far as I understand, it correctly handles the following cases:
before fix:
a = "\
" u"c"
f"foo"u"bar"
f"foo" u"bar"
""u""after fix:
a = "\
" "c"
f"foo" "bar"
f"foo" "bar"
"" ""In that case, I don't understand the drawbacks of the current simple implementation. Are there any cases that are currently being handled differently than desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My argument was that a case like this:
f"foo"u"bar"is not being handled correctly because we're adding a space that isn't necessary. In other words, this fix would be fine:
f"foo""bar"but it's being fixed to
f"foo" "bar"Micha said that could be okay as long as we never add a second space, but I don't entirely follow the line-continuation example. It does seem to be handled properly in the current version, as you said.
As Micha pointed out, another approach would be to iterate over the parts of the string instead of looking at a single part at a time. Then we could check the conditions mentioned in the issue that actually require a space to be inserted: the preceding string part is not triple quoted and not empty.
To be more concrete, we would pass in value here (or a transformed version of it to allow passing both f-strings and regular strings) instead of passing a single string_part at a time:
ruff/crates/ruff_linter/src/checkers/ast/analyze/expression.rs
Lines 1575 to 1579 in 10a14a9
| if checker.is_rule_enabled(Rule::UnicodeKindPrefix) { | |
| for string_part in value { | |
| pyupgrade::rules::unicode_kind_prefix(checker, string_part); | |
| } | |
| } |
With all that being said, maybe it's not worth worrying about and we can just move ahead with this version. I think the space looks better anyway, and I can't come up with a counter-example that adds a second space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the simpler the implementation (and it's quite simple now) the better, and if it works correctly, why not keep it as it is? but it might be worth asking someone else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway, if you need to do it differently, I'll do it, just tell me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, let's move ahead with the simple fix for now. Thank you!
…ix.rs Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
ruff] Fix introduces a syntax error for an empty string followed by a u-prefixed string (UP025)ruff] Fix syntax error introduced for an empty string followed by a u-prefixed string (UP025)
Summary
/closes #18895
Test Plan