Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions crates/ruff_linter/resources/test/fixtures/pyupgrade/UP025.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,9 @@ def hello():

f"foo"u"bar" # OK
f"foo" u"bar" # OK

# https://github.com/astral-sh/ruff/issues/18895
""u""
""u"hi"
""""""""""""""""""""u"hi"
""U"helloooo"
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,28 @@ impl AlwaysFixableViolation for UnicodeKindPrefix {
pub(crate) fn unicode_kind_prefix(checker: &Checker, string: &StringLiteral) {
if string.flags.prefix().is_unicode() {
let mut diagnostic = checker.report_diagnostic(UnicodeKindPrefix, string.range);
diagnostic.set_fix(Fix::safe_edit(Edit::range_deletion(TextRange::at(
string.start(),
TextSize::from(1),
))));

let prefix_range = TextRange::at(string.start(), TextSize::new(1));
let locator = checker.locator();
let content = locator
.slice(TextRange::new(prefix_range.end(), string.end()))
.to_owned();

// If the preceding character is equivalent to the quote character, insert a space to avoid a
// syntax error. For example, when removing the `u` prefix in `""u""`, rewrite to `"" ""`
// instead of `""""`.
// see https://github.com/astral-sh/ruff/issues/18895
let edit = if locator
.slice(TextRange::up_to(prefix_range.start()))
.chars()
.last()
.is_some_and(|char| content.starts_with(char))
{
Edit::range_replacement(" ".to_string(), prefix_range)
} else {
Edit::range_deletion(prefix_range)
};

diagnostic.set_fix(Fix::safe_edit(edit));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -281,14 +281,18 @@ UP025.py:27:7: UP025 [*] Remove unicode literals from strings
25 25 | return"Hello" # OK
26 26 |
27 |-f"foo"u"bar" # OK
27 |+f"foo""bar" # OK
27 |+f"foo" "bar" # OK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly concerned that we're now adding spaces where they aren't totally necessary. The issue indicated a few more constraints on the problem:

introduce a syntax error when the prefixed string follows a non-triple-quoted empty string with the same kind of quotation marks without an intervening space

Cases like this ("foo""bar" and even "foo""") are okay, as are empty triple-quoted strings like """""""" (8 double quotes):

$ python -m tokenize <<<'""""""""'
1,0-1,6:            STRING         '""""""'
1,6-1,8:            STRING         '""'
1,8-1,9:            NEWLINE        '\n'
2,0-2,0:            ENDMARKER      ''

I think we could check the kind of the preceding token to see if it's a triple-quoted string, but I don't think we can perform an is_empty check on the tokens.

@MichaReiser what do you think? Is it okay to add spaces where they aren't strictly necessary, or do you know of a better way to check if the previous string part is non-triple-quoted and empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest would be to make this a rule that operates on StringLike and then iterate over all parts. This also ensures that, when removing multiple parts, that the spacing is only inserted when necessary.

I think we could check the kind of the preceding token to see if it's a triple-quoted string, but I don't think we can perform an is_empty check on the tokens.

I think this is correct because of line continuations where subtracting the prefix from the range doesn't yield the correct result:

a= "\
" u"c"

s it okay to add spaces where they aren't strictly necessary, or do you know of a better way to check if the previous string part is non-triple-quoted and empty?

I think it's fine for as long as it only pads at most one space (never results in two consecutive spaces)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back to this problem, I don't really understand the next steps. Here's the current implementation

/// UP025
pub(crate) fn unicode_kind_prefix(checker: &Checker, string: &StringLiteral) {
    if string.flags.prefix().is_unicode() {
        let mut diagnostic = checker.report_diagnostic(UnicodeKindPrefix, string.range);

        let prefix_range = TextRange::at(string.start(), TextSize::new(1));
        let locator = checker.locator();
        let content = locator
            .slice(TextRange::new(prefix_range.end(), string.end()))
            .to_owned();

        // If the preceding character is equivalent to the quote character, insert a space to avoid a
        // syntax error. For example, when removing the `u` prefix in `""u""`, rewrite to `"" ""`
        // instead of `""""`.
        // see https://github.com/astral-sh/ruff/issues/18895
        let edit = if locator
            .slice(TextRange::up_to(prefix_range.start()))
            .chars()
            .last()
            .is_some_and(|char| content.starts_with(char))
        {
            Edit::range_replacement(" ".to_string(), prefix_range)
        } else {
            Edit::range_deletion(prefix_range)
        };

        diagnostic.set_fix(Fix::safe_edit(edit));
    }
}

As far as I understand, it correctly handles the following cases:
before fix:

a = "\
" u"c"

f"foo"u"bar"

f"foo" u"bar"

""u""

after fix:

a = "\
" "c"

f"foo" "bar"

f"foo" "bar"

"" ""

In that case, I don't understand the drawbacks of the current simple implementation. Are there any cases that are currently being handled differently than desired?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My argument was that a case like this:

f"foo"u"bar"

is not being handled correctly because we're adding a space that isn't necessary. In other words, this fix would be fine:

f"foo""bar"

but it's being fixed to

f"foo" "bar"

Micha said that could be okay as long as we never add a second space, but I don't entirely follow the line-continuation example. It does seem to be handled properly in the current version, as you said.

As Micha pointed out, another approach would be to iterate over the parts of the string instead of looking at a single part at a time. Then we could check the conditions mentioned in the issue that actually require a space to be inserted: the preceding string part is not triple quoted and not empty.

To be more concrete, we would pass in value here (or a transformed version of it to allow passing both f-strings and regular strings) instead of passing a single string_part at a time:

if checker.is_rule_enabled(Rule::UnicodeKindPrefix) {
for string_part in value {
pyupgrade::rules::unicode_kind_prefix(checker, string_part);
}
}

With all that being said, maybe it's not worth worrying about and we can just move ahead with this version. I think the space looks better anyway, and I can't come up with a counter-example that adds a second space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the simpler the implementation (and it's quite simple now) the better, and if it works correctly, why not keep it as it is? but it might be worth asking someone else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway, if you need to do it differently, I'll do it, just tell me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, let's move ahead with the simple fix for now. Thank you!

28 28 | f"foo" u"bar" # OK
29 29 |
30 30 | # https://github.com/astral-sh/ruff/issues/18895

UP025.py:28:8: UP025 [*] Remove unicode literals from strings
|
27 | f"foo"u"bar" # OK
28 | f"foo" u"bar" # OK
| ^^^^^^ UP025
29 |
30 | # https://github.com/astral-sh/ruff/issues/18895
|
= help: Remove unicode prefix

Expand All @@ -298,3 +302,80 @@ UP025.py:28:8: UP025 [*] Remove unicode literals from strings
27 27 | f"foo"u"bar" # OK
28 |-f"foo" u"bar" # OK
28 |+f"foo" "bar" # OK
29 29 |
30 30 | # https://github.com/astral-sh/ruff/issues/18895
31 31 | ""u""

UP025.py:31:3: UP025 [*] Remove unicode literals from strings
|
30 | # https://github.com/astral-sh/ruff/issues/18895
31 | ""u""
| ^^^ UP025
32 | ""u"hi"
33 | """"""""""""""""""""u"hi"
|
= help: Remove unicode prefix

ℹ Safe fix
28 28 | f"foo" u"bar" # OK
29 29 |
30 30 | # https://github.com/astral-sh/ruff/issues/18895
31 |-""u""
31 |+"" ""
32 32 | ""u"hi"
33 33 | """"""""""""""""""""u"hi"
34 34 | ""U"helloooo"

UP025.py:32:3: UP025 [*] Remove unicode literals from strings
|
30 | # https://github.com/astral-sh/ruff/issues/18895
31 | ""u""
32 | ""u"hi"
| ^^^^^ UP025
33 | """"""""""""""""""""u"hi"
34 | ""U"helloooo"
|
= help: Remove unicode prefix

ℹ Safe fix
29 29 |
30 30 | # https://github.com/astral-sh/ruff/issues/18895
31 31 | ""u""
32 |-""u"hi"
32 |+"" "hi"
33 33 | """"""""""""""""""""u"hi"
34 34 | ""U"helloooo"

UP025.py:33:21: UP025 [*] Remove unicode literals from strings
|
31 | ""u""
32 | ""u"hi"
33 | """"""""""""""""""""u"hi"
| ^^^^^ UP025
34 | ""U"helloooo"
|
= help: Remove unicode prefix

ℹ Safe fix
30 30 | # https://github.com/astral-sh/ruff/issues/18895
31 31 | ""u""
32 32 | ""u"hi"
33 |-""""""""""""""""""""u"hi"
33 |+"""""""""""""""""""" "hi"
34 34 | ""U"helloooo"

UP025.py:34:3: UP025 [*] Remove unicode literals from strings
|
32 | ""u"hi"
33 | """"""""""""""""""""u"hi"
34 | ""U"helloooo"
| ^^^^^^^^^^^ UP025
|
= help: Remove unicode prefix

ℹ Safe fix
31 31 | ""u""
32 32 | ""u"hi"
33 33 | """"""""""""""""""""u"hi"
34 |-""U"helloooo"
34 |+"" "helloooo"
Loading