Skip to content

Conversation

@lubaskinc0de
Copy link
Contributor

Summary

/closes #18895

Test Plan

@ntBre ntBre added bug Something isn't working fixes Related to suggested fixes for violations labels Jun 23, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jun 23, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I had a suggestion for a slightly simpler version of the current implementation and also a question that I'm not sure about either.

Comment on lines 46 to 51
let first_char = checker
.locator()
.slice(TextRange::at(string.start(), TextSize::new(1)));
let u_position = u32::from(!(first_char == "u" || first_char == "U"));
let prefix_range =
TextRange::at(string.start() + TextSize::new(u_position), TextSize::new(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this arithmetic is necessary. The is_unicode check above confirms that the first character will be a u, so this just boils down to what was inside Edit::range_deletion originally.

Suggested change
let first_char = checker
.locator()
.slice(TextRange::at(string.start(), TextSize::new(1)));
let u_position = u32::from(!(first_char == "u" || first_char == "U"));
let prefix_range =
TextRange::at(string.start() + TextSize::new(u_position), TextSize::new(1));
let prefix_range = TextRange::at(string.start(), TextSize::new(1));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think something like this might be a simpler version of the current implementation:

        let prefix_range = TextRange::at(string.start(), TextSize::new(1));

        let locator = checker.locator();
        let content = locator
            .slice(TextRange::new(prefix_range.end(), string.end()))
            .to_owned();
        // If the preceding character is equivalent to the quote character, insert a space to avoid a
        // syntax error. For example, when removing the `u` prefix in `""u""`, rewrite to `"" ""`
        // instead of `""""`.
        // see https://github.com/astral-sh/ruff/issues/18895
        let edit = if locator
            .slice(TextRange::up_to(prefix_range.start()))
            .chars()
            .last()
            .is_some_and(|char| content.starts_with(char))
        {
            Edit::range_replacement(" ".to_string(), prefix_range)
        } else {
            Edit::range_deletion(prefix_range)
        };

        diagnostic.set_fix(Fix::safe_edit(edit));

This just gives either a deletion like before or simply replaces the u with a space, instead of having to replace the whole string.

26 26 |
27 |-f"foo"u"bar" # OK
27 |+f"foo""bar" # OK
27 |+f"foo" "bar" # OK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly concerned that we're now adding spaces where they aren't totally necessary. The issue indicated a few more constraints on the problem:

introduce a syntax error when the prefixed string follows a non-triple-quoted empty string with the same kind of quotation marks without an intervening space

Cases like this ("foo""bar" and even "foo""") are okay, as are empty triple-quoted strings like """""""" (8 double quotes):

$ python -m tokenize <<<'""""""""'
1,0-1,6:            STRING         '""""""'
1,6-1,8:            STRING         '""'
1,8-1,9:            NEWLINE        '\n'
2,0-2,0:            ENDMARKER      ''

I think we could check the kind of the preceding token to see if it's a triple-quoted string, but I don't think we can perform an is_empty check on the tokens.

@MichaReiser what do you think? Is it okay to add spaces where they aren't strictly necessary, or do you know of a better way to check if the previous string part is non-triple-quoted and empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest would be to make this a rule that operates on StringLike and then iterate over all parts. This also ensures that, when removing multiple parts, that the spacing is only inserted when necessary.

I think we could check the kind of the preceding token to see if it's a triple-quoted string, but I don't think we can perform an is_empty check on the tokens.

I think this is correct because of line continuations where subtracting the prefix from the range doesn't yield the correct result:

a= "\
" u"c"

s it okay to add spaces where they aren't strictly necessary, or do you know of a better way to check if the previous string part is non-triple-quoted and empty?

I think it's fine for as long as it only pads at most one space (never results in two consecutive spaces)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back to this problem, I don't really understand the next steps. Here's the current implementation

/// UP025
pub(crate) fn unicode_kind_prefix(checker: &Checker, string: &StringLiteral) {
    if string.flags.prefix().is_unicode() {
        let mut diagnostic = checker.report_diagnostic(UnicodeKindPrefix, string.range);

        let prefix_range = TextRange::at(string.start(), TextSize::new(1));
        let locator = checker.locator();
        let content = locator
            .slice(TextRange::new(prefix_range.end(), string.end()))
            .to_owned();

        // If the preceding character is equivalent to the quote character, insert a space to avoid a
        // syntax error. For example, when removing the `u` prefix in `""u""`, rewrite to `"" ""`
        // instead of `""""`.
        // see https://github.com/astral-sh/ruff/issues/18895
        let edit = if locator
            .slice(TextRange::up_to(prefix_range.start()))
            .chars()
            .last()
            .is_some_and(|char| content.starts_with(char))
        {
            Edit::range_replacement(" ".to_string(), prefix_range)
        } else {
            Edit::range_deletion(prefix_range)
        };

        diagnostic.set_fix(Fix::safe_edit(edit));
    }
}

As far as I understand, it correctly handles the following cases:
before fix:

a = "\
" u"c"

f"foo"u"bar"

f"foo" u"bar"

""u""

after fix:

a = "\
" "c"

f"foo" "bar"

f"foo" "bar"

"" ""

In that case, I don't understand the drawbacks of the current simple implementation. Are there any cases that are currently being handled differently than desired?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My argument was that a case like this:

f"foo"u"bar"

is not being handled correctly because we're adding a space that isn't necessary. In other words, this fix would be fine:

f"foo""bar"

but it's being fixed to

f"foo" "bar"

Micha said that could be okay as long as we never add a second space, but I don't entirely follow the line-continuation example. It does seem to be handled properly in the current version, as you said.

As Micha pointed out, another approach would be to iterate over the parts of the string instead of looking at a single part at a time. Then we could check the conditions mentioned in the issue that actually require a space to be inserted: the preceding string part is not triple quoted and not empty.

To be more concrete, we would pass in value here (or a transformed version of it to allow passing both f-strings and regular strings) instead of passing a single string_part at a time:

if checker.is_rule_enabled(Rule::UnicodeKindPrefix) {
for string_part in value {
pyupgrade::rules::unicode_kind_prefix(checker, string_part);
}
}

With all that being said, maybe it's not worth worrying about and we can just move ahead with this version. I think the space looks better anyway, and I can't come up with a counter-example that adds a second space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the simpler the implementation (and it's quite simple now) the better, and if it works correctly, why not keep it as it is? but it might be worth asking someone else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway, if you need to do it differently, I'll do it, just tell me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, let's move ahead with the simple fix for now. Thank you!

lubaskinc0de and others added 2 commits June 23, 2025 22:57
…ix.rs

Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
@ntBre ntBre changed the title [ruff] Fix introduces a syntax error for an empty string followed by a u-prefixed string (UP025) [ruff] Fix syntax error introduced for an empty string followed by a u-prefixed string (UP025) Jul 1, 2025
@ntBre ntBre merged commit 667dc62 into astral-sh:main Jul 1, 2025
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fixes Related to suggested fixes for violations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UP025 fix introduces a syntax error for an empty string followed by a u-prefixed string

3 participants