Skip to content

Conversation

@robsdedude
Copy link
Contributor

Summary

Expand cases in which ruff can offer a fix for RUF039 (some of which are unsafe).

While turning "\n" (== \n) into r"\n" (== \\n) is not equivalent at run-time, it's still functionally equivalent to do so in the context of regex patterns as they themselves interpret the escape sequence. Therefore, an unsafe fix can be offered.

Further, this PR also makes ruff offer fixes for byte string literals, not only strings literals as before.

Test Plan

Tests for all escape sequences have been added.

Related

Closes: #16713

@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -0 violations, +2 -0 fixes in 1 projects; 54 projects unchanged)

pytest-dev/pytest (+0 -0 violations, +2 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

- testing/python/metafunc.py:432:45: RUF039 First argument to `re.compile()` is not raw bytes literal
+ testing/python/metafunc.py:432:45: RUF039 [*] First argument to `re.compile()` is not raw bytes literal

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
RUF039 2 0 0 2 0

@robsdedude robsdedude force-pushed the feat/ruf039-better-fixes branch from 2293a91 to a7bfccb Compare July 1, 2025 08:27
@robsdedude robsdedude marked this pull request as ready for review July 1, 2025 08:40
@MichaReiser MichaReiser requested a review from ntBre July 7, 2025 12:57
@MichaReiser MichaReiser added the fixes Related to suggested fixes for violations label Jul 7, 2025
@ntBre ntBre added the preview Related to preview mode features label Jul 8, 2025
Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I just had a couple of questions/suggestions around the allowed escape characters and an idea for the docs.

return None;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some related code that I wrote for another rule:

for (c, next) in lit.value.chars().tuple_windows() {
// `\0` (or any other ASCII digit) and `\g` have special meaning in `repl` strings.
// Meanwhile, nearly all other escapes of ASCII letters in a `repl` string causes
// `re.PatternError` to be raised at runtime.
//
// If we see that the escaped character is an alphanumeric ASCII character,
// we should only emit a diagnostic suggesting to replace the `re.sub()` call with
// `str.replace`if we can detect that the escaped character is one that is both
// valid in a `repl` string *and* does not have any special meaning in a REPL string.
//
// It's out of scope for this rule to change invalid `re.sub()` calls into something
// that would not raise an exception at runtime. They should be left as-is.
if c == '\\' && next.is_ascii_alphanumeric() {
if "abfnrtv".contains(next) {
fixable = false;
} else {
return None;
}
}
}

It looks like we're handling a slightly different set of characters there: abfnrtv. b is present there but not here and x is here but not there, for example. Is that expected, or should the two be the same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like this str::contains check more than having to pass a closure. Then we could avoid calling the checker.target_version() repeatedly too, not that it's too expensive. For example:

let allowed = if checker.target_version() >= PythonVersion::PY38 {
    "afnrtuUvxN"
} else {
    "afnrtuUvx"
};

and for bytes it would just be "afnrtvx".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also this section in the docs (just above the linked node):

Octal escapes are included in a limited form. If the first digit is a 0, or if there are three octal digits, it is considered an octal escape. Otherwise, it is a group reference. As for string literals, octal escapes are always at most three digits in length.

This might be too niche to worry about, if it's relevant at all. I see there's a test case with an octal escape, so this seems intentional.

Copy link
Contributor Author

@robsdedude robsdedude Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like this str::contains check more

If you prefer the str::contains approach for readability, I'll follow whatever you prefer as this is your code base to maintain. If you worry about performance: match is often faster. Pulling the checker.target_version() out of the closure is still possible and since Rust will monomorphize raw_applicability with the closure/anonymous function the compiler can optimize things to its heart's content (as opposed to receiving an opaque str and calling contains on it (see godbolt example).

Copy link
Contributor Author

@robsdedude robsdedude Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that expected, or should the two be the same?

I'm unsure. The linked code sure is different. If I'm interpreting it right, it's trying to go from re.sub to str.replace. While in this PR we're trying to go from re.some_func("some str") to re.some_func(r"some str"). So here we're always staying in the regex world. It's just a question of how many \es the string will end up having. My gut feeling is that those cases are not the same. But as said, unsure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be too niche to worry about

Yeah... I didn't want to go there 😅 I mean it wouldn't be hard to implement, but probably slightly less efficient because the search window would grow from 2 to 4 characters.

I'll leave it up to you to decide if you want me to go for it or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've convinced me :) let's stick with this version.

robsdedude and others added 3 commits July 22, 2025 18:33
Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ntBre ntBre merged commit 1d21816 into astral-sh:main Jul 24, 2025
35 checks passed
@robsdedude robsdedude deleted the feat/ruf039-better-fixes branch July 24, 2025 18:33
UnboundVariable pushed a commit to UnboundVariable/ruff that referenced this pull request Jul 24, 2025
…hlight

* 'main' of https://github.com/astral-sh/ruff:
  [ty] Minor: fix incomplete docstring (astral-sh#19534)
  [ty] Move server tests as integration tests (astral-sh#19522)
  [`ruff`] Offer fixes for `RUF039` in more cases (astral-sh#19065)
  [ty] Support `dataclasses.InitVar` (astral-sh#19527)
  [`ruff`] Fix `RUF033` breaking with named default expressions (astral-sh#19115)
  Update pre-commit hook name (astral-sh#19530)
  Bump 0.12.5 (astral-sh#19528)
  [ty] Rename type_api => ty_extensions (astral-sh#19523)
  [ty] Added support for "go to references" in ty playground. (astral-sh#19516)
  [ty] Return a tuple spec from the iterator protocol (astral-sh#19496)
  [ty] Exhaustiveness checking & reachability for `match` statements (astral-sh#19508)
  [ty] Fix narrowing and reachability of class patterns with arguments (astral-sh#19512)
UnboundVariable pushed a commit to UnboundVariable/ruff that referenced this pull request Jul 24, 2025
* main:
  [ty] Added support for "document highlights" language server feature. (astral-sh#19515)
  Add support for specifying minimum dots in detected string imports (astral-sh#19538)
  [ty] Minor: fix incomplete docstring (astral-sh#19534)
  [ty] Move server tests as integration tests (astral-sh#19522)
  [`ruff`] Offer fixes for `RUF039` in more cases (astral-sh#19065)
  [ty] Support `dataclasses.InitVar` (astral-sh#19527)
  [`ruff`] Fix `RUF033` breaking with named default expressions (astral-sh#19115)
  Update pre-commit hook name (astral-sh#19530)
  Bump 0.12.5 (astral-sh#19528)
  [ty] Rename type_api => ty_extensions (astral-sh#19523)
  [ty] Added support for "go to references" in ty playground. (astral-sh#19516)

# Conflicts:
#	crates/ty_server/src/server/api/requests.rs
#	crates/ty_server/tests/e2e/snapshots/e2e__initialize__initialization.snap
#	crates/ty_server/tests/e2e/snapshots/e2e__initialize__initialization_with_workspace.snap
AlexWaygood pushed a commit that referenced this pull request Jul 25, 2025
## Summary
Expand cases in which ruff can offer a fix for `RUF039` (some of which
are unsafe).

While turning `"\n"` (== `\n`) into `r"\n"` (== `\\n`) is not equivalent
at run-time, it's still functionally equivalent to do so in the context
of [regex
patterns](https://docs.python.org/3/library/re.html#regular-expression-syntax)
as they themselves interpret the escape sequence. Therefore, an unsafe
fix can be offered.

Further, this PR also makes ruff offer fixes for byte string literals,
not only strings literals as before.

## Test Plan
Tests for all escape sequences have been added.

## Related
Closes: #16713

---------

Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fixes Related to suggested fixes for violations preview Related to preview mode features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better diagnostics for RUF039

3 participants