[`ruff`] Offer fixes for `RUF039` in more cases #19065

robsdedude · 2025-07-01T07:58:44Z

Summary

Expand cases in which ruff can offer a fix for RUF039 (some of which are unsafe).

While turning "\n" (== \n) into r"\n" (== \\n) is not equivalent at run-time, it's still functionally equivalent to do so in the context of regex patterns as they themselves interpret the escape sequence. Therefore, an unsafe fix can be offered.

Further, this PR also makes ruff offer fixes for byte string literals, not only strings literals as before.

Test Plan

Tests for all escape sequences have been added.

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -0 violations, +2 -0 fixes in 1 projects; 54 projects unchanged)

pytest-dev/pytest (+0 -0 violations, +2 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

- testing/python/metafunc.py:432:45: RUF039 First argument to `re.compile()` is not raw bytes literal
+ testing/python/metafunc.py:432:45: RUF039 [*] First argument to `re.compile()` is not raw bytes literal

Changes by rule (1 rules affected)

code	total	+ violation	- violation	+ fix	- fix
RUF039	2	0	0	2	0

ntBre

Thanks! I just had a couple of questions/suggestions around the allowed escape characters and an idea for the docs.

crates/ruff_linter/src/rules/ruff/rules/unraw_re_pattern.rs

ntBre · 2025-07-08T14:20:38Z

crates/ruff_linter/src/rules/ruff/rules/unraw_re_pattern.rs

+                return None;
+            }
+        }
+    }


This is some related code that I wrote for another rule:

ruff/crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs

Lines 191 to 210 in 2f8bc9a

for (c, next) in lit.value.chars().tuple_windows() {

// `\0` (or any other ASCII digit) and `\g` have special meaning in `repl` strings.

// Meanwhile, nearly all other escapes of ASCII letters in a `repl` string causes

// `re.PatternError` to be raised at runtime.

//

// If we see that the escaped character is an alphanumeric ASCII character,

// we should only emit a diagnostic suggesting to replace the `re.sub()` call with

// `str.replace`if we can detect that the escaped character is one that is both

// valid in a `repl` string *and* does not have any special meaning in a REPL string.

//

// It's out of scope for this rule to change invalid `re.sub()` calls into something

// that would not raise an exception at runtime. They should be left as-is.

if c == '\\' && next.is_ascii_alphanumeric() {

if "abfnrtv".contains(next) {

fixable = false;

} else {

return None;

}

}

}

It looks like we're handling a slightly different set of characters there: abfnrtv. b is present there but not here and x is here but not there, for example. Is that expected, or should the two be the same?

I kind of like this str::contains check more than having to pass a closure. Then we could avoid calling the checker.target_version() repeatedly too, not that it's too expensive. For example:

let allowed = if checker.target_version() >= PythonVersion::PY38 { "afnrtuUvxN" } else { "afnrtuUvx" };

and for bytes it would just be "afnrtvx".

There's also this section in the docs (just above the linked node):

Octal escapes are included in a limited form. If the first digit is a 0, or if there are three octal digits, it is considered an octal escape. Otherwise, it is a group reference. As for string literals, octal escapes are always at most three digits in length.

This might be too niche to worry about, if it's relevant at all. I see there's a test case with an octal escape, so this seems intentional.

I kind of like this str::contains check more

If you prefer the str::contains approach for readability, I'll follow whatever you prefer as this is your code base to maintain. If you worry about performance: match is often faster. Pulling the checker.target_version() out of the closure is still possible and since Rust will monomorphize raw_applicability with the closure/anonymous function the compiler can optimize things to its heart's content (as opposed to receiving an opaque str and calling contains on it (see godbolt example).

Is that expected, or should the two be the same?

I'm unsure. The linked code sure is different. If I'm interpreting it right, it's trying to go from re.sub to str.replace. While in this PR we're trying to go from re.some_func("some str") to re.some_func(r"some str"). So here we're always staying in the regex world. It's just a question of how many \es the string will end up having. My gut feeling is that those cases are not the same. But as said, unsure.

This might be too niche to worry about

Yeah... I didn't want to go there 😅 I mean it wouldn't be hard to implement, but probably slightly less efficient because the search window would grow from 2 to 4 characters.

I'll leave it up to you to decide if you want me to go for it or not.

You've convinced me :) let's stick with this version.

crates/ruff_linter/src/rules/ruff/rules/unraw_re_pattern.rs

Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>

ntBre

Thanks!

…hlight * 'main' of https://github.com/astral-sh/ruff: [ty] Minor: fix incomplete docstring (astral-sh#19534) [ty] Move server tests as integration tests (astral-sh#19522) [`ruff`] Offer fixes for `RUF039` in more cases (astral-sh#19065) [ty] Support `dataclasses.InitVar` (astral-sh#19527) [`ruff`] Fix `RUF033` breaking with named default expressions (astral-sh#19115) Update pre-commit hook name (astral-sh#19530) Bump 0.12.5 (astral-sh#19528) [ty] Rename type_api => ty_extensions (astral-sh#19523) [ty] Added support for "go to references" in ty playground. (astral-sh#19516) [ty] Return a tuple spec from the iterator protocol (astral-sh#19496) [ty] Exhaustiveness checking & reachability for `match` statements (astral-sh#19508) [ty] Fix narrowing and reachability of class patterns with arguments (astral-sh#19512)

* main: [ty] Added support for "document highlights" language server feature. (astral-sh#19515) Add support for specifying minimum dots in detected string imports (astral-sh#19538) [ty] Minor: fix incomplete docstring (astral-sh#19534) [ty] Move server tests as integration tests (astral-sh#19522) [`ruff`] Offer fixes for `RUF039` in more cases (astral-sh#19065) [ty] Support `dataclasses.InitVar` (astral-sh#19527) [`ruff`] Fix `RUF033` breaking with named default expressions (astral-sh#19115) Update pre-commit hook name (astral-sh#19530) Bump 0.12.5 (astral-sh#19528) [ty] Rename type_api => ty_extensions (astral-sh#19523) [ty] Added support for "go to references" in ty playground. (astral-sh#19516) # Conflicts: # crates/ty_server/src/server/api/requests.rs # crates/ty_server/tests/e2e/snapshots/e2e__initialize__initialization.snap # crates/ty_server/tests/e2e/snapshots/e2e__initialize__initialization_with_workspace.snap

## Summary Expand cases in which ruff can offer a fix for `RUF039` (some of which are unsafe). While turning `"\n"` (== `\n`) into `r"\n"` (== `\\n`) is not equivalent at run-time, it's still functionally equivalent to do so in the context of [regex patterns](https://docs.python.org/3/library/re.html#regular-expression-syntax) as they themselves interpret the escape sequence. Therefore, an unsafe fix can be offered. Further, this PR also makes ruff offer fixes for byte string literals, not only strings literals as before. ## Test Plan Tests for all escape sequences have been added. ## Related Closes: #16713 --------- Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>

[ruff] Offer fixes for RUF039 in more cases

a7bfccb

robsdedude force-pushed the feat/ruf039-better-fixes branch from 2293a91 to a7bfccb Compare July 1, 2025 08:27

robsdedude marked this pull request as ready for review July 1, 2025 08:40

MichaReiser requested a review from ntBre July 7, 2025 12:57

MichaReiser added the fixes Related to suggested fixes for violations label Jul 7, 2025

ntBre added the preview Related to preview mode features label Jul 8, 2025

ntBre reviewed Jul 8, 2025

View reviewed changes

robsdedude and others added 3 commits July 22, 2025 18:33

Docs: add fix availability section

b22f01f

Pull target_version out of the loop

e899ffb

Fix typos

7b35d04

Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>

ntBre approved these changes Jul 24, 2025

View reviewed changes

ntBre merged commit 1d21816 into astral-sh:main Jul 24, 2025
35 checks passed

ntBre mentioned this pull request Jul 24, 2025

Better diagnostics for RUF039 #16713

Closed

robsdedude deleted the feat/ruf039-better-fixes branch July 24, 2025 18:33

BrewTestBot mentioned this pull request Jul 29, 2025

ruff 0.12.7 Homebrew/homebrew-core#231689

Merged

	for (c, next) in lit.value.chars().tuple_windows() {
	// `\0` (or any other ASCII digit) and `\g` have special meaning in `repl` strings.
	// Meanwhile, nearly all other escapes of ASCII letters in a `repl` string causes
	// `re.PatternError` to be raised at runtime.
	//
	// If we see that the escaped character is an alphanumeric ASCII character,
	// we should only emit a diagnostic suggesting to replace the `re.sub()` call with
	// `str.replace`if we can detect that the escaped character is one that is both
	// valid in a `repl` string and does not have any special meaning in a REPL string.
	//
	// It's out of scope for this rule to change invalid `re.sub()` calls into something
	// that would not raise an exception at runtime. They should be left as-is.
	if c == '\\' && next.is_ascii_alphanumeric() {
	if "abfnrtv".contains(next) {
	fixable = false;
	} else {
	return None;
	}
	}
	}

[ruff] Offer fixes for RUF039 in more cases #19065

[ruff] Offer fixes for RUF039 in more cases #19065

Uh oh!

Conversation

robsdedude commented Jul 1, 2025

Summary

Test Plan

Related

Uh oh!

github-actions bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Uh oh!

ntBre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ntBre Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

ntBre Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

ntBre Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

robsdedude Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robsdedude Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robsdedude Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

ntBre Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ntBre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[`ruff`] Offer fixes for `RUF039` in more cases #19065

[`ruff`] Offer fixes for `RUF039` in more cases #19065

github-actions bot commented Jul 1, 2025 •

edited

Loading

`ruff-ecosystem` results

robsdedude Jul 22, 2025 •

edited

Loading

robsdedude Jul 22, 2025 •

edited

Loading