-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ruff] Offer fixes for RUF039 in more cases
#19065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
| code | total | + violation | - violation | + fix | - fix |
|---|---|---|---|---|---|
| RUF039 | 2 | 0 | 0 | 2 | 0 |
2293a91 to
a7bfccb
Compare
ntBre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I just had a couple of questions/suggestions around the allowed escape characters and an idea for the docs.
| return None; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some related code that I wrote for another rule:
ruff/crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Lines 191 to 210 in 2f8bc9a
| for (c, next) in lit.value.chars().tuple_windows() { | |
| // `\0` (or any other ASCII digit) and `\g` have special meaning in `repl` strings. | |
| // Meanwhile, nearly all other escapes of ASCII letters in a `repl` string causes | |
| // `re.PatternError` to be raised at runtime. | |
| // | |
| // If we see that the escaped character is an alphanumeric ASCII character, | |
| // we should only emit a diagnostic suggesting to replace the `re.sub()` call with | |
| // `str.replace`if we can detect that the escaped character is one that is both | |
| // valid in a `repl` string *and* does not have any special meaning in a REPL string. | |
| // | |
| // It's out of scope for this rule to change invalid `re.sub()` calls into something | |
| // that would not raise an exception at runtime. They should be left as-is. | |
| if c == '\\' && next.is_ascii_alphanumeric() { | |
| if "abfnrtv".contains(next) { | |
| fixable = false; | |
| } else { | |
| return None; | |
| } | |
| } | |
| } |
It looks like we're handling a slightly different set of characters there: abfnrtv. b is present there but not here and x is here but not there, for example. Is that expected, or should the two be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like this str::contains check more than having to pass a closure. Then we could avoid calling the checker.target_version() repeatedly too, not that it's too expensive. For example:
let allowed = if checker.target_version() >= PythonVersion::PY38 {
"afnrtuUvxN"
} else {
"afnrtuUvx"
};and for bytes it would just be "afnrtvx".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also this section in the docs (just above the linked node):
Octal escapes are included in a limited form. If the first digit is a 0, or if there are three octal digits, it is considered an octal escape. Otherwise, it is a group reference. As for string literals, octal escapes are always at most three digits in length.
This might be too niche to worry about, if it's relevant at all. I see there's a test case with an octal escape, so this seems intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like this
str::containscheck more
If you prefer the str::contains approach for readability, I'll follow whatever you prefer as this is your code base to maintain. If you worry about performance: match is often faster. Pulling the checker.target_version() out of the closure is still possible and since Rust will monomorphize raw_applicability with the closure/anonymous function the compiler can optimize things to its heart's content (as opposed to receiving an opaque str and calling contains on it (see godbolt example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that expected, or should the two be the same?
I'm unsure. The linked code sure is different. If I'm interpreting it right, it's trying to go from re.sub to str.replace. While in this PR we're trying to go from re.some_func("some str") to re.some_func(r"some str"). So here we're always staying in the regex world. It's just a question of how many \es the string will end up having. My gut feeling is that those cases are not the same. But as said, unsure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be too niche to worry about
Yeah... I didn't want to go there 😅 I mean it wouldn't be hard to implement, but probably slightly less efficient because the search window would grow from 2 to 4 characters.
I'll leave it up to you to decide if you want me to go for it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've convinced me :) let's stick with this version.
ntBre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…hlight * 'main' of https://github.com/astral-sh/ruff: [ty] Minor: fix incomplete docstring (astral-sh#19534) [ty] Move server tests as integration tests (astral-sh#19522) [`ruff`] Offer fixes for `RUF039` in more cases (astral-sh#19065) [ty] Support `dataclasses.InitVar` (astral-sh#19527) [`ruff`] Fix `RUF033` breaking with named default expressions (astral-sh#19115) Update pre-commit hook name (astral-sh#19530) Bump 0.12.5 (astral-sh#19528) [ty] Rename type_api => ty_extensions (astral-sh#19523) [ty] Added support for "go to references" in ty playground. (astral-sh#19516) [ty] Return a tuple spec from the iterator protocol (astral-sh#19496) [ty] Exhaustiveness checking & reachability for `match` statements (astral-sh#19508) [ty] Fix narrowing and reachability of class patterns with arguments (astral-sh#19512)
* main: [ty] Added support for "document highlights" language server feature. (astral-sh#19515) Add support for specifying minimum dots in detected string imports (astral-sh#19538) [ty] Minor: fix incomplete docstring (astral-sh#19534) [ty] Move server tests as integration tests (astral-sh#19522) [`ruff`] Offer fixes for `RUF039` in more cases (astral-sh#19065) [ty] Support `dataclasses.InitVar` (astral-sh#19527) [`ruff`] Fix `RUF033` breaking with named default expressions (astral-sh#19115) Update pre-commit hook name (astral-sh#19530) Bump 0.12.5 (astral-sh#19528) [ty] Rename type_api => ty_extensions (astral-sh#19523) [ty] Added support for "go to references" in ty playground. (astral-sh#19516) # Conflicts: # crates/ty_server/src/server/api/requests.rs # crates/ty_server/tests/e2e/snapshots/e2e__initialize__initialization.snap # crates/ty_server/tests/e2e/snapshots/e2e__initialize__initialization_with_workspace.snap
## Summary Expand cases in which ruff can offer a fix for `RUF039` (some of which are unsafe). While turning `"\n"` (== `\n`) into `r"\n"` (== `\\n`) is not equivalent at run-time, it's still functionally equivalent to do so in the context of [regex patterns](https://docs.python.org/3/library/re.html#regular-expression-syntax) as they themselves interpret the escape sequence. Therefore, an unsafe fix can be offered. Further, this PR also makes ruff offer fixes for byte string literals, not only strings literals as before. ## Test Plan Tests for all escape sequences have been added. ## Related Closes: #16713 --------- Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
Summary
Expand cases in which ruff can offer a fix for
RUF039(some of which are unsafe).While turning
"\n"(==\n) intor"\n"(==\\n) is not equivalent at run-time, it's still functionally equivalent to do so in the context of regex patterns as they themselves interpret the escape sequence. Therefore, an unsafe fix can be offered.Further, this PR also makes ruff offer fixes for byte string literals, not only strings literals as before.
Test Plan
Tests for all escape sequences have been added.
Related
Closes: #16713