Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ForVariableUsedAfterBlock (based on wemake_python_styleguide ControlVarUsedAfterBlockViolation WPS441) #11769

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c1091a1
Rough start
MadLittleMods May 28, 2024
261fbc3
Rough run-able
MadLittleMods May 28, 2024
8bed302
Lookup by reference
MadLittleMods May 28, 2024
e85fe87
Add todo
MadLittleMods May 28, 2024
eecc2af
Checkpoint working with Olivier
MadLittleMods May 29, 2024
3661066
Work in all scopes
MadLittleMods Jun 5, 2024
59a2e3b
Clean up code
MadLittleMods Jun 5, 2024
0e491b3
Adjust some wording
MadLittleMods Jun 5, 2024
9ca72a6
Move to wemake_python_styleguide ruleset
MadLittleMods Jun 5, 2024
0ad0df0
Commit test snapshot
MadLittleMods Jun 5, 2024
eaf2a35
Fix shadowed cases
MadLittleMods Jun 6, 2024
710d68c
Clean up debug
MadLittleMods Jun 6, 2024
a7de1cb
Fix references before
MadLittleMods Jun 6, 2024
f2a62bb
Adjust wording
MadLittleMods Jun 6, 2024
97b165c
Update snapshot
MadLittleMods Jun 6, 2024
7ceb306
Move to Ruff (RUF) ruleset
MadLittleMods Jul 30, 2024
acc5b2a
Merge branch 'main' into madlittlemods/control-var-used-after-block
MadLittleMods Jul 30, 2024
f5aa633
Merge branch 'main' into madlittlemods/control-var-used-after-block
MadLittleMods Jul 30, 2024
026a48f
Use preview group
MadLittleMods Aug 2, 2024
544bbf5
Separate out `in` expression
MadLittleMods Aug 2, 2024
2d13a62
Better comment to explain why source-order
MadLittleMods Aug 2, 2024
1d18c66
No need to `collect()`
MadLittleMods Aug 2, 2024
7edd539
Rename to `ForVariableUsedAfterBlock`
MadLittleMods Aug 2, 2024
5351952
Remove whitespace change
MadLittleMods Aug 2, 2024
19a0ab9
Nit: Use labeled loops
MadLittleMods Aug 2, 2024
9e2ece4
No need for `BlockKind` anymore
MadLittleMods Aug 2, 2024
d675a03
Use `statement_id(...)` inside `statement(...)`
MadLittleMods Aug 2, 2024
71ca5e1
Fix lint warnings about unused imports
MadLittleMods Aug 2, 2024
8a11073
Add nested function/class example
MadLittleMods Aug 2, 2024
e59efa2
Remove `known_good_reference_node_ids` and iterating bindings in reverse
MadLittleMods Aug 9, 2024
5a76166
Get `binding_statement` in smarter way that avoids `unwrap()`
MadLittleMods Aug 9, 2024
106e78a
Better document why when `binding.statement(...)` can be `None`
MadLittleMods Aug 9, 2024
8b3e4de
Test augmented assignment
MadLittleMods Aug 9, 2024
59427f1
Reword
MadLittleMods Aug 9, 2024
63ecbfc
Explain more `unwrap()`
MadLittleMods Aug 9, 2024
27dad4f
Update snapshot
MadLittleMods Aug 9, 2024
e7653df
Merge branch 'main' into madlittlemods/control-var-used-after-block
MadLittleMods Aug 9, 2024
9bcc4fe
Fix snapshot after bumping rule number
MadLittleMods Aug 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
for global_var in []:
_ = global_var
pass

# ❌
_ = global_var

def foo():
# For control var used outside block
for event in []:
_ = event
pass

# ❌
_ = event

for x in []:
_ = x
pass

# Using the same control variable name in a different loop is ok
for x in []:
_ = x
pass

for y in []:
# ❌ x is used outside of the loop it was defined in (meant to use y)
if x == 5:
pass

# Assign a variable before the loop
room_id = 3
_ = room_id
# Use the same variable name in a loop
for room_id in []:
_ = room_id
pass

# ❌ After the loop is not ok because the value is probably not what you expect
_ = room_id

# Tuple destructuring
for a, b, c in []:
_ = a
_ = b
_ = c
pass

# ❌
_ = a
_ = b
_ = c

# Array destructuring
for [d, e, f] in []:
_ = d
_ = e
_ = f
pass

# ❌
_ = d
_ = e
_ = f

# Nested function and class definitions are fine
for potential_power in []:
def action():
print(potential_power)

class Animal:
power_level = potential_power

Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ pub(crate) fn deferred_scopes(checker: &mut Checker) {
Rule::AsyncioDanglingTask,
Rule::BadStaticmethodArgument,
Rule::BuiltinAttributeShadowing,
Rule::ForVariableUsedAfterBlock,
Rule::GlobalVariableNotAssigned,
Rule::ImportPrivateName,
Rule::ImportShadowedByLoopVar,
Expand Down Expand Up @@ -86,6 +87,10 @@ pub(crate) fn deferred_scopes(checker: &mut Checker) {
for scope_id in checker.analyze.scopes.iter().rev().copied() {
let scope = &checker.semantic.scopes[scope_id];

if checker.enabled(Rule::ForVariableUsedAfterBlock) {
ruff::rules::for_variable_used_after_block(checker, scope, &mut diagnostics);
}

if checker.enabled(Rule::UndefinedLocal) {
pyflakes::rules::undefined_local(checker, scope_id, scope, &mut diagnostics);
}
Expand Down
1 change: 1 addition & 0 deletions crates/ruff_linter/src/codes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -955,6 +955,7 @@ pub fn code_to_rule(linter: Linter, code: &str) -> Option<(RuleGroup, Rule)> {
(Ruff, "028") => (RuleGroup::Preview, rules::ruff::rules::InvalidFormatterSuppressionComment),
(Ruff, "029") => (RuleGroup::Preview, rules::ruff::rules::UnusedAsync),
(Ruff, "030") => (RuleGroup::Preview, rules::ruff::rules::AssertWithPrintMessage),
(Ruff, "031") => (RuleGroup::Preview, rules::ruff::rules::ForVariableUsedAfterBlock),
(Ruff, "100") => (RuleGroup::Stable, rules::ruff::rules::UnusedNOQA),
(Ruff, "101") => (RuleGroup::Preview, rules::ruff::rules::RedirectedNOQA),

Expand Down
4 changes: 4 additions & 0 deletions crates/ruff_linter/src/rules/ruff/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ mod tests {

#[test_case(Rule::AsyncioDanglingTask, Path::new("RUF006.py"))]
#[test_case(Rule::CollectionLiteralConcatenation, Path::new("RUF005.py"))]
#[test_case(
Rule::ForVariableUsedAfterBlock,
Path::new("for_variable_used_after_block.py")
)]
#[test_case(Rule::ExplicitFStringTypeConversion, Path::new("RUF010.py"))]
#[test_case(Rule::FunctionCallInDataclassDefaultArgument, Path::new("RUF009.py"))]
#[test_case(Rule::ImplicitOptional, Path::new("RUF013_0.py"))]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_semantic::{BindingId, NodeId, Scope};
use ruff_text_size::Ranged;

use crate::checkers::ast::Checker;

/// ## What it does
/// Checks for variables defined in `for` loops that are used outside of their
/// respective blocks.
///
/// ## Why is this bad?
/// Usage of of a control variable outside of the block they're defined in will probably
/// lead to flawed logic in a way that will likely cause bugs. The variable might not
/// contain what you expect.
///
/// In Python, unlike many other languages, `for` loops don't define their own scopes.
/// Therefore, usage of the control variables outside of the block will be the the value
/// from the last iteration until re-assigned.
///
/// While this mistake is easy to spot in small examples, it can be hidden in larger
/// blocks of code, where the the loop and downstream usage may not be visible at the
/// same time.
///
/// ## Example
/// ```python
/// for x in range(10):
/// pass
///
/// print(x) # prints 9
/// ```
#[violation]
pub struct ForVariableUsedAfterBlock {
control_var_name: String,
}

impl Violation for ForVariableUsedAfterBlock {
#[derive_message_formats]
fn message(&self) -> String {
let ForVariableUsedAfterBlock { control_var_name } = self;

format!("`for` loop variable {control_var_name} is used outside of block")
}
}

/// Based on wemake-python-styleguide (WPS441) to forbid control variables after the block body.
pub(crate) fn for_variable_used_after_block(
checker: &Checker,
scope: &Scope,
diagnostics: &mut Vec<Diagnostic>,
) {
// Keep track of the node_ids of variable references that have already been
// accounted for as part of a different variable binding. This helps us avoid
// complaining when people use the same variable name in multiple blocks under the
// same scope.
let mut known_good_reference_node_ids: Vec<NodeId> = Vec::new();

let all_bindings: Vec<(&str, BindingId)> = scope.all_bindings().collect();
// We want to reverse the bindings so that we iterate in source order and shadowed
// bindings come first. This way we can gather `known_good_reference_node_ids` and
// mark off things as we go which will allow us to ignore later bindings trying to
// reference the same variable.
let reversed_bindings = all_bindings.iter().rev();
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

// Find for-loop variable bindings
let loop_var_bindings = reversed_bindings
.map(|(name, binding_id)| (name, checker.semantic().binding(*binding_id)))
.filter_map(|(name, binding)| binding.kind.is_loop_var().then_some((name, binding)));

for (&name, binding) in loop_var_bindings {
let binding_statement = binding.statement(checker.semantic()).unwrap();
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
let binding_source_node_id = binding.source.unwrap();
// The node_id of the for-loop that contains the binding
let binding_statement_id = checker.semantic().statement_id(binding_source_node_id);

MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
// Loop over the references of those bindings to see if they're in the same block-scope
'references: for reference in binding.references() {
let reference = checker.semantic().reference(reference);
let reference_node_id = reference.expression_id().unwrap();
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

// Skip any reference that come before the control var binding in the source
// order, skip it because people can assign and use the same variable name
// above the block.
if reference.range().end() < binding_statement.range().start() {
continue;
}

// Traverse the hierarchy and look for a block match
let statement_hierarchy = checker.semantic().parent_statement_ids(reference_node_id);

for ancestor_node_id in statement_hierarchy {
if binding_statement_id == ancestor_node_id {
known_good_reference_node_ids.push(reference_node_id);
continue 'references;
}
}

// If the reference wasn't used in the same block, report a violation/diagnostic
if !known_good_reference_node_ids.contains(&reference_node_id) {
diagnostics.push(Diagnostic::new(
ForVariableUsedAfterBlock {
control_var_name: name.to_owned(),
},
reference.range(),
));
}
}
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
}
}
2 changes: 2 additions & 0 deletions crates/ruff_linter/src/rules/ruff/rules/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ pub(crate) use asyncio_dangling_task::*;
pub(crate) use collection_literal_concatenation::*;
pub(crate) use default_factory_kwarg::*;
pub(crate) use explicit_f_string_type_conversion::*;
pub(crate) use for_variable_used_after_block::*;
pub(crate) use function_call_in_dataclass_default::*;
pub(crate) use implicit_optional::*;
pub(crate) use invalid_formatter_suppression_comment::*;
Expand Down Expand Up @@ -38,6 +39,7 @@ mod collection_literal_concatenation;
mod confusables;
mod default_factory_kwarg;
mod explicit_f_string_type_conversion;
mod for_variable_used_after_block;
mod function_call_in_dataclass_default;
mod helpers;
mod implicit_optional;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
source: crates/ruff_linter/src/rules/ruff/mod.rs
---
for_variable_used_after_block.py:6:5: RUF031 `for` loop variable global_var is used outside of block
|
5 | # ❌
6 | _ = global_var
| ^^^^^^^^^^ RUF031
7 |
8 | def foo():
|

for_variable_used_after_block.py:15:9: RUF031 `for` loop variable event is used outside of block
|
14 | # ❌
15 | _ = event
| ^^^^^ RUF031
16 |
17 | for x in []:
|

for_variable_used_after_block.py:28:12: RUF031 `for` loop variable x is used outside of block
|
26 | for y in []:
27 | # ❌ x is used outside of the loop it was defined in (meant to use y)
28 | if x == 5:
| ^ RUF031
29 | pass
|

for_variable_used_after_block.py:40:9: RUF031 `for` loop variable room_id is used outside of block
|
39 | # ❌ After the loop is not ok because the value is probably not what you expect
40 | _ = room_id
| ^^^^^^^ RUF031
41 |
42 | # Tuple destructuring
|

for_variable_used_after_block.py:50:9: RUF031 `for` loop variable a is used outside of block
|
49 | # ❌
50 | _ = a
| ^ RUF031
51 | _ = b
52 | _ = c
|

for_variable_used_after_block.py:51:9: RUF031 `for` loop variable b is used outside of block
|
49 | # ❌
50 | _ = a
51 | _ = b
| ^ RUF031
52 | _ = c
|

for_variable_used_after_block.py:52:9: RUF031 `for` loop variable c is used outside of block
|
50 | _ = a
51 | _ = b
52 | _ = c
| ^ RUF031
53 |
54 | # Array destructuring
|

for_variable_used_after_block.py:62:9: RUF031 `for` loop variable d is used outside of block
|
61 | # ❌
62 | _ = d
| ^ RUF031
63 | _ = e
64 | _ = f
|

for_variable_used_after_block.py:63:9: RUF031 `for` loop variable e is used outside of block
|
61 | # ❌
62 | _ = d
63 | _ = e
| ^ RUF031
64 | _ = f
|

for_variable_used_after_block.py:64:9: RUF031 `for` loop variable f is used outside of block
|
62 | _ = d
63 | _ = e
64 | _ = f
| ^ RUF031
65 |
66 | # Nested function and class definitions are fine
|
17 changes: 16 additions & 1 deletion crates/ruff_python_semantic/src/model.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1201,9 +1201,17 @@ impl<'a> SemanticModel<'a> {
/// Return the [`Stmt`] corresponding to the given [`NodeId`].
#[inline]
pub fn statement(&self, node_id: NodeId) -> &'a Stmt {
self.nodes[self.statement_id(node_id)]
.as_statement()
.expect("No statement found")
}

/// Return the statement [`NodeId`] corresponding to the given [`NodeId`].
#[inline]
pub fn statement_id(&self, node_id: NodeId) -> NodeId {
self.nodes
.ancestor_ids(node_id)
.find_map(|id| self.nodes[id].as_statement())
.find(|id| self.nodes[*id].is_statement())
.expect("No statement found")
}

Expand Down Expand Up @@ -1232,6 +1240,13 @@ impl<'a> SemanticModel<'a> {
.nth(1)
}

/// Given a [`NodeId`], return the [`NodeId`] of the parent statement, if any.
pub fn parent_statement_ids(&self, node_id: NodeId) -> impl Iterator<Item = NodeId> + '_ {
self.nodes
.ancestor_ids(node_id)
.filter(|id| self.nodes[*id].is_statement())
}

/// Return the [`Expr`] corresponding to the given [`NodeId`].
#[inline]
pub fn expression(&self, node_id: NodeId) -> Option<&'a Expr> {
Expand Down
1 change: 1 addition & 0 deletions ruff.schema.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading