Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ForVariableUsedAfterBlock (based on wemake_python_styleguide ControlVarUsedAfterBlockViolation WPS441) #11769

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c1091a1
Rough start
MadLittleMods May 28, 2024
261fbc3
Rough run-able
MadLittleMods May 28, 2024
8bed302
Lookup by reference
MadLittleMods May 28, 2024
e85fe87
Add todo
MadLittleMods May 28, 2024
eecc2af
Checkpoint working with Olivier
MadLittleMods May 29, 2024
3661066
Work in all scopes
MadLittleMods Jun 5, 2024
59a2e3b
Clean up code
MadLittleMods Jun 5, 2024
0e491b3
Adjust some wording
MadLittleMods Jun 5, 2024
9ca72a6
Move to wemake_python_styleguide ruleset
MadLittleMods Jun 5, 2024
0ad0df0
Commit test snapshot
MadLittleMods Jun 5, 2024
eaf2a35
Fix shadowed cases
MadLittleMods Jun 6, 2024
710d68c
Clean up debug
MadLittleMods Jun 6, 2024
a7de1cb
Fix references before
MadLittleMods Jun 6, 2024
f2a62bb
Adjust wording
MadLittleMods Jun 6, 2024
97b165c
Update snapshot
MadLittleMods Jun 6, 2024
7ceb306
Move to Ruff (RUF) ruleset
MadLittleMods Jul 30, 2024
acc5b2a
Merge branch 'main' into madlittlemods/control-var-used-after-block
MadLittleMods Jul 30, 2024
f5aa633
Merge branch 'main' into madlittlemods/control-var-used-after-block
MadLittleMods Jul 30, 2024
026a48f
Use preview group
MadLittleMods Aug 2, 2024
544bbf5
Separate out `in` expression
MadLittleMods Aug 2, 2024
2d13a62
Better comment to explain why source-order
MadLittleMods Aug 2, 2024
1d18c66
No need to `collect()`
MadLittleMods Aug 2, 2024
7edd539
Rename to `ForVariableUsedAfterBlock`
MadLittleMods Aug 2, 2024
5351952
Remove whitespace change
MadLittleMods Aug 2, 2024
19a0ab9
Nit: Use labeled loops
MadLittleMods Aug 2, 2024
9e2ece4
No need for `BlockKind` anymore
MadLittleMods Aug 2, 2024
d675a03
Use `statement_id(...)` inside `statement(...)`
MadLittleMods Aug 2, 2024
71ca5e1
Fix lint warnings about unused imports
MadLittleMods Aug 2, 2024
8a11073
Add nested function/class example
MadLittleMods Aug 2, 2024
e59efa2
Remove `known_good_reference_node_ids` and iterating bindings in reverse
MadLittleMods Aug 9, 2024
5a76166
Get `binding_statement` in smarter way that avoids `unwrap()`
MadLittleMods Aug 9, 2024
106e78a
Better document why when `binding.statement(...)` can be `None`
MadLittleMods Aug 9, 2024
8b3e4de
Test augmented assignment
MadLittleMods Aug 9, 2024
59427f1
Reword
MadLittleMods Aug 9, 2024
63ecbfc
Explain more `unwrap()`
MadLittleMods Aug 9, 2024
27dad4f
Update snapshot
MadLittleMods Aug 9, 2024
e7653df
Merge branch 'main' into madlittlemods/control-var-used-after-block
MadLittleMods Aug 9, 2024
9bcc4fe
Fix snapshot after bumping rule number
MadLittleMods Aug 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
for global_var in []:
_ = global_var
pass

# ❌
_ = global_var

def foo():
# For control var used outside block
for event in []:
_ = event
pass

# ❌
_ = event

for x in []:
_ = x
pass

# Using the same control variable name in a different loop is ok
for x in []:
_ = x
pass

for y in []:
# ❌ x is used outside of the loop it was defined in (meant to use y)
if x == 5:
pass

# Assign a variable before the loop
room_id = 3
_ = room_id
# Use the same variable name in a loop
for room_id in []:
_ = room_id
pass

# ❌ After the loop is not ok because the value is probably not what you expect
_ = room_id

# Tuple destructuring
for a, b, c in []:
_ = a
_ = b
_ = c
pass

# ❌
_ = a
_ = b
_ = c

# Array destructuring
for [d, e, f] in []:
_ = d
_ = e
_ = f
pass

# ❌
_ = d
_ = e
_ = f


Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ pub(crate) fn deferred_scopes(checker: &mut Checker) {
Rule::AsyncioDanglingTask,
Rule::BadStaticmethodArgument,
Rule::BuiltinAttributeShadowing,
Rule::ControlVarUsedAfterBlock,
Rule::GlobalVariableNotAssigned,
Rule::ImportPrivateName,
Rule::ImportShadowedByLoopVar,
Expand Down Expand Up @@ -86,6 +87,10 @@ pub(crate) fn deferred_scopes(checker: &mut Checker) {
for scope_id in checker.analyze.scopes.iter().rev().copied() {
let scope = &checker.semantic.scopes[scope_id];

if checker.enabled(Rule::ControlVarUsedAfterBlock) {
ruff::rules::control_var_used_after_block(checker, scope, &mut diagnostics);
}

if checker.enabled(Rule::UndefinedLocal) {
pyflakes::rules::undefined_local(checker, scope_id, scope, &mut diagnostics);
}
Expand Down
3 changes: 2 additions & 1 deletion crates/ruff_linter/src/codes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -955,6 +955,7 @@ pub fn code_to_rule(linter: Linter, code: &str) -> Option<(RuleGroup, Rule)> {
(Ruff, "028") => (RuleGroup::Preview, rules::ruff::rules::InvalidFormatterSuppressionComment),
(Ruff, "029") => (RuleGroup::Preview, rules::ruff::rules::UnusedAsync),
(Ruff, "030") => (RuleGroup::Preview, rules::ruff::rules::AssertWithPrintMessage),
(Ruff, "031") => (RuleGroup::Stable, rules::ruff::rules::ControlVarUsedAfterBlock),
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
(Ruff, "100") => (RuleGroup::Stable, rules::ruff::rules::UnusedNOQA),
(Ruff, "101") => (RuleGroup::Preview, rules::ruff::rules::RedirectedNOQA),

Expand Down Expand Up @@ -1067,7 +1068,7 @@ pub fn code_to_rule(linter: Linter, code: &str) -> Option<(RuleGroup, Rule)> {
(Flake8Logging, "002") => (RuleGroup::Stable, rules::flake8_logging::rules::InvalidGetLoggerArgument),
(Flake8Logging, "007") => (RuleGroup::Stable, rules::flake8_logging::rules::ExceptionWithoutExcInfo),
(Flake8Logging, "009") => (RuleGroup::Stable, rules::flake8_logging::rules::UndocumentedWarn),

_ => return None,
})
}
4 changes: 4 additions & 0 deletions crates/ruff_linter/src/rules/ruff/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ mod tests {

#[test_case(Rule::AsyncioDanglingTask, Path::new("RUF006.py"))]
#[test_case(Rule::CollectionLiteralConcatenation, Path::new("RUF005.py"))]
#[test_case(
Rule::ControlVarUsedAfterBlock,
Path::new("control_var_used_after_block.py")
)]
#[test_case(Rule::ExplicitFStringTypeConversion, Path::new("RUF010.py"))]
#[test_case(Rule::FunctionCallInDataclassDefaultArgument, Path::new("RUF009.py"))]
#[test_case(Rule::ImplicitOptional, Path::new("RUF013_0.py"))]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
use std::fmt;

use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::Stmt;
use ruff_python_semantic::{BindingId, NodeId, Scope};
use ruff_text_size::Ranged;

use crate::checkers::ast::Checker;

/// ## What it does
/// Checks for variables defined in `for` loops that are used outside of their
/// respective blocks.
///
/// ## Why is this bad?
/// Usage of of a control variable outside of the block they're defined in will probably
/// lead to flawed logic in a way that will likely cause bugs. The variable might not
/// contain what you expect.
///
/// In Python, unlike many other languages, `for` loops don't define their own scopes.
/// Therefore, usage of the control variables outside of the block will be the the value
/// from the last iteration until re-assigned.
///
/// While this mistake is easy to spot in small examples, it can be hidden in larger
/// blocks of code, where the the loop and downstream usage may not be visible at the
/// same time.
///
/// ## Example
/// ```python
/// for x in range(10):
/// pass
///
/// print(x) # prints 9
/// ```
#[violation]
pub struct ControlVarUsedAfterBlock {
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
control_var_name: String,
block_kind: BlockKind,
}

impl Violation for ControlVarUsedAfterBlock {
#[derive_message_formats]
fn message(&self) -> String {
let ControlVarUsedAfterBlock {
control_var_name,
block_kind,
} = self;

format!("{block_kind} variable {control_var_name} is used outside of block")
}
}

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
enum BlockKind {
For,
}
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

impl fmt::Display for BlockKind {
fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
match self {
BlockKind::For => fmt.write_str("`for` loop"),
}
}
}

/// Based on wemake-python-styleguide (WPS441) to forbid control variables after the block body.
pub(crate) fn control_var_used_after_block(
checker: &Checker,
scope: &Scope,
diagnostics: &mut Vec<Diagnostic>,
) {
// Keep track of node_ids we know are used in the same block. This helps us when
// people use the same variable name in multiple blocks.
let mut known_good_reference_node_ids: Vec<NodeId> = Vec::new();

let all_bindings: Vec<(&str, BindingId)> = scope.all_bindings().collect();
MichaReiser marked this conversation as resolved.
Show resolved Hide resolved
// We want to reverse the bindings so that we iterate in source order and shadowed
// bindings come first.
let reversed_bindings = all_bindings.iter().rev();

// Find for-loop variable bindings
for (&name, binding) in reversed_bindings
.map(|(name, binding_id)| (name, checker.semantic().binding(*binding_id)))
.filter_map(|(name, binding)| {
if binding.kind.is_loop_var() {
return Some((name, binding));
}

None
})
{
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
let binding_statement = binding.statement(checker.semantic()).unwrap();
let binding_source_node_id = binding.source.unwrap();
// The node_id of the for-loop that contains the binding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit scared about the many unwrap calls in this block. Why is it safe to call unwrap for all these? Could we add a comment explaining the safety constraints?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I might need some help on these.

1. binding.statement(checker.semantic()).unwrap();

What scenarios would a variable binding.statement be None? Feel like this might be a situation where it's a problem with ruff itself and we should give up. How do I represent that?

2. binding.source.unwrap();

I think we're safe here as how can you bind a variable without the source? This again feels it would be a problem with ruff itself and we should give up. How do I represent that?

Looking wherever we create a new Binding { ... }, it seems like source is None with builtin's.

I don't think it's possible for us to deal with variable bindings for a builtin? (please double-check with your own knowledge)

Although I guess the following example works in the Python interpreter but it doesn't seem to pop up as a variable binding at all in ruff:

import builtins

for builtins.x in [1,2,3]:
    _ = x
    pass

# 3
print(x)

3. reference.expression_id().unwrap()

This one says:

reference.rs#L16-L18

/// The expression that the reference occurs in. `None` if the reference is a global
/// reference or a reference via an augmented assignment.

I'm not sure what it means by "global reference" as it seems to work fine with our global_var example in the fixture file.

I don't think we can run into an augmented assignment (x += 1) in the for-loop binding syntax? (please double-check with your own knowledge)

It seems like this can only be None if the SemanticModel.node_id is None which happens on on new(...) and gets filled in as soon as we push_node something onto the stack. If we're iterating over any source code, it feels like something would be pushed onto the stack.

Copy link
Author

@MadLittleMods MadLittleMods Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. binding.statement(checker.semantic()).unwrap();

Answered in a discussion below:

The statement can be None if the binding comes from a built in. Let's add a comment explaining that calling unwrap is ok, because the rule only looks at for loop bindings.

-- #11769 (comment)

Thankfully, we can avoid this unwrap() altogether if we grab the binding_statement a different way (see #11769 (comment))

2. binding.source.unwrap();

(no discussion yet)

3. reference.expression_id().unwrap()

Answered in a discussion below which seems in line with what I was finding.

According to the documentation, the expression can be None when

/// The expression that the reference occurs in. `None` if the reference is a global
/// reference or a reference via an augmented assignment.

As far as I can tell, this doesn't apply here. It would be good to add a comment here explaining why calling unwrap is safe.

-- #11769 (comment)

let binding_statement_id = checker.semantic().statement_id(binding_source_node_id);

// Loop over the references of those bindings to see if they're in the same block-scope
for reference in binding.references() {
let reference = checker.semantic().reference(reference);
let reference_node_id = reference.expression_id().unwrap();

// Skip any reference that come before the control var binding in the source
// order, skip it because people can assign and use the same variable name
// above the block.
if reference.range().end() < binding_statement.range().start() {
continue;
}

// Traverse the hierarchy and look for a block match
let statement_hierarchy: Vec<NodeId> = checker
.semantic()
.parent_statement_ids(reference_node_id)
.collect();
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

let mut is_used_in_block = false;
for ancestor_node_id in statement_hierarchy {
if binding_statement_id == ancestor_node_id {
is_used_in_block = true;
known_good_reference_node_ids.push(reference_node_id);
break;
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
}
}
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

// If the reference wasn't used in the same block, report a violation/diagnostic
if !is_used_in_block && !known_good_reference_node_ids.contains(&reference_node_id) {
let block_kind = match binding_statement {
Stmt::For(_) => BlockKind::For,
_ => {
panic!("Unexpected block item. This is a problem with ruff itself. Fix the `filter_map` above.")
}
};

diagnostics.push(Diagnostic::new(
ControlVarUsedAfterBlock {
control_var_name: name.to_owned(),
block_kind,
},
reference.range(),
));
}
}
}
}
2 changes: 2 additions & 0 deletions crates/ruff_linter/src/rules/ruff/rules/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ pub(crate) use assert_with_print_message::*;
pub(crate) use assignment_in_assert::*;
pub(crate) use asyncio_dangling_task::*;
pub(crate) use collection_literal_concatenation::*;
pub(crate) use control_var_used_after_block::*;
pub(crate) use default_factory_kwarg::*;
pub(crate) use explicit_f_string_type_conversion::*;
pub(crate) use function_call_in_dataclass_default::*;
Expand Down Expand Up @@ -36,6 +37,7 @@ mod assignment_in_assert;
mod asyncio_dangling_task;
mod collection_literal_concatenation;
mod confusables;
mod control_var_used_after_block;
mod default_factory_kwarg;
mod explicit_f_string_type_conversion;
mod function_call_in_dataclass_default;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
source: crates/ruff_linter/src/rules/ruff/mod.rs
---
control_var_used_after_block.py:6:5: RUF030 `for` loop variable global_var is used outside of block
|
5 | # ❌
6 | _ = global_var
| ^^^^^^^^^^ RUF030
7 |
8 | def foo():
|

control_var_used_after_block.py:15:9: RUF030 `for` loop variable event is used outside of block
|
14 | # ❌
15 | _ = event
| ^^^^^ RUF030
16 |
17 | for x in []:
|

control_var_used_after_block.py:28:12: RUF030 `for` loop variable x is used outside of block
|
26 | for y in []:
27 | # ❌ x is used outside of the loop it was defined in (meant to use y)
28 | if x == 5:
| ^ RUF030
29 | pass
|

control_var_used_after_block.py:40:9: RUF030 `for` loop variable room_id is used outside of block
|
39 | # ❌ After the loop is not ok because the value is probably not what you expect
40 | _ = room_id
| ^^^^^^^ RUF030
41 |
42 | # Tuple destructuring
|

control_var_used_after_block.py:50:9: RUF030 `for` loop variable a is used outside of block
|
49 | # ❌
50 | _ = a
| ^ RUF030
51 | _ = b
52 | _ = c
|

control_var_used_after_block.py:51:9: RUF030 `for` loop variable b is used outside of block
|
49 | # ❌
50 | _ = a
51 | _ = b
| ^ RUF030
52 | _ = c
|

control_var_used_after_block.py:52:9: RUF030 `for` loop variable c is used outside of block
|
50 | _ = a
51 | _ = b
52 | _ = c
| ^ RUF030
53 |
54 | # Array destructuring
|

control_var_used_after_block.py:62:9: RUF030 `for` loop variable d is used outside of block
|
61 | # ❌
62 | _ = d
| ^ RUF030
63 | _ = e
64 | _ = f
|

control_var_used_after_block.py:63:9: RUF030 `for` loop variable e is used outside of block
|
61 | # ❌
62 | _ = d
63 | _ = e
| ^ RUF030
64 | _ = f
|

control_var_used_after_block.py:64:9: RUF030 `for` loop variable f is used outside of block
|
62 | _ = d
63 | _ = e
64 | _ = f
| ^ RUF030
|
16 changes: 16 additions & 0 deletions crates/ruff_python_semantic/src/model.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1207,6 +1207,15 @@ impl<'a> SemanticModel<'a> {
.expect("No statement found")
}

/// Return the [`Stmt`] corresponding to the given [`NodeId`].
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
#[inline]
pub fn statement_id(&self, node_id: NodeId) -> NodeId {
self.nodes
.ancestor_ids(node_id)
.find(|id| self.nodes[*id].is_statement())
.expect("No statement found")
}

/// Returns an [`Iterator`] over the statements, starting from the given [`NodeId`].
/// through to any parents.
pub fn statements(&self, node_id: NodeId) -> impl Iterator<Item = &'a Stmt> + '_ {
Expand All @@ -1232,6 +1241,13 @@ impl<'a> SemanticModel<'a> {
.nth(1)
}

/// Given a [`NodeId`], return the [`NodeId`] of the parent statement, if any.
pub fn parent_statement_ids(&self, node_id: NodeId) -> impl Iterator<Item = NodeId> + '_ {
self.nodes
.ancestor_ids(node_id)
.filter(|id| self.nodes[*id].is_statement())
}

/// Return the [`Expr`] corresponding to the given [`NodeId`].
#[inline]
pub fn expression(&self, node_id: NodeId) -> Option<&'a Expr> {
Expand Down
Loading