[FIX] Infinite Loop in Variablization of Subtrees#77
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes an infinite loop bug in the variablize_subtrees() function where already variablized subtrees (containing variables like "V22") would be repeatedly variablized, causing an endless loop of variable generation.
- Adds a condition to skip variablization of subtrees that are already variables
- Adds two new test cases to verify the fix and test aggregation-to-subquery rewriting
- Includes a new rule for aggregation to filtered subquery transformation
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| core/rule_generator.py | Adds check to prevent re-variablization of already variablized subtrees |
| tests/test_rule_generator.py | Adds test for general rule generation and updates existing test assertions |
| tests/test_query_rewriter.py | Adds test for aggregation to subquery rewriting functionality |
| data/rules.py | Adds new rule for aggregation to filtered subquery transformation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| assert StringUtil.strim(RuleGenerator._fingerPrint(rule['pattern'])) == StringUtil.strim(RuleGenerator._fingerPrint(''' | ||
| CAST(<x1> AS DATE) | ||
| ''')) or StringUtil.strim(RuleGenerator._fingerPrint(rule['rewrite'])) == StringUtil.strim(RuleGenerator._fingerPrint(''' | ||
| CAST(<<y>> AS DATE) | ||
| ''')) | ||
|
|
||
| assert StringUtil.strim(RuleGenerator._fingerPrint(rule['rewrite'])) == StringUtil.strim(RuleGenerator._fingerPrint(''' | ||
| <x1> | ||
| ''')) or StringUtil.strim(RuleGenerator._fingerPrint(rule['rewrite'])) == StringUtil.strim(RuleGenerator._fingerPrint(''' | ||
| <<y>> | ||
| ''')) | ||
|
|
||
|
|
There was a problem hiding this comment.
The test logic is duplicated and confusing with overlapping assertions. The first assertion checks both pattern and rewrite fingerprints, while the second assertion only checks rewrite fingerprints. Consider restructuring these assertions to be more clear and avoid the duplicated rewrite fingerprint check.
| assert StringUtil.strim(RuleGenerator._fingerPrint(rule['pattern'])) == StringUtil.strim(RuleGenerator._fingerPrint(''' | |
| CAST(<x1> AS DATE) | |
| ''')) or StringUtil.strim(RuleGenerator._fingerPrint(rule['rewrite'])) == StringUtil.strim(RuleGenerator._fingerPrint(''' | |
| CAST(<<y>> AS DATE) | |
| ''')) | |
| assert StringUtil.strim(RuleGenerator._fingerPrint(rule['rewrite'])) == StringUtil.strim(RuleGenerator._fingerPrint(''' | |
| <x1> | |
| ''')) or StringUtil.strim(RuleGenerator._fingerPrint(rule['rewrite'])) == StringUtil.strim(RuleGenerator._fingerPrint(''' | |
| <<y>> | |
| ''')) | |
| pattern_fp = StringUtil.strim(RuleGenerator._fingerPrint(rule['pattern'])) | |
| rewrite_fp = StringUtil.strim(RuleGenerator._fingerPrint(rule['rewrite'])) | |
| expected_pattern_fps = ( | |
| StringUtil.strim(RuleGenerator._fingerPrint(''' | |
| CAST(<x1> AS DATE) | |
| ''')), | |
| StringUtil.strim(RuleGenerator._fingerPrint(''' | |
| CAST(<<y>> AS DATE) | |
| ''')) | |
| ) | |
| expected_rewrite_fps = ( | |
| StringUtil.strim(RuleGenerator._fingerPrint(''' | |
| <x1> | |
| ''')), | |
| StringUtil.strim(RuleGenerator._fingerPrint(''' | |
| <<y>> | |
| ''')) | |
| ) | |
| assert pattern_fp in expected_pattern_fps | |
| assert rewrite_fp in expected_rewrite_fps |
| for subtree in subtrees: | ||
| # Prevents already variablized subtrees from being variablized again in an infinite loop | ||
| # ex. {"value": "V22"} should not be variablized again | ||
| if len(subtree) == 1 and 'value' in subtree and QueryRewriter.is_var(subtree['value']): |
There was a problem hiding this comment.
The condition len(subtree) == 1 assumes a specific structure for variablized subtrees. This magic number should be documented or replaced with a more explicit check that describes what constitutes a variablized subtree structure.
| if len(subtree) == 1 and 'value' in subtree and QueryRewriter.is_var(subtree['value']): | |
| # Explicitly check for a variablized subtree structure: a dict with only the 'value' key whose value is a variable | |
| if isinstance(subtree, dict) and set(subtree.keys()) == {'value'} and QueryRewriter.is_var(subtree['value']): |
|
@colinthebomb1, can we merge this PR? |
Overview:
This PR fixes an infinite loop in
variablize_subtrees(), where an already variablized subtree like {"value": "V22"} would be variablized again in an infinite loop, ex. {"value": "V22"} -> {"value": "V27"} -> {"value": "V32"} etc.Testing:
testtest_generate_general_rule_22andtest_rewrite_aggregation_to_subquery