Skip to content

[SPARK-48843] Prevent infinite loop with BindParameters #47271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

nemanja-boric-databricks
Copy link
Contributor

What changes were proposed in this pull request?

In order to resolve the named parameters on the subtree, BindParameters recurses into the subtrees and tries to match the pattern with the named parameters. If there's no named parameter in the current level, the rule tries to return the unchanged plan. However, instead of returning the current plan object, the rule always returns the captured root plan node, leading into the infinite recursion.

Why are the changes needed?

Infinite recursion with the named parameters and the global limit.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

In order to resolve the named parameters on the subtree, BindParameters
recurses into the subtrees and tries to match the pattern with the
named parameters. If there's no named parameter in the current
level, the rule tries to return the unchanged plan. However, instead
of returning the current plan object, the rule always returns the captured
root plan node, leading into the infinite recursion.
Copy link
Contributor

@grundprinzip grundprinzip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to test this with a unit test in catalyst directly?

@cloud-fan or @hvanhovell ?

@cloud-fan
Copy link
Contributor

good catch!

@cloud-fan
Copy link
Contributor

we can put test in ParametersSuite

@nemanja-boric-databricks
Copy link
Contributor Author

nemanja-boric-databricks commented Jul 10, 2024

The trick is that I cannot get this plan to be generated just with SQL without connect and df.first():

GlobalLimit 1
+- 'LocalLimit 1
   +- 'NameParameterizedQuery [val], [1]
      +- 'Project [*]
         +- 'Filter ('val = namedparameter(val))
            +- 'SubqueryAlias tab
               +- 'UnresolvedInlineTable [date, val], [[cast(2022-12-25 10:30:00 as timestamp), 1]]

All my tries generate the plan that doesn't expose this bug, where the NameParametizedQuery is over the Limit node:

df = spark.sql("select pickup_zip from nyctaxi.trips where fare_amount > :amount limit 1", args = {"amount": 100})

generates

== Parsed Logical Plan ==
'NameParameterizedQuery [amount], [100]
+- 'GlobalLimit 1
   +- 'LocalLimit 1
      +- 'Project ['pickup_zip]
         +- 'Filter ('fare_amount > namedparameter(amount))
            +- 'UnresolvedRelation [nyctaxi, trips], [], false

@cloud-fan
Copy link
Contributor

I see, let's keep the current test as it is.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jul 10, 2024

thanks, merging to master/3.5!

@cloud-fan cloud-fan closed this in a39f70d Jul 10, 2024
cloud-fan added a commit that referenced this pull request Jul 10, 2024
### What changes were proposed in this pull request?

In order to resolve the named parameters on the subtree, BindParameters recurses into the subtrees and tries to match the pattern with the named parameters. If there's no named parameter in the current level, the rule tries to return the unchanged plan. However, instead of returning the current plan object, the rule always returns the captured root plan node, leading into the infinite recursion.

### Why are the changes needed?

Infinite recursion with the named parameters and the global limit.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47271 from nemanja-boric-databricks/fix-bind.

Lead-authored-by: Nemanja Boric <nemanja.boric@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit a39f70d)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
biruktesf-db pushed a commit to biruktesf-db/spark that referenced this pull request Jul 11, 2024
### What changes were proposed in this pull request?

In order to resolve the named parameters on the subtree, BindParameters recurses into the subtrees and tries to match the pattern with the named parameters. If there's no named parameter in the current level, the rule tries to return the unchanged plan. However, instead of returning the current plan object, the rule always returns the captured root plan node, leading into the infinite recursion.

### Why are the changes needed?

Infinite recursion with the named parameters and the global limit.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47271 from nemanja-boric-databricks/fix-bind.

Lead-authored-by: Nemanja Boric <nemanja.boric@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?

In order to resolve the named parameters on the subtree, BindParameters recurses into the subtrees and tries to match the pattern with the named parameters. If there's no named parameter in the current level, the rule tries to return the unchanged plan. However, instead of returning the current plan object, the rule always returns the captured root plan node, leading into the infinite recursion.

### Why are the changes needed?

Infinite recursion with the named parameters and the global limit.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47271 from nemanja-boric-databricks/fix-bind.

Lead-authored-by: Nemanja Boric <nemanja.boric@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
nemanjapetr-db added a commit to nemanjapetr-db/spark that referenced this pull request Aug 6, 2024
…ache#47271 not been sumitted. Prevents regression. Hardens previous tests to catch infinite loop caused by head() and first() in Connect. Adds a test that would have caused infinite loop with BindParameters in Classic.
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

In order to resolve the named parameters on the subtree, BindParameters recurses into the subtrees and tries to match the pattern with the named parameters. If there's no named parameter in the current level, the rule tries to return the unchanged plan. However, instead of returning the current plan object, the rule always returns the captured root plan node, leading into the infinite recursion.

### Why are the changes needed?

Infinite recursion with the named parameters and the global limit.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47271 from nemanja-boric-databricks/fix-bind.

Lead-authored-by: Nemanja Boric <nemanja.boric@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?

In order to resolve the named parameters on the subtree, BindParameters recurses into the subtrees and tries to match the pattern with the named parameters. If there's no named parameter in the current level, the rule tries to return the unchanged plan. However, instead of returning the current plan object, the rule always returns the captured root plan node, leading into the infinite recursion.

### Why are the changes needed?

Infinite recursion with the named parameters and the global limit.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47271 from nemanja-boric-databricks/fix-bind.

Lead-authored-by: Nemanja Boric <nemanja.boric@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants