Skip to content

[SPARK-48871] Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in… #47304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

zhipengmao-db
Copy link
Contributor

… CheckAnalysis

What changes were proposed in this pull request?

The PR added a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this method in checkAnalysis.

Why are the changes needed?

I encountered the INVALID_NON_DETERMINISTIC_EXPRESSIONS exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled.

Does this PR introduce any user-facing change?

No

How was this patch tested?

The test case "SPARK-48871: AllowsNonDeterministicExpression allow lists non-deterministic expressions" is added.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jul 11, 2024
@zhipengmao-db zhipengmao-db requested a review from cloud-fan July 11, 2024 16:32
@cloud-fan
Copy link
Contributor

The CI failure is unrelated, I'm merging it to master, as well as 3.5, so that Spark plugins targeting 3.5 can work with non-deterministic expressions.

@cloud-fan cloud-fan closed this in 9cbd5dd Jul 12, 2024
cloud-fan added a commit that referenced this pull request Jul 12, 2024
… CheckAnalysis

The PR added a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this method in checkAnalysis.

I encountered the `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled.

No

The test case `"SPARK-48871: AllowsNonDeterministicExpression allow lists non-deterministic expressions"` is added.

No

Closes #47304 from zhipengmao-db/zhipengmao-db/SPARK-48871-check-analysis.

Lead-authored-by: zhipeng.mao <zhipeng.mao@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 9cbd5dd)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
… CheckAnalysis

### What changes were proposed in this pull request?

The PR added a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this method in checkAnalysis.

### Why are the changes needed?

I encountered the `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

The test case `"SPARK-48871: AllowsNonDeterministicExpression allow lists non-deterministic expressions"` is added.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47304 from zhipengmao-db/zhipengmao-db/SPARK-48871-check-analysis.

Lead-authored-by: zhipeng.mao <zhipeng.mao@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
… CheckAnalysis

### What changes were proposed in this pull request?

The PR added a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this method in checkAnalysis.

### Why are the changes needed?

I encountered the `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

The test case `"SPARK-48871: AllowsNonDeterministicExpression allow lists non-deterministic expressions"` is added.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47304 from zhipengmao-db/zhipengmao-db/SPARK-48871-check-analysis.

Lead-authored-by: zhipeng.mao <zhipeng.mao@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
… CheckAnalysis

### What changes were proposed in this pull request?

The PR added a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this method in checkAnalysis.

### Why are the changes needed?

I encountered the `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

The test case `"SPARK-48871: AllowsNonDeterministicExpression allow lists non-deterministic expressions"` is added.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47304 from zhipengmao-db/zhipengmao-db/SPARK-48871-check-analysis.

Lead-authored-by: zhipeng.mao <zhipeng.mao@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants