Skip to content

[SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan #49897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR is a followup of #48149 that proposes to rename spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan

Why are the changes needed?

For consistent naming.

Does this PR introduce any user-facing change?

No, the main PR has not been released yet.

How was this patch tested?

CI should test it out.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Feb 12, 2025
@@ -4110,7 +4110,7 @@ object SQLConf {
.createWithDefault(ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH)

val PRUNE_FILTERS_CAN_PRUNE_STREAMING_SUBPLAN =
buildConf("spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan")
buildConf("spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @HyukjinKwon .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, @HyukjinKwon .

It seems to be too late as a follow-up because it's released already.

Screenshot 2025-02-11 at 17 51 18

@dongjoon-hyun
Copy link
Member

Could you file a new JIRA issue for this because this should have a fix version 3.5.5?

@HyukjinKwon
Copy link
Member Author

uhoh

@HyukjinKwon
Copy link
Member Author

sure, will do

@HyukjinKwon HyukjinKwon changed the title [SPARK-49699][SS][FOLLOW-UP] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan Feb 12, 2025
@HyukjinKwon
Copy link
Member Author

Hmmmm .. @HeartSaVioR WDYT? should we only make this change in master branch alone?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM again.

BTW, I made a new scalastyle rule to prevent this mistake because we missed this mistake completely at 3.5.4 and almost in 4.0.0, @HyukjinKwon .

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 12, 2025

Let's fix in master/4.0 first to avoid making more releases shipping this. We should probably think through how to not break 3.5 itself + upgrading from 3.5 to 4.0+ when fixing in 3.5. If we weren't released this in 3.5 that'd be ideal but...

@HyukjinKwon
Copy link
Member Author

Merged to master and branch-4.0.

HyukjinKwon added a commit that referenced this pull request Feb 12, 2025
…treamingSubplan

### What changes were proposed in this pull request?

This PR is a followup of #48149 that proposes to rename `spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` to `spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan`

### Why are the changes needed?

For consistent naming.

### Does this PR introduce _any_ user-facing change?

No, the main PR has not been released yet.

### How was this patch tested?

CI should test it out.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49897 from HyukjinKwon/SPARK-49699.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit decb677)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
dongjoon-hyun added a commit that referenced this pull request Feb 12, 2025
### What changes were proposed in this pull request?

This PR aims to add `configName` Scalastyle rule to prevent invalid config names.

### Why are the changes needed?

To prevent repetitive mistake pattern
- #45649
- #48149
- #49121

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Currently, this PR will fail at Scalastyle test because the `master` branch is broken. We can merge this after the following PR.
- #49897

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49900 from dongjoon-hyun/SPARK-51173.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Feb 12, 2025
### What changes were proposed in this pull request?

This PR aims to add `configName` Scalastyle rule to prevent invalid config names.

### Why are the changes needed?

To prevent repetitive mistake pattern
- #45649
- #48149
- #49121

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Currently, this PR will fail at Scalastyle test because the `master` branch is broken. We can merge this after the following PR.
- #49897

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49900 from dongjoon-hyun/SPARK-51173.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 274dc5e)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 12, 2025

There are two cases:

  1. The streaming query has started from Spark 3.5.4
  2. The streaming query has started before Spark 3.5.4, and had migrated to Spark 3.5.4

1>
When they start the new query in Spark 3.5.4, there is no offset log to read the static config back from, so the value of config spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan will follow the default value, true.

This will be written back to offset log to ensure this value to be kept on streaming query lifecycle.

When they upgrade to the Spark version which we renamed the config, there is offset log to read the static config back, and there is no entry for spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan, hence we enable backward compatibility mode and put the value false for the config spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan.

This could break the query if the rule impacts the query, because the effectiveness of the fix is flipped.

2>
When they upgrade their existing streaming query from older version to Spark 3.5.4, there is offset log to read the static config back from, and there is no entry for spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan, hence we enable backward compatibility mode and put the value false for the config spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan. So the fix is disabled.

When they further upgrade this query to the Spark version which we renamed the config, same thing applies and the fix is disabled. So no change.

I feel like this is a sort of "dead end" because if we don't want to make the case 1 to break, the only way is to make an alias of config, which enforces us to keep this problematic config name much longer (probably forever, as technically they can jump from Spark 3.5.4 to any future version). The chance to hit this case is not very high, but when they encounter this, they'd need to drop checkpoint and start from the scratch, which is unpleasant.

Not sure how we could inform Spark 3.5.4 users about this - maybe should we mention about this in release note for 3.5.5 / 4.0.0?

Anyway, we should apply the same fix in Spark 3.5 branch line. I can't imagine there is other way around.

HeartSaVioR pushed a commit to HeartSaVioR/spark that referenced this pull request Feb 12, 2025
…treamingSubplan

This PR is a followup of apache#48149 that proposes to rename `spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` to `spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan`

For consistent naming.

No, the main PR has not been released yet.

CI should test it out.

No.

Closes apache#49897 from HyukjinKwon/SPARK-49699.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@WweiL
Copy link
Contributor

WweiL commented Feb 12, 2025

Also a tiny fix if folks can merge
@HeartSaVioR @HyukjinKwon @dongjoon-hyun
#49911

dongjoon-hyun added a commit that referenced this pull request Feb 23, 2025
This PR aims to add `configName` Scalastyle rule to prevent invalid config names.

To prevent repetitive mistake pattern
- #45649
- #48149
- #49121

No.

Currently, this PR will fail at Scalastyle test because the `master` branch is broken. We can merge this after the following PR.
- #49897

No.

Closes #49900 from dongjoon-hyun/SPARK-51173.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 274dc5e)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Mar 17, 2025

According to https://lists.apache.org/thread/25xct2zv31kh1plpy766crs3bd6go5s6, I'm -0.99 on this PR, and I'm willing to cast -1 on this PR anytime soon, because my comment remains unaddressed for more than 1 month from merging this PR.

#49897 (comment)

Let's fix in master/4.0 first to avoid making more releases shipping this. We should probably think through how to not break 3.5 itself + upgrading from 3.5 to 4.0+ when fixing in 3.5. If we weren't released this in 3.5 that'd be ideal but...

I got thumbs up, so this is not something I only think this is critical. There are argument that "users can upgrade to Spark 3.5.5 before Spark 4.0.0", which I do not believe I agree with. Since both are casting -1 for each other's proposal, I think my comment is NOT addressed.

I'm waiting for feedback in dev@, but if we fail to make consensus on the approach on migration, I really have to cast -1 and revert this PR in master/4.0.

@HeartSaVioR
Copy link
Contributor

Here the meaning of "my comment is NOT addressed" is,

  1. My proposal is stuck on VETO.
  2. Dongjoon's proposal is never discussed in public.

So we never have a way to address my question "We should probably think through how to not break 3.5 itself + upgrading from 3.5 to 4.0+ when fixing in 3.5." which got community's consensus.

This is precondition of merging this PR and I hope we quickly address the issue, hence the migration logic came up, but...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants