[SPARK-51172][SS][3.5] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan #49905

HeartSaVioR · 2025-02-12T08:23:19Z

Credit to @HyukjinKwon . This PR is just to ensure I've signed up with other committers before proceeding.

What changes were proposed in this pull request?

This PR is a followup of #48149 that proposes to rename spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan

Why are the changes needed?

For consistent naming.

Does this PR introduce any user-facing change?

Yes, as the config was released in Spark 3.5.4 and we are making change with the config.

The config itself is internal and users are not expected to deal with this config. The problem comes from the fact that the config is coupled with how we avoid making breaking change against "the streaming query which had run with prior Spark versions".

There are two cases:

The streaming query has started from Spark 3.5.4
The streaming query has started before Spark 3.5.4, and had migrated to Spark 3.5.4

1>
When they start the new query in Spark 3.5.4, there is no offset log to read the static config back from, so the value of config spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan will follow the default value, true.

This will be written back to offset log to ensure this value to be kept on streaming query lifecycle.

When they upgrade to the Spark version which we renamed the config, there is offset log to read the static config back, and there is no entry for spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan, hence we enable backward compatibility mode and put the value false for the config spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan.

This could break the query if the rule impacts the query, because the effectiveness of the fix is flipped.

2>
When they upgrade their existing streaming query from older version to Spark 3.5.4, there is offset log to read the static config back from, and there is no entry for spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan, hence we enable backward compatibility mode and put the value false for the config spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan. So the fix is disabled.

When they further upgrade this query to the Spark version which we renamed the config, same thing applies and the fix is disabled. So no change.

Ideally we should take care of the case 1, but the only way we could ensure this is to make an alias to allow both configs to work as the same, and this will enforce us to keep the problematic config forever. (3.5.4 users don't only bump to the next 3.5.x. They can just bump to 4.0.0 hence we have to keep alias for 4.x as well. That's against the rationale for fixing this.)

So the only option seems to tolerate the case 1. Hopefully the failure from that rule is not happening frequently, and also they have to start their query from 3.5.4 (exactly this version) and upgrade, so less and less chance to hit.

How was this patch tested?

CI should test it out.

Was this patch authored or co-authored using generative AI tooling?

No.

…treamingSubplan This PR is a followup of apache#48149 that proposes to rename `spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` to `spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` For consistent naming. No, the main PR has not been released yet. CI should test it out. No. Closes apache#49897 from HyukjinKwon/SPARK-49699. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

HeartSaVioR · 2025-02-12T08:25:08Z

@dongjoon-hyun @HyukjinKwon
Please take a look. I've described the user-facing change; it's actually applied for prior PR as well (master/4.0), but let's put this here instead.
Please let me know if we want to solve the issue in different way. Thanks!

dongjoon-hyun

Although this is an internal configuration, it seems that we need to follow a deprecation procedure because this is exposed to the user already.

It means that we need the following still in branch-3.5.

.withAlternative("spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan")

And, the following.

DeprecatedConfig(PRUNE_FILTERS_CAN_PRUNE_STREAMING_SUBPLAN.head, ...

WDYT, @HeartSaVioR and @HyukjinKwon ?

HeartSaVioR · 2025-02-12T23:26:53Z

This config is not user facing one - this is so much advanced topic and I don't expect any user to try configuring by themselves. Technically we should have just fixed the issue, but config is there to avoid breakage of existing query.

HeartSaVioR · 2025-02-12T23:28:47Z

What do we get from that deprecation procedural when we no longer have that config in very next (technically) minor release? If you think we need to follow that deprecation, this has to be there for a couple minor releases, otherwise it sounds to me just adding more work without outcome. Users do not upgrade every single version release.

dongjoon-hyun · 2025-02-12T23:39:51Z

this has to be there for a couple minor releases, otherwise it sounds to me just adding more work without outcome.

Are you aware of the meaning of .withAlternative("spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan"), @HeartSaVioR ?

HeartSaVioR · 2025-02-12T23:53:07Z

What I meant is, don't we just want to remove the config ASAP as I was under the impression like that in the previous discussion thread. If I was wrong, I feel like we were overreacted. We should have just filed a JIRA ticket marked as a blocker, and followed the full process. Even master and 4.0 we should not just remove that.

If I wasn't wrong, I believe this is the fastest way with very small impact. We change the config, release the new version, guiding users a way to mitigate if they are in combination of "starting query from 3.5.4, and upgrading to this version, and somehow hit some weird issue". This is just the same with what we were in Spark 3.5.3-. The whole effect is that they went back with the time whether there was no fix.

dongjoon-hyun · 2025-02-13T00:03:04Z

Apache Spark 4.0.0 RC1 is scheduled in three days (2025-02-15). So, we are on the same page on master and branch-4.0.

What I meant is, don't we just want to remove the config ASAP

For this part, you are more accurate with the rule.

Even master and 4.0 we should not just remove that.

You can block Apache Spark 4.0.0 release with this, if you want. Then, do the Apache Spark 3.5.5 release. I'm with you, @HeartSaVioR .

dongjoon-hyun · 2025-02-13T00:03:25Z

cc @cloud-fan

dongjoon-hyun · 2025-02-13T00:05:07Z

Technically, I aim not to block 4.0.0 release and do 3.5.5 after 4.0.0 release immediately. That was my best compromised idea. But, you are right, @HeartSaVioR .

HeartSaVioR · 2025-02-13T00:14:54Z

I could actually promise that the chance for case 1 is super small, but if we want to follow practice regardless, I'll file one. I am just seeking the way to be pragmatic if everyone is OK with it.

HeartSaVioR · 2025-02-13T00:34:20Z

Please notice that following full deprecation is not just adding alternative but we are willing to support both case 1 and case 2, and I need to come up with testing. Please understand if this could delay the first RC of 4.0.0. I'm trying to prioritize but it's not something I have full control of.

HyukjinKwon · 2025-02-13T00:46:27Z

I actually also prefer to just remove this ....

dongjoon-hyun · 2025-02-13T01:16:28Z

I'm technically not okay with covering up our mistake silently in the community. At the same time, I agree with you guys that I really don't want to block the Apache Spark 4.0.0 release.

For the record, given that we have well-known internal configs like the following, internal or not is not a criteria for the decision here.

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Lines 1080 to 1081 in f749742

    
           val CASE_SENSITIVE = buildConf(SqlApiConfHelper.CASE_SENSITIVE_KEY) 
        
             .internal()

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Lines 1639 to 1640 in f749742

    
           val CONVERT_CTAS = buildConf("spark.sql.hive.convertCTAS") 
        
             .internal()

The bottom line is that how can we handle this properly in the Apache way. Since this is our mistake, we should be responsible to correct this instead of ignoring any possibility.

As I mentioned early, we cannot delete the exposed config at Apache Spark 3.5.x. We can delete that in Apache Spark 4.0.0+ only.
#49905 (review)

HeartSaVioR · 2025-02-13T01:21:24Z

You don't seem to understand my explanation clearly. If we want to avoid breakage in case 1, we can never delete the config in 4.0. Probably a couple minor releases later. Maybe 4.2, or just 5.0?

There must be no case where the user upgrades the Spark from 3.5.4 to the version we remove the config.

Please let me know if we are OK with this. I'm ready to file a blocker ticket.

HeartSaVioR · 2025-02-13T02:39:02Z

https://issues.apache.org/jira/browse/SPARK-51187 Here is it. I can't guarantee that this can be done before the date of proposed first RC. If we want to make graceful fix, this ticket is definitely a blocker for Spark 4.0.0.
@cloud-fan
Please do not start triggering release process before we address blockers. I can post to dev@ in case we want to notice the community broadly.

cloud-fan · 2025-02-13T05:15:01Z

To avoid trouble, we can always call withAlternative to still support the old name. It's just a one line change.

HeartSaVioR · 2025-02-13T05:33:41Z

That's not the case, because streaming engine has the ability to persist the value of the config and set it back to session config when we restart the query. The incorrect config name is in the offset log when they ever run the query in Spark 3.5.4. It's not a single line change if we really want to fix this without any trouble.

HeartSaVioR · 2025-02-13T05:35:39Z

I'm thinking of the fix. I'm trying to prioritize but I might not come up with the fix in this weekend. Shall we give me a bit more time to address this properly? Probably early next week.

dongjoon-hyun

To @HeartSaVioR , @HyukjinKwon , @cloud-fan ,

According to the vote result,

[VOTE][RESULT] Release Apache Spark 3.5.5 deprecating spark.databricks.* configuration

I believe we can close this PR in favor of

[SPARK-51187][SQL][SS][3.5] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 #49985

HeartSaVioR · 2025-02-23T03:00:12Z

I'll address the recent discussion to other PRs. Closing this.

dongjoon-hyun · 2025-02-23T03:03:59Z

Thank you for closing this.

github-actions bot added the SQL label Feb 12, 2025

HyukjinKwon approved these changes Feb 12, 2025

View reviewed changes

HeartSaVioR changed the title ~~[SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan~~ [SPARK-51172][SS][3.5] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan Feb 12, 2025

dongjoon-hyun reviewed Feb 12, 2025

View reviewed changes

cloud-fan approved these changes Feb 13, 2025

View reviewed changes

dongjoon-hyun requested changes Feb 22, 2025

View reviewed changes

HeartSaVioR closed this Feb 23, 2025

[SPARK-51172][SS][3.5] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan #49905

[SPARK-51172][SS][3.5] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan #49905

Uh oh!

Conversation

HeartSaVioR commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HeartSaVioR commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Feb 12, 2025

Uh oh!

HeartSaVioR commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Feb 12, 2025

Uh oh!

HeartSaVioR commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Feb 13, 2025

Uh oh!

dongjoon-hyun commented Feb 13, 2025

Uh oh!

dongjoon-hyun commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Feb 13, 2025

Uh oh!

dongjoon-hyun commented Feb 13, 2025

Uh oh!

HeartSaVioR commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Feb 13, 2025

Uh oh!

HeartSaVioR commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Feb 23, 2025

Uh oh!

dongjoon-hyun commented Feb 23, 2025

Uh oh!

Uh oh!

HeartSaVioR commented Feb 12, 2025 •

edited

Loading

HeartSaVioR commented Feb 12, 2025 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

HeartSaVioR commented Feb 12, 2025 •

edited

Loading

HeartSaVioR commented Feb 12, 2025 •

edited

Loading

dongjoon-hyun commented Feb 13, 2025 •

edited

Loading

HeartSaVioR commented Feb 13, 2025 •

edited

Loading

HeartSaVioR commented Feb 13, 2025 •

edited

Loading

HeartSaVioR commented Feb 13, 2025 •

edited

Loading

HeartSaVioR commented Feb 13, 2025 •

edited

Loading

HeartSaVioR commented Feb 13, 2025 •

edited

Loading

HeartSaVioR commented Feb 13, 2025 •

edited

Loading