Skip to content

[SPARK-51187][SQL][SS][3.5] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 #49985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes to implement the graceful deprecation of incorrect config introduced in SPARK-49699.

SPARK-49699 was included in Spark 3.5.4, hence we can't simply rename to fix the issue.

Also, since the incorrect config is logged in offset log in streaming query, the fix isn't just easy like adding withAlternative and done. We need to manually handle the case where offset log contains the incorrect config, and set the value of incorrect config in the offset log into the new config. Once a single microbatch has planned after the restart (hence the above logic is applied), offset log will contain the "new" config and it will no longer refer to the incorrect config.

That said, we can remove the incorrect config in the Spark version which we are confident that there will be no case users will upgrade from Spark 3.5.4 to that version.

Why are the changes needed?

We released an incorrect config and we want to rename it properly. While renaming, we don't also want to have any breakage on the existing streaming query.

Does this PR introduce any user-facing change?

No. That is what this PR is aiming for.

How was this patch tested?

New UT.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member

Thank you for making this PR, @HeartSaVioR .

I sent an email for further discussion.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM for Apache Spark 3.5.5.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, could you the old config to deprecatedSQLConfigs, @HeartSaVioR , please?

val deprecatedSQLConfigs: Map[String, DeprecatedConfig] = {

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vote passed. Could you address the above review comment, @HeartSaVioR ?

@HeartSaVioR
Copy link
Contributor Author

@dongjoon-hyun I've pushed the commit 673f789 to reflect your feedback. Would you mind having a review again? Thanks!

@dongjoon-hyun
Copy link
Member

Thank you! Sure, @HeartSaVioR .

@@ -4480,7 +4481,9 @@ object SQLConf {
DeprecatedConfig(LEGACY_REPLACE_DATABRICKS_SPARK_AVRO_ENABLED.key, "3.2",
"""Use `.format("avro")` in `DataFrameWriter` or `DataFrameReader` instead."""),
DeprecatedConfig(COALESCE_PARTITIONS_MIN_PARTITION_NUM.key, "3.2",
s"Use '${COALESCE_PARTITIONS_MIN_PARTITION_SIZE.key}' instead.")
s"Use '${COALESCE_PARTITIONS_MIN_PARTITION_SIZE.key}' instead."),
DeprecatedConfig(PRUNE_FILTERS_CAN_PRUNE_STREAMING_SUBPLAN.alternatives.head, "3.5.4",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the definition, this should be 3.5.5.

Version of Spark where key was deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry was confused. Nice finding!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it looks good to me. While waiting for CIs, let me verify it manually too.

@dongjoon-hyun
Copy link
Member

I verified the deprecation warning messages. Thank you again.

scala> spark.range(1).show()
25/02/22 19:22:41 WARN SQLConf: The SQL config 'spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan' has been deprecated in Spark v3.5.5 and may be removed in the future. Use 'spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan' instead.
25/02/22 19:22:41 WARN SQLConf: The SQL config 'spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan' has been deprecated in Spark v3.5.5 and may be removed in the future. Use 'spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan' instead.
25/02/22 19:22:41 WARN SQLConf: The SQL config 'spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan' has been deprecated in Spark v3.5.5 and may be removed in the future. Use 'spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan' instead.
+---+
| id|
+---+
|  0|
+---+

@dongjoon-hyun
Copy link
Member

The very previous commit passed the CI and the last commit is only to change the version number string to 3.5.5 which is verified by manually.

Let me merge this.

dongjoon-hyun pushed a commit that referenced this pull request Feb 23, 2025
…orrect config introduced in SPARK-49699

### What changes were proposed in this pull request?

This PR proposes to implement the graceful deprecation of incorrect config introduced in SPARK-49699.

SPARK-49699 was included in Spark 3.5.4, hence we can't simply rename to fix the issue.

Also, since the incorrect config is logged in offset log in streaming query, the fix isn't just easy like adding withAlternative and done. We need to manually handle the case where offset log contains the incorrect config, and set the value of incorrect config in the offset log into the new config. Once a single microbatch has planned after the restart (hence the above logic is applied), offset log will contain the "new" config and it will no longer refer to the incorrect config.

That said, we can remove the incorrect config in the Spark version which we are confident that there will be no case users will upgrade from Spark 3.5.4 to that version.

### Why are the changes needed?

We released an incorrect config and we want to rename it properly. While renaming, we don't also want to have any breakage on the existing streaming query.

### Does this PR introduce _any_ user-facing change?

No. That is what this PR is aiming for.

### How was this patch tested?

New UT.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49985 from HeartSaVioR/SPARK-51187-3.5.

Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Merged to branch-3.5.

@dongjoon-hyun
Copy link
Member

Thank you, @HeartSaVioR .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants