forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-40925][SQL][SS] Fix stateful operator late record filtering
### What changes were proposed in this pull request? This PR fixes the input late record filtering done by stateful operators to allow for chaining of stateful operators. Currently stateful operators are initialized with the current microbatch watermark and perform both input late record filtering and state eviction (e.g. producing aggregations) using the same watermark value. The state evicted (or aggregates produced) due to watermark advancing is behind the watermark and thus effectively late - if a following stateful operator consumes the output of the previous one, the input records will be filtered as late. This PR provides two watermark values to the stateful operators - one from the previous microbatch to be used for late record filtering and the one from the current microbatch (as in the existing code) to be used for state eviction. This solves the above problem of the broken late record filtering. Note that this PR still does not solve the issue of time-interval stream join producing records delayed against the watermark. Therefore time-interval streaming join followed by stateful operators is still not supported. That will be fixed in a follow up PR (and a SPIP) effectively replacing the single global watermark with conceptually watermarks per operator. Also, the stateful operator chains unblocked by this PR (e.g. a chain of window aggregations) are still blocked by the unsupported operations checker. The new test for these scenarios - MultiStatefulOperatorsSuite has to explicitly disable the unsupported ops check. This again will be fixed in a follow-up PR. ### Why are the changes needed? The PR allows Spark Structured Streaming to support chaining of stateful operators e.g. chaining of time window aggregations which is a meaningful streaming scenario. ### Does this PR introduce _any_ user-facing change? With this PR, chains of stateful operators will be supported in Spark Structured Streaming. ### How was this patch tested? Added a new test suite - MultiStatefulOperatorsSuite Closes apache#38405 from alex-balikov/multiple_stateful-ops-base. Lead-authored-by: Alex Balikov <91913242+alex-balikov@users.noreply.github.com> Co-authored-by: Alex Balikov <alex.balikov@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
- Loading branch information
1 parent
58a527b
commit 242675a
Showing
19 changed files
with
661 additions
and
81 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.