Spark 3.2: Commit consumed offsets to the checkpoint location #4473

SreeramGarlapati · 2022-04-02T23:51:48Z

This is Port of bug fix from our internal fork. Without this - our table maintenance expired the initial snapshot that was checkpoint'ed by the tables - which intern rendered our streaming readers to unrecoverable state.

cc: @rajarshisarkar, @daksha121, @rdblue, @aokolnychyi, @RussellSpitzer

RussellSpitzer

there should be a test with this patch which illustrates the issue being fixed if possible

SreeramGarlapati · 2022-04-03T15:35:04Z

@RussellSpitzer - thanks a lot for your review.

The challenge with unit testing this - is that - sparkSession is holding on to the state of the stream - internally - which makes it non-reproducible with 1 sparkSession. We saw this issue when our streams in EMR cluster had to be bought down and restarted on another EMR cluster & were able to verify this fix using the same approach.
So, all-in-all - I might be needing a unit test which spans across 2 sparkSessions & try this; the way tests are written is needing a bit of refactor to make it use 2 spark sessions & even after that I am unsure - if this will reproduce this (given the process sharing). Is there any precedence in the codebase for this pattern that you could kindly point me to.

I have put the fix before even I have that unit test before the public - for visibility - so ppl. can port it as needed. This bug is essentially a time bomb in the code (which blasts off the moment a stream moves across clusters) whoever is using readStream from iceberg table & have first snapshot expired; which is almost everyone who uses this!

Let me try that 2 spark sessions creation unittest and get back.

putting the problem being explained above aside - the code itself has coverage in this unittest - testResumingStreamReadFromCheckpoint - using which I was able to debug thru & fix the code - where I need to use file.createOrOverwrite() instead of the existing file.create().

cc: @rajarshisarkar

rdblue · 2022-04-03T20:35:16Z

@SreeramGarlapati can you please update the description with a clear statement of the bug and how this fixes it?

rdblue · 2022-04-03T20:37:36Z

I agree with @RussellSpitzer. This should have a test to verify the behavior.

singhpk234 · 2022-04-21T15:29:43Z

spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java

-    private void writeOffset(StreamingOffset offset, OutputFile file) {
-      try (OutputStream outputStream = file.create()) {
+    private void writeOffset(StreamingOffset offset) {
+      OutputFile file = io.newOutputFile(initialOffsetLocation);


Thanks @SreeramGarlapati !!!

[question] I have a doubt, can multiple streaming writing to same check-point location, result in nondeterministic state.

Let's say initial snapshot id was : snapshot(1)
Table state eventually (snapshot(1) -> snapshot(2) -> snapshot(3) -> snapshot(4))
Now stream 1 started in cluster 1 and read / commited till snapshot(2),
Now at the same time stream 2 started in cluster 2, it started from snapshot(2), before it could commit snapshot(3) , stream 1 committed snapshot(3) & snapshot(4) now when stream2 tried to commit snapshot(3) it will be overwriting the state of stream 1 or vice versa, starting state of new stream let's say stream3 in cluster 3 nondeterministic, as one stream is running ahead of other (stream 1 / stream 2).

Also earlier if two stream would have started in the same time (with no prev checkpoint / offset file) one would have failed since we did a file.create() now they can co-exist. Your thoughts ?

@singhpk234 - multiple spark streaming clusters cannot run based on the same checkpoint location.

ajantha-bhat · 2023-12-14T11:23:50Z

Closing this as Spark 3.2 is no longer supported in the latest version.

If issue exist on other spark version, please handle there.

implement spark micro-batch streaming - commitOffset

d2311f6

github-actions bot added the spark label Apr 2, 2022

SreeramGarlapati changed the title ~~Spark3: implement micro-batch streaming read - commitOffset~~ Spark3.2: implement micro-batch streaming read - commitOffset Apr 2, 2022

RussellSpitzer reviewed Apr 3, 2022

View reviewed changes

rdblue changed the title ~~Spark3.2: implement micro-batch streaming read - commitOffset~~ Spark 3.2: Commit consumed offsets to the checkpoint location Apr 3, 2022

singhpk234 reviewed Apr 21, 2022

View reviewed changes

ajantha-bhat closed this Dec 14, 2023

singhpk234 mentioned this pull request Aug 19, 2024

Iceberg Spark streaming skips rows of data #10156

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.2: Commit consumed offsets to the checkpoint location #4473

Spark 3.2: Commit consumed offsets to the checkpoint location #4473

SreeramGarlapati commented Apr 2, 2022 •

edited

Loading

RussellSpitzer left a comment

SreeramGarlapati commented Apr 3, 2022 •

edited

Loading

rdblue commented Apr 3, 2022

rdblue commented Apr 3, 2022

singhpk234 Apr 21, 2022 •

edited

Loading

SreeramGarlapati May 21, 2022

ajantha-bhat commented Dec 14, 2023

Spark 3.2: Commit consumed offsets to the checkpoint location #4473

Spark 3.2: Commit consumed offsets to the checkpoint location #4473

Conversation

SreeramGarlapati commented Apr 2, 2022 • edited Loading

RussellSpitzer left a comment

Choose a reason for hiding this comment

SreeramGarlapati commented Apr 3, 2022 • edited Loading

rdblue commented Apr 3, 2022

rdblue commented Apr 3, 2022

singhpk234 Apr 21, 2022 • edited Loading

Choose a reason for hiding this comment

SreeramGarlapati May 21, 2022

Choose a reason for hiding this comment

ajantha-bhat commented Dec 14, 2023

SreeramGarlapati commented Apr 2, 2022 •

edited

Loading

SreeramGarlapati commented Apr 3, 2022 •

edited

Loading

singhpk234 Apr 21, 2022 •

edited

Loading