[SPARK-47568][SS] Fix race condition between maintenance thread and load/commit for snapshot files. #45724

sahnib · 2024-03-26T14:59:57Z

What changes were proposed in this pull request?

This PR fixes a race condition between the maintenance thread and task thread when change-log checkpointing is enabled, and ensure all snapshots are valid.

The maintenance thread currently relies on class variable lastSnapshot to find the latest checkpoint and uploads it to DFS. This checkpoint can be modified at commit time by Task thread if a new snapshot is created.
The task thread was not resetting the lastSnapshot at load time, which can result in newer snapshots (if a old version is loaded) being considered valid and uploaded to DFS. This results in VersionIdMismatch errors.

Why are the changes needed?

These are logical bugs which can cause VersionIdMismatch errors causing user to discard the snapshot and restart the query.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit test cases.

Was this patch authored or co-authored using generative AI tooling?

No

sahnib · 2024-03-26T15:03:01Z

@HeartSaVioR @anishshri-db PTAL, thanks !

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

anishshri-db

lgtm pending nit

HeartSaVioR

only nits

sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala

HeartSaVioR · 2024-03-27T05:35:35Z

sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala

+      // do maintenance - upload any latest snapshots so far
+      // would fail to acquire lock and no snapshots would be uploaded
+      db.doMaintenance()
+      db.commit()


Could we think of the way to verify this? Or is it not feasible as it's about race condition?

verify that maintenance actually fails here?

no snapshot being uploaded at this moment. but OK to skip if it's bound to race condition.

…pshot files

HeartSaVioR

+1 pending CI

HeartSaVioR · 2024-03-29T04:22:05Z

Thanks! Merging to master/3.5.

HeartSaVioR · 2024-03-29T04:24:06Z

@sahnib Could you please file a new PR for 3.5? Looks like there is a merge conflict. Thanks in advance!

…oad/commit for snapshot files ### What changes were proposed in this pull request? This PR fixes a race condition between the maintenance thread and task thread when change-log checkpointing is enabled, and ensure all snapshots are valid. 1. The maintenance thread currently relies on class variable lastSnapshot to find the latest checkpoint and uploads it to DFS. This checkpoint can be modified at commit time by Task thread if a new snapshot is created. 2. The task thread was not resetting the lastSnapshot at load time, which can result in newer snapshots (if a old version is loaded) being considered valid and uploaded to DFS. This results in VersionIdMismatch errors. ### Why are the changes needed? These are logical bugs which can cause `VersionIdMismatch` errors causing user to discard the snapshot and restart the query. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test cases. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45724 from sahnib/rocks-db-fix. Authored-by: Bhuwan Sahni <bhuwan.sahni@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

sahnib · 2024-04-04T16:53:58Z

@sahnib Could you please file a new PR for 3.5? Looks like there is a merge conflict. Thanks in advance!

Created backport PR #45881

…and load/commit for snapshot files Backports #45724 to 3.5 ### What changes were proposed in this pull request? This PR fixes a race condition between the maintenance thread and task thread when change-log checkpointing is enabled, and ensure all snapshots are valid. 1. The maintenance thread currently relies on class variable lastSnapshot to find the latest checkpoint and uploads it to DFS. This checkpoint can be modified at commit time by Task thread if a new snapshot is created. 2. The task thread was not resetting the lastSnapshot at load time, which can result in newer snapshots (if a old version is loaded) being considered valid and uploaded to DFS. This results in VersionIdMismatch errors. ### Why are the changes needed? These are logical bugs which can cause `VersionIdMismatch` errors causing user to discard the snapshot and restart the query. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test cases. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45881 from sahnib/rocks-db-fix-3.5. Authored-by: Bhuwan Sahni <bhuwan.sahni@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…g a deep copy of file mappings in RocksDBFileManager in load() ### What changes were proposed in this pull request? When change log checkpointing is enabled, the lock of the **RocksDB** state store is acquired when uploading the snapshot inside maintenance tasks, which causes lock contention between query processing tasks and state maintenance thread. This PR fixes lock contention issue introduced by #45724. The changes include: 1. Removing lock acquisition in `doMaintenance()` 2. Adding a `copyFileMappings()` method to **RocksDBFileManager**, and using this method to deep copy the file manager state, specifically the file mappings `versionToRocksDBFiles` and `localFilesToDfsFiles`, in `load()` 3. Capture the reference to the file mappings in `commit()`. ### Why are the changes needed? We want to eliminate lock contention to decrease latency of streaming queries so lock acquisition inside maintenance tasks should be avoided. This can introduce race conditions between task and maintenance threads. By making a deep copy of `versionToRocksDBFiles` and `localFilesToDfsFiles` in **RocksDBFileManager**, we can ensure that the file manager state is not updated by task thread when background snapshot uploading tasks attempt to upload a snapshot. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test cases. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46942 from riyaverm-db/remove-lock-contention-between-maintenance-and-task. Authored-by: Riya Verma <riya.verma@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

…making a deep copy of file mappings in RocksDBFileManager in load() Backports apache#46942 to 3.5 When change log checkpointing is enabled, the lock of the **RocksDB** state store is acquired when uploading the snapshot inside maintenance tasks, which causes lock contention between query processing tasks and state maintenance thread. This PR fixes lock contention issue introduced by apache#45724. The changes include: 1. Removing lock acquisition in `doMaintenance()` 2. Adding a `copyFileMappings()` method to **RocksDBFileManager**, and using this method to deep copy the file manager state, specifically the file mappings `versionToRocksDBFiles` and `localFilesToDfsFiles`, in `load()` 3. Capture the reference to the file mappings in `commit()`. We want to eliminate lock contention to decrease latency of streaming queries so lock acquisition inside maintenance tasks should be avoided. This can introduce race conditions between task and maintenance threads. By making a deep copy of `versionToRocksDBFiles` and `localFilesToDfsFiles` in **RocksDBFileManager**, we can ensure that the file manager state is not updated by task thread when background snapshot uploading tasks attempt to upload a snapshot. No Added unit test cases. No Closes apache#46942 from riyaverm-db/remove-lock-contention-between-maintenance-and-task. Authored-by: Riya Verma <riya.verma@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

…making a deep copy of file mappings in RocksDBFileManager in load() Backports #46942 to 3.5 ### What changes were proposed in this pull request? When change log checkpointing is enabled, the lock of the **RocksDB** state store is acquired when uploading the snapshot inside maintenance tasks, which causes lock contention between query processing tasks and state maintenance thread. This PR fixes lock contention issue introduced by #45724. The changes include: 1. Removing lock acquisition in `doMaintenance()` 2. Adding a `copyFileMappings()` method to **RocksDBFileManager**, and using this method to deep copy the file manager state, specifically the file mappings `versionToRocksDBFiles` and `localFilesToDfsFiles`, in `load()` 3. Capture the reference to the file mappings in `commit()`. ### Why are the changes needed? We want to eliminate lock contention to decrease latency of streaming queries so lock acquisition inside maintenance tasks should be avoided. This can introduce race conditions between task and maintenance threads. By making a deep copy of `versionToRocksDBFiles` and `localFilesToDfsFiles` in **RocksDBFileManager**, we can ensure that the file manager state is not updated by task thread when background snapshot uploading tasks attempt to upload a snapshot. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test cases. ### Was this patch authored or co-authored using generative AI tooling? No Closes #47130 from riyaverm-db/remove-lock-contention-between-maintenance-and-task-3.5. Lead-authored-by: Riya Verma <riya.verma@databricks.com> Co-authored-by: Riya Verma <170376104+riyaverm-db@users.noreply.github.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

…g a deep copy of file mappings in RocksDBFileManager in load() ### What changes were proposed in this pull request? When change log checkpointing is enabled, the lock of the **RocksDB** state store is acquired when uploading the snapshot inside maintenance tasks, which causes lock contention between query processing tasks and state maintenance thread. This PR fixes lock contention issue introduced by apache#45724. The changes include: 1. Removing lock acquisition in `doMaintenance()` 2. Adding a `copyFileMappings()` method to **RocksDBFileManager**, and using this method to deep copy the file manager state, specifically the file mappings `versionToRocksDBFiles` and `localFilesToDfsFiles`, in `load()` 3. Capture the reference to the file mappings in `commit()`. ### Why are the changes needed? We want to eliminate lock contention to decrease latency of streaming queries so lock acquisition inside maintenance tasks should be avoided. This can introduce race conditions between task and maintenance threads. By making a deep copy of `versionToRocksDBFiles` and `localFilesToDfsFiles` in **RocksDBFileManager**, we can ensure that the file manager state is not updated by task thread when background snapshot uploading tasks attempt to upload a snapshot. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test cases. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46942 from riyaverm-db/remove-lock-contention-between-maintenance-and-task. Authored-by: Riya Verma <riya.verma@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

github-actions bot added SQL STRUCTURED STREAMING labels Mar 26, 2024

sahnib force-pushed the rocks-db-fix branch from 1290e84 to 5b9a10f Compare March 26, 2024 15:01

sahnib marked this pull request as ready for review March 26, 2024 15:02

anishshri-db reviewed Mar 26, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala Outdated Show resolved Hide resolved

anishshri-db approved these changes Mar 26, 2024

View reviewed changes

sahnib force-pushed the rocks-db-fix branch from 5b9a10f to 7a0345e Compare March 27, 2024 02:56

sahnib changed the title ~~[SPARK-47568][SS]Fix race condition between maintenance thread and load/commit for snapshot files.~~ [SPARK-47568][SS] Fix race condition between maintenance thread and load/commit for snapshot files. Mar 27, 2024

HeartSaVioR reviewed Mar 27, 2024

View reviewed changes

Fix race condition between maintenance thread and load/commit for sna…

47fd2ed

…pshot files

sahnib force-pushed the rocks-db-fix branch from 7a0345e to 47fd2ed Compare March 29, 2024 00:26

HeartSaVioR approved these changes Mar 29, 2024

View reviewed changes

HeartSaVioR closed this in 0b844e5 Mar 29, 2024

sahnib mentioned this pull request Apr 4, 2024

[SPARK-47568][SS][3.5] Fix race condition between maintenance thread and load/commit for snapshot files #45881

Closed

riyaverm-db mentioned this pull request Jun 18, 2024

[SPARK-48586][SS] Remove lock acquisition in doMaintenance() by making a deep copy of file mappings in RocksDBFileManager in load() #46942

Closed

riyaverm-db mentioned this pull request Jun 27, 2024

[SPARK-48586][SS][3.5] Remove lock acquisition in doMaintenance() by making a deep copy of file mappings in RocksDBFileManager in load() #47130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-47568][SS] Fix race condition between maintenance thread and load/commit for snapshot files. #45724

[SPARK-47568][SS] Fix race condition between maintenance thread and load/commit for snapshot files. #45724

Uh oh!

sahnib commented Mar 26, 2024

Uh oh!

sahnib commented Mar 26, 2024

Uh oh!

Uh oh!

anishshri-db left a comment

Uh oh!

HeartSaVioR left a comment

Uh oh!

Uh oh!

Uh oh!

HeartSaVioR Mar 27, 2024

Uh oh!

sahnib Mar 29, 2024

Uh oh!

HeartSaVioR Mar 29, 2024

Uh oh!

HeartSaVioR left a comment

Uh oh!

HeartSaVioR commented Mar 29, 2024 •

edited

Loading

Uh oh!

HeartSaVioR commented Mar 29, 2024

Uh oh!

sahnib commented Apr 4, 2024

Uh oh!

Uh oh!

[SPARK-47568][SS] Fix race condition between maintenance thread and load/commit for snapshot files. #45724

[SPARK-47568][SS] Fix race condition between maintenance thread and load/commit for snapshot files. #45724

Uh oh!

Conversation

sahnib commented Mar 26, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

sahnib commented Mar 26, 2024

Uh oh!

Uh oh!

anishshri-db left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HeartSaVioR Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

sahnib Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Mar 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Mar 29, 2024

Uh oh!

sahnib commented Apr 4, 2024

Uh oh!

Uh oh!

HeartSaVioR commented Mar 29, 2024 •

edited

Loading