From 885c3fac724611ca59add984eb0629d32644b56f Mon Sep 17 00:00:00 2001 From: Anish Shrigondekar Date: Mon, 30 Sep 2024 15:02:40 +0900 Subject: [PATCH] [SPARK-49823][SS] Avoid flush during shutdown in rocksdb close path ### What changes were proposed in this pull request? Avoid flush during shutdown in rocksdb close path ### Why are the changes needed? Without this change, we see sometimes that `cancelAllBackgroundWork` gets hung if there are memtables that need to be flushed. We also don't need to flush in this path, because we only assume that sync flush is required in the commit path. ``` at app//org.rocksdb.RocksDB.cancelAllBackgroundWork(Native Method) at app//org.rocksdb.RocksDB.cancelAllBackgroundWork(RocksDB.java:4053) at app//org.apache.spark.sql.execution.streaming.state.RocksDB.closeDB(RocksDB.scala:1406) at app//org.apache.spark.sql.execution.streaming.state.RocksDB.load(RocksDB.scala:383) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified the config is passed manually in the logs and existing unit tests. Before: ``` sql/core/target/unit-tests.log:141:18:20:06.223 pool-1-thread-1-ScalaTest-running-RocksDBSuite INFO RocksDB [Thread-17]: [NativeRocksDB-1] Options.avoid_flush_during_shutdown: 0 sql/core/target/unit-tests.log:776:18:20:06.871 pool-1-thread-1-ScalaTest-running-RocksDBSuite INFO RocksDB [Thread-17]: [NativeRocksDB-1] Options.avoid_flush_during_shutdown: 0 sql/core/target/unit-tests.log:1096:18:20:07.129 pool-1-thread-1-ScalaTest-running-RocksDBSuite INFO RocksDB [Thread-17]: [NativeRocksDB-1] Options.avoid_flush_during_shutdown: 0 ``` After: ``` sql/core/target/unit-tests.log:6561:18:17:42.723 pool-1-thread-1-ScalaTest-running-RocksDBSuite INFO RocksDB [Thread-17]: [NativeRocksDB-1] Options.avoid_flush_during_shutdown: 1 sql/core/target/unit-tests.log:6947:18:17:43.035 pool-1-thread-1-ScalaTest-running-RocksDBSuite INFO RocksDB [Thread-17]: [NativeRocksDB-1] Options.avoid_flush_during_shutdown: 1 sql/core/target/unit-tests.log:7344:18:17:43.313 pool-1-thread-1-ScalaTest-running-RocksDBSuite INFO RocksDB [Thread-17]: [NativeRocksDB-1] Options.avoid_flush_during_shutdown: 1 ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #48292 from anishshri-db/task/SPARK-49823. Authored-by: Anish Shrigondekar Signed-off-by: Jungtaek Lim --- .../org/apache/spark/sql/execution/streaming/state/RocksDB.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala index f8d0c8722c3f5..c7f8434e5345b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala @@ -134,6 +134,7 @@ class RocksDB( rocksDbOptions.setTableFormatConfig(tableFormatConfig) rocksDbOptions.setMaxOpenFiles(conf.maxOpenFiles) rocksDbOptions.setAllowFAllocate(conf.allowFAllocate) + rocksDbOptions.setAvoidFlushDuringShutdown(true) rocksDbOptions.setMergeOperator(new StringAppendOperator()) if (conf.boundedMemoryUsage) {