[SPARK-50855][CONNECT][TESTS][FOLLOWUP] Refactor TransformWithStateConnectSuite to run DROP TABLE IF EXISTS my_sink in beforeAll/afterEach

LuciferYang · dongjoon-hyun · commit 9cfc628d8df3 · 2025-03-04T14:01:18.000-08:00
### What changes were proposed in this pull request? This PR refactors the `TransformWithStateConnectSuite`: 1. Overrides the `beforeAll` method to execute `spark.sql("DROP TABLE IF EXISTS my_sink")`, ensuring that no table named `my_sink` exists before the test cases in `TransformWithStateConnectSuite` are executed. 2. Overrides the `afterEach` method to execute `spark.sql("DROP TABLE IF EXISTS my_sink")`, ensuring that any potential `my_sink` table is cleaned up after each test case in `TransformWithStateConnectSuite` is executed. 3. Removes the calls to `spark.sql("DROP TABLE IF EXISTS my_sink")` within the test cases. ### Why are the changes needed? The PR at apache#49488 introduced the `TransformWithStateConnectSuite`, in which the test case `transformWithState - batch query` did not delete the table `my_sink` after execution. Additionally, due to inheriting from `RemoteSparkSession`, the test cases shared a connect server, which caused the test case `Table APIs` in the `CatalogSuite` to fail during the Maven daily test: - https://github.com/apache/spark/actions/runs/13654375212/job/38169921062 ![image](https://github.com/user-attachments/assets/bcefe19e-7668-4092-a0e8-955d61bc28e2) Therefore, this PR refactors the `TransformWithStateConnectSuite` to ensure that the table named `my_sink` does not exist before or after the execution of the tests in `TransformWithStateConnectSuite`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual check: ``` build/mvn -DskipTests -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pjvm-profiler -Pspark-ganglia-lgpl -Pkinesis-asl clean install build/mvn test -pl sql/connect/client/jvm -fae ``` Before ``` CatalogSuite: - Database APIs - CatalogMetadata APIs - Table APIs *** FAILED *** Array(Table[name='my_sink', catalog='spark_catalog', database='default', tableType='MANAGED', isTemporary='false']) was not empty (CatalogSuite.scala:91) Run completed in 3 minutes, 20 seconds. Total number of tests run: 1474 Suites: completed 36, aborted 0 Tests: succeeded 1473, failed 1, canceled 0, ignored 6, pending 0 *** 1 TEST FAILED *** ``` After ``` Run completed in 3 minutes, 22 seconds. Total number of tests run: 1474 Suites: completed 36, aborted 0 Tests: succeeded 1474, failed 0, canceled 0, ignored 6, pending 0 All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50155 from LuciferYang/SPARK-50855-FOLLOWUP. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/streaming/TransformWithStateConnectSuite.scala b/sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/streaming/TransformWithStateConnectSuite.scala
@@ -21,6 +21,7 @@ import java.io.{BufferedWriter, File, FileWriter}
 import java.nio.file.Paths
 import java.sql.Timestamp
 
+import org.scalatest.BeforeAndAfterEach
 import org.scalatest.concurrent.Eventually.eventually
 import org.scalatest.concurrent.Futures.timeout
 import org.scalatest.time.SpanSugar._
@@ -188,7 +189,11 @@ class TTLTestStatefulProcessor
   }
 }
 
-class TransformWithStateConnectSuite extends QueryTest with RemoteSparkSession with Logging {
+class TransformWithStateConnectSuite
+    extends QueryTest
+    with RemoteSparkSession
+    with Logging
+    with BeforeAndAfterEach {
   val testData: Seq[(String, String)] = Seq(("a", "1"), ("b", "1"), ("a", "2"))
   val twsAdditionalSQLConf = Seq(
     "spark.sql.streaming.stateStore.providerClass" ->
@@ -197,13 +202,24 @@ class TransformWithStateConnectSuite extends QueryTest with RemoteSparkSession w
     "spark.sql.session.timeZone" -> "UTC",
     "spark.sql.streaming.noDataMicroBatches.enabled" -> "false")
 
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    spark.sql("DROP TABLE IF EXISTS my_sink")
+  }
+
+  override protected def afterEach(): Unit = {
+    try {
+      spark.sql("DROP TABLE IF EXISTS my_sink")
+    } finally {
+      super.afterEach()
+    }
+  }
+
   test("transformWithState - streaming with state variable, case class type") {
     withSQLConf(twsAdditionalSQLConf: _*) {
       val session: SparkSession = spark
       import session.implicits._
 
-      spark.sql("DROP TABLE IF EXISTS my_sink")
-
       withTempPath { dir =>
         val path = dir.getCanonicalPath
         testData
@@ -242,7 +258,6 @@ class TransformWithStateConnectSuite extends QueryTest with RemoteSparkSession w
           }
         } finally {
           q.stop()
-          spark.sql("DROP TABLE IF EXISTS my_sink")
         }
       }
     }
@@ -253,8 +268,6 @@ class TransformWithStateConnectSuite extends QueryTest with RemoteSparkSession w
       val session: SparkSession = spark
       import session.implicits._
 
-      spark.sql("DROP TABLE IF EXISTS my_sink")
-
       withTempPath { dir =>
         val path = dir.getCanonicalPath
         testData
@@ -299,7 +312,6 @@ class TransformWithStateConnectSuite extends QueryTest with RemoteSparkSession w
           }
         } finally {
           q.stop()
-          spark.sql("DROP TABLE IF EXISTS my_sink")
         }
       }
     }
@@ -444,8 +456,6 @@ class TransformWithStateConnectSuite extends QueryTest with RemoteSparkSession w
       val session: SparkSession = spark
       import session.implicits._
 
-      spark.sql("DROP TABLE IF EXISTS my_sink")
-
       withTempPath { dir =>
         val path = dir.getCanonicalPath
         testData