[SPARK-41708][SQL][FOLLOWUP] Add a new `replaceAll` to `SQLQueryTestHelper#replaceNotIncludedMsg` to remove `@hashCode` #39598

LuciferYang · 2023-01-16T06:30:57Z

What changes were proposed in this pull request?

The daily build GA Master Scala 2.13 + Hadoop 3 + JDK 8 failed after #39468 merged, the failed tests includes:

org.apache.spark.sql.SQLQueryTestSuite
org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite

and the failure error message is similar to the following:

2023-01-11T01:03:46.3478323Z 01:03:46.347 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: 
2023-01-11T01:03:46.3677880Z [info]- explain.sql *** FAILED *** (348 milliseconds)
2023-01-11T01:03:46.3678710Z [info]  explain.sql
2023-01-11T01:03:46.3679479Z [info]  Expected "...es.CatalogFileIndex@[7d811218, [key, val]
2023-01-11T01:03:46.3680509Z [info]  +- Project [key#x, val#x]
2023-01-11T01:03:46.3681033Z [info]     +- SubqueryAlias spark_catalog.default.explain_temp4
2023-01-11T01:03:46.3684259Z [info]        +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet
2023-01-11T01:03:46.3684922Z [info]   
2023-01-11T01:03:46.3685766Z [info]  == Optimized Logical Plan ==
2023-01-11T01:03:46.3687590Z [info]  InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@7d811218, [key, val]
2023-01-11T01:03:46.3688465Z [info]  +- WriteFiles
2023-01-11T01:03:46.3690929Z [info]     +- Sort [val#x ASC NULLS FIRST], false
2023-01-11T01:03:46.3691387Z [info]        +- Project [key#x, empty2null(val#x) AS val#x]
2023-01-11T01:03:46.3692078Z [info]           +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet
2023-01-11T01:03:46.3692549Z [info]   
2023-01-11T01:03:46.3693443Z [info]  == Physical Plan ==
2023-01-11T01:03:46.3695233Z [info]  Execute InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@7d811218], [key, val]
2023-01-11T01:03:46.3696100Z [info]  +- Writ...", but got "...es.CatalogFileIndex@[cdfa8472, [key, val]
2023-01-11T01:03:46.3698327Z [info]  +- Project [key#x, val#x]
2023-01-11T01:03:46.3698881Z [info]     +- SubqueryAlias spark_catalog.default.explain_temp4
2023-01-11T01:03:46.3699680Z [info]        +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet
2023-01-11T01:03:46.3704986Z [info]   
2023-01-11T01:03:46.3705457Z [info]  == Optimized Logical Plan ==
2023-01-11T01:03:46.3717140Z [info]  InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@cdfa8472, [key, val]
2023-01-11T01:03:46.3718309Z [info]  +- WriteFiles
2023-01-11T01:03:46.3718964Z [info]     +- Sort [val#x ASC NULLS FIRST], false
2023-01-11T01:03:46.3719752Z [info]        +- Project [key#x, empty2null(val#x) AS val#x]
2023-01-11T01:03:46.3723046Z [info]           +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet
2023-01-11T01:03:46.3723598Z [info]   
2023-01-11T01:03:46.3726955Z [info]  == Physical Plan ==
2023-01-11T01:03:46.3728111Z [info]  Execute InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@cdfa8472], [key, val]
2023-01-11T01:03:46.3898445Z [info]  +- Writ..." Result did not match for query #21
2023-01-11T01:03:46.3902948Z [info]  EXPLAIN EXTENDED INSERT INTO TABLE explain_temp5 SELECT * FROM explain_temp4 (SQLQueryTestSuite.scala:495)
2023-01-11T01:03:46.3903881Z [info]  org.scalatest.exceptions.TestFailedException:
2023-01-11T01:03:46.3904492Z [info]  at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
2023-01-11T01:03:46.3905449Z [info]  at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
2023-01-11T01:03:46.3906493Z [info]  at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
2023-01-11T01:03:46.3907683Z [info]  at org.scalatest.Assertions.assertResult(Assertions.scala:847)
2023-01-11T01:03:46.3908243Z [info]  at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
2023-01-11T01:03:46.3908812Z [info]  at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)
2023-01-11T01:03:46.3910011Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runQueries$11(SQLQueryTestSuite.scala:495)
2023-01-11T01:03:46.3910611Z [info]  at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
2023-01-11T01:03:46.3911163Z [info]  at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
2023-01-11T01:03:46.3912094Z [info]  at scala.collection.AbstractIterable.foreach(Iterable.scala:926)
2023-01-11T01:03:46.3912781Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runQueries$9(SQLQueryTestSuite.scala:486)
2023-01-11T01:03:46.3913371Z [info]  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
2023-01-11T01:03:46.3914237Z [info]  at org.scalatest.Assertions.withClue(Assertions.scala:1065)
2023-01-11T01:03:46.3915165Z [info]  at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
2023-01-11T01:03:46.3915725Z [info]  at org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564)
2023-01-11T01:03:46.3916341Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.runQueries(SQLQueryTestSuite.scala:462)
2023-01-11T01:03:46.3917485Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runTest$35(SQLQueryTestSuite.scala:364)
2023-01-11T01:03:46.3918517Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runTest$35$adapted(SQLQueryTestSuite.scala:362)
2023-01-11T01:03:46.3919102Z [info]  at scala.collection.immutable.List.foreach(List.scala:333)
2023-01-11T01:03:46.3919675Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.runTest(SQLQueryTestSuite.scala:362)
2023-01-11T01:03:46.3921754Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$createScalaTestCase$6(SQLQueryTestSuite.scala:269)
2023-01-11T01:03:46.3922358Z [info]  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
2023-01-11T01:03:46.3923784Z [info]  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
2023-01-11T01:03:46.3924473Z [info]  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
2023-01-11T01:03:46.3925286Z [info]  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
2023-01-11T01:03:46.3926199Z [info]  at org.scalatest.Transformer.apply(Transformer.scala:22)
2023-01-11T01:03:46.3927071Z [info]  at org.scalatest.Transformer.apply(Transformer.scala:20)
2023-01-11T01:03:46.3928583Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
2023-01-11T01:03:46.3929225Z [info]  at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207)
2023-01-11T01:03:46.3930091Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
2023-01-11T01:03:46.3933329Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
2023-01-11T01:03:46.3933893Z [info]  at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
2023-01-11T01:03:46.3934875Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
2023-01-11T01:03:46.3935479Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
2023-01-11T01:03:46.3936453Z [info]  at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66)
2023-01-11T01:03:46.3937318Z [info]  at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
2023-01-11T01:03:46.3940707Z [info]  at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
2023-01-11T01:03:46.3941350Z [info]  at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:66)
2023-01-11T01:03:46.3941962Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
2023-01-11T01:03:46.3943332Z [info]  at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
2023-01-11T01:03:46.3944504Z [info]  at scala.collection.immutable.List.foreach(List.scala:333)
2023-01-11T01:03:46.3950194Z [info]  at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
2023-01-11T01:03:46.3950748Z [info]  at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
2023-01-11T01:03:46.3951912Z [info]  at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
2023-01-11T01:03:46.3952515Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
2023-01-11T01:03:46.3953476Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
2023-01-11T01:03:46.3954069Z [info]  at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
2023-01-11T01:03:46.3966445Z [info]  at org.scalatest.Suite.run(Suite.scala:1114)
2023-01-11T01:03:46.3967583Z [info]  at org.scalatest.Suite.run$(Suite.scala:1096)
2023-01-11T01:03:46.3968377Z [info]  at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
2023-01-11T01:03:46.3969537Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
2023-01-11T01:03:46.3970510Z [info]  at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
2023-01-11T01:03:46.3971298Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
2023-01-11T01:03:46.3972182Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
2023-01-11T01:03:46.3973529Z [info]  at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:66)
2023-01-11T01:03:46.3974433Z [info]  at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
2023-01-11T01:03:46.3977778Z [info]  at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
2023-01-11T01:03:46.3984781Z [info]  at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
2023-01-11T01:03:46.3985521Z [info]  at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:66)
2023-01-11T01:03:46.3986684Z [info]  at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
2023-01-11T01:03:46.3987264Z [info]  at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
2023-01-11T01:03:46.3987774Z [info]  at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
2023-01-11T01:03:46.3988269Z [info]  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2023-01-11T01:03:46.3989260Z [info]  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2023-01-11T01:03:46.3996895Z [info]  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2023-01-11T01:03:46.3997550Z [info]  at java.lang.Thread.run(Thread.java:750)

The reason for the failure is that the result of CatalogFileIndex printed when using Scala 2.12 is different from Scala 2.13:

When using Scala 2.12, the hashCode of CatalogFileIndex is 7d811218

InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@7d811218, [key, val]

When using Scala 2.13, the hashCode of CatalogFileIndex is cdfa8472

InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@cdfa8472, [key, val]

So this add a new replaceAll action to SQLQueryTestHelper#replaceNotIncludedMsg to remove @hashCode
to make CatalogFileIndex print the same results when using Scala 2.12 and 2.13

Why are the changes needed?

Make daily build GA Master Scala 2.13 + Hadoop 3 + JDK 8 can run successfully

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Actions
Manual test with this pr:

gh pr checkout 39598
dev/change-scala-version.sh 2.13
build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z explain.sql" -Pscala-2.13
build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z explain-aqe.sql" -Pscala-2.13
build/sbt -Phive-thriftserver "hive-thriftserver/testOnly org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -- -z explain.sql" -Pscala-2.13
build/sbt -Phive-thriftserver "hive-thriftserver/testOnly org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -- -z explain-aqe.sql" -Pscala-2.13

LuciferYang · 2023-01-16T07:01:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala

@@ -104,6 +104,13 @@ case class TableIdentifier(table: String, database: Option[String], catalog: Opt

  def this(table: String) = this(table, None, None)
  def this(table: String, database: Option[String]) = this(table, database, None)
+
+  override def equals(obj: Any): Boolean = obj match {


cc @cloud-fan @HyukjinKwon @dongjoon-hyun @ulysses-you

The daily build GA Master Scala 2.13 + Hadoop 3 + JDK 8 failed after #39468 merged.

https://github.com/apache/spark/actions/runs/3924877161

https://github.com/apache/spark/actions/runs/3919886270

https://github.com/apache/spark/actions/runs/3914162890

https://github.com/apache/spark/actions/runs/3905252989

https://github.com/apache/spark/actions/runs/3895875210

https://github.com/apache/spark/actions/runs/3895875210

Are there any suggestions for better solutions? For example, override toString of CatalogFileIndex? What is better to print?

Currently printing is org.apache.spark.sql.execution.datasources.CatalogFileIndex@hashCode

The better solution would be to normalize the output by removing CatalogFileIndex@xxx part.

just print org.apache.spark.sql.execution.datasources.CatalogFileIndex?

We can replace CatalogFileIndex@xxx to CatalogFileIndex in SQLQueryTestHelper#replaceNotIncludedMsg to recover test.

@ulysses-you do you mean

spark/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

Line 173 in 2062c74

output

change to use SQLQueryTestHelper.replaceNotIncludedMsg(output) ?

it had already used

dongjoon-hyun

Thank you for making a PR, @LuciferYang .

dongjoon-hyun

Could you make a separate test PR to scrub the hashCode instead of this follow-up, @LuciferYang ?

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

Lines 843 to 845 in 2062c74

    
           private def scrubOutIds(string: String): String = 
        
             string.replaceAll("#\\d+", "#x") 
        
               .replaceAll("operator id = \\d+", "operator id = #x")

I believe you can do that here in SQLQueryTestSuite.

spark/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

Line 173 in 2062c74

output

LuciferYang · 2023-01-16T07:12:27Z

Could you make a separate test PR to scrub the hashCode instead of this follow-up, @LuciferYang ?

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

Lines 843 to 845 in 2062c74

private def scrubOutIds(string: String): String =

string.replaceAll("#\\d+", "#x")

.replaceAll("operator id = \\d+", "operator id = #x")

I believe you can do that here in SQLQueryTestSuite.

spark/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

Line 173 in 2062c74

output

ok, let me try this

cloud-fan · 2023-01-16T08:51:38Z

sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out

@@ -1086,20 +1086,20 @@ struct<plan:string>
   +- 'UnresolvedRelation [explain_temp4], [], false

 == Analyzed Logical Plan ==
-InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@7d811218, [key, val]
+InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@5c30f6b5, [key, val]


let's not include the hashCode in the golden files.

LuciferYang · 2023-01-16T09:48:33Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala

@@ -47,6 +47,7 @@ trait SQLQueryTestHelper {
      .replaceAll("Last Access.*", s"Last Access $notIncludedMsg")
      .replaceAll("Partition Statistics\t\\d+", s"Partition Statistics\t$notIncludedMsg")
      .replaceAll("\\*\\(\\d+\\) ", "*") // remove the WholeStageCodegen codegenStageIds
+      .replaceAll("@[0-9a-z]+,", "@x,") // remove hashCode


Add , to avoid userinfo@spark.apache.org being replaced by mistake

spark/sql/core/src/test/resources/sql-tests/results/url-functions.sql.out

Line 55 in 9b647e8

userinfo@spark.apache.org

@cloud-fan @dongjoon-hyun @ulysses-you

Change to add a new replaceAll action to SQLQueryTestHelper#replaceNotIncludedMsg to replace @hashCode to @x.

or maybe just remove all? since @x is uesless anyway.

this one change to remove all

cloud-fan · 2023-01-16T10:06:33Z

sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out

@@ -1086,20 +1086,20 @@ struct<plan:string>
   +- 'UnresolvedRelation [explain_temp4], [], false

 == Analyzed Logical Plan ==
-InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@7d811218, [key, val]
+InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@x, [key, val]


shall we simply override toString in FileIndex? One idea is to produce className(rootPaths.mString(","))

-InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex@7d811218, [key, val] +InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5), [key, val]

After override toString in FileIndex, Append, spark_catalog.default.explain_temp5, org.apache.spark.sql.execution.datasources.CatalogFileIndex@7d811218 disappeared from result

So, it means we cannot do that. Did I understand correctly? Given that, I'm +1 with the AS-IS PR.

@cloud-fan className(rootPaths.mString(",")) will print out the path related to the running environment, like

org.apache.spark.sql.execution.datasources.CatalogFileIndex(file:/Users/${userNaame}/spark-source/sql/core/spark-warehouse/org.apache.spark.sql.SQLQueryTestSuite/explain_temp5)

It seems not friendly to developers

So, it means we cannot do that. Did I understand correctly? Given that, I'm +1 with the AS-IS PR.

Only override the toString method cannot achieve the goal, and as I said above, rootPaths.mString(",") needs further cleaning to avoid differences in test execution paths, so I prefer the current way.

dongjoon-hyun · 2023-01-16T12:58:34Z

Since the PR builder is only testing Scala 2.12, I tested manually with Scala 2.13. Merged to master to recover Scala 2.13 test coverage. Thank you, @LuciferYang , @HyukjinKwon , @ulysses-you, @cloud-fan .

cloud-fan · 2023-01-16T12:58:59Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala

@@ -47,6 +47,7 @@ trait SQLQueryTestHelper {
      .replaceAll("Last Access.*", s"Last Access $notIncludedMsg")
      .replaceAll("Partition Statistics\t\\d+", s"Partition Statistics\t$notIncludedMsg")
      .replaceAll("\\*\\(\\d+\\) ", "*") // remove the WholeStageCodegen codegenStageIds
+      .replaceAll("@[0-9a-z]+,", ",") // remove hashCode


My point is we will change it very soon. It's meaningless to put a default object string in the EXPLAIN result, as it's not user-facing. Having a path string is fine, and if you look at this method, we already normalized location strings for other places.

Oh, sorry. It seems that I merged too early. Could you make a PR if this is not enough?

@ulysses-you can you help to fix this EXPLAIN issue?

spark/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala

Line 44 in cedc9d2

.replaceAll(s"file:.*$clsName", s"Location $notIncludedMsg/{warehouse_dir}")

It seems that the problem is here. The current replaceAll will match the longest match string, such as the following:

== Analyzed Logical Plan == InsertIntoHadoopFsRelationCommand file:/Users/yangjie01/SourceCode/git/spark-mine-12/spark-warehouse/org.apache.spark.sql.SQLQuerySuite/explain_temp5, false, [val#x], Parquet, [path=file:/Users/yangjie01/SourceCode/git/spark-mine-12/spark-warehouse/org.apache.spark.sql.SQLQuerySuite/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex(file:/Users/yangjie01/SourceCode/git/spark-mine-12/spark-warehouse/org.apache.spark.sql.SQLQuerySuite/explain_temp5), [key, val] +- Project [key#x, val#x] +- SubqueryAlias spark_catalog.default.explain_temp4 +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet

After the the replaceAll, it will become

== Analyzed Logical Plan == InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5), [key, val] +- Project [key#x, val#x] +- SubqueryAlias spark_catalog.default.explain_temp4 +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet

instead of

== Analyzed Logical Plan == InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5, false, [val#x], Parquet, [path=Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex(fLocation [not included in comparison]/{warehouse_dir}/explain_temp5), [key, val] +- Project [key#x, val#x] +- SubqueryAlias spark_catalog.default.explain_temp4 +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet

seems we can use non-greedy mode, by using file:.*?$clsName

Seems OK. Another pr #39610

@dongjoon-hyun Very sorry for giving misleading suggestions

### What changes were proposed in this pull request? The main change of this pr is to fix suggestions of #39598 (comment): - revert the change of #39598 - override `toString` method of `FileIndex` to print className with `rootPaths` - change the replacement rule of `SQLQueryTestHelper#replaceNotIncludedMsg` method for `file://xxx/clsName/` path to replace one by one instead of replacing the longest match ### Why are the changes needed? Fix suggestions of #39598 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manually test `build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" -Pscala-2.13` , run successful Closes #39610 from LuciferYang/SPARK-41708-FOLLOWUP-n. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

LuciferYang added 2 commits January 16, 2023 13:57

Override equals and hashCode for TableIdentifier

7bd488b

update golden files

104cdac

github-actions bot added the SQL label Jan 16, 2023

LuciferYang commented Jan 16, 2023

View reviewed changes

dongjoon-hyun reviewed Jan 16, 2023

View reviewed changes

cloud-fan reviewed Jan 16, 2023

View reviewed changes

LuciferYang added 3 commits January 16, 2023 17:12

remove hashCode and revert change

38be9a0

revert change

b0f8c6c

update golden files

f30204e

LuciferYang changed the title ~~[SPARK-41708][SQL][FOLLOWUP] Add a new replaceAll to SQLQueryTestHelper#replaceNotIncludedMsg to remove hashCode~~ [SPARK-41708][SQL][FOLLOWUP] Add a new replaceAll to SQLQueryTestHelper#replaceNotIncludedMsg to remove hashCode Jan 16, 2023

LuciferYang commented Jan 16, 2023

View reviewed changes

cloud-fan reviewed Jan 16, 2023

View reviewed changes

remove @Hashcode

bfe25b2

LuciferYang changed the title ~~[SPARK-41708][SQL][FOLLOWUP] Add a new replaceAll to SQLQueryTestHelper#replaceNotIncludedMsg to remove hashCode~~ [SPARK-41708][SQL][FOLLOWUP] Add a new replaceAll to SQLQueryTestHelper#replaceNotIncludedMsg to remove @hashCode Jan 16, 2023

dongjoon-hyun approved these changes Jan 16, 2023

View reviewed changes

HyukjinKwon approved these changes Jan 16, 2023

View reviewed changes

cloud-fan reviewed Jan 16, 2023

View reviewed changes

dongjoon-hyun closed this in cedc9d2 Jan 16, 2023

LuciferYang mentioned this pull request Jan 16, 2023

[SPARK-41708][SQL][FOLLOWUP] Override toString method of FileIndex #39610

Closed

	private def scrubOutIds(string: String): String =
	string.replaceAll("#\\d+", "#x")
	.replaceAll("operator id = \\d+", "operator id = #x")

[SPARK-41708][SQL][FOLLOWUP] Add a new replaceAll to SQLQueryTestHelper#replaceNotIncludedMsg to remove @hashCode #39598

[SPARK-41708][SQL][FOLLOWUP] Add a new replaceAll to SQLQueryTestHelper#replaceNotIncludedMsg to remove @hashCode #39598

Uh oh!

Conversation

LuciferYang commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Jan 16, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

[SPARK-41708][SQL][FOLLOWUP] Add a new `replaceAll` to `SQLQueryTestHelper#replaceNotIncludedMsg` to remove `@hashCode` #39598

[SPARK-41708][SQL][FOLLOWUP] Add a new `replaceAll` to `SQLQueryTestHelper#replaceNotIncludedMsg` to remove `@hashCode` #39598

LuciferYang commented Jan 16, 2023 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

LuciferYang Jan 16, 2023 •

edited

Loading

LuciferYang Jan 16, 2023 •

edited

Loading

dongjoon-hyun commented Jan 16, 2023 •

edited

Loading

cloud-fan Jan 16, 2023 •

edited

Loading

LuciferYang Jan 16, 2023 •

edited

Loading

cloud-fan Jan 16, 2023 •

edited

Loading