Skip to content

[SPARK-43945][SQL][TESTS] Fix bug for SQLQueryTestSuite when run on local env #41431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jun 2, 2023

What changes were proposed in this pull request?

The pr aims to fix bug for SQLQueryTestSuite when run on local env.
When SQLQueryTestSuite runs identifier-clause.sql in various different environments, the statement DESCRIBE SCHEMA IDENTIFIER('id' || 'ent') will generate different Owner values. Currently, this value(runner) is OK when runs on GA, but it will fail on local. In order to be compatible with the results of various different environments, the Owner value is removed for comparison.

Why are the changes needed?

Fix bugs and compatibility.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Manually test.
  • Pass GA.

@github-actions github-actions bot added the SQL label Jun 2, 2023
@panbingkun
Copy link
Contributor Author

[info] - identifier-clause.sql *** FAILED *** (809 milliseconds)
[info] identifier-clause.sql
[info] Expected "...ce Name ident
[info] Owner [runner]", but got "...ce Name ident
[info] Owner [panbingkun]" Result did not match for query #61
[info] DESCRIBE SCHEMA IDENTIFIER('id' || 'ent') (SQLQueryTestSuite.scala:842)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info] at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
[info] at org.scalatest.Assertions.assertResult(Assertions.scala:847)
[info] at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
[info] at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)
[info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$readGoldenFileAndCompareResults$3(SQLQueryTestSuite.scala:842)
[info] at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
[info] at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
[info] at scala.collection.AbstractIterable.foreach(Iterable.scala:926)
[info] at org.apache.spark.sql.SQLQueryTestSuite.readGoldenFileAndCompareResults(SQLQueryTestSuite.scala:833)
[info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runQueries$10(SQLQueryTestSuite.scala:586)
[info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[info] at org.scalatest.Assertions.withClue(Assertions.scala:1065)
[info] at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
[info] at org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564)
[info] at org.apache.spark.sql.SQLQueryTestSuite.runQueries(SQLQueryTestSuite.scala:582)
[info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runSqlTestCase$35(SQLQueryTestSuite.scala:424)
[info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runSqlTestCase$35$adapted(SQLQueryTestSuite.scala:422)
[info] at scala.collection.immutable.List.foreach(List.scala:333)
[info] at org.apache.spark.sql.SQLQueryTestSuite.runSqlTestCase(SQLQueryTestSuite.scala:422)
[info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$createScalaTestCase$6(SQLQueryTestSuite.scala:329)
[info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info] at org.scalatest.Transformer.apply(Transformer.scala:22)
[info] at org.scalatest.Transformer.apply(Transformer.scala:20)
[info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:221)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
[info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:67)
[info] at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info] at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:67)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
[info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info] at scala.collection.immutable.List.foreach(List.scala:333)
[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
[info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
[info] at org.scalatest.Suite.run(Suite.scala:1114)
[info] at org.scalatest.Suite.run$(Suite.scala:1096)
[info] at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
[info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:67)
[info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:67)
[info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
[info] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] at java.lang.Thread.run(Thread.java:750)

@HyukjinKwon
Copy link
Member

What's the bug? Mind elabourating it in the PR description?

@LuciferYang
Copy link
Contributor

@panbingkun Can we just enhance SQLQueryTestHelper#replaceNotIncludedMsg, replace Owner.* to s"Owner $notIncludedMsg"?

protected def replaceNotIncludedMsg(line: String): String = {
line.replaceAll("#\\d+", "#x")
.replaceAll("plan_id=\\d+", "plan_id=x")
.replaceAll(
s"Location.*$clsName/",
s"Location $notIncludedMsg/{warehouse_dir}/")
.replaceAll(s"file:[^\\s,]*$clsName", s"file:$notIncludedMsg/{warehouse_dir}")
.replaceAll("Created By.*", s"Created By $notIncludedMsg")
.replaceAll("Created Time.*", s"Created Time $notIncludedMsg")
.replaceAll("Last Access.*", s"Last Access $notIncludedMsg")
.replaceAll("Partition Statistics\t\\d+", s"Partition Statistics\t$notIncludedMsg")
.replaceAll("CTERelationDef \\d+,", s"CTERelationDef xxxx,")
.replaceAll("CTERelationRef \\d+,", s"CTERelationRef xxxx,")
.replaceAll("@\\w*,", s"@xxxxxxxx,")
.replaceAll("\\*\\(\\d+\\) ", "*") // remove the WholeStageCodegen codegenStageIds
}

And I think this one worth file a new Jira

@beliefer
Copy link
Contributor

beliefer commented Jun 2, 2023

@panbingkun Thank you for the fix. It seems it's not the issue of SQLQueryTestSuite. Could we find out the root cause ?

@panbingkun
Copy link
Contributor Author

What's the bug? Mind elabourating it in the PR description?

Ok, Let me do it.

@panbingkun
Copy link
Contributor Author

@panbingkun Can we just enhance SQLQueryTestHelper#replaceNotIncludedMsg, replace Owner.* to s"Owner $notIncludedMsg"?

protected def replaceNotIncludedMsg(line: String): String = {
line.replaceAll("#\\d+", "#x")
.replaceAll("plan_id=\\d+", "plan_id=x")
.replaceAll(
s"Location.*$clsName/",
s"Location $notIncludedMsg/{warehouse_dir}/")
.replaceAll(s"file:[^\\s,]*$clsName", s"file:$notIncludedMsg/{warehouse_dir}")
.replaceAll("Created By.*", s"Created By $notIncludedMsg")
.replaceAll("Created Time.*", s"Created Time $notIncludedMsg")
.replaceAll("Last Access.*", s"Last Access $notIncludedMsg")
.replaceAll("Partition Statistics\t\\d+", s"Partition Statistics\t$notIncludedMsg")
.replaceAll("CTERelationDef \\d+,", s"CTERelationDef xxxx,")
.replaceAll("CTERelationRef \\d+,", s"CTERelationRef xxxx,")
.replaceAll("@\\w*,", s"@xxxxxxxx,")
.replaceAll("\\*\\(\\d+\\) ", "*") // remove the WholeStageCodegen codegenStageIds
}

And I think this one worth file a new Jira

Ok, Very good proposal, let me do it.

@LuciferYang
Copy link
Contributor

@panbingkun Thank you for the fix. It seems it's not the issue of SQLQueryTestSuite. Could we find out the root cause ?

owner is namespace reserved properties when spark.sql.legacy.notReserveProperties is false, it always use SPARK_USER or UserGroupInformation.getCurrentUser().getShortUserName() and it cannot be modified

@beliefer
Copy link
Contributor

beliefer commented Jun 2, 2023

SPARK_USER

Thank you for your investigation. If so, it seems just set the SPARK_USER or CurrentUser in beforeAll.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks worthy for a JIRA. please file a JIRA for this stuff for trace-ability, @panbingkun .

@panbingkun
Copy link
Contributor Author

It looks worthy for a JIRA. please file a JIRA for this stuff for trace-ability, @panbingkun .

OK.

@beliefer
Copy link
Contributor

beliefer commented Jun 2, 2023

@panbingkun I tested with the below command and it works fine.
SPARK_GENERATE_GOLDEN_FILES=1 SPARK_USER="runner" build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z identifier-clause.sql"

I think we should just update the document of org.apache.spark.sql.SQLQueryTestSuite and add SPARK_USER="runner" in usage section.

s" for query #$i\n${expected.sql}") {
output.output
}
}
}

private def normalizeOwner(plan: String): String = {
plan.replaceAll("""Owner\t\{runner}""", "Owner\t" + Utils.getCurrentUserName)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this work is OK. But it introduce the extra complexity and readability.

@LuciferYang
Copy link
Contributor

@panbingkun I tested with the below command and it works fine. SPARK_GENERATE_GOLDEN_FILES=1 SPARK_USER="runner" build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z identifier-clause.sql"

I think we should just update the document of org.apache.spark.sql.SQLQueryTestSuite and add SPARK_USER="runner" in usage section.

Export SPARK_USER="runner" in build/mvn and build/sbt as default ? This is easier for developers

@panbingkun panbingkun changed the title [MINOR][TESTS] Fix bug for SQLQueryTestSuite when run on local env [SPARK-43945][SQL][TESTS] Fix bug for SQLQueryTestSuite when run on local env Jun 2, 2023
@panbingkun
Copy link
Contributor Author

This is all done! @HyukjinKwon @dongjoon-hyun @LuciferYang @beliefer

@beliefer
Copy link
Contributor

beliefer commented Jun 2, 2023

Export SPARK_USER="runner" in build/mvn and build/sbt as default ? This is easier for developers

Seems fine.

@LuciferYang
Copy link
Contributor

@panbingkun I tested with the below command and it works fine. SPARK_GENERATE_GOLDEN_FILES=1 SPARK_USER="runner" build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z identifier-clause.sql"
I think we should just update the document of org.apache.spark.sql.SQLQueryTestSuite and add SPARK_USER="runner" in usage section.

Export SPARK_USER="runner" in build/mvn and build/sbt as default ? This is easier for developers

@panbingkun can you test this change? May need additional check for Maven's testing

@@ -49,6 +49,7 @@ trait SQLQueryTestHelper extends Logging {
.replaceAll("Created By.*", s"Created By $notIncludedMsg")
.replaceAll("Created Time.*", s"Created Time $notIncludedMsg")
.replaceAll("Last Access.*", s"Last Access $notIncludedMsg")
.replaceAll("Owner.*", s"Owner $notIncludedMsg")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems good to me too. @panbingkun Thank you!

@LuciferYang
Copy link
Contributor

Let's waiting ci ~

@MaxGekk
Copy link
Member

MaxGekk commented Jun 2, 2023

+1, LGTM. Merging to master.
Thank you, @panbingkun and @dongjoon-hyun @beliefer @HyukjinKwon @LuciferYang for review.

@MaxGekk MaxGekk closed this in 18b9bd9 Jun 2, 2023
@panbingkun panbingkun deleted the fix_SQLQueryTestSuite_run_on_local branch June 3, 2023 13:02
czxm pushed a commit to czxm/spark that referenced this pull request Jun 12, 2023
… local env

### What changes were proposed in this pull request?
The pr aims to fix bug for SQLQueryTestSuite when run on local env.
When SQLQueryTestSuite runs `identifier-clause.sql` in various different environments, the statement `DESCRIBE SCHEMA IDENTIFIER('id' || 'ent')` will generate different `Owner` values. Currently, this value(`runner`) is OK when runs on GA, but it will fail on local. In order to be compatible with the results of various different environments, the `Owner` value is removed for comparison.

### Why are the changes needed?
Fix bugs and compatibility.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes apache#41431 from panbingkun/fix_SQLQueryTestSuite_run_on_local.

Authored-by: panbingkun <pbk1982@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants