[SPARK-48394][3.5][CORE] Cleanup mapIdToMapIndex on mapoutput unregister #46768

Ngone51 · 2024-05-28T06:10:09Z

This PR backports #46706 to branch 3.5.

What changes were proposed in this pull request?

This PR cleans up mapIdToMapIndex when the corresponding mapstatus is unregistered in three places:

removeMapOutput
removeOutputsByFilter
addMapOutput (old mapstatus overwritten)

Why are the changes needed?

There is only one valid mapstatus for the same mapIndex at the same time in Spark. mapIdToMapIndex should also follows the same rule to avoid chaos.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

LuciferYang · 2024-05-28T07:39:20Z

Traceback (most recent call last):
  File "/home/runner/work/spark/spark/./dev/run-tests.py", line 674, in <module>
    main()
  File "/home/runner/work/spark/spark/./dev/run-tests.py", line 547, in main
    changed_files = identify_changed_files_from_git_commits(
  File "/home/runner/work/spark/spark/dev/sparktestsupport/utils.py", line 86, in identify_changed_files_from_git_commits
    raw_output = subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 5[26](https://github.com/apache/spark/actions/runs/9264159299/job/25483780970?pr=46768#step:9:27), in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['git', 'diff', '--name-only', '07db4e5871cc083cd0178f5772b6884fe1b0dc04', 'c9d94ef8e7c7d35e3f2995ffb63596a993a766c8']' returned non-zero exit status 1[28](https://github.com/apache/spark/actions/runs/9264159299/job/25483780970?pr=46768#step:9:29).

It seems git diff failed to execute..

Ngone51 · 2024-05-31T01:42:20Z

OracleIntegrationSuite seems to be broken in branch-3.5. @yaooqinn Do you have insight for this?

[info] OracleIntegrationSuite:
[info] org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite *** ABORTED *** (49 seconds, 935 milliseconds)
[info]   java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:702)
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:608)
[info]   at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1248)
[info]   at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:1041)
[info]   at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:443)
[info]   at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:518)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:251)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1181)
[info]   at oracle.jdbc.driver.OracleStatement.executeSQLStatement(OracleStatement.java:1571)
[info]   at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1345)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3728)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeLargeUpdate(OraclePreparedStatement.java:3905)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:3880)
[info]   at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:993)
[info]   at org.apache.spark.sql.jdbc.v2.DockerJDBCIntegrationV2Suite.dataPreparation(DockerJDBCIntegrationV2Suite.scala:43)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.$anonfun$beforeAll$1(DockerJDBCIntegrationSuite.scala:171)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled(DockerIntegrationFunSuite.scala:49)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled$(DockerIntegrationFunSuite.scala:47)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.runIfTestsEnabled(DockerJDBCIntegrationSuite.scala:95)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:118)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:750)
[info]   Cause: oracle.jdbc.OracleDatabaseException: ORA-00933: SQL command not properly ended
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:710)
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:608)
[info]   at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1248)
[info]   at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:1041)
[info]   at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:443)
[info]   at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:518)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:251)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1181)
[info]   at oracle.jdbc.driver.OracleStatement.executeSQLStatement(OracleStatement.java:1571)
[info]   at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1345)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3728)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeLargeUpdate(OraclePreparedStatement.java:3905)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:3880)
[info]   at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:993)
[info]   at org.apache.spark.sql.jdbc.v2.DockerJDBCIntegrationV2Suite.dataPreparation(DockerJDBCIntegrationV2Suite.scala:43)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.$anonfun$beforeAll$1(DockerJDBCIntegrationSuite.scala:171)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled(DockerIntegrationFunSuite.scala:49)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled$(DockerIntegrationFunSuite.scala:47)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.runIfTestsEnabled(DockerJDBCIntegrationSuite.scala:95)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:118)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:750)

yaooqinn · 2024-05-31T02:10:39Z

Hi @Ngone51 #46807 is about to recover this.

This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places: * `removeMapOutput` * `removeOutputsByFilter` * `addMapOutput` (old mapstatus overwritten) There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos. No. Unit tests. No. Closes apache#46706 from Ngone51/SPARK-43043-followup. Lead-authored-by: Yi Wu <yi.wu@databricks.com> Co-authored-by: wuyi <yi.wu@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

mridulm

LGTM

yaooqinn · 2024-06-03T03:21:48Z

Merged to 3.5, thank you all

This PR backports #46706 to branch 3.5. ### What changes were proposed in this pull request? This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places: * `removeMapOutput` * `removeOutputsByFilter` * `addMapOutput` (old mapstatus overwritten) ### Why are the changes needed? There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46768 from Ngone51/SPARK-48394-3.5. Authored-by: Yi Wu <yi.wu@databricks.com> Signed-off-by: Kent Yao <yao@apache.org>

Ngone51 · 2024-06-03T14:06:38Z

Thanks all!

github-actions bot added the CORE label May 28, 2024

Ngone51 mentioned this pull request May 28, 2024

[SPARK-48394][3.5][CORE] Cleanup mapIdToMapIndex on mapoutput unregister #46747

Closed

Ngone51 force-pushed the SPARK-48394-3.5 branch from 07db4e5 to 7d57953 Compare May 30, 2024 08:27

Ngone51 force-pushed the SPARK-48394-3.5 branch from 7d57953 to 15ed5a0 Compare May 31, 2024 13:34

mridulm approved these changes Jun 1, 2024

View reviewed changes

yaooqinn approved these changes Jun 3, 2024

View reviewed changes

LuciferYang approved these changes Jun 3, 2024

View reviewed changes

yaooqinn closed this Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48394][3.5][CORE] Cleanup mapIdToMapIndex on mapoutput unregister #46768

[SPARK-48394][3.5][CORE] Cleanup mapIdToMapIndex on mapoutput unregister #46768

Ngone51 commented May 28, 2024

LuciferYang commented May 28, 2024

Ngone51 commented May 31, 2024

yaooqinn commented May 31, 2024

mridulm left a comment

yaooqinn commented Jun 3, 2024

Ngone51 commented Jun 3, 2024

[SPARK-48394][3.5][CORE] Cleanup mapIdToMapIndex on mapoutput unregister #46768

[SPARK-48394][3.5][CORE] Cleanup mapIdToMapIndex on mapoutput unregister #46768

Conversation

Ngone51 commented May 28, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

LuciferYang commented May 28, 2024

Ngone51 commented May 31, 2024

yaooqinn commented May 31, 2024

mridulm left a comment

Choose a reason for hiding this comment

yaooqinn commented Jun 3, 2024

Ngone51 commented Jun 3, 2024