Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48394][3.5][CORE] Cleanup mapIdToMapIndex on mapoutput unregister #46768

Closed
wants to merge 1 commit into from

Conversation

Ngone51
Copy link
Member

@Ngone51 Ngone51 commented May 28, 2024

This PR backports #46706 to branch 3.5.

What changes were proposed in this pull request?

This PR cleans up mapIdToMapIndex when the corresponding mapstatus is unregistered in three places:

  • removeMapOutput
  • removeOutputsByFilter
  • addMapOutput (old mapstatus overwritten)

Why are the changes needed?

There is only one valid mapstatus for the same mapIndex at the same time in Spark. mapIdToMapIndex should also follows the same rule to avoid chaos.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@LuciferYang
Copy link
Contributor

Traceback (most recent call last):
  File "/home/runner/work/spark/spark/./dev/run-tests.py", line 674, in <module>
    main()
  File "/home/runner/work/spark/spark/./dev/run-tests.py", line 547, in main
    changed_files = identify_changed_files_from_git_commits(
  File "/home/runner/work/spark/spark/dev/sparktestsupport/utils.py", line 86, in identify_changed_files_from_git_commits
    raw_output = subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 5[26](https://github.com/apache/spark/actions/runs/9264159299/job/25483780970?pr=46768#step:9:27), in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['git', 'diff', '--name-only', '07db4e5871cc083cd0178f5772b6884fe1b0dc04', 'c9d94ef8e7c7d35e3f2995ffb63596a993a766c8']' returned non-zero exit status 1[28](https://github.com/apache/spark/actions/runs/9264159299/job/25483780970?pr=46768#step:9:29).

It seems git diff failed to execute..

@Ngone51
Copy link
Member Author

Ngone51 commented May 31, 2024

OracleIntegrationSuite seems to be broken in branch-3.5. @yaooqinn Do you have insight for this?

[info] OracleIntegrationSuite:
[info] org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite *** ABORTED *** (49 seconds, 935 milliseconds)
[info]   java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:702)
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:608)
[info]   at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1248)
[info]   at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:1041)
[info]   at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:443)
[info]   at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:518)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:251)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1181)
[info]   at oracle.jdbc.driver.OracleStatement.executeSQLStatement(OracleStatement.java:1571)
[info]   at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1345)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3728)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeLargeUpdate(OraclePreparedStatement.java:3905)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:3880)
[info]   at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:993)
[info]   at org.apache.spark.sql.jdbc.v2.DockerJDBCIntegrationV2Suite.dataPreparation(DockerJDBCIntegrationV2Suite.scala:43)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.$anonfun$beforeAll$1(DockerJDBCIntegrationSuite.scala:171)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled(DockerIntegrationFunSuite.scala:49)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled$(DockerIntegrationFunSuite.scala:47)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.runIfTestsEnabled(DockerJDBCIntegrationSuite.scala:95)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:118)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:750)
[info]   Cause: oracle.jdbc.OracleDatabaseException: ORA-00933: SQL command not properly ended
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:710)
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:608)
[info]   at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1248)
[info]   at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:1041)
[info]   at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:443)
[info]   at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:518)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:251)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1181)
[info]   at oracle.jdbc.driver.OracleStatement.executeSQLStatement(OracleStatement.java:1571)
[info]   at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1345)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3728)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeLargeUpdate(OraclePreparedStatement.java:3905)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:3880)
[info]   at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:993)
[info]   at org.apache.spark.sql.jdbc.v2.DockerJDBCIntegrationV2Suite.dataPreparation(DockerJDBCIntegrationV2Suite.scala:43)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.$anonfun$beforeAll$1(DockerJDBCIntegrationSuite.scala:171)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled(DockerIntegrationFunSuite.scala:49)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled$(DockerIntegrationFunSuite.scala:47)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.runIfTestsEnabled(DockerJDBCIntegrationSuite.scala:95)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:118)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:750)

@yaooqinn
Copy link
Member

Hi @Ngone51 #46807 is about to recover this.

This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places:
* `removeMapOutput`
* `removeOutputsByFilter`
* `addMapOutput` (old mapstatus overwritten)

There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos.

No.

Unit tests.

No.

Closes apache#46706 from Ngone51/SPARK-43043-followup.

Lead-authored-by: Yi Wu <yi.wu@databricks.com>
Co-authored-by: wuyi <yi.wu@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yaooqinn
Copy link
Member

yaooqinn commented Jun 3, 2024

Merged to 3.5, thank you all

yaooqinn pushed a commit that referenced this pull request Jun 3, 2024
This PR backports #46706 to branch 3.5.

### What changes were proposed in this pull request?

This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places:
* `removeMapOutput`
* `removeOutputsByFilter`
* `addMapOutput` (old mapstatus overwritten)

### Why are the changes needed?

There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46768 from Ngone51/SPARK-48394-3.5.

Authored-by: Yi Wu <yi.wu@databricks.com>
Signed-off-by: Kent Yao <yao@apache.org>
@yaooqinn yaooqinn closed this Jun 3, 2024
@Ngone51
Copy link
Member Author

Ngone51 commented Jun 3, 2024

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants