Skip to content

[SPARK-25139][SPARK-18406][CORE][2.4] Avoid NonFatals to kill the Executor in PythonRunner #24552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jiangxb1987
Copy link
Contributor

What changes were proposed in this pull request?

Backport #24542 to 2.4.

How was this patch tested?

existing tests

… in PythonRunner

## What changes were proposed in this pull request?

Python uses a prefetch approach to read the result from upstream and serve them in another thread, thus it's possible that if the children operator doesn't consume all the data then the Task cleanup may happen before Python side read process finishes, this in turn create a race condition that the block read locks are freed during Task cleanup and then the reader try to release the read lock it holds and find it has been released, in this case we shall hit a AssertionError.

We shall catch the AssertionError in PythonRunner and prevent this kill the Executor.

## How was this patch tested?

Hard to write a unit test case for this case, manually verified with failed job.

Closes apache#24542 from jiangxb1987/pyError.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@SparkQA
Copy link

SparkQA commented May 8, 2019

Test build #105246 has finished for PR 24552 at commit c6a18ac.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented May 8, 2019

Test build #105252 has finished for PR 24552 at commit c6a18ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. This is a clean backport. Thank you, @jiangxb1987 and @HyukjinKwon !

Merged to branch-2.4.

dongjoon-hyun pushed a commit that referenced this pull request May 8, 2019
…cutor in PythonRunner

## What changes were proposed in this pull request?

Backport #24542 to 2.4.

## How was this patch tested?

existing tests

Closes #24552 from jiangxb1987/SPARK-25139-2.4.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@jiangxb1987 jiangxb1987 deleted the SPARK-25139-2.4 branch May 8, 2019 18:50
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
…cutor in PythonRunner

## What changes were proposed in this pull request?

Backport apache#24542 to 2.4.

## How was this patch tested?

existing tests

Closes apache#24552 from jiangxb1987/SPARK-25139-2.4.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
…cutor in PythonRunner

## What changes were proposed in this pull request?

Backport apache#24542 to 2.4.

## How was this patch tested?

existing tests

Closes apache#24552 from jiangxb1987/SPARK-25139-2.4.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
XUJiahua pushed a commit to XUJiahua/spark that referenced this pull request Apr 9, 2020
…cutor in PythonRunner

Backport apache#24542 to 2.4.

existing tests

Closes apache#24552 from jiangxb1987/SPARK-25139-2.4.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 2f16255)
(cherry picked from commit d6cd2eb4bdac48c7d2f11faca11f53d901f6737b)
(cherry picked from commit 6d23c7a5f28cb988a3b3a820480453a6eafb0cf7)

Change-Id: I4401ec663789bcc63c17babe3fe90c3f7bb53364
(cherry picked from commit ac367b740b4e6cecf563146b189d8153106d208d)
(cherry picked from commit 3813df0ab9d48d6b1f988ac165f1cb0e00f2e4c0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants