Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1498] Decide whether to reuse the shuffle id based on the appShuffle's numAvailableOutputs #2611

Closed
wants to merge 1 commit into from

Conversation

jiang13021
Copy link
Contributor

What changes were proposed in this pull request?

Decide whether to reuse the shuffle id based on the appShuffle's numAvailableOutputs instead of deterministic level

Why are the changes needed?

In Spark, the DAGScheduler determines whether to reuse task output based on multiple factors, deterministic level is one of them, but not all. I constructed a case in the test, where a shuffleId should be regenerated instead of reused. IMO, celeborn should decide whether to reuse the shuffle id based on whether the map output is empty, which means to delegating the decision of reusing previous attempt results to the DAGScheduler.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add a test: org.apache.celeborn.tests.spark.CelebornFetchFailureSuite#test("celeborn spark integration test - resubmit a barrier stage and do not reuse the shuffle id")

@@ -20,6 +20,8 @@
import java.util.HashSet;
import java.util.concurrent.ConcurrentHashMap;

import org.apache.hadoop.classification.VisibleForTesting;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be

import com.google.common.annotations.VisibleForTesting;

or just

 // Visible for testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thank you.

Copy link

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale label Jul 31, 2024
Copy link

This issue was closed because it has been staled for 10 days with no activity.

@github-actions github-actions bot closed this Aug 11, 2024
@turboFei
Copy link
Member

cc @AngersZhuuuu

@turboFei turboFei reopened this Aug 20, 2024
@mridulm
Copy link
Contributor

mridulm commented Aug 20, 2024

Once #2609 is merged, is this still relevant ?

@turboFei
Copy link
Member

Thanks for the reminder @mridulm, seems not needed

@turboFei turboFei closed this Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants