[CELEBORN-1498] Decide whether to reuse the shuffle id based on the appShuffle's numAvailableOutputs #2611

jiang13021 · 2024-07-10T13:37:41Z

What changes were proposed in this pull request?

Decide whether to reuse the shuffle id based on the appShuffle's numAvailableOutputs instead of deterministic level

Why are the changes needed?

In Spark, the DAGScheduler determines whether to reuse task output based on multiple factors, deterministic level is one of them, but not all. I constructed a case in the test, where a shuffleId should be regenerated instead of reused. IMO, celeborn should decide whether to reuse the shuffle id based on whether the map output is empty, which means to delegating the decision of reusing previous attempt results to the DAGScheduler.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add a test: org.apache.celeborn.tests.spark.CelebornFetchFailureSuite#test("celeborn spark integration test - resubmit a barrier stage and do not reuse the shuffle id")

turboFei · 2024-07-11T06:28:41Z

...t-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/ExecutorShuffleIdTracker.java

@@ -20,6 +20,8 @@
 import java.util.HashSet;
 import java.util.concurrent.ConcurrentHashMap;

+import org.apache.hadoop.classification.VisibleForTesting;


should be

import com.google.common.annotations.VisibleForTesting;

or just

// Visible for testing

Fixed, thank you.

…ppShuffle's numAvailableOutputs

github-actions · 2024-07-31T09:13:02Z

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2024-08-11T08:31:33Z

This issue was closed because it has been staled for 10 days with no activity.

turboFei · 2024-08-20T03:40:33Z

cc @AngersZhuuuu

mridulm · 2024-08-20T05:26:00Z

Once #2609 is merged, is this still relevant ?

turboFei · 2024-08-20T05:37:13Z

Thanks for the reminder @mridulm, seems not needed

jiang13021 mentioned this pull request Jul 10, 2024

[CELEBORN-1496] Differentiate map results with only different stageAttemptId #2609

Closed

turboFei reviewed Jul 11, 2024

View reviewed changes

[CELEBORN-1498] Decide whether to reuse the shuffle id based on the a…

31b8d34

…ppShuffle's numAvailableOutputs

jiang13021 force-pushed the celeborn-1498 branch from 2b898da to 31b8d34 Compare July 11, 2024 06:34

github-actions bot added the stale label Jul 31, 2024

github-actions bot closed this Aug 11, 2024

turboFei reopened this Aug 20, 2024

turboFei closed this Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CELEBORN-1498] Decide whether to reuse the shuffle id based on the appShuffle's numAvailableOutputs #2611

[CELEBORN-1498] Decide whether to reuse the shuffle id based on the appShuffle's numAvailableOutputs #2611

jiang13021 commented Jul 10, 2024

turboFei Jul 11, 2024

jiang13021 Jul 11, 2024

github-actions bot commented Jul 31, 2024

github-actions bot commented Aug 11, 2024

turboFei commented Aug 20, 2024

mridulm commented Aug 20, 2024

turboFei commented Aug 20, 2024

[CELEBORN-1498] Decide whether to reuse the shuffle id based on the appShuffle's numAvailableOutputs #2611

[CELEBORN-1498] Decide whether to reuse the shuffle id based on the appShuffle's numAvailableOutputs #2611

Conversation

jiang13021 commented Jul 10, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

turboFei Jul 11, 2024

Choose a reason for hiding this comment

jiang13021 Jul 11, 2024

Choose a reason for hiding this comment

github-actions bot commented Jul 31, 2024

github-actions bot commented Aug 11, 2024

turboFei commented Aug 20, 2024

mridulm commented Aug 20, 2024

turboFei commented Aug 20, 2024