Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#731] feat(spark): Make blockid layout configurable for Spark clients #1528

Merged
merged 29 commits into from
Mar 7, 2024

Conversation

EnricoMi
Copy link
Contributor

@EnricoMi EnricoMi commented Feb 14, 2024

What changes were proposed in this pull request?

Make bit-lengths in block id (block id layout) configurable through dynamic client config from coordinator or client config. Block id layout can be created from RssConf, or where that is not available, is being passed around.

  • Adds new options (defaults are equivalent to current values in Constants):
    • rss.client.blockId.sequenceNoBits
    • rss.client.blockId.partitionIdBits
    • rss.client.blockId.taskAttemptIdBits
  • Adds block id layout to two requests (default is layout with current values in Constants).

Default values have moved from Constants into BlockIdLayout. The following replacements exist:

  • PARTITION_ID_MAX_LENGTH: BlockIdLayout.DEFAULT.partitionIdBits
  • TASK_ATTEMPT_ID_MAX_LENGTH: BlockIdLayout.DEFAULT.taskAttemptIdBits
  • ATOMIC_INT_MAX_LENGTH: BlockIdLayout.DEFAULT.sequenceNoBits

Why are the changes needed?

The bit-lengths of sequence number, partition id and task attempt id in block id are defined in common/src/main/java/org/apache/uniffle/common/util/Constants.java. Changing these requires recompiling and redeploying the project. Making this configurable in coordinator.conf, server.conf or client-side would very useful.

Also see #1512, #749.

Fixes #731.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Tests.

Copy link

github-actions bot commented Feb 14, 2024

Test Results

 2 326 files  + 14  2 326 suites  +14   4h 29m 36s ⏱️ - 6m 0s
   851 tests + 28    849 ✅ + 27   2 💤 +1  0 ❌ ±0 
10 018 runs  +321  9 995 ✅ +312  23 💤 +9  0 ❌ ±0 

Results for commit bf0902d. ± Comparison against base commit d8aedf3.

♻️ This comment has been updated with latest results.

@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2024

Codecov Report

Attention: Patch coverage is 67.87879% with 53 lines in your changes are missing coverage. Please review.

Project coverage is 54.60%. Comparing base (9f4ccde) to head (bf0902d).
Report is 1 commits behind head on master.

Files Patch % Lines
...pache/uniffle/server/ShuffleServerGrpcService.java 0.00% 17 Missing ⚠️
...ffle/client/impl/grpc/ShuffleServerGrpcClient.java 0.00% 12 Missing ⚠️
.../org/apache/uniffle/common/util/BlockIdLayout.java 79.59% 7 Missing and 3 partials ⚠️
...n/java/org/apache/uniffle/common/util/BlockId.java 55.55% 2 Missing and 2 partials ⚠️
...che/uniffle/client/impl/ShuffleReadClientImpl.java 85.00% 1 Missing and 2 partials ⚠️
...equest/RssGetShuffleResultForMultiPartRequest.java 0.00% 3 Missing ⚠️
...fle/client/request/RssGetShuffleResultRequest.java 0.00% 3 Missing ⚠️
.../org/apache/uniffle/server/ShuffleTaskManager.java 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1528      +/-   ##
============================================
+ Coverage     53.63%   54.60%   +0.96%     
- Complexity     2822     2837      +15     
============================================
  Files           436      417      -19     
  Lines         24549    22295    -2254     
  Branches       2080     2094      +14     
============================================
- Hits          13167    12174     -993     
+ Misses        10562     9361    -1201     
+ Partials        820      760      -60     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@EnricoMi EnricoMi marked this pull request as ready for review February 26, 2024 20:02
@EnricoMi EnricoMi force-pushed the blockid-layout branch 2 times, most recently from 9fafb5d to e499311 Compare February 26, 2024 20:54
@EnricoMi
Copy link
Contributor Author

CC @zuston @jerqi @xianjingfeng

@EnricoMi
Copy link
Contributor Author

Should I make Spark2, TEZ and MR clients configurable in this PR as well, our should that better be done in a separate PR?

@jerqi
Copy link
Contributor

jerqi commented Feb 27, 2024

Should I make Spark2, TEZ and MR clients configurable in this PR as well, our should that better be done in a separate PR?

We can do it in an another pr.

@EnricoMi EnricoMi changed the title [#731] feat(conf): Make blockid layout configurable [#731] feat(conf): Make blockid layout configurable for Spark3 Feb 27, 2024
@jerqi jerqi changed the title [#731] feat(conf): Make blockid layout configurable for Spark3 [#731] feat(spark3): Make blockid layout configurable for Spark3 Feb 28, 2024
@EnricoMi EnricoMi force-pushed the blockid-layout branch 2 times, most recently from 1f3e6c7 to 6bd203b Compare February 29, 2024 06:56
@jerqi jerqi closed this Feb 29, 2024
@jerqi jerqi reopened this Feb 29, 2024
@EnricoMi EnricoMi force-pushed the blockid-layout branch 2 times, most recently from 3883937 to 262c343 Compare February 29, 2024 13:50
@EnricoMi
Copy link
Contributor Author

EnricoMi commented Mar 1, 2024

This is ready for review.

@@ -53,17 +53,17 @@
import org.apache.uniffle.client.factory.ShuffleClientFactory;
import org.apache.uniffle.common.ShuffleServerInfo;
import org.apache.uniffle.common.exception.RssException;
import org.apache.uniffle.common.util.BlockId;
import org.apache.uniffle.common.util.BlockIdLayout;
import org.apache.uniffle.common.util.Constants;

public class RssTezUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more cases for Tez?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic is unchanged and layout is not configurable for TEZ, so there is nothing new that needs to be tested.

If existing tests are incomplete, I'd suggest someone who has a deeper understanding of the internals of TEZ task attempt ids than I have adds more coverage in a separate PR. I am happy to adjust them afterwards in this PR. Alternatively, the coverage could be increased when adding configurable block id layout for TEZ in a follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic is unchanged and layout is not configurable for TEZ, so there is nothing new that needs to be tested.

If existing tests are incomplete, I'd suggest someone who has a deeper understanding of the internals of TEZ task attempt ids than I have adds more coverage in a separate PR. I am happy to adjust them afterwards in this PR. Alternatively, the coverage could be increased when adding configurable block id layout for TEZ in a follow up PR.

OK, I got it . I just feel that you changed the logic of Tez from the title.

@@ -528,6 +529,11 @@ public void getShuffleResult(
String appId = request.getAppId();
int shuffleId = request.getShuffleId();
int partitionId = request.getPartitionId();
BlockIdLayout blockIdLayout =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you consider some compatible issues? If older client connects a new server, the blockIdLayout may be null.

Copy link
Contributor Author

@EnricoMi EnricoMi Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could fall back to BlockIdLayout.DEFAULT here, but that would only work for clients that use the original values in Constants.java. For cluster setups where Constants.java has been changed, there is no way to pick the right block id layout here.

Copy link
Contributor

@jerqi jerqi Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request.getBlockIdLayout may return null if we use an old client to connect the shuffle server. The code will throw a NullPointerException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what do you suggest? Use BlockIdLayout.DEFAULT if request.getBlockIdLayout == null and mark this a breaking change for users that deployed a cluster with modified bit lengths in Constants.java? How can we make this compatible for those users?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use BlockIdLayout.DEFAULT if layout is null. We can't guarantee the compatibility if users modify our code in the project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, done!

+ "This may happen when the data is flushing, please ignore.",
totalLength,
dataFileLen,
BlockId.getPartitionId(blockId));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove partitionId here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual block id layout is not available here, and wiring it through from the client would mean to add it to many method signature calls. All this only for logging.

At least we should log the long block id here so it can be de-cyphered manually if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@jerqi jerqi changed the title [#731] feat(Spark): Make blockid layout configurable for Spark clients [#731] feat(spark): Make blockid layout configurable for Spark clients Mar 4, 2024
@jerqi
Copy link
Contributor

jerqi commented Mar 5, 2024

LGTM, @zuston Could you take an another look?

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, left a comment for the compatible problem.

And for the further optimization, the blockIds stored on the shuffle-server for me is not a good design, which will bring too much burden for shuffle-server, especially for many spark jobs, I have raise another issue to track this. #1538

Anyway, the above suggestion is not related with this PR. Thanks for your effort

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution! @EnricoMi

@zuston zuston merged commit dc890ff into apache:master Mar 7, 2024
41 checks passed
@EnricoMi
Copy link
Contributor Author

EnricoMi commented Mar 7, 2024

What about the SHUFFLE_SERVER_VERSION?

@zuston
Copy link
Member

zuston commented Mar 8, 2024

What about the SHUFFLE_SERVER_VERSION?

Make sense for me. WDYT? @jerqi

@jerqi
Copy link
Contributor

jerqi commented Mar 8, 2024

What about the SHUFFLE_SERVER_VERSION?

Make sense for me. WDYT? @jerqi

OK for me.

@EnricoMi
Copy link
Contributor Author

EnricoMi commented Mar 8, 2024

Let me do that in #1566.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Adaptive length of different parts of blockId
4 participants