Add support of Apache Uniffle for remote shuffle service#796
Merged
richox merged 3 commits intoapache:masterfrom Feb 5, 2025
Merged
Add support of Apache Uniffle for remote shuffle service#796richox merged 3 commits intoapache:masterfrom
richox merged 3 commits intoapache:masterfrom
Conversation
Member
|
Really looking forward to this feature. Thank you for your contribution |
merrily01
reviewed
Feb 5, 2025
spark-extension-shims-spark3/src/main/scala/org/apache/spark/sql/blaze/ShimsImpl.scala
Show resolved
Hide resolved
richox
approved these changes
Feb 5, 2025
spark-extension-shims-spark3/src/main/scala/org/apache/spark/sql/blaze/ShimsImpl.scala
Show resolved
Hide resolved
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
Uniffle is a high performance, general purpose remote shuffle service for distributed computing engines. It provides the ability to push shuffle data into centralized storage service, changing the shuffle style from "local file pull-like style" to "remote block push-like style". It brings in several advantages like supporting disaggregated storage deployment, super large shuffle jobs, and high elasticity. Currently it supports Apache Spark, Apache Hadoop MapReduce and Apache Tez.
Based on the above advantages, uniffle has been used by several commercial companies. After intergrating with blaze, users' spark jobs will benefit greatly from storage-computation separation and vectorized execution.
What changes are included in this PR?
Following the blaze's rss shuffle manager design to implement the writer + reader
Are there any user-facing changes?
Yes.