This repository has been archived by the owner on Dec 20, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 71
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from Mellanox/yuvaldeg-patch-1
Update README.md
- Loading branch information
Showing
1 changed file
with
77 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# SparkRDMA Shuffle Manager Plugin | ||
The SparkRDMA Plugin is a high performance shuffle manager that uses RDMA (instead of TCP) when | ||
performing the shuffle phase of the Spark job. | ||
|
||
This open-source project is developed, maintained and supported by [Mellanox Technologies](http://www.mellanox.com). | ||
|
||
## Performance results | ||
Example performance speedup for HiBench TeraSort: | ||
 | ||
|
||
Running TeraSort with SparkRDMA is x1.41 faster than standard Spark (runtime in seconds) | ||
|
||
Testbed: | ||
|
||
175GB Workload | ||
|
||
15 Workers, 2x Intel Xeon E5-2697 v3 @ 2.60GHz, 28 cores per Worker, 256GB RAM, non-flash storage (HDD) | ||
|
||
Mellanox ConnectX-4 network adapter with 100GbE RoCE fabric, connected with a Mellanox Spectrum switch | ||
|
||
## Runtime requirements | ||
* Apache Spark 2.0.0 (more versions to be supported) | ||
* Java 8 | ||
* libdisni 1.2 | ||
* An RDMA-supported network, e.g. RoCE or Infiniband | ||
|
||
## Build | ||
|
||
* Building the SparkRDMA plugin requires [Apache Maven](http://maven.apache.org/) and Java 8 | ||
|
||
1. Obtain a clone of [SparkRDMA](https://github.com/Mellanox/SparkRDMA) | ||
|
||
2. Build the plugin: | ||
``` | ||
mvn -DskipTests package | ||
``` | ||
|
||
3. Obtain a clone of [DiSNI](https://github.com/zrlio/disni) for building libdisni 1.2: | ||
|
||
``` | ||
git clone https://github.com/zrlio/disni.git | ||
cd disni | ||
git checkout -b v1.2 247fe8abe54c90b450d2a4b0679e59cfa83205f6 | ||
``` | ||
|
||
4. Compile and install only libdisni (the jars are already included in the SparkRDMA plugin): | ||
|
||
``` | ||
cd libdisni | ||
sh autoprepare.sh | ||
./configure --with-jdk=/path/to/java8/jdk | ||
make | ||
make install | ||
``` | ||
5. libdisni **must** be installed on every Spark Master and Worker | ||
|
||
## Configuration | ||
|
||
* Provide Spark the location of the SparkRDMA plugin jars by using the extraClassPath option. For standalone mode this can | ||
be added to either spark-defaults.conf or any runtime configuration file. For client mode this **must** be added to spark-defaults.conf | ||
|
||
``` | ||
spark.driver.extraClassPath /path/to/SparkRDMA/target/spark-rdma-1.0-jar-with-dependencies.jar | ||
spark.executor.extraClassPath /path/to/SparkRDMA/target/spark-rdma-1.0-jar-with-dependencies.jar | ||
``` | ||
|
||
## Running | ||
|
||
* To enable and use the SparkRDMA Shuffle Manager plugin, add the following line to either spark-defaults.conf or any runtime configuration file: | ||
|
||
``` | ||
spark.shuffle.manager org.apache.spark.shuffle.rdma.RdmaShuffleManager | ||
``` | ||
|
||
## Contributions | ||
|
||
Any PR submissions are welcome |