You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 20, 2022. It is now read-only.
Peter Rudenko edited this page Mar 30, 2018
·
2 revisions
Troubleshooting
If you encounter spark job failures or performance inconsistencies when using the SparkRDMA plugin it is a good idea to refer to the job logs in hopes of identifying any potential issues.
$ cat <your log file> | grep Rdma
There will be a lot of informative information, not all of which is related to an actual error. A common issue related to performance is oversubscription of a QP. If you see the following indication, please follow the recomendation and increase the rdmaSendDepth parameter.
17/08/14 14:33:38 WARN RdmaChannel: RDMA channel org.apache.spark.shuffle.rdma.RdmaChannel@7608ffc9 oversubscription detected. RDMA send queue depth is too small. To improve performance, please set set spark.shuffle.io.rdmaSendDepth to a higher value (current depth: 1024
Failed to bind. Make sure your NIC supports RDMA. - add the following to spark-env.sh: