[SPARK-17984][YARN][Mesos][Deploy][WIP] Add support for NUMA aware feature #15524
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Add NUMA aware support for Yarn based deployment mode.
This patch optimizes the memory allocation, executors are bound to NUMA nodes in round-robin for a worker node so that memory allocation tries local NUMA node firstly and only when there is no enough memory in local NUMA node it tries remote ones.
Before this patch, Spark is NUMA unaware in which many remote memory allocations happen and the tremendous remote memory accesses impact performance a lot. We observed significant performance improvement during NUMA aware patch evaluation.
To Do:
How was this patch tested?
We observed significant performance improvement during evaluation with BigBench. We are still making evaluation and more detailed results will be updated continuously.
Setup:
Cluster Topo: 1 Master + 4 Slaves (Spark on Yarn)
CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz(72 Cores)
Memory: 128GB(2 NUMA Nodes)
NIC: 1x10Gb/Sec
Disk: Write -1.5GB/Sec, Read- 5GB/Sec
SW Version: Hadoop-5.7.0 + Spark-2.0.0
NUMA Introduction
As below diagram depicts, in UMA(Uniform Memory Access) model, processors share one bus. The contention on bus becomes very heavy when processer scales up. NUMA(Non-Uniform Memory Access) processer has a better scalability by dividing processors and memory blocks into nodes, nodes are interconnected with added bus.
For NUMA, the memory accessing to a remote node is much slower than accesing to local one, while, for UMA memory accessing to any nodes is uniform.
For more NUMA information, please refer to https://en.wikipedia.org/wiki/Non-uniform_memory_access.