Skip to content

[SPARK-17984][YARN][Mesos][Deploy][WIP] Add support for NUMA aware feature #15524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

quanfuw
Copy link

@quanfuw quanfuw commented Oct 18, 2016

What changes were proposed in this pull request?

Add NUMA aware support for Yarn based deployment mode.
This patch optimizes the memory allocation, executors are bound to NUMA nodes in round-robin for a worker node so that memory allocation tries local NUMA node firstly and only when there is no enough memory in local NUMA node it tries remote ones.
Before this patch, Spark is NUMA unaware in which many remote memory allocations happen and the tremendous remote memory accesses impact performance a lot. We observed significant performance improvement during NUMA aware patch evaluation.

To Do:

  1. Add support for NUMA node numbers' configuration and make testing.
  2. Add NUMA aware support for Mesos based deployment mode and make testing.
  3. Add NUMA aware support for Standalone deployment mode and make testing.

How was this patch tested?

We observed significant performance improvement during evaluation with BigBench. We are still making evaluation and more detailed results will be updated continuously.

Setup:
Cluster Topo: 1 Master + 4 Slaves (Spark on Yarn)
CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz(72 Cores)
Memory: 128GB(2 NUMA Nodes)
NIC: 1x10Gb/Sec
Disk: Write -1.5GB/Sec, Read- 5GB/Sec
SW Version: Hadoop-5.7.0 + Spark-2.0.0

NUMA Introduction

As below diagram depicts, in UMA(Uniform Memory Access) model, processors share one bus. The contention on bus becomes very heavy when processer scales up. NUMA(Non-Uniform Memory Access) processer has a better scalability by dividing processors and memory blocks into nodes, nodes are interconnected with added bus.
For NUMA, the memory accessing to a remote node is much slower than accesing to local one, while, for UMA memory accessing to any nodes is uniform.

image

For more NUMA information, please refer to https://en.wikipedia.org/wiki/Non-uniform_memory_access.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@quanfuw quanfuw changed the title add numa aware support(WIP, not ready for review) [Spark][JIRA: SPARK-17984][YARN, Mesos, Deploy][WIP] add support for numa aware feature Oct 18, 2016
@quanfuw quanfuw changed the title [Spark][JIRA: SPARK-17984][YARN, Mesos, Deploy][WIP] add support for numa aware feature [SPARK-17984][YARN][Mesos][Deploy][WIP] add support for numa aware feature Oct 18, 2016
@quanfuw quanfuw changed the title [SPARK-17984][YARN][Mesos][Deploy][WIP] add support for numa aware feature [SPARK-17984][YARN][Mesos][Deploy][WIP] Add support for numa aware feature Oct 18, 2016
@quanfuw quanfuw changed the title [SPARK-17984][YARN][Mesos][Deploy][WIP] Add support for numa aware feature [SPARK-17984][YARN][Mesos][Deploy][WIP] Add support for NUMA aware feature Oct 18, 2016
@srowen
Copy link
Member

srowen commented Oct 25, 2016

This should be closed in favor of #15579 at least

srowen added a commit to srowen/spark that referenced this pull request Oct 31, 2016
@asfgit asfgit closed this in 26b07f1 Oct 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants