GPU-centric Data Placement And Zero-Copy Data Access

Quiver-Feature is a RDMA-based high performance distributed feature collection component for training GNN models on extremely large graphs, It is built on Quiver and has several novel features:

High Performance: Quiver-Feature has 5-10x throughput performance over feature collection solutions in existing GNN systems such as DGL and PyG.
Maximum Hardware Resource Utilization Efficiency: Quiver-Feature has minimal CPU usage and minimal memory bus traffic, leaving much of the CPU and memory resource to tasks like graph sampling and model training.
Easy to use: To use Quiver-Feature, developers only need to add a few lines of code in existing PyG/DGL programs. Quiver-Feature is thus easy to be adopted by PyG/DGL users and deployed in production clusters.

GPU-centric Data Placement And Zero-Copy Data Access

GPU-centric data placement and Zero-Copy data access method are two keys behind Quiver-Feature's high performance.

GPU-Centric Data Placement: Quiver-Feature has a unified view of memories across heterogeneous devices and machines. It classifies these memories into 4 memory spaces under a GPU-centric view: Local HBM(Current GPU's Memory),Neighbor HBM, Local DRAM(Current machines's CPU memory) and Remote DRAM(Remote CPU's memory). These 4 memory spaces have connections with each other using PCIe, NVLink and RDMA etc.

Accessing different memory spaces from GPU has unbalanced performance. Considering that feature data access frequency during GNN training is also unbalanced, Quiver-Feature uses an application-aware and GPU-Centric data palcement algorithm to takes full advantage of the GPU-centric multi-level memory layers.

Zero-Copy Data Access: Feature collection in GNN training involves massive data movement across network, DRAM, PCIe and NVLink and any extra memory copy hurts the e2e performance. Quiver-Feature uses one-sided commnunication methods such as UVA for local memory spaces access(Local HBM, Local DRAM, Neighbor HBM) and RDMA READ for remote memory space access(Remote DRAM), achiving zero-copy and minimum CPU intervention.(You can refer to this document for more RDMA details)

DistTensorPGAS: Above those memory spaces, Quiver-Feature adopts PGAS memory model and implements a 2-dimension distributed tensor abstraction which is called DistTensorPGAS. Users can use DistTensorPGAS just like a local torch.Tensor, such as querying shape and performing slicing operation etc.

Performance Benchmark

As far as we know, there's no public GNN system directly supports using RDMA for feature collection. DGL uses TensorPipe as its rpc backend, TensorPipe itself supports RDMA but DGL has not integrated this feature. Since TensorPipe is also the official rpc backend of Pytorch, we compare the feature collection performance betweenQuiver-Feature with Pytorch-RPC Based Solution.

We have 2 machines and 100Gbps IB networks between them. We partition the data uniformly and start M GPU training processes on each machine(which we will refer as 2 Machines 2M GPUs in the following result chart). we benchmark feature collection performance of Quiver-Feature and Pytorch-RPC Based Solution and we can see that Quiver-Feature is 5x better over Pytorch-RPC Based Solution in all settings.

Install

Install From Source(Recommended For Now)

Install Quiver.

Install Quiver-Feature from source

 $ git clone git@github.com:quiver-team/quiver-feature
 $ cd quiver-feature/
 $ pip install .

Pip Install

Install Quiver.
Install the Quiver-Feature pip package.
```
 $ pip install quiver-feature
```

We have tested Quiver with the following setup:

OS: Ubuntu 18.04, Ubuntu 20.04
CUDA: 10.2, 11.1
GPU: Nvidia P100, V100, Titan X, A6000

Test Install

You can download Quiver-Feature's examples to test installation:

    $ git clone git@github.com:quiver-team/quiver-feature.git
    $ cd quiver-feature/examples/reddit
    $ python3 distribute_training.py

A successful run should contain the following line:

Starting Server With: xxxx

Quick Start

To use Quiver-Feature, you need to replace PyG's feature tensors with quiver_feature.DistTensorPGAS,this usually requires only a few changes in existing PyG programs with following 4 steps on each machine:

Load feature partition and meta data which belongs to the current machine.
Exchange feature partition meta data with other processes using quiver_feature.DistHelper.
Create a quiver_feature.DistTensorPGAS from local feature partition and meta data.
Pass the quiver_feature.DistTensorPGAS built above as parameter to each training process for feature collection.

Here is a simple example for using Quiver-Feature in a PyG's program. You can check the original scripts for more details.

    
    def train_process(rank, dist_tensor):
        ...
        for batch_size, n_id, adjs in train_loader:
                ...
                # Using DistTensorPGAS Just Like A torch.Tensor
                collected_feature = dist_tensor[n_id]
                ...

    if __name__ == "__main__":

        # Step 1: Load Local data partition
        local_tensor, cached_range, local_range = load_partitioned_data(...)

        # Step 2: Exchange TensorPoints Information
        dist_helper = DistHelper(...)
        tensor_endpoints = dist_helper.exchange_tensor_endpoints_info()

        
        # Step 3:  Build DistTensorPGAS from local feature partition
        dist_tensor = DistTensorPGAS(...)


        # Step 4: Spawn Training Processes Using DistTensor as Parameter
        mp.spawn(
                train_process,
                args=(..., dist_tensor, ...),
                nprocs=args.device_per_node,
                join=True
        )
        ...

License

Quiver-Feature is licensed under the Apache License, Version 2.0

Citation

If you use Quiver-Feature in your publication,please cite it by using the following BibTeX entry.

@Misc{Quiver-Feature,
    institution = {Quiver Team},
    title =  {Quiver-Feature:A High Performance Feature Collection Component For Training GNN On Extremely Large Graphs},
    howpublished = {\url{https://github.com/quiver-team/quiver-feature}},
    year = {2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
csrc		csrc
docs		docs
examples		examples
quiver_feature		quiver_feature
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.sh		build.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU-centric Data Placement And Zero-Copy Data Access

Performance Benchmark

Install

Install From Source(Recommended For Now)

Pip Install

Test Install

Quick Start

License

Citation

About

Releases

Packages

Languages

License

eedalong/quiver-feature

Folders and files

Latest commit

History

Repository files navigation

GPU-centric Data Placement And Zero-Copy Data Access

Performance Benchmark

Install

Install From Source(Recommended For Now)

Pip Install

Test Install

Quick Start

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages