Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCA stage split: source + convert #1617

Merged
merged 48 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
716559c
Doca stage split: source + convert
eagonv Apr 11, 2024
9e7a163
Fix code style issues
eagonv Apr 11, 2024
d266ca1
1 thread only for DOCA Source Stage
eagonv Apr 11, 2024
63ea2ad
Progress in column creation
eagonv Apr 12, 2024
1573152
vdb_realtime works
eagonv Apr 13, 2024
537e5ba
VDB works + example directory
eagonv Apr 16, 2024
c456546
Add nemollm to pip dependencies in dev yaml
eagonv Apr 16, 2024
5af237f
Update readme
eagonv Apr 16, 2024
2da4b8c
Syntax fix
eagonv Apr 17, 2024
84cdffa
Merge branch 'branch-24.06' into doca-split
e-ago Apr 17, 2024
8b399e5
Minor update to documentation
eagonv Apr 17, 2024
0b1c263
Merge branch 'doca-split' of github.com:e-ago/MorpheusDoca into doca-…
eagonv Apr 17, 2024
e9e4295
Merge branch 'branch-24.06' into doca-split
e-ago Apr 17, 2024
deb2937
Fix vdb header, fix DOCA cleanup
eagonv Apr 18, 2024
9a2a071
Increase to 2 rx queues
eagonv Apr 18, 2024
c40f048
More fixes
eagonv Apr 19, 2024
b0ed2e5
DocaConvert stage allows dynamic number of packets + fix to python pi…
eagonv Apr 22, 2024
42676be
Fix code style
eagonv Apr 22, 2024
b9534ba
Minor fix
eagonv Apr 22, 2024
86977c2
Upgrade to DOCA 2.7 + minor fixes
eagonv May 9, 2024
000cf18
More improvements and codestyle fix
eagonv May 9, 2024
79dbca3
Doca stage split: source + convert
eagonv Apr 11, 2024
1884553
Fix code style issues
eagonv Apr 11, 2024
acff3b8
1 thread only for DOCA Source Stage
eagonv Apr 11, 2024
1ef468d
Progress in column creation
eagonv Apr 12, 2024
5c83c37
vdb_realtime works
eagonv Apr 13, 2024
a1e7e2d
VDB works + example directory
eagonv Apr 16, 2024
9a565be
Add nemollm to pip dependencies in dev yaml
eagonv Apr 16, 2024
438acb2
Update readme
eagonv Apr 16, 2024
ebbd17a
Syntax fix
eagonv Apr 17, 2024
061d0c0
Minor update to documentation
eagonv Apr 17, 2024
85bf6ee
Fix vdb header, fix DOCA cleanup
eagonv Apr 18, 2024
d4d8ec0
Increase to 2 rx queues
eagonv Apr 18, 2024
a9b63f8
More fixes
eagonv Apr 19, 2024
af24e93
DocaConvert stage allows dynamic number of packets + fix to python pi…
eagonv Apr 22, 2024
5639f28
Fix code style
eagonv Apr 22, 2024
2805669
Minor fix
eagonv Apr 22, 2024
3c0ccee
Upgrade to DOCA 2.7 + minor fixes
eagonv May 9, 2024
9960f44
More improvements and codestyle fix
eagonv May 9, 2024
d85d7e5
Merge branch 'doca-split' of github.com:e-ago/MorpheusDoca into doca-…
eagonv May 9, 2024
a59b53d
Merge remote-tracking branch 'upstream/branch-24.06' into doca-split
mdemoret-nv May 13, 2024
6109b83
Merging and removing commented code.
mdemoret-nv May 13, 2024
b96cdaf
Style cleanup
mdemoret-nv May 13, 2024
101991c
Passed linting.
mdemoret-nv May 14, 2024
427af1d
Resolving comments from feedback
mdemoret-nv May 14, 2024
eeea4a6
Merge pull request #2 from mdemoret-nv/mdd_fix-doca-split
e-ago May 14, 2024
b402693
Style cleanup
mdemoret-nv May 14, 2024
6aa8c70
Style cleanup
mdemoret-nv May 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
VDB works + example directory
  • Loading branch information
eagonv committed Apr 16, 2024
commit 537e5ba6201ee53f2611aba321bf308540960ed1
125 changes: 125 additions & 0 deletions examples/doca/vdb_realtime/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2023-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# DOCA Sensitive Information Detection Example

## Run Milvus

Download the milvus docker-compose file from the [Milvus GitHub repository]()

```bash
mkdir milvus
cd milvus
wget https://github.com/milvus-io/milvus/releases/download/v2.3.3/milvus-standalone-docker-compose-gpu.yml -O docker-compose.yml
```

Start Milvus

```bash
sudo docker-compose up -d
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved
```

## Launch Triton Inference Server

To serve the embedding model, we will use Triton:

```bash
cd ${MORPHEUS_ROOT}
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.01-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model all-MiniLM-L6-v2
```

## Populate the Milvus database

```bash
cd ${MORPHEUS_ROOT}

python examples/doca/vdb_realtime/vdb.py --nic_addr=ca:00.0 --gpu_addr=17:00.0
```

## Send data to the NIC to be indexed

On another machine, run the following command:

```bash
sudo python3 examples/doca/vdb_realtime/sender/send.py
```

On the original machine, wait for the "Upload rate" to match the "DOCA GPUNetIO Source rate" and then press `Ctrl+C` to stop the script. The output should look like the following

```
====Building Segment Complete!====
Accumulated 1 rows for collection: vdb_doca
Accumulated 2 rows for collection: vdb_doca
Accumulated 3 rows for collection: vdb_doca
Accumulated 1 rows for collection: vdb_doca
Accumulated 2 rows for collection: vdb_doca
Accumulated 3 rows for collection: vdb_doca
Stopping pipeline. Please wait... Press Ctrl+C again to kill.
====Stopping Pipeline====
====Pipeline Stopped====
DOCA GPUNetIO Source rate[Complete]: 229 pkts [04:29, 1.18s/ pkts]
Embedding rate[Complete]: 229 pkts [05:51, 1.53s/ pkts]
====Pipeline Complete====
```

## Query the Milvus database

First, set the NeMo LLM API Key:

```bash
export NGC_API_KEY="<YOUR_NGC_API>"
```

Then install basic requirements:
```bash
pip install langchain
pip install sentence-transformers
conda env update --solver=libmamba -n morpheus --file conda/environments/dev_cuda-121_arch-x86_64.yaml --prune
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved
```

Run the RAG example to query the Milvus database:

```bash
cd ${MORPHEUS_ROOT}
python examples/llm/main.py --use_cpp=True --log_level=DEBUG rag pipeline --vdb_resource_name=vdb_doca --question="What is DOCA SDK?","What is DOCA GPUNetIO?","What does DOCA GPUNetIO to remove the CPU from the critical path?"
```

You should see the answer to the query in the output:

```
Pipeline complete. Received 3 responses
Question:
What is DOCA?
Response:
DOCA is a library that provides a set of APIs for creating and managing network devices on GPUs.
Question:
What is the DOCA SDK?
Response:
The DOCA Software Development Kit (SDK) is a software development kit that provides a set of libraries, tools, and documentation to help developers create and deploy network applications on Mellanox network adapters.
Question:
What does DOCA GPUNetIO to remove the CPU from the critical path?
Response:
DOCA GPUNetIO enables GPU-centric solutions that remove the CPU from the critical path by providing the following features:
GPUDirect Async Kernel-Initiated Network (GDAKIN) communications – a CUDA kernel can invoke GPUNetIO device functions to receive or send, directly interacting with the NIC
CPU intervention is not needed in the application critical path
GPUDirect RDMA – receive packets directly into a contiguous GPU memory​ area
Semaphores – provide a standardized I/O communication protocol between the receiving entity and the CUDA kernel real-time packet processing​
Smart memory allocation – allocate aligned GPU memory buffers exposing them to direct CPU access
Combination of CUDA and DPDK gpudev library (with the DOCA GPUNetIO shared library is doca-gpu.pc. However, there is no pkgconfig file for the DOCA GPUNetIO CUDA device's static library /opt/mellanox/d
Total time: 10.61 sec
Pipeline runtime: 4.12 sec
```
40 changes: 40 additions & 0 deletions examples/doca/vdb_realtime/sender/dataset/doca_overview.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
This is an overview of the structure of NVIDIA DOCA documentation. It walks you through DOCA's developer zone portal which contains all the information about the DOCA toolkit from NVIDIA, providing everything you need to develop BlueField-accelerated applications.

The NVIDIA DOCA SDK enables developers to rapidly create applications and services on top of NVIDIA® BlueField® networking platform, leveraging industry-standard APIs. With DOCA, developers can deliver breakthrough networking, security, and storage performance by harnessing the power of NVIDIA's BlueField data-processing units (DPUs) and SuperNICs.

Installation

DOCA contains a runtime and development environment for both the host and as part of a BlueField device image. The full installation instructions for both can be found in the NVIDIA DOCA Installation Guide for Linux.
Whether DOCA has been installed on the host or on the BlueField networking platform, one can find the different DOCA components under the /opt/mellanox/doca directory. These include the traditional SDK-related components (libraries, header files, etc.) as well as the DOCA samples, applications, tools and more, as described in this document.

API

The DOCA SDK is built around the different DOCA libraries designed to leverage the capabilities of BlueField. Under the Programming Guides section, one can find a detailed description of each DOCA library, its goals, and API. These guides document DOCA's API, aiming to help developers wishing to develop DOCA-based programs.
The API References section holds the Doxygen-generated documentation of DOCA's official API. See NVIDIA DOCA Library APIs.
Please note that, as explained in the NVIDIA DOCA gRPC Infrastructure User Guide, some of DOCA's libraries also support a gRPC-based API. More information about these extended programming interfaces can be found in detail in the programming guides of the respective libraries.
Programming Guides
DOCA programming guides provide the full picture of DOCA libraries and their APIs. Each guide includes an introduction, architecture, API overview, and other library-specific information.
Each library's programming guide includes code snippets for achieving basic DOCA-based tasks. It is recommended to review these samples while going over the programming guide of the relevant DOCA library to learn about its API. The samples provide an implementation example of a single feature of a given DOCA library.
For a more detailed reference of full DOCA-based programs that make use of multiple DOCA libraries, please refer to the Reference Applications.

Applications

Applications are a higher-level reference code than the samples and demonstrate how a full DOCA-based program can be built. In addition to the supplied source code and compilation definitions, the applications are also shipped in their compiled binary form. This is to allow users an out-of-the-box interaction with DOCA-based programs without the hassle of a developer-oriented compilation process.
Many DOCA applications combine the functionality of more than one DOCA library and offer an example implementation for common scenarios of interest to users such as application recognition according to incoming/outgoing traffic, scanning files using the hardware RegEx acceleration, and much more.
For more information about DOCA applications, refer to DOCA Applications.

Tools

Some of the DOCA libraries are shipped alongside helper tools for both runtime and development. These tools are often an extension to the library's own API and bridge the gap between the library's expected input format and the input available to the users.
An example for one such DOCA tool is the doca_dpi_compiler, responsible for converting Suricata-based rules to their matching .cdo definition files which are then used by the DOCA DPI library.
For more information about DOCA tools, refer to DOCA Tools.

Services

DOCA services are containerized DOCA-based programs that provide an end-to-end solution for a given use case. DOCA services are accessible as part of NVIDIA's container catalog (NGC) from which they can be easily deployed directly to BlueField, and sometimes also to the host.
For more information about container-based deployment to the BlueField DPU or SmartNIC, refer to the NVIDIA BlueField DPU Container Deployment Guide.
For more information about DOCA services, refer to the DOCA Services.

Note

For questions, comments, and feedback, please contact us at DOCA-Feedback@exchange.nvidia.com
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
A growing number of network applications need to exercise GPU real-time packet processing in order to implement high data rate solutions: data filtering, data placement, network analysis, sensors’ signal processing, and more.

One primary motivation is the high degree of parallelism that the GPU can enable to process in parallel multiple packets while offering scalability and programmability.

For an overview of the basic concepts of these techniques and an initial solution based on the DPDK gpudev library, see Boosting Inline Packet Processing Using DPDK and GPUdev with GPUs.

This post explains how the new NVIDIA DOCA GPUNetIO Library can overcome some of the limitations found in the previous DPDK solution, moving a step closer to GPU-centric packet processing applications.
Introduction

Real-time GPU processing of network packets is a technique useful to several different application domains, including signal processing, network security, information gathering, and input reconstruction. The goal of these applications is to realize an inline packet processing pipeline to receive packets in GPU memory (without staging copies through CPU memory); process them in parallel with one or more CUDA kernels; and then run inference, evaluate, or send over the network the result of the calculation.

Typically, in this pipeline, the CPU is the intermediary because it has to synchronize network card (NIC) receive activity with the GPU processing. This wakes up the CUDA kernel as soon as new packets have been received in GPU memory. Similar considerations can be applied to the send side of the pipeline.
Graphic showing a CPU-centric application wherein the CPU has to wake up the network card to receive packets (that will be transferred directly in GPU memory through DMA), unblock the CUDA kernel waiting for those packets to arrive in GPU to actually start the packet processing.
Figure 1. CPU-centric application with the CPU orchestrating the GPU and network card work

The Data Plane Development Kit (DPDK) framework introduced the gpudev library to provide a solution for this kind of application: receive or send using GPU memory (GPUDirect RDMA technology) in combination with low-latency CPU synchronization. For more information about different approaches to coordinating CPU and GPU activity, see Boosting Inline Packet Processing Using DPDK and GPUdev with GPUs.
GPUDirect Async Kernel-Initiated Network communications

Looking at Figure 1, it is clear that the CPU is the main bottleneck. It has too many responsibilities in synchronizing NIC and GPU tasks and managing multiple network queues. As an example, consider an application with many receive queues and incoming traffic of 100 Gbps. A CPU-centric solution would have:

CPU invoking the network function on each receive queue to receive packets in GPU memory using one or multiple CPU cores
CPU collecting packets’ info (packets addresses, number)
CPU notifying the GPU about new received packets
GPU processing the packets

This CPU-centric approach is:

Resource consuming: To deal with high-rate network throughput (100 Gbps or more) the application may have to dedicate an entire CPU physical core to receive or send packets.
Not scalable: To receive or send in parallel with different queues, the application may have to use multiple CPU cores, even on systems where the total number of CPU cores may be limited to a low number (depending on the platform).
Platform-dependent: The same application on a low-power CPU decreases the performance.

The next natural step for GPU inline packet processing applications is to remove the CPU from the critical path. Moving to a GPU-centric solution, the GPU can directly interact with the NIC to receive packets so the processing can start as soon as packets arrive in GPU memory. The same considerations can be applied to the send operation.

The capability of a GPU to control the NIC activity from a CUDA kernel is called GPUDirect Async Kernel-Initiated Network (GDAKIN) communications. Assuming the use of an NVIDIA GPU and an NVIDIA NIC, it is possible to expose the NIC registers to the direct access of the GPU. In this way, a CUDA kernel can directly configure and update these registers to orchestrate a send or a receive network operation without the intervention of the CPU.
Graphic showing a GPU-centric application, with the GPU controlling the network card and packet processing without the need of the CPU.
Figure 2. GPU-centric application with the GPU controlling the network card and packet processing without the need of the CPU

DPDK is, by definition, a CPU framework. To enable GDAKIN communications, it would be necessary to move the whole control path on the GPU, which is not applicable. For this reason, this feature is enabled by creating a new NVIDIA DOCA library.
NVIDIA DOCA GPUNetIO Library

NVIDIA DOCA SDK is the new NVIDIA framework composed of drivers, libraries, tools, documentation, and example applications. These resources are needed to leverage your application with the network, security, and computation features the NVIDIA hardware can expose on host systems and DPU.

NVIDIA DOCA GPUNetIO is a new library developed on top of the NVIDIA DOCA 1.5 release to introduce the notion of a GPU device in the DOCA ecosystem (Figure 3). To facilitate the creation of a DOCA GPU-centric real-time packet processing application, DOCA GPUNetIO combines GPUDirect RDMA for data-path acceleration, smart GPU memory management, low-latency message passing techniques between CPU and GPU (through GDRCopy features) and GDAKIN communications.

This enables a CUDA kernel to directly control an NVIDIA ConnectX network card. To maximize the performance, DOCA GPUNetIO Library must be used on platforms considered GPUDirect-friendly, where the GPU and the network card are directly connected through a dedicated PCIe bridge. The DPU converged card is an example but the same topology can be realized on host systems as well.

DOCA GPUNetIO targets are GPU packet processing network applications using the Ethernet protocol to exchange packets in a network. With these applications, there is no need for a pre-synchronization phase across peers through an OOB mechanism, as for RDMA-based applications. There is also no need to assume other peers use DOCA GPUNetIO to communicate and no need to be topology-aware. In future releases, the RDMA option will be enabled to cover more use cases.

Here are the DOCA GPUNetIO features enabled in the current release:

GDAKIN communications: A CUDA kernel can invoke the CUDA device functions in the DOCA GPUNetIO Library to instruct the network card to send or receive packets.
Accurate Send Scheduling: It is possible to schedule packets’ transmission in the future according to some user-provided timestamp.
GPUDirect RDMA: Receive or send packets in contiguous fixed-size GPU memory strides without CPU memory staging copies.
Semaphores: Provide a standardized low-latency message passing protocol between CPU and GPU or between different GPU CUDA kernels.
CPU direct access to GPU memory: CPU can modify GPU memory buffers without using the CUDA memory API.

Graphic depicting NVIDIA DOCA GPUNetIO configuration requiring a GPU and CUDA drivers and libraries installed on the same platform.
Figure 3. NVIDIA DOCA GPUNetIO is a new DOCA library requiring a GPU and CUDA drivers and libraries installed on the same platform

As shown in Figure 4, the typical DOCA GPUNetIO application steps are:

Initial configuration phase on CPU
Use DOCA to identify and initialize a GPU device and a network device
Use DOCA GPUNetIO to create receive or send queues manageable from a CUDA kernel
Use DOCA Flow to determine which type of packet should land in each receive queue (for example, subset of IP addresses, TCP or UDP protocol, and so on)
Launch one or more CUDA kernels (to execute packet processing/filtering/analysis)
Runtime control and data path on GPU within CUDA kernel
Use DOCA GPUNetIO CUDA device functions to send or receive packets
Use DOCA GPUNetIO CUDA device functions to interact with the semaphores to synchronize the work with other CUDA kernels or with the CPU

Flow chart showing generic GPU packet processing pipeline data flow composed by several building blocks: receive packets in GPU memory, first staging GPU packet processing or filtering, additional GPU processing (AI inference, for example), processing output stored in the GPU memory.
Figure 4. Generic GPU packet processing pipeline data flow composed by several building blocks

The following sections present an overview of possible GPU packet processing pipeline application layouts combining DOCA GPUNetIO building blocks.

Loading