VDB works + example directory

nv-morpheus · rapids-bot · May 15, 2024 · Apr 11, 2024 · Apr 11, 2024 · Apr 11, 2024
commit 537e5ba6201ee53f2611aba321bf308540960ed1
@@ -0,0 +1,125 @@
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2023-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# DOCA Sensitive Information Detection Example
+
+## Run Milvus
+
+Download the milvus docker-compose file from the [Milvus GitHub repository]()
+
+```bash
+mkdir milvus
+cd milvus
+wget https://github.com/milvus-io/milvus/releases/download/v2.3.3/milvus-standalone-docker-compose-gpu.yml -O docker-compose.yml
+```
+
+Start Milvus
+
+```bash
+sudo docker-compose up -d
+```
+
+## Launch Triton Inference Server
+
+To serve the embedding model, we will use Triton:
+
+```bash
+cd ${MORPHEUS_ROOT}
+docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.01-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model all-MiniLM-L6-v2
+```
+
+## Populate the Milvus database
+
+```bash
+cd ${MORPHEUS_ROOT}
+
+python examples/doca/vdb_realtime/vdb.py --nic_addr=ca:00.0 --gpu_addr=17:00.0
+```
+
+## Send data to the NIC to be indexed
+
+On another machine, run the following command:
+
+```bash
+sudo python3 examples/doca/vdb_realtime/sender/send.py
+```
+
+On the original machine, wait for the "Upload rate" to match the "DOCA GPUNetIO Source rate" and then press `Ctrl+C` to stop the script. The output should look like the following
+
+```
+====Building Segment Complete!====
+Accumulated 1 rows for collection: vdb_doca
+Accumulated 2 rows for collection: vdb_doca
+Accumulated 3 rows for collection: vdb_doca
+Accumulated 1 rows for collection: vdb_doca
+Accumulated 2 rows for collection: vdb_doca
+Accumulated 3 rows for collection: vdb_doca
+Stopping pipeline. Please wait... Press Ctrl+C again to kill.
+====Stopping Pipeline====
+====Pipeline Stopped====
+DOCA GPUNetIO Source rate[Complete]: 229 pkts [04:29,  1.18s/ pkts]
+Embedding rate[Complete]: 229 pkts [05:51,  1.53s/ pkts]
+====Pipeline Complete====
+```
+
+## Query the Milvus database
+
+First, set the NeMo LLM API Key:
+
+```bash
+export NGC_API_KEY="<YOUR_NGC_API>"
+```
+
+Then install basic requirements:
+```bash
+pip install langchain
+pip install sentence-transformers
+conda env update --solver=libmamba -n morpheus --file conda/environments/dev_cuda-121_arch-x86_64.yaml --prune
+```
+
+Run the RAG example to query the Milvus database:
+
+```bash
+cd ${MORPHEUS_ROOT}
+python examples/llm/main.py --use_cpp=True --log_level=DEBUG rag pipeline --vdb_resource_name=vdb_doca --question="What is DOCA SDK?","What is DOCA GPUNetIO?","What does DOCA GPUNetIO to remove the CPU from the critical path?"
+```
+
+You should see the answer to the query in the output:
+
+```
+Pipeline complete. Received 3 responses
+Question:
+What is DOCA?
+Response:
+ DOCA is a library that provides a set of APIs for creating and managing network devices on GPUs.
+Question:
+What is the DOCA SDK?
+Response:
+ The DOCA Software Development Kit (SDK) is a software development kit that provides a set of libraries, tools, and documentation to help developers create and deploy network applications on Mellanox network adapters.
+Question:
+What does DOCA GPUNetIO to remove the CPU from the critical path?
+Response:
+ DOCA GPUNetIO enables GPU-centric solutions that remove the CPU from the critical path by providing the following features:
+   GPUDirect Async Kernel-Initiated Network (GDAKIN) communications – a CUDA kernel can invoke GPUNetIO device functions to receive or send, directly interacting with the NIC
+       CPU intervention is not needed in the application critical path
+   GPUDirect RDMA – receive packets directly into a contiguous GPU memory area
+   Semaphores – provide a standardized I/O communication protocol between the receiving entity and the CUDA kernel real-time packet processing
+   Smart memory allocation – allocate aligned GPU memory buffers exposing them to direct CPU access
+       Combination of CUDA and DPDK gpudev library (with the DOCA GPUNetIO shared library is doca-gpu.pc. However, there is no pkgconfig file for the DOCA GPUNetIO CUDA device's static library /opt/mellanox/d
+Total time: 10.61 sec
+Pipeline runtime: 4.12 sec
+```
@@ -0,0 +1,40 @@
+This is an overview of the structure of NVIDIA DOCA documentation. It walks you through DOCA's developer zone portal which contains all the information about the DOCA toolkit from NVIDIA, providing everything you need to develop BlueField-accelerated applications.
+
+The NVIDIA DOCA SDK enables developers to rapidly create applications and services on top of NVIDIA® BlueField® networking platform, leveraging industry-standard APIs. With DOCA, developers can deliver breakthrough networking, security, and storage performance by harnessing the power of NVIDIA's BlueField data-processing units (DPUs) and SuperNICs.
+
+Installation
+
+DOCA contains a runtime and development environment for both the host and as part of a BlueField device image. The full installation instructions for both can be found in the NVIDIA DOCA Installation Guide for Linux.
+Whether DOCA has been installed on the host or on the BlueField networking platform, one can find the different DOCA components under the /opt/mellanox/doca directory. These include the traditional SDK-related components (libraries, header files, etc.) as well as the DOCA samples, applications, tools and more, as described in this document.
+
+API
+
+The DOCA SDK is built around the different DOCA libraries designed to leverage the capabilities of BlueField. Under the Programming Guides section, one can find a detailed description of each DOCA library, its goals, and API. These guides document DOCA's API, aiming to help developers wishing to develop DOCA-based programs.
+The API References section holds the Doxygen-generated documentation of DOCA's official API. See NVIDIA DOCA Library APIs.
+Please note that, as explained in the NVIDIA DOCA gRPC Infrastructure User Guide, some of DOCA's libraries also support a gRPC-based API. More information about these extended programming interfaces can be found in detail in the programming guides of the respective libraries.
+Programming Guides
+DOCA programming guides provide the full picture of DOCA libraries and their APIs. Each guide includes an introduction, architecture, API overview, and other library-specific information.
+Each library's programming guide includes code snippets for achieving basic DOCA-based tasks. It is recommended to review these samples while going over the programming guide of the relevant DOCA library to learn about its API. The samples provide an implementation example of a single feature of a given DOCA library.
+For a more detailed reference of full DOCA-based programs that make use of multiple DOCA libraries, please refer to the Reference Applications.
+
+Applications
+
+Applications are a higher-level reference code than the samples and demonstrate how a full DOCA-based program can be built. In addition to the supplied source code and compilation definitions, the applications are also shipped in their compiled binary form. This is to allow users an out-of-the-box interaction with DOCA-based programs without the hassle of a developer-oriented compilation process.
+Many DOCA applications combine the functionality of more than one DOCA library and offer an example implementation for common scenarios of interest to users such as application recognition according to incoming/outgoing traffic, scanning files using the hardware RegEx acceleration, and much more.
+For more information about DOCA applications, refer to DOCA Applications.
+
+Tools
+
+Some of the DOCA libraries are shipped alongside helper tools for both runtime and development. These tools are often an extension to the library's own API and bridge the gap between the library's expected input format and the input available to the users.
+An example for one such DOCA tool is the doca_dpi_compiler, responsible for converting Suricata-based rules to their matching .cdo definition files which are then used by the DOCA DPI library.
+For more information about DOCA tools, refer to DOCA Tools.
+
+Services
+
+DOCA services are containerized DOCA-based programs that provide an end-to-end solution for a given use case. DOCA services are accessible as part of NVIDIA's container catalog (NGC) from which they can be easily deployed directly to BlueField, and sometimes also to the host.
+For more information about container-based deployment to the BlueField DPU or SmartNIC, refer to the NVIDIA BlueField DPU Container Deployment Guide.
+For more information about DOCA services, refer to the DOCA Services.
+
+Note
+
+For questions, comments, and feedback, please contact us at DOCA-Feedback@exchange.nvidia.com
@@ -0,0 +1,75 @@
+A growing number of network applications need to exercise GPU real-time packet processing in order to implement high data rate solutions: data filtering, data placement, network analysis, sensors’ signal processing, and more.
+
+One primary motivation is the high degree of parallelism that the GPU can enable to process in parallel multiple packets while offering scalability and programmability.
+
+For an overview of the basic concepts of these techniques and an initial solution based on the DPDK gpudev library, see Boosting Inline Packet Processing Using DPDK and GPUdev with GPUs.
+
+This post explains how the new NVIDIA DOCA GPUNetIO Library can overcome some of the limitations found in the previous DPDK solution, moving a step closer to GPU-centric packet processing applications.
+Introduction
+
+Real-time GPU processing of network packets is a technique useful to several different application domains, including signal processing, network security, information gathering, and input reconstruction. The goal of these applications is to realize an inline packet processing pipeline to receive packets in GPU memory (without staging copies through CPU memory); process them in parallel with one or more CUDA kernels; and then run inference, evaluate, or send over the network the result of the calculation.
+
+Typically, in this pipeline, the CPU is the intermediary because it has to synchronize network card (NIC) receive activity with the GPU processing. This wakes up the CUDA kernel as soon as new packets have been received in GPU memory. Similar considerations can be applied to the send side of the pipeline.
+Graphic showing a CPU-centric application wherein the CPU has to wake up the network card to receive packets (that will be transferred directly in GPU memory through DMA), unblock the CUDA kernel waiting for those packets to arrive in GPU to actually start the packet processing.
+Figure 1. CPU-centric application with the CPU orchestrating the GPU and network card work
+
+The Data Plane Development Kit (DPDK) framework introduced the gpudev library to provide a solution for this kind of application: receive or send using GPU memory (GPUDirect RDMA technology) in combination with low-latency CPU synchronization. For more information about different approaches to coordinating CPU and GPU activity, see Boosting Inline Packet Processing Using DPDK and GPUdev with GPUs.
+GPUDirect Async Kernel-Initiated Network communications
+
+Looking at Figure 1, it is clear that the CPU is the main bottleneck. It has too many responsibilities in synchronizing NIC and GPU tasks and managing multiple network queues. As an example, consider an application with many receive queues and incoming traffic of 100 Gbps. A CPU-centric solution would have:
+
+    CPU invoking the network function on each receive queue to receive packets in GPU memory using one or multiple CPU cores
+    CPU collecting packets’ info (packets addresses, number)
+    CPU notifying the GPU about new received packets
+    GPU processing the packets
+
+This CPU-centric approach is:
+
+    Resource consuming: To deal with high-rate network throughput (100 Gbps or more) the application may have to dedicate an entire CPU physical core to receive or send packets.
+    Not scalable: To receive or send in parallel with different queues, the application may have to use multiple CPU cores, even on systems where the total number of CPU cores may be limited to a low number (depending on the platform).
+    Platform-dependent: The same application on a low-power CPU decreases the performance.
+
+The next natural step for GPU inline packet processing applications is to remove the CPU from the critical path. Moving to a GPU-centric solution, the GPU can directly interact with the NIC to receive packets so the processing can start as soon as packets arrive in GPU memory. The same considerations can be applied to the send operation.
+
+The capability of a GPU to control the NIC activity from a CUDA kernel is called GPUDirect Async Kernel-Initiated Network (GDAKIN) communications. Assuming the use of an NVIDIA GPU and an NVIDIA NIC, it is possible to expose the NIC registers to the direct access of the GPU. In this way, a CUDA kernel can directly configure and update these registers to orchestrate a send or a receive network operation without the intervention of the CPU.
+Graphic showing a GPU-centric application, with the GPU controlling the network card and packet processing without the need of the CPU.
+Figure 2. GPU-centric application with the GPU controlling the network card and packet processing without the need of the CPU
+
+DPDK is, by definition, a CPU framework. To enable GDAKIN communications, it would be necessary to move the whole control path on the GPU, which is not applicable. For this reason, this feature is enabled by creating a new NVIDIA DOCA library.
+NVIDIA DOCA GPUNetIO Library
+
+NVIDIA DOCA SDK is the new NVIDIA framework composed of drivers, libraries, tools, documentation, and example applications. These resources are needed to leverage your application with the network, security, and computation features the NVIDIA hardware can expose on host systems and DPU.
+
+NVIDIA DOCA GPUNetIO is a new library developed on top of the NVIDIA DOCA 1.5 release to introduce the notion of a GPU device in the DOCA ecosystem (Figure 3). To facilitate the creation of a DOCA GPU-centric real-time packet processing application, DOCA GPUNetIO combines GPUDirect RDMA for data-path acceleration, smart GPU memory management, low-latency message passing techniques between CPU and GPU (through GDRCopy features) and GDAKIN communications. 
+
+This enables a CUDA kernel to directly control an NVIDIA ConnectX network card. To maximize the performance, DOCA GPUNetIO Library must be used on platforms considered GPUDirect-friendly, where the GPU and the network card are directly connected through a dedicated PCIe bridge. The DPU converged card is an example but the same topology can be realized on host systems as well.
+
+DOCA GPUNetIO targets are GPU packet processing network applications using the Ethernet protocol to exchange packets in a network. With these applications, there is no need for a pre-synchronization phase across peers through an OOB mechanism, as for RDMA-based applications. There is also no need to assume other peers use DOCA GPUNetIO to communicate and no need to be topology-aware. In future releases, the RDMA option will be enabled to cover more use cases.
+
+Here are the DOCA GPUNetIO features enabled in the current release:
+
+    GDAKIN communications: A CUDA kernel can invoke the CUDA device functions in the DOCA GPUNetIO Library to instruct the network card to send or receive packets.
+    Accurate Send Scheduling: It is possible to schedule packets’ transmission in the future according to some user-provided timestamp.
+    GPUDirect RDMA: Receive or send packets in contiguous fixed-size GPU memory strides without CPU memory staging copies.
+    Semaphores: Provide a standardized low-latency message passing protocol between CPU and GPU or between different GPU CUDA kernels.
+    CPU direct access to GPU memory: CPU can modify GPU memory buffers without using the CUDA memory API.
+
+Graphic depicting NVIDIA DOCA GPUNetIO configuration requiring a GPU and CUDA drivers and libraries installed on the same platform.
+Figure 3. NVIDIA DOCA GPUNetIO is a new DOCA library requiring a GPU and CUDA drivers and libraries installed on the same platform
+
+As shown in Figure 4, the typical DOCA GPUNetIO application steps are:
+
+    Initial configuration phase on CPU
+        Use DOCA to identify and initialize a GPU device and a network device
+        Use DOCA GPUNetIO to create receive or send queues manageable from a CUDA kernel
+        Use DOCA Flow to determine which type of packet should land in each receive queue (for example, subset of IP addresses, TCP or UDP protocol, and so on)
+        Launch one or more CUDA kernels (to execute packet processing/filtering/analysis)
+    Runtime control and data path on GPU within CUDA kernel
+        Use DOCA GPUNetIO CUDA device functions to send or receive packets
+        Use DOCA GPUNetIO CUDA device functions to interact with the semaphores to synchronize the work with other CUDA kernels or with the CPU
+
+Flow chart showing generic GPU packet processing pipeline data flow composed by several building blocks: receive packets in GPU memory, first staging GPU packet processing or filtering, additional GPU processing (AI inference, for example), processing output stored in the GPU memory.
+Figure 4. Generic GPU packet processing pipeline data flow composed by several building blocks
+
+The following sections present an overview of possible GPU packet processing pipeline application layouts combining DOCA GPUNetIO building blocks.
+