Add support for Host-backed GPU maps #522

dr-xjx · 2025-12-01T12:30:20Z

Introduced two new map types: BPF_MAP_TYPE_PERGPUTD_ARRAY_HOST_MAP and BPF_MAP_TYPE_GPU_ARRAY_HOST_MAP for Tegra platforms without CUDA IPC.
Updated default_trampoline.cu to handle new host-based map types in the BPF helper functions.
Created host_map_test.bpf.c and host_map_test.c to demonstrate usage of the new host-backed maps, including per-thread and shared storage.
Enhanced the build system with a Makefile and README for the new example, detailing usage and requirements.

This change improves memory management and flexibility for applications running on platforms lacking CUDA IPC support, enabling efficient data sharing between CPU and GPU.

Test Method and Cases
Prerequisites
Test Case 1: Basic Functionality
Purpose: Verify both map types work correctly
Expected Result:
shared_counter shows values for keys 0-9 (threads mod 10) perthread_counter shows per-thread call counts, execution times, and thread IDs thread_timestamp shows active thread count

Please try to use the copilot to summary your PR. You don't need to fill all info below, just it can help giving your a checklist.

Description

Fixes # (issue)

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Test A
Test B

Test Configuration:

Firmware version:
Hardware:
Toolchain:
SDK:

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
I have checked my code and corrected any misspellings

- Introduced two new map types: BPF_MAP_TYPE_PERGPUTD_ARRAY_HOST_MAP and BPF_MAP_TYPE_GPU_ARRAY_HOST_MAP for Tegra platforms without CUDA IPC. - Updated default_trampoline.cu to handle new host-based map types in the BPF helper functions. - Created host_map_test.bpf.c and host_map_test.c to demonstrate usage of the new host-backed maps, including per-thread and shared storage. - Enhanced the build system with a Makefile and README for the new example, detailing usage and requirements. This change improves memory management and flexibility for applications running on platforms lacking CUDA IPC support, enabling efficient data sharing between CPU and GPU. Test Method and Cases Prerequisites Test Case 1: Basic Functionality Purpose: Verify both map types work correctly Expected Result: shared_counter shows values for keys 0-9 (threads mod 10) perthread_counter shows per-thread call counts, execution times, and thread IDs thread_timestamp shows active thread count Signed-off-by: jingxuanxie <jingxuanxie@deeproute.ai>

yunwei37 · 2025-12-01T19:46:05Z

Thanks!

Copilot

Pull request overview

This PR adds support for host-backed GPU map types designed for Tegra platforms that lack CUDA IPC support. The implementation introduces two new map types that store data in host memory (accessible via boost::interprocess shared memory + cudaHostRegister) rather than GPU device memory (cuMemAlloc + CUDA IPC), enabling efficient CPU-GPU data sharing on platforms without IPC capabilities.

Key Changes:

Added BPF_MAP_TYPE_PERGPUTD_ARRAY_HOST_MAP (1512) for per-GPU-thread storage and BPF_MAP_TYPE_GPU_ARRAY_HOST_MAP (1513) for shared storage, both backed by host memory
Implemented memory synchronization using std::atomic_thread_fence on CPU side and membar.sys on GPU side for proper visibility guarantees
Included comprehensive test example with BPF program, userspace monitor, and CUDA application

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
runtime/include/bpftime_shm.hpp	Added new map type enumerations for host-backed GPU maps (1512, 1513) with clear documentation distinguishing IPC vs host-backed implementations
runtime/src/handler/map_handler.cpp	Extended map handler with lookup, update, delete, and iteration support for both new host-backed map types, including userspace value size calculations
runtime/src/bpf_map/gpu/nv_gpu_array_host_map.{hpp,cpp}	Implemented per-thread host-backed map with boost::interprocess shared memory and proper memory barriers for CPU-GPU synchronization
runtime/src/bpf_map/gpu/nv_gpu_shared_array_host_map.{hpp,cpp}	Implemented shared host-backed map with single-copy storage accessible by all GPU threads
runtime/src/bpf_map/gpu/nv_gpu_{shared_array,per_thread_array,ringbuf}_map.cpp	Added memory barriers to existing GPU maps for consistency with new host-backed implementations
runtime/CMakeLists.txt	Added new source files to build system with clear comments distinguishing IPC-based vs host-based implementations
attach/nv_attach_impl/trampoline/default_trampoline.cu	Extended BPF helper functions (map_lookup_elem, map_update_elem) to handle host-backed map types with appropriate memory barriers
example/gpu/host_map_test/host_map_test.bpf.c	BPF program demonstrating per-thread and shared host-backed maps with kprobe/kretprobe on CUDA kernel
example/gpu/host_map_test/host_map_test.c	Userspace program that reads and displays statistics from host-backed maps using dynamic key iteration
example/gpu/host_map_test/vec_add.cu	Simple CUDA vector addition application for triggering BPF probes and testing map functionality
example/gpu/host_map_test/Makefile	Build system for example with support for libbpf, bpftool, and CUDA compilation
example/gpu/host_map_test/README.md	Comprehensive documentation covering map types, use cases, building, running, and troubleshooting

Comments suppressed due to low confidence (4)

attach/nv_attach_impl/trampoline/default_trampoline.cu:224

real_key is used without bounds checking to compute a per-thread offset, which can cause out-of-bounds read/write in host memory. A malicious or buggy BPF program can pass a large key to this helper and obtain a pointer past the allocated map region, leading to host memory corruption when used. Add an explicit check against map_info.max_entries (and reject/return 0 on failure) before computing the offset:

auto real_key = *(uint32_t *)(uintptr_t)key;
if ((uint64_t)real_key >= (uint64_t)map_info.max_entries) {
    return 0; // or error code
}
auto offset = array_map_offset(real_key, map_info, map);

		auto offset = array_map_offset(real_key, map_info, map);

attach/nv_attach_impl/trampoline/default_trampoline.cu:276

real_key is used to compute a destination pointer without validating it against map_info.max_entries, enabling out-of-bounds writes to host memory. An attacker controlling the map key can write past the allocated buffer via simple_memcpy, corrupting adjacent host memory. Validate the key before computing the offset or performing the copy:

auto real_key = *(uint32_t *)(uintptr_t)key;
if ((uint64_t)real_key >= (uint64_t)map_info.max_entries) {
    return (uint64_t)-1; // or appropriate error handling
}
auto offset = array_map_offset(real_key, map_info, map);

		auto offset = array_map_offset(real_key, map_info, map);

attach/nv_attach_impl/trampoline/default_trampoline.cu:233

Out-of-bounds access risk: real_key is used directly in base + real_key * map_info.value_size with no check against map_info.max_entries. A crafted key can cause the helper to return a pointer outside the map buffer, leading to OOB reads/writes by callers. Add a bounds check before pointer arithmetic and return 0 on invalid keys:

auto real_key = *(uint32_t *)(uintptr_t)key;
if ((uint64_t)real_key >= (uint64_t)map_info.max_entries) {
    return 0;
}
auto base = (char *)map_info.extra_buffer;
return (uint64_t)(uintptr_t)(base + (uint64_t)real_key * map_info.value_size);

		return (uint64_t)(uintptr_t)(base + (uint64_t)real_key * map_info.value_size);

attach/nv_attach_impl/trampoline/default_trampoline.cu:286

The destination pointer dst is derived from base + real_key * map_info.value_size without validating real_key, allowing out-of-bounds writes into host memory. An attacker can pass an oversized key to corrupt memory adjacent to the map buffer via simple_memcpy. Guard this path by checking the key against map_info.max_entries before computing dst:

auto real_key = *(uint32_t *)(uintptr_t)key;
if ((uint64_t)real_key >= (uint64_t)map_info.max_entries) {
    return (uint64_t)-1; // or set an error result
}
auto base = (char *)map_info.extra_buffer;
auto dst = (void *)(uintptr_t)(base + (uint64_t)real_key * map_info.value_size);

		auto dst = (void *)(uintptr_t)(base + (uint64_t)real_key * map_info.value_size);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

runtime/src/bpf_map/gpu/nv_gpu_array_host_map.cpp

runtime/src/bpf_map/gpu/nv_gpu_shared_array_host_map.cpp

example/gpu/host_map_test/host_map_test.c

example/gpu/host_map_test/Makefile

Forsworns · 2025-12-02T02:17:18Z

Does this implementation also work on general Tesla/GeForce platform? Or it is specific to Tegra? If so I guess it would be better to distinguish them during compilation configuration.

dr-xjx · 2025-12-02T02:58:51Z

Does this implementation also work on general Tesla/GeForce platform? Or it is specific to Tegra? If so I guess it would be better to distinguish them during compilation configuration.

This is a universal solution that can be used on platforms that do not support CUDA IPC. For platforms that support CUDA IPC, BPF_MAP_TYPE_PERGPUTD_ARRAY_MAP and BPF_MAP_TYPE_GPU_ARRAY_MAP are preferred as they allocate memory on the device, enabling faster GPU access.

- Add HOST_MAP_MAX_ENTRIES variable with default value 10 - Pass HOST_MAP_MAX_ENTRIES to both BPF and C compilation commands - Enable the documented feature allowing users to customize map entries via 'make HOST_MAP_MAX_ENTRIES=N' Signed-off-by: jingxuanxie <jingxuanxie@deeproute.ai>

dr-xjx · 2025-12-03T09:45:17Z

@Officeyutong Could you please review this PR and help assess whether it can be merged?

Officeyutong · 2025-12-03T12:53:34Z

@Officeyutong Could you please review this PR and help assess whether it can be merged?

Please resolve merge conflicts after that I will review this PR

yunwei37 · 2025-12-03T16:52:33Z

runtime/src/bpf_map/gpu/nv_gpu_shared_array_map.cpp

    if (did_switch_ctx) {
        cuCtxSetCurrent(prev_ctx);
    }
+	// Memory barrier: ensure GPU data is visible to CPU


Do we need this?

For the situation where both CPU and GPU access the same variable, since some memory models are weakly consistent, to be on the safe side, I think a memory barrier operation is needed.

yunwei37 · 2025-12-03T16:54:39Z

attach/nv_attach_impl/trampoline_ptx.h

Why change so many trampoline?

Because I modified default_trampoline.cu, I need to recompile trampoline_ptx.h. I used clang18, so the changes look quite significant.

pull-request-size bot added the size/XXL label Dec 1, 2025

yunwei37 requested a review from Copilot December 1, 2025 19:46

Copilot started reviewing on behalf of yunwei37 December 1, 2025 19:46 View session

Copilot finished reviewing on behalf of yunwei37 December 1, 2025 19:49

Copilot AI reviewed Dec 1, 2025

View reviewed changes

runtime/src/bpf_map/gpu/nv_gpu_array_host_map.cpp Show resolved Hide resolved

runtime/src/bpf_map/gpu/nv_gpu_shared_array_host_map.cpp Show resolved Hide resolved

example/gpu/host_map_test/host_map_test.c Show resolved Hide resolved

example/gpu/host_map_test/Makefile Show resolved Hide resolved

dr-xjx added 2 commits December 2, 2025 11:24

Update trampoline_ptx.h

0e34818

dr-xjx force-pushed the feature/host-backed-gpu-maps branch from 149142a to 0e34818 Compare December 2, 2025 03:56

yunwei37 approved these changes Dec 3, 2025

View reviewed changes

yunwei37 requested changes Dec 3, 2025

View reviewed changes

Uh oh!

Add support for Host-backed GPU maps #522

Are you sure you want to change the base?

Add support for Host-backed GPU maps #522

Uh oh!

Conversation

dr-xjx commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

Checklist

Uh oh!

yunwei37 commented Dec 1, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Forsworns commented Dec 2, 2025

Uh oh!

dr-xjx commented Dec 2, 2025

Uh oh!

dr-xjx commented Dec 3, 2025

Uh oh!

Officeyutong commented Dec 3, 2025

Uh oh!

yunwei37 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

dr-xjx Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

yunwei37 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

dr-xjx Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dr-xjx commented Dec 1, 2025 •

edited

Loading