Skip to content

Comments

[EBPF] gpu: replace map with fixed-size array for memory usage tracking#46799

Open
pjo256 wants to merge 3 commits intoDataDog:mainfrom
pjo256:gpu-perf-map-to-array
Open

[EBPF] gpu: replace map with fixed-size array for memory usage tracking#46799
pjo256 wants to merge 3 commits intoDataDog:mainfrom
pjo256:gpu-perf-map-to-array

Conversation

@pjo256
Copy link

@pjo256 pjo256 commented Feb 23, 2026

What does this PR do?

Replaces map[memAllocType]uint64 with [memAllocTypeCount]uint64 fixed-size arrays in the GPU monitoring hot path. This affects two locations:

  • kernelSpan.avgMemoryUsage in stream.go — accessed on every CUDA synchronization event
  • memTsBuilders in aggregator.go — used when computing per-process GPU memory stats\

If new memory types are added in the future, adding a new constant before memAllocTypeCount in the memAllocType enum is all that's needed.

Motivation

memAllocType is a 4-value enum (kernel binary, global, shared, constant memory). Using a map here adds overhead: hash computation, bucket lookups, and heap allocations on every access. A fixed-size array replaces this with direct indexing.

A benchmarking script shows 6-23x speedup depending on workload (number of kernel launches between syncs), with allocations dropping from 240 B/op to 64 B/op.

Describe how you validated your changes

Existing unit tests in stream_test.go and stats_test.go already index avgMemoryUsage by memAllocType constants (e.g., span.avgMemoryUsage[sharedMemAlloc]), which works for both arrays and maps.

Wrote standalone microbenchmarks reproducing the exact access patterns of getCurrentKernelSpan and getRawStats, comparing map vs array across varying kernel launch counts.

Launches Map Array Speedup (per sync call)
1 260 ns 29 ns 9.0x
10 555 ns 42 ns 13.2x
100 3.4 μs 164 ns 20.5x
500 15.9 μs 700 ns 22.6x
1000 31.5 μs 1.4 μs 23.0x

In a production setting, serving an LLM like Qwen3.5-397B-A17B, we might expect 1500-3000+ kernel launches per forward pass and 10+ forward passes per second - so this can cumulatively save ~20+ms of agent CPU time per second.

Signed-off-by: Philip Ottesen <phiott256@gmail.com>
@pjo256 pjo256 requested a review from a team as a code owner February 23, 2026 13:13
@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@pjo256
Copy link
Author

pjo256 commented Feb 23, 2026

I have read the CLA Document and I hereby sign the CLA

@gjulianm
Copy link
Contributor

Hi @pjo256. From a first look it looks good, I'll run the full test pipeline and will make a full review later.

Thanks a lot!

Signed-off-by: Philip Ottesen <phiott256@gmail.com>
@pjo256
Copy link
Author

pjo256 commented Feb 23, 2026

@gjulianm Thanks! I've added a reno release note in the latest commit, let me know if anything else needs adjusting.

@gjulianm gjulianm added qa/done QA done before merge and regressions are covered by tests changelog/no-changelog labels Feb 24, 2026 — with Graphite App
Copy link
Contributor

Hi @pjo256, CI is green. Just one minor change, could you remove the changelog note? We don't usually write them for changes like these to avoid having a massive changelog :D

Signed-off-by: Philip Ottesen <phiott256@gmail.com>
@pjo256
Copy link
Author

pjo256 commented Feb 24, 2026

@gjulianm Done! I saw a failing release-notes check earlier and wasn't sure from the Reno docs 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/no-changelog community qa/done QA done before merge and regressions are covered by tests team/ebpf-platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants