CUDA VMM API Benchmark

A lightweight CLI tool to measure CPU-side latency of CUDA Driver Virtual Memory Management (VMM) APIs, including:

cuMemGetAllocationGranularity
cuMemAddressReserve / cuMemAddressFree
cuMemCreate / cuMemRelease
cuMemMap / cuMemUnmap
cuMemSetAccess

Outputs per-API statistics per size: avg, p50, p95. Console shows a table; optional CSV appends records.

Requirements

Linux with NVIDIA driver (providing libcuda.so).
CUDA Toolkit headers (cuda.h) for build. If you don’t have nvcc, the fallback g++ build links -lcuda and still requires headers.
CUDA 11+ recommended; device must support VMM.

Build

Preferred: nvcc (if available)

cd /data/workspace/vmm
make

If nvcc is not available, use g++ fallback (headers and library required):

make gpp
# If headers/libraries are not in default paths:
make gpp INCLUDES="-I/usr/local/cuda/include" LDFLAGS="-L/usr/local/cuda/lib64 -lcuda"

You can inspect detected CUDA paths:

make print-vars

Run

./bin/vmm_bench [options]

Console prints a table like:

API                           |     avg (us) |     p50 (us) |     p95 (us)
------------------------------+--------------+--------------+--------------
cuMemGetAllocationGranularity |        0.030 |        0.030 |        0.030
cuMemAddressReserve           |        0.210 |        0.210 |        0.230
...

CLI Options

--device N Select CUDA device id (default: 0)
--iters I Iterations per test (default: 100)
--warmup W Warmup iterations per API per size (default: 10)
--threads T Concurrent threads for multi-thread benchmark (default: 1)
--min-granularity Use minimum granularity (default: recommended granularity)
--csv path Append CSV output to path (created if not exists)
--sizes list Comma-separated sizes. Supports k/m/g suffix, e.g. 1m,8m,64m
--help Show usage

Examples

Single-thread, default recommended granularity:

./bin/vmm_bench --iters 200 --warmup 20 --sizes 1m,8m,64m --csv results.csv

Use minimum granularity:

./bin/vmm_bench --min-granularity --sizes 2m,32m --iters 50

Multi-thread (4 threads):

./bin/vmm_bench --threads 4 --iters 100 --warmup 10 --sizes 2m,8m,32m --csv mt_results.csv

CSV Format

CSV columns: device,size_bytes,api,avg_us,p50_us,p95_us

Time unit is microseconds (us) in the current version.
Records are appended; header written once when file is created.

Sizes and Granularity

Sizes are automatically aligned to the selected granularity.
Granularity is derived from cuMemGetAllocationGranularity using either RECOMMENDED (default) or MINIMUM (with --min-granularity).
VMM constraints require mapping sizes/addresses/offsets to be multiples of the minimum granularity.

Concurrency Notes

The tool uses the device primary context; each thread calls cuCtxSetCurrent(ctx) before VMM operations.
Multi-threaded VMM API calls may contend inside the driver; expect distributions to widen. Consider increasing warmup and iterations for stable stats.
If you need strict isolation, a future variant can create per-thread independent contexts.

Troubleshooting

nvcc: No such file or directory
- Install CUDA Toolkit or use g++ fallback: make gpp
- Ensure headers in CUDA_INCLUDE_DIR (e.g., /usr/local/cuda/include) and libraries in CUDA_LIB_DIR (e.g., /usr/local/cuda/lib64).
fatal error: cuda.h: No such file or directory
- Install CUDA Toolkit headers or pass include path: INCLUDES="-I/usr/local/cuda/include"
CUDA Driver API error 201: invalid device context
- Occurs if destroying a primary context incorrectly. The tool retains and releases the primary context correctly via cuDevicePrimaryCtxRetain/Release.
Runtime cannot find libcuda.so
- Ensure NVIDIA driver is installed; set LD_LIBRARY_PATH to include the driver library directory.

Notes

This tool measures CPU-side latency of API calls, not kernel execution time.
cuMemMap requires cuMemSetAccess to make memory accessible; both calls are measured separately.
cuMemUnmap must unmap an entire previously mapped contiguous range.

License

MIT (or your preferred license)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.clang-format		.clang-format
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA VMM API Benchmark

Requirements

Build

Run

CLI Options

Examples

CSV Format

Sizes and Granularity

Concurrency Notes

Troubleshooting

Notes

License

About

Uh oh!

Releases

Packages

Languages

staryxchen/cuda_vmm_api_bench

Folders and files

Latest commit

History

Repository files navigation

CUDA VMM API Benchmark

Requirements

Build

Run

CLI Options

Examples

CSV Format

Sizes and Granularity

Concurrency Notes

Troubleshooting

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages