Skip to content

qq502233945/bpftime-super

 
 

Repository files navigation

bpftime-super: Extending eBPF Programmability and Observability to GPUs

Build and Test VM Build and test runtime DOI

bpftime-super is the first system to dynamically offload eBPF instrumentation and bytecode directly onto running GPU kernels using real-time PTX injection, significantly reducing instrumentation overhead compared to existing methods.

Installation

git clone https://github.com/eunomia-bpf/bpftime-super.git
cd bpftime-super
make release

eGPU – Extending eBPF Programmability & Observability to GPUs

eGPU is the first open‑source framework that lets you run eBPF programs inside live GPU kernels. By JIT‑translating eBPF byte‑code to NVIDIA PTX at runtime, eGPU injects ultra‑lightweight probes directly into running kernels without pausing or recompiling them. The result is micro‑second‑level visibility into kernel execution, memory transfers and heterogeneous orchestration with minimal overhead. ​


Why eGPU?

  • Traditional GPU profilers (CUPTI, NVBit, …) either interrupt kernels or impose high per‑event cost.
  • Linux eBPF offers elegant, safe instrumentation—but only for CPUs.
  • Modern AI & HPC workloads need continuous telemetry across both CPU and GPU to catch memory stalls, launch gaps, and anomalous behavior in production.

eGPU bridges that gap by marrying the flexibility of eBPF with the parallel fire‑power of GPUs.


Core capabilities

Capability How it works Benefit
Dynamic PTX injection At load‑time we JIT eBPF → PTX and patch it into the resident kernel < 1 µs probe overhead on micro‑benchmarks
Shared eBPF maps across CPU & GPU boost::managed_shared_memory exposes the same map to host threads and device code Zero‑copy metrics exchange
User‑space verifier & JIT (bpftime) All safety checks stay in user space; no root privileges required Fast iteration & lower attack surface
Hot‑swap instrumentation Add / remove probes while kernels keep running Debug live services without downtime
CXL.mem latency modelling Optional delay injection emulates tier‑2 memory Prototype far‑memory systems on today’s hardware

Project highlights

  • Low overhead: < 5 % runtime impact on memory‑bound kernels up to 128 KB access size (see Fig. 2 of the paper).
  • Open ecosystem: Works with standard eBPF tooling—clang, bpftool, bpftrace.
  • Future‑proof: Design anticipates Grace‑Hopper architectures & CXL memory pools.
@article{yang2025bpftimesuper,
      title={eGPU: Extending eBPF Programmability and Observability to GPUs}, 
      author={Yiwei Yang, Yu Tong, Yusheng Zheng, Andrew Quinn},
      year={2025},
      archivePrefix={4th Workshop on Heterogeneous Composable and Disaggregated Systems},
      primaryClass={cs.OS}
}

About

Extending eBPF Programmability and Observability to GPUs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 75.1%
  • C 15.2%
  • Makefile 4.7%
  • CMake 3.3%
  • Rust 1.1%
  • Python 0.4%
  • Other 0.2%