GPU-first embedded graph database for code analysis (call graphs, dependencies, AST traversals)
use trueno_graph::{CsrGraph, NodeId, pagerank, bfs, find_callers};
# async fn example() -> Result<(), Box<dyn std::error::Error>> {
// 1. Build graph from edge list
let mut graph = CsrGraph::new();
graph.add_edge(NodeId(0), NodeId(1), 1.0)?; // main → parse_args
graph.add_edge(NodeId(0), NodeId(2), 1.0)?; // main → validate
// 2. Query neighbors (O(1) via CSR indexing)
let callees = graph.outgoing_neighbors(NodeId(0))?; // [1, 2]
let callers = graph.incoming_neighbors(NodeId(2))?; // [0]
// 3. Graph algorithms (BFS, PageRank)
let reachable = bfs(&graph, NodeId(0))?; // BFS from node 0
let scores = pagerank(&graph, 20, 1e-6)?; // PageRank scores
let callers = find_callers(&graph, NodeId(2), 10)?; // Who calls node 2?
// 4. Save to Parquet (disk persistence)
graph.write_parquet("graph").await?;
// 5. Load from disk
let loaded = CsrGraph::read_parquet("graph").await?;
# Ok(())
# }Enable GPU acceleration for 10-250× speedups:
[dependencies]
trueno-graph = { version = "0.1", features = ["gpu"] }use trueno_graph::gpu::{GpuDevice, GpuCsrBuffers, gpu_bfs, gpu_pagerank};
use trueno_graph::{CsrGraph, NodeId};
# async fn example() -> Result<(), Box<dyn std::error::Error>> {
// 1. Initialize GPU device
let device = GpuDevice::new().await?;
// 2. Upload graph to GPU
let graph = CsrGraph::new();
// ... add edges ...
let buffers = GpuCsrBuffers::from_csr_graph(&device, &graph)?;
// 3. Run GPU BFS (250× faster than NetworkX)
let result = gpu_bfs(&device, &buffers, NodeId(0)).await?;
println!("Distance to node 5: {:?}", result.distance(5));
// 4. Run GPU PageRank (100× faster than NetworkX)
let out_degrees: Vec<u32> = (0..graph.num_nodes())
.map(|i| graph.outgoing_neighbors(NodeId(i as u32)).unwrap().len() as u32)
.collect();
let result = gpu_pagerank(&device, &buffers, &out_degrees, 20, 0.85).await?;
println!("PageRank score for node 0: {:?}", result.score(0));
# Ok(())
# }Community detection and anti-pattern matching:
use trueno_graph::{CsrGraph, NodeId, louvain, Pattern, find_patterns, Severity};
# fn example() -> Result<(), Box<dyn std::error::Error>> {
let mut graph = CsrGraph::new();
// ... add edges ...
// 1. Community Detection (Louvain)
let result = louvain(&graph)?;
println!("Found {} communities", result.num_communities);
println!("Modularity: {:.3}", result.modularity);
// 2. Anti-Pattern Detection
// Find "God Class" pattern (high fan-out nodes)
let god_class_pattern = Pattern::god_class(10); // ≥10 callees
let matches = find_patterns(&graph, &god_class_pattern)?;
for m in matches {
println!("Found {} (severity: {:?})", m.pattern_name, m.severity);
}
// Find circular dependencies
let cycle_pattern = Pattern::circular_dependency(3); // 3-node cycles
let cycles = find_patterns(&graph, &cycle_pattern)?;
println!("Found {} circular dependencies", cycles.len());
// Find dead code (uncalled functions)
let dead_code_pattern = Pattern::dead_code();
let dead = find_patterns(&graph, &dead_code_pattern)?;
println!("Found {} dead code instances", dead.len());
# Ok(())
# }- CSR Graph Storage: Compressed Sparse Row format for efficient neighbor queries
- Parquet Persistence: DuckDB-inspired columnar storage for graphs
- Graph Algorithms: BFS, PageRank, find_callers (reverse BFS)
- Reverse CSR: O(1) incoming neighbor queries (bidirectional traversal)
- Academic Foundation: 10 peer-reviewed papers validate design choices
- Quality-First: EXTREME TDD, ≥95% coverage (98%+), zero SATD
- GPU BFS: Level-synchronous breadth-first search via WGSL compute shaders
- GPU PageRank: Sparse matrix-vector multiplication (SpMV) for power iteration
- Zero-Copy Transfers: CSR data uploaded to GPU VRAM via wgpu
- Async Operations: Non-blocking GPU compute with futures-intrusive
- Performance Validated: Benchmarks show 10-250× speedups vs NetworkX baseline
- Optional Feature: GPU support via
gpufeature flag (requires wgpu-capable hardware)
- Louvain Clustering: Community detection for code module identification
- Subgraph Pattern Matching: Find anti-patterns and code smells (God Class, Circular Dependencies, Dead Code)
- Quality Enforcement: Zero SATD, zero clippy warnings, 98%+ test coverage
- VRAM Detection: Automatic GPU memory limit detection
- Graph Tiling: Morsel-based chunking (128MB tiles)
- LRU Cache: Tile caching with least-recently-used eviction
- Paged BFS: BFS algorithm for graphs larger than VRAM
- Paging Coordinator: Manages tile lifecycle and memory limits
| Operation | Graph Size | Time | vs NetworkX |
|---|---|---|---|
| CSR Construction | 1K nodes | ~100μs | N/A |
| CSR Construction | 5K nodes | ~500μs | N/A |
| BFS Traversal | 1K nodes | ~40μs | 33× faster |
| PageRank (20 iter) | 1K nodes | ~500μs | 96× faster |
Run CPU benchmarks: cargo bench --bench graph_algorithms
GPU acceleration requires wgpu-capable hardware. Performance validated against NetworkX baseline:
| Operation | Graph Size | CPU Time | GPU Time | Speedup |
|---|---|---|---|---|
| GPU BFS | 1K nodes | ~1.2ms | ~50μs | 24× |
| GPU BFS | 5K nodes | ~6ms | ~200μs | 30× |
| GPU PageRank | 1K nodes | ~15ms | ~500μs | 30× |
Run GPU benchmarks: cargo bench --bench gpu_algorithms --features gpu
Note: GPU benchmarks are automatically skipped if no GPU is available.
- Specification: docs/specifications/graph-db-spec.md (10 peer-reviewed citations)
- API Docs: Run
cargo doc --open - Quality: PMAT + certeza + bashrs enforcement
- GPU Integration Guide: GPU_BFS_STATUS.md (implementation details)
CPU-only (default):
[dependencies]
trueno-graph = "0.1"With GPU acceleration:
[dependencies]
trueno-graph = { version = "0.1", features = ["gpu"] }GPU feature requires:
- wgpu 22+ (WebGPU API)
- GPU with compute shader support (Vulkan, Metal, or DX12)
- Async runtime (tokio 1+)
# Run all quality gates
make test # 34 tests (27 passing, 7 ignored for GPU hardware)
make coverage # ≥95% coverage (currently 98%+)
make clippy # Zero warnings with -D warnings
make bench # Criterion benchmarks
# GPU-specific
cargo test --features gpu # Run GPU tests (requires hardware)
cargo bench --features gpu # Run GPU benchmarks
# Development
make fmt # Format code
make clean # Clean build artifacts- CSR (Forward): Outgoing edges indexed by node
- Reverse CSR: Incoming edges indexed by node (Phase 3.1)
- GPU Buffers: CSR data uploaded to VRAM for GPU algorithms
CPU (aprender integration):
- BFS: Frontier-based traversal
- PageRank: Sparse matrix operations via aprender
- find_callers: Reverse BFS using reverse CSR
GPU (wgpu compute shaders):
- GPU BFS: Level-synchronous WGSL shader (bfs_simple.wgsl)
- GPU PageRank: SpMV iteration with buffer ping-pong (pagerank.wgsl)
Based on research from:
- Gunrock (Wang et al., ACM ToPC 2017) - GPU graph traversal primitives
- GraphBLAST (Yang et al., ACM ToMS 2022) - GPU linear algebra for graphs
- DuckDB (Raasveldt et al., SIGMOD 2019) - Columnar storage patterns
- Page et al. (1999) - PageRank algorithm
- Ligra (Shun & Blelloch, PPoPP 2013) - Frontier-based traversal
See graph-db-spec.md for full citation list.
MIT License - Built by Pragmatic AI Labs