A performance-oriented reimplementation of DiffDRR with the following improvements:
- Optimized, pure PyTorch implementation (~5× faster than
DiffDRRat baseline) - Modular design (freely swap subjects, extrinsics, and intrinsics during rendering)
- Compatibility with
torch.compileand mixed precision - Extensive type hints with
jaxtyping - Standard Python package structure managed with
uv
All projective geometry is implemented internally using the standard Hartley and Zisserman pinhole camera formulation.
Note
On pytorch<2.9, torch.compile with bfloat16 is slower than eager due to a CUDA graph capture issue (see Benchmarks). Use pytorch>=2.9 (Triton ≥3.5) for best results.
To strictly install the renderer:
pip install nanodrr
To install the optional 3D visualization module:
pip install "nanodrr[scene]"
Important
- ~5× faster than
DiffDRRout of the box, without compilation (946 FPS vs 213 FPS) - ~8× faster with
torch.compileandbfloat16onpytorch>=2.9(1,650 FPS vs 213 FPS) - ~2.5× less memory than
DiffDRR(516 MB vs 1,344 MB peak reserved withbfloat16+ compile)
Mean ± std. dev. of 10 runs, 100 loops each. Benchmarked by rendering 200×200 DRRs on an NVIDIA RTX 6000 Ada (48 GB) with Python 3.12. Compile represents
torch.compile(mode="reduce-overhead", fullgraph=True). Full experiment attests/benchmark/.
To test the docs locally, run
uv run --group docs jupyter nbconvert --to markdown tutorials/*.ipynb --output-dir docs/tutorials/
uv run --group docs zensical serve
- Implement a fully optimized renderer
- Port strictly necessary modules from
DiffDRR(e.g., SE(3) utilities, loss functions, and 2D plotting) - Migrate 3D plotting functions to an optional module
- Integrate with
xvrto speed up network training and registration - Integrate with
polyposeto speed up registration - Release as
v1.0.0ofDiffDRR!