Skip to content
#

sm120

Here are 16 public repositories matching this topic...

From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoning and concurrent sub-agents on top of the fastest single-stream decode on the 5090 (beats llama.cpp, at-or-ahead of vLLM on NVFP4). 100% written by Claude Code.

  • Updated Jun 26, 2026
  • Cuda

Improve this page

Add a description, image, and links to the sm120 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sm120 topic, visit your repo's landing page and select "manage topics."

Learn more