Deploy Nemotron 3 Nano 30B with 1M context window on NVIDIA DGX Spark using llama.cpp (Blackwell sm_121, Q4_0 KV cache quantization)
cuda inference aarch64 mamba mixture-of-experts blackwell long-context llama-cpp local-llm gguf kv-cache-quantization nemotron nvidia-dgx-spark 1m-context-window
-
Updated
Mar 22, 2026 - Shell