run DeepSeek-R1 GGUFs on KTransformers
-
Updated
Mar 3, 2025 - Python
run DeepSeek-R1 GGUFs on KTransformers
LvLLM is a special NUMA extension of vllm that makes full use of CPU and memory resources, reduces GPU memory requirements, and features an efficient GPU parallel and NUMA parallel architecture, supporting hybrid inference for MOE large models.
清华大学 KTransformers Docker Image Build Tool
Add a description, image, and links to the ktransformers topic page so that developers can more easily learn about it.
To associate your repository with the ktransformers topic, visit your repo's landing page and select "manage topics."