Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
-
Updated
Apr 8, 2025 - C++
Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
A high-performance inference system for large language models, designed for production environments.
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
Run generative AI models in sophgo BM1684X
Explore LLM model deployment based on AXera's AI chips
llama.cpp 🦙 LLM inference in TypeScript
Add a description, image, and links to the llama3 topic page so that developers can more easily learn about it.
To associate your repository with the llama3 topic, visit your repo's landing page and select "manage topics."