cpu-inference

Here are 35 public repositories matching this topic...

kennethleungty / Llama-2-Open-Source-LLM-CPU-Inference

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Updated Nov 6, 2023
Python

FoxNoseTech / diarize

Speaker diarization for Python — "who spoke when?" CPU-only, no API keys, Apache 2.0. ~10.8% DER on VoxConverse, 8x faster than real-time.

python audio-analysis speech-to-text speaker-recognition speech-processing speaker-diarization spectral-clustering voice-activity-detection onnx speaker-embedding diarization apache-2 rttm cpu-inference meeting-transcription who-spoke-when

Updated May 6, 2026
Python

gyunggyung / Tiny-MoA

Star

Running Mixture of Agents on CPU: LFM2.5 Brain (1.2B) + Falcon-R Reasoner (600M) + Tool Caller (90M). CPU-only, 16GB RAM. Lightweight AI Legion.

multilingual lightweight falcon agents moa uv on-device-ai cpu-inference llm llama-cpp mixture-of-agents tool-calling lfm2

Updated Feb 7, 2026
Python

laelhalawani / gguf_llama

Star

Wrapper for simplified use of Llama2 GGUF quantized models.

llama quantization cpu-inference llamacpp llama2 gguf

Updated Jan 14, 2024
Python

grctest / fastapi-gemma-translate

Star

A FastAPI server for querying Google's Gemma Translate AI models for translations

docker google cuda translate gemma ai-api fastapi cpu-inference gpu-inference translategemma gemmatranslate

Updated Apr 26, 2026
Python

HaseebKhalid1507 / VelociRAG

Star

Lightning-fast RAG for AI agents. ONNX-powered, 4-layer fusion, MCP server. No PyTorch.

Updated Apr 5, 2026
Python

lahcenkh / rag-network-docs

Star

Privacy-focused RAG chatbot for network documentation. Chat with your PDFs locally using Ollama, Chroma & LangChain. CPU-only, fully offline.

ai python3 network-programming cpu-inference vector-database-embedding rag-chatbot

Updated Sep 7, 2025
Python

mjlzz / voice-studio

Star

🎤 Voice Studio - 语音识别与合成工具箱，支持实时流式转写、CPU推理、离线模式、桌面悬浮话筒 | ASR & TTS toolkit with real-time streaming, CPU inference, offline mode, floating mic, Web UI & CLI

desktop-app python text-to-speech streaming offline tts speech-synthesis voice-recognition speech-to-text whisper asr vue3 fastapi cpu-inference

Updated Mar 24, 2026
Python

kevin046 / VibeBlade

Star

VibeDrift - Run any LLM on your own hardware. Bypass the VRAM wall with CPU/RAM inference, MOE expert offloading, and 4-bit quantization. No Cloud, no Subscription.

python inference moe quantization openai-api memory-tiering cpu-inference constrained-decoding llm sparse-inference local-ai gguf

Updated May 12, 2026
Python

shyamsridhar123 / AZR-CPU

Star

Absolute Zero Reasoning Experiments on CPU

python reasoning ai-research absolute-zero cpu-inference llm

Updated Aug 24, 2025
Python

Neuro-symbolic inference framework for edge-class hardware. Fuses INT8-quantized neural anomaly detection with formal symbolic reasoning and explainable proof trees. Sub-millisecond latency on AMD Ryzen PRO — no GPU required.

python rust anomaly-detection edge-computing explainable-ai xai proof-tree neuro-symbolic duckdb cpu-inference

Updated May 10, 2026
Python

engr-afnan786 / SLM-Math-Storyteller-RAG

Star

CPU-only AI math storyteller with RAG, SymPy verification, and coherence tracking

nlp edtech spacy knowledge-base sympy slm gradio faiss rag math-education cpu-inference llm llama-cpp retrieval-augmented-generation qwen

Updated May 16, 2026
Python

vikukumar / neuroswift

Star

NeuroSwift 1.0.0 is the world's most advanced MatMul-Free Hybrid State-Space Model (H-SSM). By integrating Dynamic Depth Scaling (DDS), Selective SSD (Mamba-2), and MLA (DeepSeek), it achieves the intelligence of the world's largest dense models with zero-latency CPU inference.

ai deep-learning neural-network transformers pytorch openai moe ssm hebbian-learning cpu-inference llm openllm claude-code

Updated Apr 9, 2026
Python

SaiVarunPappla / nanoGPT-visualizer

Star

Interactive GPT-2 inference explorer with token probability visualization, entropy curves, confidence heatmap, and sampling strategy comparison. Built on nanoGPT.

nlp machine-learning deep-learning transformers text-generation pytorch language-model gradio gpt2 nlp-visualization cpu-inference llm nanogpt interactive-ml

Updated May 17, 2026
Python

SrabanMondal / voice-assistant-v2

Star

CPU-first, turn-aware local voice assistant with multiprocessing, streaming STT→LLM→TTS, and interruption-safe orchestration.

natural-language-processing text-to-speech streaming multiprocessing concurrency speech-recognition event-driven low-latency inter-process-communication voice-assistant real-time-systems system-design wake-word-detection onnx edge-ai cpu-inference llm ollama offline-ai

Updated Apr 13, 2026
Python

raghav-potdar / LeanLLM-as-a-service

Star

Lightweight LLM API stack for local or cloud CPU deployment. OpenAI-compatible inference with llama.cpp, managed through Docker Compose with built-in monitoring, alerting, and request logging.

docker-compose grafana-dashboard observability prometheus-monitoring cpu-inference llama-cpp llm-as-a-service nginx-rate-limiting

Updated Jan 25, 2026
Python

NeuroTinkerLab / local-rag-chat-with-foundry

Star

Un sistema RAG per chattare con documenti locali usando Foundry e modelli LLM su CPU

ai cpu-inference llm local-ai document-chat rag-chatbot

Updated Oct 27, 2025
Python

cwccie / small-model-big-infra

Star

Evaluate SLMs (Phi-4-mini, Gemma-3-4B, Qwen3-3B) on infrastructure NLP tasks

nlp infrastructure benchmarking ai-research cpu-inference small-language-models

Updated Feb 22, 2026
Python

bhimanbaghel / llama-streamlit-app

Star

🤖 AI Text Completion App built with Streamlit and Llama-3.2-1B. Generate creative text completions with an intuitive web interface. GPU & CPU optimized, easy to deploy, perfect for content creation and AI experimentation.

python nlp machine-learning ai transformers text-generation webapp llama huggingface streamlit streamlit-webapp cpu-inference

Updated Jun 21, 2025
Python

MckAnissa / echo-rag-chatbot

Star

Personal project. Local RAG chatbot using Mistralv0.2/TinyLlama with TF-IDF retrieval. Streamlit interface for CPU-optimized inference without GPU requirements.

python nlp privacy ai chatbot philosophy mistral ethics conversational-ai rag streamlit cpu-inference local-llm retrieval-augmented-generation mistral-7b rag-chatbot

Updated Nov 17, 2025
Python

Improve this page

Add a description, image, and links to the cpu-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cpu-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu-inference

Here are 35 public repositories matching this topic...

kennethleungty / Llama-2-Open-Source-LLM-CPU-Inference

FoxNoseTech / diarize

gyunggyung / Tiny-MoA

laelhalawani / gguf_llama

grctest / fastapi-gemma-translate

HaseebKhalid1507 / VelociRAG

lahcenkh / rag-network-docs

mjlzz / voice-studio

kevin046 / VibeBlade

shyamsridhar123 / AZR-CPU

idkBsy / aion-core

engr-afnan786 / SLM-Math-Storyteller-RAG

vikukumar / neuroswift

SaiVarunPappla / nanoGPT-visualizer

SrabanMondal / voice-assistant-v2

raghav-potdar / LeanLLM-as-a-service

NeuroTinkerLab / local-rag-chat-with-foundry

cwccie / small-model-big-infra

bhimanbaghel / llama-streamlit-app

MckAnissa / echo-rag-chatbot

Improve this page

Add this topic to your repo