Ultra-low-latency LLM gateway with microsecond caching, dynamic routing, budgets, analytics, and forecasting.
-
Updated
Apr 2, 2026 - Go
Ultra-low-latency LLM gateway with microsecond caching, dynamic routing, budgets, analytics, and forecasting.
Official implementation of "SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching" (COLM 2025). A novel KV cache compression method that organizes cache at sentence level using semantic similarity.
This app leverages Semantic Caching to minimize inference latency and reduce API costs by reusing semantically similar prompt responses.
Semantic caching demo with real-time streaming and a cost & sizing calculator, powered by Azure Managed Redis and Azure OpenAI.
Rust Local Token Compression Proxy for coding agents, built solo for GenAI Genesis 2026. 🏆 1st Google Sustainability Hack
Evaluate how a semantic cache performs on your dataset by computing key KPIs over a threshold sweep and producing plots/CSVs:
Semantic caching for LLM responses using Redis Vector DB, LangChain, and HuggingFace embeddings, parses PDFs, generates FAQs with Groq, and serves similarity-based answers without redundant LLM calls.
LLM cost monitoring and optimization toolkit
LLMOps API Gateway in Go. Optimizes GenAI workloads with Qdrant semantic caching, Redis rate-limiting, and OpenTelemetry metrics.
Multi-agent content pipeline with LangGraph, FastAPI, and Redis semantic caching
Semantic LLM Gateway featuring intelligent prompt routing (basic MoE), L1/L2 semantic caching (Redis + pgvector), fault-tolerant model fallbacks, and real-time streaming telemetry. Built to reduce AI inference latency and optimize API compute costs.
Simple RAG implementation with semantic caching using Redis and Langchain
Public skill project for OpenAI- and Groq-aware LLM rate limiting, retries, quota handling, and provider-specific pacing.
Intelligent LLM agent cost optimization runtime.
LLM CLI built on the Ember framework — model comparison and semantic caching, written in Rust
Add a description, image, and links to the semantic-caching topic page so that developers can more easily learn about it.
To associate your repository with the semantic-caching topic, visit your repo's landing page and select "manage topics."