โก๏ธ llm-cache-ts is a lightweight, plug-and-play semantic cache for LLM API calls โ written in TypeScript.
Cache responses by exact prompt or semantic similarity to save latency, tokens, and money.
| Category | Description |
|---|---|
| โ๏ธ Exact + Semantic Caching | Stores model responses and reuses them when prompts are identical or semantically similar. |
| ๐งฎ Cosine Similarity Search | Finds similar prompts using embeddings. |
| โฑ TTL-aware Expiry | Entries expire automatically after configurable time (applies to both exact + semantic layers). |
| ๐งฉ Pluggable Storage Layers | Swap between in-memory, Redis, or Qdrant vector DB without changing code. |
| ๐ง Custom Embeddings | Bring your own embedding function (OpenAI, local model, etc.). |
| ๐ Singleflight Protection | Prevents multiple identical concurrent model calls (anti-stampede). |
| ๐งพ Namespace Isolation | Cache entries are scoped per namespace or tenant. |
| ๐ Structured Logging | Emits structured events (hit_exact, hit_semantic, miss_populated, etc.). |
| ๐งฑ Type-Safe API | Full TypeScript types with minimal configuration. |
npm install llm-cache-ts
# or
yarn add llm-cache-tsRequires Node.js โฅ 18.
import { GptCache, MemoryCacheStore, MemoryVectorStore } from "llm-cache-ts";
// Example fake embedding (you can use OpenAI, etc.)
const embeddingFn = async (text: string) => {
const v = new Array(16).fill(0);
for (let i = 0; i < text.length; i++) v[i % 16] += text.charCodeAt(i) % 13;
return v.map(x => x / 100);
};
// Your model call (OpenAI, Anthropic, local, etc.)
async function callModel(req: any) {
return { content: "MODEL:" + req.prompt };
}
const cache = new GptCache({
cacheStore: new MemoryCacheStore(),
vectorStore: new MemoryVectorStore(),
embeddingFn,
similarityThreshold: 0.85,
ttlMs: 10_000,
logger: console.log,
});
const req = {
provider: "openai",
model: "gpt-4o-mini",
prompt: "What is tail recursion?",
namespace: "demo",
};
// 1st call โ miss (model runs)
console.log(await cache.respond(req, callModel));
// 2nd call โ exact hit
console.log(await cache.respond(req, callModel));
// Paraphrase โ semantic hit
console.log(await cache.respond({ ...req, prompt: "Explain tail recursion simply." }, callModel));
// After TTL โ miss again (expired)
await new Promise(r => setTimeout(r, 11_000));
console.log(await cache.respond(req, callModel));Output:
{ event: 'miss_populated' }
{ event: 'hit_exact' }
{ event: 'hit_semantic', score: 0.88 }
{ event: 'semantic_expired' }
โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ LLM call โ ---> โ GptCache.respond โ
โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ
โโโ 1๏ธโฃ Check exact cache (Redis / Memory)
โ
โโโ 2๏ธโฃ If miss โ embed prompt โ semantic search (Qdrant / Memory)
โ
โโโ 3๏ธโฃ If semantic match โฅ threshold โ reuse response
โ
โโโ 4๏ธโฃ Otherwise call model โ store result in both layers
import { MemoryCacheStore, MemoryVectorStore } from "llm-cache-ts";import { RedisCacheStore } from "llm-cache-ts";
const cacheStore = new RedisCacheStore({ url: "redis://127.0.0.1:6379" });import { QdrantVectorStore } from "llm-cache-ts";
const vectorStore = new QdrantVectorStore({
url: "http://127.0.0.1:6333",
vectorSize: 1536,
});
await vectorStore.ensureCollection();Ensures only one concurrent model call per key:
await Promise.all([
cache.respond(req, callModel),
cache.respond(req, callModel),
cache.respond(req, callModel)
]);
// only one model call actually runs!All cache entries expire after ttlMs.
Expired entries trigger a new model call automatically.
logger: (event) => {
if (event.event) console.log(event);
}Outputs:
{ event: 'hit_exact' }
{ event: 'hit_semantic', score: 0.91 }
{ event: 'semantic_expired' }
| Version | Feature | Status |
|---|---|---|
| v0.1.0 | In-memory + Redis + Qdrant adapters | โ Done |
| v0.2.0 | SQLite (VSS) adapter for local persistence | ๐งฉ Planned |
| v0.3.0 | Stale-While-Revalidate (SWR) strategy | ๐งฉ Planned |
| v0.4.0 | Express / Next.js middleware wrappers | ๐งฉ Planned |
| v0.5.0 | Pgvector adapter + auto-tuning thresholds | ๐งฉ Planned |
| v1.0.0 | Docs site + metrics & benchmark suite | ๐งฉ Planned |
| Option | Type | Description |
|---|---|---|
cacheStore |
CacheStore |
Keyโvalue cache (in-memory or Redis). |
vectorStore |
VectorStore |
Semantic storage backend (Memory, Qdrant). |
embeddingFn |
(text) => Promise<number[]> |
Embedding function used for semantic similarity. |
similarityThreshold |
number |
Minimum cosine similarity (0โ1) to count as a semantic hit. |
ttlMs |
number |
Time-to-live in milliseconds. |
logger |
(event) => void |
Optional structured logger. |
npm install
npm run testUses Vitest for unit testing.
npm run dev # watch build (tsup)
npm run build # build to dist/
npm run test # run tests
npm pack # create local tarball
npm publish --access public # publish to npmPull requests and issues are welcome!
If youโd like to add a new adapter (e.g. SQLite-VSS, pgvector, or LanceDB):
- Implement the
VectorStoreinterface. - Add tests in
test/stores/. - Open a PR ๐
MIT ยฉ Hibbaan
If you like this project:
- ๐ Star it on GitHub
- ๐งฉ Use it in your LLM projects
- ๐ฌ Share feedback โ PRs and issues are welcome!
โCache smarter, not harder โ reuse your intelligence.โ