🧠 llm-cache-ts

⚡️ llm-cache-ts is a lightweight, plug-and-play semantic cache for LLM API calls — written in TypeScript.
Cache responses by exact prompt or semantic similarity to save latency, tokens, and money.

✨ Features

Category	Description
⚙️ Exact + Semantic Caching	Stores model responses and reuses them when prompts are identical or semantically similar.
🧮 Cosine Similarity Search	Finds similar prompts using embeddings.
⏱ TTL-aware Expiry	Entries expire automatically after configurable time (applies to both exact + semantic layers).
🧩 Pluggable Storage Layers	Swap between in-memory, Redis, or Qdrant vector DB without changing code.
🧠 Custom Embeddings	Bring your own embedding function (OpenAI, local model, etc.).
🔄 Singleflight Protection	Prevents multiple identical concurrent model calls (anti-stampede).
🧾 Namespace Isolation	Cache entries are scoped per namespace or tenant.
📊 Structured Logging	Emits structured events (`hit_exact`, `hit_semantic`, `miss_populated`, etc.).
🧱 Type-Safe API	Full TypeScript types with minimal configuration.

🚀 Installation

npm install llm-cache-ts
# or
yarn add llm-cache-ts

Requires Node.js ≥ 18.

🧰 Quick Start

import { GptCache, MemoryCacheStore, MemoryVectorStore } from "llm-cache-ts";

// Example fake embedding (you can use OpenAI, etc.)
const embeddingFn = async (text: string) => {
  const v = new Array(16).fill(0);
  for (let i = 0; i < text.length; i++) v[i % 16] += text.charCodeAt(i) % 13;
  return v.map(x => x / 100);
};

// Your model call (OpenAI, Anthropic, local, etc.)
async function callModel(req: any) {
  return { content: "MODEL:" + req.prompt };
}

const cache = new GptCache({
  cacheStore: new MemoryCacheStore(),
  vectorStore: new MemoryVectorStore(),
  embeddingFn,
  similarityThreshold: 0.85,
  ttlMs: 10_000,
  logger: console.log,
});

const req = {
  provider: "openai",
  model: "gpt-4o-mini",
  prompt: "What is tail recursion?",
  namespace: "demo",
};

// 1st call → miss (model runs)
console.log(await cache.respond(req, callModel));

// 2nd call → exact hit
console.log(await cache.respond(req, callModel));

// Paraphrase → semantic hit
console.log(await cache.respond({ ...req, prompt: "Explain tail recursion simply." }, callModel));

// After TTL → miss again (expired)
await new Promise(r => setTimeout(r, 11_000));
console.log(await cache.respond(req, callModel));

Output:

{ event: 'miss_populated' }
{ event: 'hit_exact' }
{ event: 'hit_semantic', score: 0.88 }
{ event: 'semantic_expired' }

🧱 Architecture Overview

 ┌────────────┐      ┌──────────────────┐
 │  LLM call  │ ---> │  GptCache.respond │
 └────────────┘      └──────────────────┘
        │
        ├── 1️⃣ Check exact cache (Redis / Memory)
        │
        ├── 2️⃣ If miss → embed prompt → semantic search (Qdrant / Memory)
        │
        ├── 3️⃣ If semantic match ≥ threshold → reuse response
        │
        └── 4️⃣ Otherwise call model → store result in both layers

🔌 Storage Adapters

1. In-Memory (default)

import { MemoryCacheStore, MemoryVectorStore } from "llm-cache-ts";

2. Redis Cache (for multi-process setups)

import { RedisCacheStore } from "llm-cache-ts";
const cacheStore = new RedisCacheStore({ url: "redis://127.0.0.1:6379" });

3. Qdrant Vector Store (persistent semantic cache)

import { QdrantVectorStore } from "llm-cache-ts";

const vectorStore = new QdrantVectorStore({
  url: "http://127.0.0.1:6333",
  vectorSize: 1536,
});
await vectorStore.ensureCollection();

🧠 Advanced Features

🔄 Singleflight

Ensures only one concurrent model call per key:

await Promise.all([
  cache.respond(req, callModel),
  cache.respond(req, callModel),
  cache.respond(req, callModel)
]);
// only one model call actually runs!

🕒 TTL (Time-to-Live)

All cache entries expire after ttlMs.
Expired entries trigger a new model call automatically.

🧾 Logging Events

logger: (event) => {
  if (event.event) console.log(event);
}

Outputs:

{ event: 'hit_exact' }
{ event: 'hit_semantic', score: 0.91 }
{ event: 'semantic_expired' }

🧭 Roadmap

Version	Feature	Status
v0.1.0	In-memory + Redis + Qdrant adapters	✅ Done
v0.2.0	SQLite (VSS) adapter for local persistence	🧩 Planned
v0.3.0	Stale-While-Revalidate (SWR) strategy	🧩 Planned
v0.4.0	Express / Next.js middleware wrappers	🧩 Planned
v0.5.0	Pgvector adapter + auto-tuning thresholds	🧩 Planned
v1.0.0	Docs site + metrics & benchmark suite	🧩 Planned

🧩 API Reference

`new GptCache(options)`

Option	Type	Description
`cacheStore`	`CacheStore`	Key–value cache (in-memory or Redis).
`vectorStore`	`VectorStore`	Semantic storage backend (Memory, Qdrant).
`embeddingFn`	`(text) => Promise<number[]>`	Embedding function used for semantic similarity.
`similarityThreshold`	`number`	Minimum cosine similarity (0–1) to count as a semantic hit.
`ttlMs`	`number`	Time-to-live in milliseconds.
`logger`	`(event) => void`	Optional structured logger.

🧪 Testing Locally

npm install
npm run test

Uses Vitest for unit testing.

⚙️ Development

npm run dev        # watch build (tsup)
npm run build      # build to dist/
npm run test       # run tests
npm pack           # create local tarball
npm publish --access public  # publish to npm

🧑‍💻 Contributing

Pull requests and issues are welcome!
If you’d like to add a new adapter (e.g. SQLite-VSS, pgvector, or LanceDB):

Implement the VectorStore interface.
Add tests in test/stores/.
Open a PR 🚀

📄 License

MIT © Hibbaan

⭐️ Support & Inspiration

If you like this project:

🌟 Star it on GitHub
🧩 Use it in your LLM projects
💬 Share feedback — PRs and issues are welcome!

“Cache smarter, not harder — reuse your intelligence.”

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 llm-cache-ts

✨ Features

🚀 Installation

🧰 Quick Start

🧱 Architecture Overview

🔌 Storage Adapters

1. In-Memory (default)

2. Redis Cache (for multi-process setups)

3. Qdrant Vector Store (persistent semantic cache)

🧠 Advanced Features

🔄 Singleflight

🕒 TTL (Time-to-Live)

🧾 Logging Events

🧭 Roadmap

🧩 API Reference

`new GptCache(options)`

🧪 Testing Locally

⚙️ Development

🧑‍💻 Contributing

📄 License

⭐️ Support & Inspiration

About

Uh oh!

Releases

Packages

Languages

License

s3847243/llm-cache

Folders and files

Latest commit

History

Repository files navigation

🧠 llm-cache-ts

✨ Features

🚀 Installation

🧰 Quick Start

🧱 Architecture Overview

🔌 Storage Adapters

1. In-Memory (default)

2. Redis Cache (for multi-process setups)

3. Qdrant Vector Store (persistent semantic cache)

🧠 Advanced Features

🔄 Singleflight

🕒 TTL (Time-to-Live)

🧾 Logging Events

🧭 Roadmap

🧩 API Reference

new GptCache(options)

🧪 Testing Locally

⚙️ Development

🧑‍💻 Contributing

📄 License

⭐️ Support & Inspiration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`new GptCache(options)`

Packages