Skip to content

s3847243/llm-cache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  llm-cache-ts

npm version license TypeScript Node Build

โšก๏ธ llm-cache-ts is a lightweight, plug-and-play semantic cache for LLM API calls โ€” written in TypeScript.
Cache responses by exact prompt or semantic similarity to save latency, tokens, and money.


โœจ Features

Category Description
โš™๏ธ Exact + Semantic Caching Stores model responses and reuses them when prompts are identical or semantically similar.
๐Ÿงฎ Cosine Similarity Search Finds similar prompts using embeddings.
โฑ TTL-aware Expiry Entries expire automatically after configurable time (applies to both exact + semantic layers).
๐Ÿงฉ Pluggable Storage Layers Swap between in-memory, Redis, or Qdrant vector DB without changing code.
๐Ÿง  Custom Embeddings Bring your own embedding function (OpenAI, local model, etc.).
๐Ÿ”„ Singleflight Protection Prevents multiple identical concurrent model calls (anti-stampede).
๐Ÿงพ Namespace Isolation Cache entries are scoped per namespace or tenant.
๐Ÿ“Š Structured Logging Emits structured events (hit_exact, hit_semantic, miss_populated, etc.).
๐Ÿงฑ Type-Safe API Full TypeScript types with minimal configuration.

๐Ÿš€ Installation

npm install llm-cache-ts
# or
yarn add llm-cache-ts

Requires Node.js โ‰ฅ 18.


๐Ÿงฐ Quick Start

import { GptCache, MemoryCacheStore, MemoryVectorStore } from "llm-cache-ts";

// Example fake embedding (you can use OpenAI, etc.)
const embeddingFn = async (text: string) => {
  const v = new Array(16).fill(0);
  for (let i = 0; i < text.length; i++) v[i % 16] += text.charCodeAt(i) % 13;
  return v.map(x => x / 100);
};

// Your model call (OpenAI, Anthropic, local, etc.)
async function callModel(req: any) {
  return { content: "MODEL:" + req.prompt };
}

const cache = new GptCache({
  cacheStore: new MemoryCacheStore(),
  vectorStore: new MemoryVectorStore(),
  embeddingFn,
  similarityThreshold: 0.85,
  ttlMs: 10_000,
  logger: console.log,
});

const req = {
  provider: "openai",
  model: "gpt-4o-mini",
  prompt: "What is tail recursion?",
  namespace: "demo",
};

// 1st call โ†’ miss (model runs)
console.log(await cache.respond(req, callModel));

// 2nd call โ†’ exact hit
console.log(await cache.respond(req, callModel));

// Paraphrase โ†’ semantic hit
console.log(await cache.respond({ ...req, prompt: "Explain tail recursion simply." }, callModel));

// After TTL โ†’ miss again (expired)
await new Promise(r => setTimeout(r, 11_000));
console.log(await cache.respond(req, callModel));

Output:

{ event: 'miss_populated' }
{ event: 'hit_exact' }
{ event: 'hit_semantic', score: 0.88 }
{ event: 'semantic_expired' }

๐Ÿงฑ Architecture Overview

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚  LLM call  โ”‚ ---> โ”‚  GptCache.respond โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚
        โ”œโ”€โ”€ 1๏ธโƒฃ Check exact cache (Redis / Memory)
        โ”‚
        โ”œโ”€โ”€ 2๏ธโƒฃ If miss โ†’ embed prompt โ†’ semantic search (Qdrant / Memory)
        โ”‚
        โ”œโ”€โ”€ 3๏ธโƒฃ If semantic match โ‰ฅ threshold โ†’ reuse response
        โ”‚
        โ””โ”€โ”€ 4๏ธโƒฃ Otherwise call model โ†’ store result in both layers

๐Ÿ”Œ Storage Adapters

1. In-Memory (default)

import { MemoryCacheStore, MemoryVectorStore } from "llm-cache-ts";

2. Redis Cache (for multi-process setups)

import { RedisCacheStore } from "llm-cache-ts";
const cacheStore = new RedisCacheStore({ url: "redis://127.0.0.1:6379" });

3. Qdrant Vector Store (persistent semantic cache)

import { QdrantVectorStore } from "llm-cache-ts";

const vectorStore = new QdrantVectorStore({
  url: "http://127.0.0.1:6333",
  vectorSize: 1536,
});
await vectorStore.ensureCollection();

๐Ÿง  Advanced Features

๐Ÿ”„ Singleflight

Ensures only one concurrent model call per key:

await Promise.all([
  cache.respond(req, callModel),
  cache.respond(req, callModel),
  cache.respond(req, callModel)
]);
// only one model call actually runs!

๐Ÿ•’ TTL (Time-to-Live)

All cache entries expire after ttlMs.
Expired entries trigger a new model call automatically.

๐Ÿงพ Logging Events

logger: (event) => {
  if (event.event) console.log(event);
}

Outputs:

{ event: 'hit_exact' }
{ event: 'hit_semantic', score: 0.91 }
{ event: 'semantic_expired' }

๐Ÿงญ Roadmap

Version Feature Status
v0.1.0 In-memory + Redis + Qdrant adapters โœ… Done
v0.2.0 SQLite (VSS) adapter for local persistence ๐Ÿงฉ Planned
v0.3.0 Stale-While-Revalidate (SWR) strategy ๐Ÿงฉ Planned
v0.4.0 Express / Next.js middleware wrappers ๐Ÿงฉ Planned
v0.5.0 Pgvector adapter + auto-tuning thresholds ๐Ÿงฉ Planned
v1.0.0 Docs site + metrics & benchmark suite ๐Ÿงฉ Planned

๐Ÿงฉ API Reference

new GptCache(options)

Option Type Description
cacheStore CacheStore Keyโ€“value cache (in-memory or Redis).
vectorStore VectorStore Semantic storage backend (Memory, Qdrant).
embeddingFn (text) => Promise<number[]> Embedding function used for semantic similarity.
similarityThreshold number Minimum cosine similarity (0โ€“1) to count as a semantic hit.
ttlMs number Time-to-live in milliseconds.
logger (event) => void Optional structured logger.

๐Ÿงช Testing Locally

npm install
npm run test

Uses Vitest for unit testing.


โš™๏ธ Development

npm run dev        # watch build (tsup)
npm run build      # build to dist/
npm run test       # run tests
npm pack           # create local tarball
npm publish --access public  # publish to npm

๐Ÿง‘โ€๐Ÿ’ป Contributing

Pull requests and issues are welcome!
If youโ€™d like to add a new adapter (e.g. SQLite-VSS, pgvector, or LanceDB):

  1. Implement the VectorStore interface.
  2. Add tests in test/stores/.
  3. Open a PR ๐Ÿš€

๐Ÿ“„ License

MIT ยฉ Hibbaan


โญ๏ธ Support & Inspiration

If you like this project:

  • ๐ŸŒŸ Star it on GitHub
  • ๐Ÿงฉ Use it in your LLM projects
  • ๐Ÿ’ฌ Share feedback โ€” PRs and issues are welcome!

โ€œCache smarter, not harder โ€” reuse your intelligence.โ€

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published