infinite-memory

Infinite context windows for Claude via OpenMemory semantic retrieval

Drop-in replacement for Anthropic's AI SDK provider that automatically manages infinite conversation context using OpenMemory for semantic storage and retrieval.

Features

🎯 Truly infinite context - Never lose conversation history, no matter how long
🧠 Smart retrieval - Semantic search finds relevant context from thousands of messages
🔄 Transparent operation - Drop-in replacement for @ai-sdk/anthropic
⚡ Token-aware - Automatically fits context under model limits (200k for Sonnet 4)
💾 Automatic storage - Messages stored in OpenMemory with zero configuration
🛡️ Resilient - Falls back to recent messages if OpenMemory is unavailable
🔧 Zero config - Just provide conversationId and userId

Installation

npm install infinite-memory

Prerequisites

You need an OpenMemory server running. See OpenMemory Quick Start for setup.

Quick Start

import { createInfiniteMemory } from 'infinite-memory';
import { streamText } from 'ai';

// Create the infinite memory provider
const memory = createInfiniteMemory({
  openMemoryUrl: 'http://localhost:8080',
  openMemoryApiKey: process.env.OPENMEMORY_API_KEY!,
  anthropicApiKey: process.env.ANTHROPIC_API_KEY!,
});

// Create a model with conversation context
const model = memory('claude-sonnet-4', {
  conversationId: 'conv_123',
  userId: 'user_456'
});

// Use it like any AI SDK model - infinite memory happens automatically
const result = await streamText({
  model,
  messages: [
    { role: 'user', content: 'What did we discuss 100 messages ago?' }
  ],
});

// Stream the response
for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

That's it! The model will:

Query OpenMemory for relevant historical context
Combine with recent messages
Stay under token budget
Store the conversation automatically

How It Works

Hybrid Retrieval Strategy

For each request, Infinite Memory:

Always includes the last 3-5 messages (chronological context)
Queries OpenMemory for semantically relevant older messages
Scores and ranks by relevance + recency
Fills token budget (50% of model limit, e.g., 100k for Sonnet 4)
Deduplicates to avoid sending messages twice

Automatic Storage

After each request:

User message → Stored with full JSON structure
Assistant response → Stored after completion (streaming supported)
Metadata: conversationId, userId, role, timestamp

Fallback Behavior

If OpenMemory is slow or unavailable:

Falls back to recent messages only
Ensures messages fit under context window
Chat continues without interruption

API Reference

`createInfiniteMemory(config)`

Creates an infinite memory provider.

const memory = createInfiniteMemory({
  openMemoryUrl: string;           // OpenMemory server URL
  openMemoryApiKey: string;        // OpenMemory API key
  anthropicApiKey: string;         // Anthropic API key
  openMemoryTimeout?: number;      // Query timeout in ms (default: 2000)
});

Returns a model creator function: (modelId, context) => LanguageModel

Model Creator

const model = memory(modelId, context);

Parameters:

modelId: string - Claude model ID (e.g., 'claude-sonnet-4')
context: ModelContext - Conversation scope
- conversationId: string - Unique conversation identifier
- userId: string - User identifier for scoping

Returns: LanguageModel - Compatible with all AI SDK functions

Supported Models

claude-sonnet-4 / claude-sonnet-4-20250514 (200k context)
claude-opus-4 / claude-opus-4-20250514 (200k context)
claude-haiku-3-5 / claude-haiku-3-5-20250514 (100k context)

Usage with AI SDK

Streaming

import { streamText } from 'ai';

const model = memory('claude-sonnet-4', {
  conversationId: 'conv_123',
  userId: 'user_456'
});

const result = await streamText({
  model,
  messages: [{ role: 'user', content: 'Hello!' }],
});

for await (const chunk of result.textStream) {
  console.log(chunk);
}

With Tools

import { generateText, tool } from 'ai';
import { z } from 'zod';

const result = await generateText({
  model: memory('claude-sonnet-4', { conversationId, userId }),
  messages,
  tools: {
    getWeather: tool({
      description: 'Get weather for a location',
      parameters: z.object({
        location: z.string(),
      }),
      execute: async ({ location }) => {
        return { temperature: 72, condition: 'sunny' };
      },
    }),
  },
});

Express.js Integration

import express from 'express';
import { createInfiniteMemory } from 'infinite-memory';
import { streamText } from 'ai';

const app = express();
const memory = createInfiniteMemory({ /* config */ });

app.post('/api/chat', async (req, res) => {
  const { messages, conversationId, userId } = req.body;
  
  const model = memory('claude-sonnet-4', {
    conversationId,
    userId
  });
  
  const result = await streamText({ model, messages });
  
  // Stream response back to client
  result.pipeDataStreamToResponse(res);
});

Configuration

Token Budget

By default, Infinite Memory reserves 50% of the model's context window for input:

Sonnet 4: 100k tokens for context
Opus 4: 100k tokens for context
Haiku 3.5: 50k tokens for context

This leaves room for output and system prompts.

OpenMemory Timeout

Queries timeout after 2 seconds by default. Adjust if needed:

const memory = createInfiniteMemory({
  // ...
  openMemoryTimeout: 5000, // 5 seconds
});

Architecture

Client Request
    ↓
InfiniteMemoryModel.doStream()
    ↓
ContextManager.getRelevantContext()
    ├─→ Get last 3-5 messages (recent)
    ├─→ Query OpenMemory (semantic search)
    └─→ Merge + deduplicate (under token budget)
    ↓
Anthropic API (with augmented context)
    ↓
Stream Response
    ↓
Store in OpenMemory (after completion)

Performance

OpenMemory queries: ~50-200ms (localhost)
Fallback mode: Instant (recent messages only)
Storage: Async, non-blocking
Memory overhead: Minimal (~10MB per conversation)

Debugging

Enable verbose logging by checking console output:

✨ [InfiniteMemory] Provider initialized
🎨 [InfiniteMemory] Creating model: claude-sonnet-4 (conv: conv_123, user: user_456)
🎯 [InfiniteMemory] Context budget: 100,000 tokens (model: claude-sonnet-4)
📌 [InfiniteMemory] Recent 5 messages: 1,234 tokens
🔍 [InfiniteMemory] Found 15 relevant messages
✅ [InfiniteMemory] Context built: 12 retrieved (45,678 tokens) + 5 recent = 46,912 tokens
📝 [InfiniteMemory] Stored message msg_xyz (assistant)

Contributing

Contributions are welcome! Please open an issue or PR on GitHub.

License

Built With

Vercel AI SDK - AI framework
Anthropic Claude - Language model
OpenMemory - Semantic memory engine

Made with ❤️ by Dark Research

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

infinite-memory

Features

Installation

Prerequisites

Quick Start

How It Works

Hybrid Retrieval Strategy

Automatic Storage

Fallback Behavior

API Reference

`createInfiniteMemory(config)`

Model Creator

Supported Models

Usage with AI SDK

Streaming

With Tools

Express.js Integration

Configuration

Token Budget

OpenMemory Timeout

Architecture

Performance

Debugging

Contributing

License

Built With

About

Uh oh!

Releases

Packages

Languages

License

darkresearch/infinite-memory

Folders and files

Latest commit

History

Repository files navigation

infinite-memory

Features

Installation

Prerequisites

Quick Start

How It Works

Hybrid Retrieval Strategy

Automatic Storage

Fallback Behavior

API Reference

createInfiniteMemory(config)

Model Creator

Supported Models

Usage with AI SDK

Streaming

With Tools

Express.js Integration

Configuration

Token Budget

OpenMemory Timeout

Architecture

Performance

Debugging

Contributing

License

Built With

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`createInfiniteMemory(config)`

Packages