-
Notifications
You must be signed in to change notification settings - Fork 0
Core Concepts
This guide explains the fundamental concepts behind Fold's holographic memory system. If you're building on Fold or want to understand how it works, this is where to find the reasoning and architecture.
- What is a Memory?
- Hash-Based Storage
- Memory Sources
- Embeddings and Semantic Search
- The Knowledge Graph
- Search and Retrieval
- ACT-R Memory Decay
- A-MEM Agentic Evolution
- The Indexing Pipeline
- Example: Holographic Reconstruction
At its core, a memory is a unit of knowledge stored in Fold. It's not just a single record—it's a rich object combining content, semantics, relationships, and metadata.
{
"id": "aBcD123456789abc",
"project_id": "proj_123",
"repository_id": "repo_main",
"title": "Authentication Service",
"type": "codebase",
"source": "File",
"author": "system",
"keywords": ["auth", "jwt", "security", "validation"],
"tags": ["auth", "typescript", "security"],
"context": "This module implements JWT-based authentication...",
"file_path": "src/auth/service.ts",
"language": "typescript",
"created_at": "2026-02-03T10:30:00Z",
"updated_at": "2026-02-03T10:30:00Z",
"retrieval_count": 47,
"last_accessed": "2026-02-03T14:15:00Z"
}- ID (16-char hash): Deterministic from repo path SHA256. Same file path = same ID.
- Type: Classifies memory (codebase, session, spec, decision, commit, pr, task, general)
- Source: Where it came from (File = auto-indexed, Agent = manual/AI, Git = from webhooks)
-
Content: Stored as markdown in
fold/a/b/hash.md - Metadata: Title, author, keywords, tags, context
- Semantics: Vector embedding in Qdrant for similarity search
- Relationships: Links to other memories in the knowledge graph
- Strength: Retrieval count and last accessed for decay calculation
Fold uses path-addressed storage where the repository file path determines identity.
fold/
├── a/
│ ├── b/
│ │ ├── aBcD123456789abc.md
│ │ ├── aC12def456789abcd.md
│ └── f/
│ └── af87654321fedcba.md
├── 9/
│ └── a/
│ └── 9a8b7c6d5e4f3g2h.md
Path Format: fold/{first_hex}/{second_hex}/{full_hash}.md
The hash is the first 16 characters of the SHA256 of the repository file path.
Project slug: my-app
File path: src/auth/service.ts
Normalised input: my-app/src/auth/service.ts
SHA256(normalised_path) = aBcD123456789abcDEF123456789ABCD...
Memory ID = aBcD123456789abc (first 16 chars)
Storage path = fold/a/b/aBcD123456789abc.md
The path is normalised with forward slashes and no absolute paths, ensuring the same ID is generated regardless of which machine indexes the file.
Benefits:
- Machine-independent: Same ID across different machines and environments
- Deterministic: Same file path always produces same ID
- Stable identity: Content changes don't create new memories
- Update in place: Re-indexing updates existing memory
- Recoverable: Can rebuild database from fold/ directory
Each markdown file has YAML frontmatter + content:
---
id: aBcD123456789abc
title: Authentication Service
author: system
type: codebase
source: File
file_path: src/auth/service.ts
language: typescript
tags:
- auth
- typescript
- security
keywords:
- authenticate
- jwt
- session
context: "Implements JWT-based authentication with refresh tokens. Handles user login flows and API token validation. Uses RS256 asymmetric signing for enhanced security and refresh token rotation to prevent theft."
created_at: 2026-02-03T10:30:00Z
updated_at: 2026-02-03T10:30:00Z
related_to:
- f0123456789abcde
- 9a8b7c6d5e4f3g2h
---
# Authentication Service
This module implements JWT-based authentication with refresh tokens.
It serves as the core authentication layer for the application,
handling both user login flows and API token validation.
## Key Components
- AuthService - Main authentication service class
- validateToken() - Token validation with expiry check
- refreshToken() - Refresh token rotation
## Dependencies
- jsonwebtoken, bcryptRelated memories are linked using markdown:
## Related Memories
- [[f/0/f0123456789abcde.md|Authentication Middleware]]
- [[9/a/9a8b7c6d5e4f3g2h.md|JWT Decision Document]]This format works with Obsidian and other markdown editors that support wiki links.
Memories have a source field indicating how they were created. This replaces the old system of 8 distinct types.
Created: Automatic indexing from repositories
Example:
Memory of: src/auth/service.ts
Source: File
Type: codebase
Files are indexed from:
- GitHub webhooks (on push)
- GitLab webhooks (on push)
- Polling loop (every 5 minutes)
- Manual trigger via API
Created: Manual or AI-generated content
Examples:
Type: session
Source: Agent
→ "Today we refactored the auth flow"
Type: decision
Source: Agent
→ "We decided to use JWT instead of server sessions"
Type: spec
Source: Agent
→ "User authentication requirements..."
Type: task
Source: Agent
→ "Implement password reset functionality"
Type: general
Source: Agent
→ "Research notes on OAuth 2.0"
Stored in fold/ like any other memory, but created through the API or MCP.
Created: From git commit webhooks
Example:
Type: commit
Source: Git
Title: "Add JWT validation middleware"
Content: AI-generated summary of the commit
Automatically generated when commits are pushed via webhooks.
This is where the "semantic" in "semantic memory" comes from.
An embedding converts text into a vector (list of numbers) where similar meanings are close together and different meanings are far apart.
Text → Embedding Model → 768-dimensional Vector
"Authentication service using JWT tokens"
→ [0.234, -0.156, 0.891, ..., -0.045]
"JWT token validation in authentication"
→ [0.241, -0.148, 0.878, ..., -0.039]
Distance between vectors ≈ semantic similarity
Fold uses either Gemini or OpenAI embeddings:
| Provider | Model | Dimensions |
|---|---|---|
| Google Gemini | text-embedding-004 | 768 |
| OpenAI | text-embedding-3-small | 1536 |
The embedding model is trained on millions of text pairs where similar meanings get similar vectors.
Traditional keyword search:
Query: "How do we validate users?"
→ Find files with "validate" OR "users"
→ Get lots of unrelated results
Semantic search:
Query: "How do we validate users?"
→ Embed query
→ Find vectors closest to query vector
→ Get authentication code, login specs, decisions about auth
→ All conceptually related, not just keyword matches
Query: "Sessions expiring too fast"
Keyword search finds:
• Files with "session"
• Files with "expiring"
• Miss the decision about timeout values
• Miss the commit that changed the timeout
Semantic search finds:
• Code that handles session timeout
• Decision: "Sessions should expire after 7 days"
• Commit: "Reduced timeout for security"
• Related: Session management patterns
→ Full context, not just keywords
Beyond embeddings, Fold maintains an explicit graph where memories are connected by typed links.
Embeddings answer: "What means something similar to this?"
Graphs answer: "What relates to this in a specific way?"
Combined power:
┌─────────────────────────────────────────────────────┐
│ Embeddings: Find semantically similar memories │
│ Graphs: Understand specific relationships │
│ Together: Complete context reconstruction │
└─────────────────────────────────────────────────────┘
Related - Semantically related (found via embedding similarity)
References - One memory explicitly references another
DependsOn - One memory depends on another
Modifies - A commit/change modified another memory
Session: "Investigating session timeout"
├─ References → Decision: "Session timeout policy"
└─ References → File: "src/auth.rs"
Commit: "Reduce session timeout for security"
├─ Modifies → File: "src/auth.rs"
└─ Related → Decision: "Session timeout policy"
Decision: "Session timeout policy"
└─ Related ← Commit: "Reduce session timeout"
From any memory, you can walk the graph:
Start: "Fix session timeout bug"
↓ Related
Found: "Session timeout decision"
↓ Related
Found: "Commit that changed timeout"
↓ Modifies
Found: "src/auth.rs"
↓ Related
Found: "Similar timeout implementations elsewhere"
This is holographic reconstruction: starting from any point, you can rebuild full context.
Fold provides multiple ways to find memories.
Search by meaning:
POST /api/projects/my-app/search
{
"query": "How do we handle user authentication?",
"limit": 10
}
Response:
{
"results": [
{
"id": "aBcD123456789abc",
"type": "codebase",
"title": "Authentication Service",
"relevance": 0.95,
"snippet": "Handles JWT validation and session management..."
},
{
"id": "f0123456789abcde",
"type": "decision",
"title": "Use JWT for authentication",
"relevance": 0.91,
"snippet": "We chose JWT because..."
}
]
}For AI agents starting work on a task:
POST /api/projects/my-app/context/my-task-id
{
"depth": 2
}
Response:
{
"memory": { "id": "...", "title": "...", "content": "..." },
"related_memories": [
{
"id": "...",
"title": "...",
"type": "...",
"relationship": "Related"
}
],
"similar_memories": [
{ "id": "...", "relevance": 0.87 }
]
}Fold implements memory decay inspired by cognitive science research (ACT-R). Memories have strength that decays over time but is boosted by access frequency.
Human memory research shows that forgetting is a feature, not a bug:
- Recent memories are more relevant
- Frequently used memories matter more
- Stale memories are less likely to be useful
Storing everything forever with equal weight creates noise.
strength = recency_factor × access_boost
recency_factor = exp(-age_days × ln(2) / half_life)
access_boost = log(retrieval_count + 1)
Example:
Memory created 30 days ago, accessed 5 times:
- Age: 30 days, half_life: 30 days
- recency_factor: exp(-30 × 0.693 / 30) = 0.5
- access_boost: log(6) = 1.79
- strength = 0.5 × 1.79 = 0.895
Search combines semantic relevance with memory strength:
score = (1 - weight) × relevance + weight × strength
With default weight=0.3:
- Perfect semantic match (1.0) with low strength (0.2): score = 0.76
- Good semantic match (0.8) with high strength (0.9): score = 0.83
Recent, frequently-accessed memories can outrank slightly better semantic matches.
| Parameter | Default | Use Case |
|---|---|---|
decay_half_life_days |
30 | Balance: 7d=fast-moving, 90d=longer projects |
decay_strength_weight |
0.3 | 0=pure semantic, 1=pure strength, 0.3=balanced |
After creating a memory, Fold can automatically suggest and create relationships.
Step 1: New Memory Created
Title: "Fixed session timeout bug"
Content: "Changed SESSION_TIMEOUT from 1 hour to 7 days..."
Step 2: Find Similar Memories
- Embed the new memory
- Vector search for 5 nearest neighbours
- Get candidates for linking
Step 3: Ask LLM for Suggestions
LLM Input:
- New memory: "Fixed session timeout"
- Candidate A: "Session timeout decision"
- Candidate B: "Commit that reduced timeout"
- Candidate C: "Auth service code"
LLM Output:
{
"links": [
{
"target": "Session timeout decision",
"type": "References",
"confidence": 0.92,
"reasoning": "This memory implements the decision"
},
{
"target": "Commit that reduced timeout",
"type": "Related",
"confidence": 0.85,
"reasoning": "Related to fixing the timeout issue"
}
]
}
Step 4: Store Links
- Create memory_links in database
- Update fold file with wiki-style links
- Update neighbour metadata
When creating a memory with auto-analysis enabled:
LLM extracts:
1. Keywords - Key terms and concepts (max 15)
2. Context - Detailed 3-5 sentence summary
3. Tags - Broad categories (max 6)
The context should cover:
- What this memory does
- Its role in the system
- Key responsibilities
- Important relationships
- Notable design decisions
How Fold converts source files into memories.
GitHub Webhook ─┐
GitLab Webhook ─┼─→ Job Queue
Polling Loop ──┤
Manual API ────┘
Files are added to a job queue for processing:
jobs (
id: "job_123",
project_id: "proj_abc",
job_type: "index_repo",
status: "pending",
priority: 1,
payload: { "files": [...] },
attempts: 0,
next_retry: now
)Features:
- Atomic job claiming (prevents duplicate processing)
- Automatic retry with exponential backoff
- Stale job recovery
- Heartbeat to prevent timeouts
Claims jobs atomically and processes them:
1. Claims job atomically
2. Checks LLM/embedding availability
3. Ensures local clone exists (clone or pull)
4. Routes to indexer
5. Updates job statusFor each file:
1. Read from local clone
2. Skip if:
- Empty
- > 100KB
- Binary/non-code
- Matches exclude pattern
3. Calculate SHA256 hash of file path
→ memory_id = first 16 chars
4. Check if already indexed:
- Hash unchanged? Skip
- fold/a/b/hash.md exists? Skip
5. Continue to summarization
Summarize file with LLM:
Input: File content, path, language
LLM extracts:
- title: "Authentication Service" (max 100 chars)
- summary: "Comprehensive 2-4 sentence description"
- keywords: Key terms (max 15)
- tags: Categories (max 6)
- exports: Public functions/classes
- dependencies: Imported modules
- created_date: Earliest date from file
Create and store memory:
1. Create Memory object with metadata
2. Write to fold/a/b/hash.md
- YAML frontmatter
- Markdown content
3. Insert metadata into SQLite
4. Generate embedding
- content + context + keywords + tags
5. Store in Qdrant
6. Process A-MEM evolution
- Find 5 neighbours
- Ask LLM for linking
- Create memory_links
- Update fold file with wiki links
If enabled, commit fold/ changes:
git add fold/
git commit -m "fold: Index N files from project"Let's trace a complete scenario showing how all concepts work together.
Day 1: Problem
User reports: "Sessions expire after 1 hour but should be 7 days"
Developer creates task: "Fix session timeout"
Day 2: Search for Context
Developer asks: "How should session timeouts work?"
Fold searches and returns:
1. Semantic Match: src/auth.rs (0.97)
"Handles JWT validation and session creation"
2. Decision: (0.93)
"Sessions should expire after 7 days of inactivity"
3. Recent Commit: (0.96)
"Reduced session timeout for security"
(probably the bug!)
4. Related Code: Similar timeout patterns
How internally:
- Query embedded → vector
- Qdrant finds top matches
- Apply decay weighting (recent=stronger)
- Return ranked results
Day 3: Fix Applied
Developer:
- Reads src/auth.rs (relevance 0.97 guided them)
- Finds SESSION_TIMEOUT = 3600 (1 hour)
- Changes to 604800 (7 days)
- Commits: "Fix session timeout bug - set to 7 days per policy"
Auto-Generated Memory:
Type: commit
Source: Git
Title: "Fix session timeout - set to 7 days"
Content: "Restored SESSION_TIMEOUT to 7 days per policy,
fixing the bug introduced by security timeout reduction"
A-MEM Evolution:
LLM automatically suggests links:
1. References → Decision "Session timeout policy"
(confidence: 0.95)
2. Related → Commit "Reduce session timeout"
(confidence: 0.92 - the commit that caused the bug)
3. Modifies → File "src/auth.rs"
(confidence: 0.99)
Knowledge Graph:
Task: "Fix session timeout"
├─ Related → Commit: "Fix session timeout"
│ ├─ Modifies → File: "src/auth.rs"
│ ├─ References → Decision: "Session timeout policy"
│ └─ Related → Commit: "Reduce timeout"
└─ Related → Decision: "Session timeout policy"
Day 4: New Developer Onboarding
Teammate searches: "How do we handle session management?"
Fold returns complete story:
- Decision: why 7 days
- Bug fix commit: what went wrong
- Related code: patterns to follow
- Session management: complete picture
This is holographic memory: enter at any point, reconstruct full context.
Key Takeaways:
- Unified Memory Model - One Memory type with source field (File, Agent, Git)
- Hash-Based Storage - Repo path determines identity via SHA256
- Git-Native - Markdown files in fold/ are source of truth
- Semantic Embeddings - Find meaning, not keywords
- Knowledge Graph - Explicit relationships between memories
- A-MEM Evolution - LLM-powered automatic linking
- Memory Decay - Recency and access frequency matter
- Holographic - Any fragment reconstructs full context
The result: A knowledge system where memories are deeply interconnected, automatically organized, and readily accessible—enabling both humans and AI agents to understand complex projects instantly.