Skip to content

Core Concepts

hitchhiker edited this page Feb 3, 2026 · 2 revisions

Core Concepts Guide

This guide explains the fundamental concepts behind Fold's holographic memory system. If you're building on Fold or want to understand how it works, this is where to find the reasoning and architecture.

Table of Contents

  1. What is a Memory?
  2. Hash-Based Storage
  3. Memory Sources
  4. Embeddings and Semantic Search
  5. The Knowledge Graph
  6. Search and Retrieval
  7. ACT-R Memory Decay
  8. A-MEM Agentic Evolution
  9. The Indexing Pipeline
  10. Example: Holographic Reconstruction

What is a Memory?

At its core, a memory is a unit of knowledge stored in Fold. It's not just a single record—it's a rich object combining content, semantics, relationships, and metadata.

Memory Structure

{
  "id": "aBcD123456789abc",
  "project_id": "proj_123",
  "repository_id": "repo_main",

  "title": "Authentication Service",
  "type": "codebase",
  "source": "File",

  "author": "system",
  "keywords": ["auth", "jwt", "security", "validation"],
  "tags": ["auth", "typescript", "security"],
  "context": "This module implements JWT-based authentication...",

  "file_path": "src/auth/service.ts",
  "language": "typescript",

  "created_at": "2026-02-03T10:30:00Z",
  "updated_at": "2026-02-03T10:30:00Z",
  "retrieval_count": 47,
  "last_accessed": "2026-02-03T14:15:00Z"
}

Key Properties

  • ID (16-char hash): Deterministic from repo path SHA256. Same file path = same ID.
  • Type: Classifies memory (codebase, session, spec, decision, commit, pr, task, general)
  • Source: Where it came from (File = auto-indexed, Agent = manual/AI, Git = from webhooks)
  • Content: Stored as markdown in fold/a/b/hash.md
  • Metadata: Title, author, keywords, tags, context
  • Semantics: Vector embedding in Qdrant for similarity search
  • Relationships: Links to other memories in the knowledge graph
  • Strength: Retrieval count and last accessed for decay calculation

Hash-Based Storage

Fold uses path-addressed storage where the repository file path determines identity.

The Structure

fold/
├── a/
│   ├── b/
│   │   ├── aBcD123456789abc.md
│   │   ├── aC12def456789abcd.md
│   └── f/
│       └── af87654321fedcba.md
├── 9/
│   └── a/
│       └── 9a8b7c6d5e4f3g2h.md

Path Format: fold/{first_hex}/{second_hex}/{full_hash}.md

The hash is the first 16 characters of the SHA256 of the repository file path.

How Identity Works

Project slug: my-app
File path: src/auth/service.ts

Normalised input: my-app/src/auth/service.ts
SHA256(normalised_path) = aBcD123456789abcDEF123456789ABCD...
Memory ID = aBcD123456789abc (first 16 chars)
Storage path = fold/a/b/aBcD123456789abc.md

The path is normalised with forward slashes and no absolute paths, ensuring the same ID is generated regardless of which machine indexes the file.

Benefits:

  • Machine-independent: Same ID across different machines and environments
  • Deterministic: Same file path always produces same ID
  • Stable identity: Content changes don't create new memories
  • Update in place: Re-indexing updates existing memory
  • Recoverable: Can rebuild database from fold/ directory

File Format

Each markdown file has YAML frontmatter + content:

---
id: aBcD123456789abc
title: Authentication Service
author: system
type: codebase
source: File
file_path: src/auth/service.ts
language: typescript
tags:
  - auth
  - typescript
  - security
keywords:
  - authenticate
  - jwt
  - session
context: "Implements JWT-based authentication with refresh tokens. Handles user login flows and API token validation. Uses RS256 asymmetric signing for enhanced security and refresh token rotation to prevent theft."
created_at: 2026-02-03T10:30:00Z
updated_at: 2026-02-03T10:30:00Z
related_to:
  - f0123456789abcde
  - 9a8b7c6d5e4f3g2h
---

# Authentication Service

This module implements JWT-based authentication with refresh tokens.
It serves as the core authentication layer for the application,
handling both user login flows and API token validation.

## Key Components
- AuthService - Main authentication service class
- validateToken() - Token validation with expiry check
- refreshToken() - Refresh token rotation

## Dependencies
- jsonwebtoken, bcrypt

Wiki-Style Links

Related memories are linked using markdown:

## Related Memories

- [[f/0/f0123456789abcde.md|Authentication Middleware]]
- [[9/a/9a8b7c6d5e4f3g2h.md|JWT Decision Document]]

This format works with Obsidian and other markdown editors that support wiki links.


Memory Sources

Memories have a source field indicating how they were created. This replaces the old system of 8 distinct types.

Source: File

Created: Automatic indexing from repositories

Example:

Memory of: src/auth/service.ts
Source: File
Type: codebase

Files are indexed from:

  • GitHub webhooks (on push)
  • GitLab webhooks (on push)
  • Polling loop (every 5 minutes)
  • Manual trigger via API

Source: Agent

Created: Manual or AI-generated content

Examples:

Type: session
Source: Agent
→ "Today we refactored the auth flow"

Type: decision
Source: Agent
→ "We decided to use JWT instead of server sessions"

Type: spec
Source: Agent
→ "User authentication requirements..."

Type: task
Source: Agent
→ "Implement password reset functionality"

Type: general
Source: Agent
→ "Research notes on OAuth 2.0"

Stored in fold/ like any other memory, but created through the API or MCP.

Source: Git

Created: From git commit webhooks

Example:

Type: commit
Source: Git
Title: "Add JWT validation middleware"
Content: AI-generated summary of the commit

Automatically generated when commits are pushed via webhooks.


Embeddings and Semantic Search

This is where the "semantic" in "semantic memory" comes from.

What Are Embeddings?

An embedding converts text into a vector (list of numbers) where similar meanings are close together and different meanings are far apart.

Text → Embedding Model → 768-dimensional Vector

"Authentication service using JWT tokens"
→ [0.234, -0.156, 0.891, ..., -0.045]

"JWT token validation in authentication"
→ [0.241, -0.148, 0.878, ..., -0.039]

Distance between vectors ≈ semantic similarity

How It Works

Fold uses either Gemini or OpenAI embeddings:

Provider Model Dimensions
Google Gemini text-embedding-004 768
OpenAI text-embedding-3-small 1536

The embedding model is trained on millions of text pairs where similar meanings get similar vectors.

Semantic Similarity vs Keyword Matching

Traditional keyword search:

Query: "How do we validate users?"
→ Find files with "validate" OR "users"
→ Get lots of unrelated results

Semantic search:

Query: "How do we validate users?"
→ Embed query
→ Find vectors closest to query vector
→ Get authentication code, login specs, decisions about auth
→ All conceptually related, not just keyword matches

Why This Matters

Query: "Sessions expiring too fast"

Keyword search finds:
  • Files with "session"
  • Files with "expiring"
  • Miss the decision about timeout values
  • Miss the commit that changed the timeout

Semantic search finds:
  • Code that handles session timeout
  • Decision: "Sessions should expire after 7 days"
  • Commit: "Reduced timeout for security"
  • Related: Session management patterns
  → Full context, not just keywords

The Knowledge Graph

Beyond embeddings, Fold maintains an explicit graph where memories are connected by typed links.

Why Both Embeddings AND Graphs?

Embeddings answer: "What means something similar to this?"

Graphs answer: "What relates to this in a specific way?"

Combined power:
┌─────────────────────────────────────────────────────┐
│  Embeddings: Find semantically similar memories    │
│  Graphs: Understand specific relationships          │
│  Together: Complete context reconstruction          │
└─────────────────────────────────────────────────────┘

Link Types

Related       - Semantically related (found via embedding similarity)
References    - One memory explicitly references another
DependsOn     - One memory depends on another
Modifies      - A commit/change modified another memory

Example Graph

Session: "Investigating session timeout"
  ├─ References → Decision: "Session timeout policy"
  └─ References → File: "src/auth.rs"

Commit: "Reduce session timeout for security"
  ├─ Modifies → File: "src/auth.rs"
  └─ Related → Decision: "Session timeout policy"

Decision: "Session timeout policy"
  └─ Related ← Commit: "Reduce session timeout"

Traversing the Graph

From any memory, you can walk the graph:

Start: "Fix session timeout bug"
  ↓ Related
Found: "Session timeout decision"
  ↓ Related
Found: "Commit that changed timeout"
  ↓ Modifies
Found: "src/auth.rs"
  ↓ Related
Found: "Similar timeout implementations elsewhere"

This is holographic reconstruction: starting from any point, you can rebuild full context.


Search and Retrieval

Fold provides multiple ways to find memories.

Semantic Search

Search by meaning:

POST /api/projects/my-app/search
{
  "query": "How do we handle user authentication?",
  "limit": 10
}

Response:
{
  "results": [
    {
      "id": "aBcD123456789abc",
      "type": "codebase",
      "title": "Authentication Service",
      "relevance": 0.95,
      "snippet": "Handles JWT validation and session management..."
    },
    {
      "id": "f0123456789abcde",
      "type": "decision",
      "title": "Use JWT for authentication",
      "relevance": 0.91,
      "snippet": "We chose JWT because..."
    }
  ]
}

Context Retrieval

For AI agents starting work on a task:

POST /api/projects/my-app/context/my-task-id
{
  "depth": 2
}

Response:
{
  "memory": { "id": "...", "title": "...", "content": "..." },
  "related_memories": [
    {
      "id": "...",
      "title": "...",
      "type": "...",
      "relationship": "Related"
    }
  ],
  "similar_memories": [
    { "id": "...", "relevance": 0.87 }
  ]
}

ACT-R Memory Decay

Fold implements memory decay inspired by cognitive science research (ACT-R). Memories have strength that decays over time but is boosted by access frequency.

Why Decay Matters

Human memory research shows that forgetting is a feature, not a bug:

  • Recent memories are more relevant
  • Frequently used memories matter more
  • Stale memories are less likely to be useful

Storing everything forever with equal weight creates noise.

Memory Strength Formula

strength = recency_factor × access_boost

recency_factor = exp(-age_days × ln(2) / half_life)
access_boost = log(retrieval_count + 1)

Example:

Memory created 30 days ago, accessed 5 times:
- Age: 30 days, half_life: 30 days
- recency_factor: exp(-30 × 0.693 / 30) = 0.5
- access_boost: log(6) = 1.79
- strength = 0.5 × 1.79 = 0.895

Blending with Semantic Similarity

Search combines semantic relevance with memory strength:

score = (1 - weight) × relevance + weight × strength

With default weight=0.3:
- Perfect semantic match (1.0) with low strength (0.2): score = 0.76
- Good semantic match (0.8) with high strength (0.9): score = 0.83

Recent, frequently-accessed memories can outrank slightly better semantic matches.

Configuration

Parameter Default Use Case
decay_half_life_days 30 Balance: 7d=fast-moving, 90d=longer projects
decay_strength_weight 0.3 0=pure semantic, 1=pure strength, 0.3=balanced

A-MEM Agentic Memory Evolution

After creating a memory, Fold can automatically suggest and create relationships.

The Process

Step 1: New Memory Created

Title: "Fixed session timeout bug"
Content: "Changed SESSION_TIMEOUT from 1 hour to 7 days..."

Step 2: Find Similar Memories

  • Embed the new memory
  • Vector search for 5 nearest neighbours
  • Get candidates for linking

Step 3: Ask LLM for Suggestions

LLM Input:
- New memory: "Fixed session timeout"
- Candidate A: "Session timeout decision"
- Candidate B: "Commit that reduced timeout"
- Candidate C: "Auth service code"

LLM Output:
{
  "links": [
    {
      "target": "Session timeout decision",
      "type": "References",
      "confidence": 0.92,
      "reasoning": "This memory implements the decision"
    },
    {
      "target": "Commit that reduced timeout",
      "type": "Related",
      "confidence": 0.85,
      "reasoning": "Related to fixing the timeout issue"
    }
  ]
}

Step 4: Store Links

  • Create memory_links in database
  • Update fold file with wiki-style links
  • Update neighbour metadata

Content Analysis

When creating a memory with auto-analysis enabled:

LLM extracts:
1. Keywords - Key terms and concepts (max 15)
2. Context - Detailed 3-5 sentence summary
3. Tags - Broad categories (max 6)

The context should cover:

  • What this memory does
  • Its role in the system
  • Key responsibilities
  • Important relationships
  • Notable design decisions

The Indexing Pipeline

How Fold converts source files into memories.

Data Sources

GitHub Webhook ─┐
GitLab Webhook ─┼─→ Job Queue
Polling Loop ──┤
Manual API ────┘

Step 1: Job Queue

Files are added to a job queue for processing:

jobs (
  id: "job_123",
  project_id: "proj_abc",
  job_type: "index_repo",
  status: "pending",
  priority: 1,
  payload: { "files": [...] },
  attempts: 0,
  next_retry: now
)

Features:

  • Atomic job claiming (prevents duplicate processing)
  • Automatic retry with exponential backoff
  • Stale job recovery
  • Heartbeat to prevent timeouts

Step 2: Job Worker

Claims jobs atomically and processes them:

1. Claims job atomically
2. Checks LLM/embedding availability
3. Ensures local clone exists (clone or pull)
4. Routes to indexer
5. Updates job status

Step 3: Indexer

For each file:

1. Read from local clone
2. Skip if:
   - Empty
   - > 100KB
   - Binary/non-code
   - Matches exclude pattern

3. Calculate SHA256 hash of file path
   → memory_id = first 16 chars

4. Check if already indexed:
   - Hash unchanged? Skip
   - fold/a/b/hash.md exists? Skip

5. Continue to summarization

Step 4: LLM Summarization

Summarize file with LLM:

Input: File content, path, language

LLM extracts:
- title: "Authentication Service" (max 100 chars)
- summary: "Comprehensive 2-4 sentence description"
- keywords: Key terms (max 15)
- tags: Categories (max 6)
- exports: Public functions/classes
- dependencies: Imported modules
- created_date: Earliest date from file

Step 5: Memory Service

Create and store memory:

1. Create Memory object with metadata

2. Write to fold/a/b/hash.md
   - YAML frontmatter
   - Markdown content

3. Insert metadata into SQLite

4. Generate embedding
   - content + context + keywords + tags

5. Store in Qdrant

6. Process A-MEM evolution
   - Find 5 neighbours
   - Ask LLM for linking
   - Create memory_links
   - Update fold file with wiki links

Step 6: Auto-Commit

If enabled, commit fold/ changes:

git add fold/
git commit -m "fold: Index N files from project"

Example: Holographic Reconstruction

Let's trace a complete scenario showing how all concepts work together.

Scenario: Session Timeout Bug

Day 1: Problem

User reports: "Sessions expire after 1 hour but should be 7 days"
Developer creates task: "Fix session timeout"

Day 2: Search for Context

Developer asks: "How should session timeouts work?"

Fold searches and returns:

1. Semantic Match: src/auth.rs (0.97)
   "Handles JWT validation and session creation"

2. Decision: (0.93)
   "Sessions should expire after 7 days of inactivity"

3. Recent Commit: (0.96)
   "Reduced session timeout for security"
   (probably the bug!)

4. Related Code: Similar timeout patterns

How internally:

  1. Query embedded → vector
  2. Qdrant finds top matches
  3. Apply decay weighting (recent=stronger)
  4. Return ranked results

Day 3: Fix Applied

Developer:

  • Reads src/auth.rs (relevance 0.97 guided them)
  • Finds SESSION_TIMEOUT = 3600 (1 hour)
  • Changes to 604800 (7 days)
  • Commits: "Fix session timeout bug - set to 7 days per policy"

Auto-Generated Memory:

Type: commit
Source: Git
Title: "Fix session timeout - set to 7 days"
Content: "Restored SESSION_TIMEOUT to 7 days per policy,
          fixing the bug introduced by security timeout reduction"

A-MEM Evolution:

LLM automatically suggests links:

1. References → Decision "Session timeout policy"
   (confidence: 0.95)

2. Related → Commit "Reduce session timeout"
   (confidence: 0.92 - the commit that caused the bug)

3. Modifies → File "src/auth.rs"
   (confidence: 0.99)

Knowledge Graph:

Task: "Fix session timeout"
  ├─ Related → Commit: "Fix session timeout"
  │             ├─ Modifies → File: "src/auth.rs"
  │             ├─ References → Decision: "Session timeout policy"
  │             └─ Related → Commit: "Reduce timeout"
  └─ Related → Decision: "Session timeout policy"

Day 4: New Developer Onboarding

Teammate searches: "How do we handle session management?"

Fold returns complete story:

  • Decision: why 7 days
  • Bug fix commit: what went wrong
  • Related code: patterns to follow
  • Session management: complete picture

This is holographic memory: enter at any point, reconstruct full context.


Summary

Key Takeaways:

  1. Unified Memory Model - One Memory type with source field (File, Agent, Git)
  2. Hash-Based Storage - Repo path determines identity via SHA256
  3. Git-Native - Markdown files in fold/ are source of truth
  4. Semantic Embeddings - Find meaning, not keywords
  5. Knowledge Graph - Explicit relationships between memories
  6. A-MEM Evolution - LLM-powered automatic linking
  7. Memory Decay - Recency and access frequency matter
  8. Holographic - Any fragment reconstructs full context

The result: A knowledge system where memories are deeply interconnected, automatically organized, and readily accessible—enabling both humans and AI agents to understand complex projects instantly.

Clone this wiki locally