Skip to content

A converter toolkit that takes an Anthropic Claude account message history export and converts it into an Obsidian consumable knowledge graph

Notifications You must be signed in to change notification settings

aaronsb/llmchat-knowledge-converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLM Chat Knowledge Converter 🧠

Obsidian Graph View of Claude Conversations

Transform your AI conversations into a searchable knowledge base with knowledge graph integration.

Convert LLM chat exports (Claude, ChatGPT) into a unified, searchable format with full-text search, semantic embeddings, and knowledge graph ingestion capabilities. Your conversations become a queryable knowledge repository that integrates with sophisticated knowledge management systems.

🎯 What This Does

This toolkit provides a complete pipeline for converting, searching, and ingesting LLM conversations:

graph LR
    A[Chat Export<br/>ZIP] -->|convert.py| B[Normalized<br/>Markdown]
    B -->|SQLite FTS5| C[Full-Text<br/>Search]
    B -->|Optional| D[Semantic<br/>Embeddings]
    C -->|search_chats.py| E[Find<br/>Conversations]
    E -->|kg ingest| F[Knowledge<br/>Graph]
    B -->|Optional| G[Obsidian<br/>Vault]

    style A fill:#e74c3c,stroke:#333,stroke-width:2px,color:#fff
    style B fill:#3498db,stroke:#333,stroke-width:2px,color:#fff
    style C fill:#27ae60,stroke:#333,stroke-width:2px,color:#fff
    style D fill:#9b59b6,stroke:#333,stroke-width:2px,color:#fff
    style E fill:#f39c12,stroke:#333,stroke-width:2px,color:#fff
    style F fill:#1abc9c,stroke:#333,stroke-width:2px,color:#fff
    style G fill:#34495e,stroke:#333,stroke-width:2px,color:#fff
Loading

πŸš€ Quick Start

Installation

git clone https://github.com/aaronsb/llmchat-knowledge-converter.git
cd llmchat-knowledge-converter
./scripts/install-pipx.sh  # One-time setup

Basic Usage

# Convert (both providers auto-detected)
llmchat-convert claude export.zip --name my-vault
llmchat-convert chatgpt export.zip --name my-vault

# Search conversations
llmchat-search output/my-vault "search query"

# Get paths for knowledge graph ingestion
llmchat-search output/my-vault "topic" --json --ontology topic-name

πŸ“₯ Exporting Chat History

πŸ“˜ Claude Export
  1. Visit https://claude.ai/settings
  2. Click "Download my data"
  3. Wait for email with download link
  4. Download the ZIP file

Note: The export contains conversations.json, projects.json, and users.json in a single ZIP.

πŸ“— ChatGPT Export
  1. Sign in to ChatGPT
  2. Click your profile icon (top right)
  3. Settings β†’ Data controls β†’ Export data
  4. Click "Confirm export"
  5. Check email for download link (expires in 24 hours)
  6. Download the ZIP file

Note: The export includes conversations.json and any images from DALL-E or uploads.

πŸ”„ Conversion Workflow

Convert to Searchable Format

# Basic conversion (FTS search only)
python src/convert.py claude ~/Downloads/export.zip --name my-vault --no-embeddings

# With semantic search (requires Nomic API key)
python src/convert.py chatgpt ~/Downloads/export.zip --name my-vault

# Skip tag configuration
python src/convert.py claude export.zip --name my-vault --skip-tags

Output structure:

output/my-vault/
β”œβ”€β”€ conversations/              # Year/Month/Day/ConversationName/
β”‚   └── 2025/
β”‚       └── 11-November/
β”‚           └── 20/
β”‚               └── Barbecue_Philosophy_a1b2c3d4/
β”‚                   β”œβ”€β”€ messages/
β”‚                   β”‚   β”œβ”€β”€ *.json          # Message metadata
β”‚                   β”‚   └── *.md            # Markdown content (for KG)
β”‚                   β”œβ”€β”€ images/             # Preserved images
β”‚                   └── metadata.json
β”œβ”€β”€ conversations.db            # SQLite with FTS5
└── .obsidian/                 # Optional Obsidian config

πŸ” Searching Conversations

Full-Text Search (No Embeddings Needed)

# Basic search
llmchat-search output/my-vault "barbecue"

# Keyword search
llmchat-search output/my-vault "keyword:javascript"

# Limit results
llmchat-search output/my-vault "machine learning" --limit 10

Knowledge Graph Integration

Get file paths for selective ingestion into knowledge graph systems:

# Conversation-level (directories)
llmchat-search output/my-vault "topic" --json --ontology topic-name

# File-level (markdown files)
llmchat-search output/my-vault "topic" --json --granularity file

# Message-level (all files)
llmchat-search output/my-vault "topic" --json --granularity message

Example workflow:

# 1. Search for conversations
llmchat-search output/my-vault "philosophy consciousness" --json > results.json

# 2. Ingest into knowledge graph (if using kg system)
kg ingest directory /path/to/conversation -o philosophy-discussions -r --depth 1

🌟 Features

Unified Conversion

  • βœ… Single Tool - One CLI for both Claude and ChatGPT
  • βœ… Direct ZIP Input - No manual extraction needed
  • βœ… Auto-Detection - Provider detected automatically
  • βœ… Normalized Output - Identical structure for both providers

Search Capabilities

  • πŸ” Full-Text Search - SQLite FTS5 for instant keyword search
  • 🧠 Semantic Search - Optional Nomic embeddings (local or remote)
  • 🎯 Keyword Extraction - TF-IDF based tagging
  • πŸ“Š Statistics - Message counts, date ranges, keyword analytics

Knowledge Graph Ready

  • πŸ“‚ Multiple Granularities - Conversation, file, or message level
  • πŸ”— Absolute Paths - Ready for external tool ingestion
  • 🏷️ Ontology Suggestions - Auto-generated from search queries
  • πŸ“‹ JSON Output - Structured data for pipeline integration

Content Preservation

  • πŸ“ Markdown Extraction - Formatting preserved
  • πŸ’» Code Snippets - Syntax highlighting ready
  • πŸ–ΌοΈ Images - DALL-E and uploaded images preserved
  • πŸ—‚οΈ Metadata - Full conversation context retained

πŸ”— Integration with Knowledge Graph Systems

This converter is designed to work with knowledge graph systems that extract concepts and relationships from documents. The search tool outputs file paths in formats compatible with:

  • Custom knowledge graph systems (via MCP integration)
  • Graph databases (Neo4j, Apache AGE, etc.)
  • Vector databases (with semantic search)
  • Note-taking tools (Obsidian, Logseq, etc.)

Example: Continuous Knowledge Accumulation

# Weekly workflow: ingest new conversations into persistent ontologies
# 1. Export new conversations from Claude/ChatGPT
# 2. Convert to searchable format
llmchat-convert claude weekly-export.zip --name weekly-vault

# 3. Search for specific topics
llmchat-search output/weekly-vault "system architecture" --json > arch.json

# 4. Ingest selected conversations into knowledge graph
kg ingest directory /path/from/search -o system-architecture -r --depth 1

🎨 Obsidian Integration (Optional)

The converter also works as an Obsidian vault generator:

  1. Convert your export (vault created in output/)
  2. Move to your Obsidian vaults location:
    mv output/my-vault ~/Documents/ObsidianVaults/
  3. Open in Obsidian β†’ Graph View

Obsidian MCP Plugin: Use obsidian-mcp-plugin to let Claude interact with your vault.

βš™οΈ Advanced Options

Embedding Generation

# Requires Nomic API key (for remote) or local model
export NOMIC_API_KEY=your_key_here
llmchat-convert claude export.zip --name vault-with-embeddings

Tag Configuration

# Interactive tag/color setup (Obsidian graph)
llmchat-convert claude export.zip --name my-vault

# Skip interactive setup
llmchat-convert claude export.zip --name my-vault --skip-tags

Custom Exclusions

Edit src/tag_exclusions.txt to filter common words from keyword extraction.

πŸ“Š What Gets Indexed

The SQLite database tracks:

  • Conversations: UUID, name, dates, message count, source provider
  • Messages: Sender, content, timestamps, code detection
  • Keywords: TF-IDF extracted tags with scores
  • Embeddings: Optional semantic vectors (Nomic)
  • Full-Text Search: FTS5 virtual table for instant search

πŸ› οΈ Architecture

Normalized Data Structure

Both Claude and ChatGPT exports convert to identical structure:

conversations/YYYY/MM-MonthName/DD/ConversationName_ID/
β”œβ”€β”€ messages/
β”‚   β”œβ”€β”€ 000_human_*.json         # Message metadata
β”‚   β”œβ”€β”€ 001_assistant_*.json
β”‚   β”œβ”€β”€ *-001_Assistant_Message.md   # Markdown (KG-ready)
β”‚   └── *-003_Assistant_Message.md
β”œβ”€β”€ images/                      # Preserved media
└── metadata.json                # Conversation metadata

Database Schema

  • conversations - Main conversation table
  • messages - Individual messages
  • keywords - Extracted tags
  • embeddings - Semantic vectors
  • messages_fts - FTS5 virtual table

Search Architecture

See docs/ADR-001-semantic-search-architecture.md for design decisions.

πŸ“‹ Requirements

  • Python 3.11+
  • SQLite 3.9+ (FTS5 support)
  • Optional: Nomic API key for embeddings

Python Dependencies:

ijson>=3.2.0        # Streaming JSON parsing
nltk>=3.8.0         # Keyword extraction
nomic>=3.0.0        # Optional: embeddings
numpy>=1.24.0       # Vector operations

🀝 Contributing

This tool is part of a larger knowledge management ecosystem. Contributions welcome for:

  • Additional LLM provider support (Gemini, Copilot, etc.)
  • Local embedding models (sentence-transformers)
  • Enhanced search algorithms
  • Better keyword extraction

πŸ“„ License

MIT - Free to use, modify, and distribute.

πŸ™ Acknowledgments

Built to integrate with knowledge graph systems that externalize LLM latent space into queryable structures. Special thanks to the open source community for SQLite FTS5, NLTK, and Nomic embeddings.


Related Projects:

Built for the AI-assisted knowledge worker who keeps their promises. πŸš€

About

A converter toolkit that takes an Anthropic Claude account message history export and converts it into an Obsidian consumable knowledge graph

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •