Transform your AI conversations into a searchable knowledge base with knowledge graph integration.
Convert LLM chat exports (Claude, ChatGPT) into a unified, searchable format with full-text search, semantic embeddings, and knowledge graph ingestion capabilities. Your conversations become a queryable knowledge repository that integrates with sophisticated knowledge management systems.
This toolkit provides a complete pipeline for converting, searching, and ingesting LLM conversations:
graph LR
A[Chat Export<br/>ZIP] -->|convert.py| B[Normalized<br/>Markdown]
B -->|SQLite FTS5| C[Full-Text<br/>Search]
B -->|Optional| D[Semantic<br/>Embeddings]
C -->|search_chats.py| E[Find<br/>Conversations]
E -->|kg ingest| F[Knowledge<br/>Graph]
B -->|Optional| G[Obsidian<br/>Vault]
style A fill:#e74c3c,stroke:#333,stroke-width:2px,color:#fff
style B fill:#3498db,stroke:#333,stroke-width:2px,color:#fff
style C fill:#27ae60,stroke:#333,stroke-width:2px,color:#fff
style D fill:#9b59b6,stroke:#333,stroke-width:2px,color:#fff
style E fill:#f39c12,stroke:#333,stroke-width:2px,color:#fff
style F fill:#1abc9c,stroke:#333,stroke-width:2px,color:#fff
style G fill:#34495e,stroke:#333,stroke-width:2px,color:#fff
git clone https://github.com/aaronsb/llmchat-knowledge-converter.git
cd llmchat-knowledge-converter
./scripts/install-pipx.sh # One-time setup# Convert (both providers auto-detected)
llmchat-convert claude export.zip --name my-vault
llmchat-convert chatgpt export.zip --name my-vault
# Search conversations
llmchat-search output/my-vault "search query"
# Get paths for knowledge graph ingestion
llmchat-search output/my-vault "topic" --json --ontology topic-nameπ Claude Export
- Visit https://claude.ai/settings
- Click "Download my data"
- Wait for email with download link
- Download the ZIP file
Note: The export contains conversations.json, projects.json, and users.json in a single ZIP.
π ChatGPT Export
- Sign in to ChatGPT
- Click your profile icon (top right)
- Settings β Data controls β Export data
- Click "Confirm export"
- Check email for download link (expires in 24 hours)
- Download the ZIP file
Note: The export includes conversations.json and any images from DALL-E or uploads.
# Basic conversion (FTS search only)
python src/convert.py claude ~/Downloads/export.zip --name my-vault --no-embeddings
# With semantic search (requires Nomic API key)
python src/convert.py chatgpt ~/Downloads/export.zip --name my-vault
# Skip tag configuration
python src/convert.py claude export.zip --name my-vault --skip-tagsOutput structure:
output/my-vault/
βββ conversations/ # Year/Month/Day/ConversationName/
β βββ 2025/
β βββ 11-November/
β βββ 20/
β βββ Barbecue_Philosophy_a1b2c3d4/
β βββ messages/
β β βββ *.json # Message metadata
β β βββ *.md # Markdown content (for KG)
β βββ images/ # Preserved images
β βββ metadata.json
βββ conversations.db # SQLite with FTS5
βββ .obsidian/ # Optional Obsidian config
# Basic search
llmchat-search output/my-vault "barbecue"
# Keyword search
llmchat-search output/my-vault "keyword:javascript"
# Limit results
llmchat-search output/my-vault "machine learning" --limit 10Get file paths for selective ingestion into knowledge graph systems:
# Conversation-level (directories)
llmchat-search output/my-vault "topic" --json --ontology topic-name
# File-level (markdown files)
llmchat-search output/my-vault "topic" --json --granularity file
# Message-level (all files)
llmchat-search output/my-vault "topic" --json --granularity messageExample workflow:
# 1. Search for conversations
llmchat-search output/my-vault "philosophy consciousness" --json > results.json
# 2. Ingest into knowledge graph (if using kg system)
kg ingest directory /path/to/conversation -o philosophy-discussions -r --depth 1- β Single Tool - One CLI for both Claude and ChatGPT
- β Direct ZIP Input - No manual extraction needed
- β Auto-Detection - Provider detected automatically
- β Normalized Output - Identical structure for both providers
- π Full-Text Search - SQLite FTS5 for instant keyword search
- π§ Semantic Search - Optional Nomic embeddings (local or remote)
- π― Keyword Extraction - TF-IDF based tagging
- π Statistics - Message counts, date ranges, keyword analytics
- π Multiple Granularities - Conversation, file, or message level
- π Absolute Paths - Ready for external tool ingestion
- π·οΈ Ontology Suggestions - Auto-generated from search queries
- π JSON Output - Structured data for pipeline integration
- π Markdown Extraction - Formatting preserved
- π» Code Snippets - Syntax highlighting ready
- πΌοΈ Images - DALL-E and uploaded images preserved
- ποΈ Metadata - Full conversation context retained
This converter is designed to work with knowledge graph systems that extract concepts and relationships from documents. The search tool outputs file paths in formats compatible with:
- Custom knowledge graph systems (via MCP integration)
- Graph databases (Neo4j, Apache AGE, etc.)
- Vector databases (with semantic search)
- Note-taking tools (Obsidian, Logseq, etc.)
# Weekly workflow: ingest new conversations into persistent ontologies
# 1. Export new conversations from Claude/ChatGPT
# 2. Convert to searchable format
llmchat-convert claude weekly-export.zip --name weekly-vault
# 3. Search for specific topics
llmchat-search output/weekly-vault "system architecture" --json > arch.json
# 4. Ingest selected conversations into knowledge graph
kg ingest directory /path/from/search -o system-architecture -r --depth 1The converter also works as an Obsidian vault generator:
- Convert your export (vault created in
output/) - Move to your Obsidian vaults location:
mv output/my-vault ~/Documents/ObsidianVaults/ - Open in Obsidian β Graph View
Obsidian MCP Plugin: Use obsidian-mcp-plugin to let Claude interact with your vault.
# Requires Nomic API key (for remote) or local model
export NOMIC_API_KEY=your_key_here
llmchat-convert claude export.zip --name vault-with-embeddings# Interactive tag/color setup (Obsidian graph)
llmchat-convert claude export.zip --name my-vault
# Skip interactive setup
llmchat-convert claude export.zip --name my-vault --skip-tagsEdit src/tag_exclusions.txt to filter common words from keyword extraction.
The SQLite database tracks:
- Conversations: UUID, name, dates, message count, source provider
- Messages: Sender, content, timestamps, code detection
- Keywords: TF-IDF extracted tags with scores
- Embeddings: Optional semantic vectors (Nomic)
- Full-Text Search: FTS5 virtual table for instant search
Both Claude and ChatGPT exports convert to identical structure:
conversations/YYYY/MM-MonthName/DD/ConversationName_ID/
βββ messages/
β βββ 000_human_*.json # Message metadata
β βββ 001_assistant_*.json
β βββ *-001_Assistant_Message.md # Markdown (KG-ready)
β βββ *-003_Assistant_Message.md
βββ images/ # Preserved media
βββ metadata.json # Conversation metadata
conversations- Main conversation tablemessages- Individual messageskeywords- Extracted tagsembeddings- Semantic vectorsmessages_fts- FTS5 virtual table
See docs/ADR-001-semantic-search-architecture.md for design decisions.
- Python 3.11+
- SQLite 3.9+ (FTS5 support)
- Optional: Nomic API key for embeddings
Python Dependencies:
ijson>=3.2.0 # Streaming JSON parsing
nltk>=3.8.0 # Keyword extraction
nomic>=3.0.0 # Optional: embeddings
numpy>=1.24.0 # Vector operations
This tool is part of a larger knowledge management ecosystem. Contributions welcome for:
- Additional LLM provider support (Gemini, Copilot, etc.)
- Local embedding models (sentence-transformers)
- Enhanced search algorithms
- Better keyword extraction
MIT - Free to use, modify, and distribute.
Built to integrate with knowledge graph systems that externalize LLM latent space into queryable structures. Special thanks to the open source community for SQLite FTS5, NLTK, and Nomic embeddings.
Related Projects:
- Knowledge Graph System - Large Concept Model for persistent AI memory
- Obsidian MCP Plugin - Claude integration for Obsidian vaults
Built for the AI-assisted knowledge worker who keeps their promises. π
