TokenSmith is a Retrieval-Augmented Generation (RAG) application that enables intelligent document search and question answering using local LLMs. Built with llama.cpp for efficient inference and FAISS for high-performance vector search, TokenSmith allows you to index PDF documents and chat with them using natural language queries.
- 📚 PDF Document Processing: Extract and index content from PDF documents
- 🔍 Intelligent Retrieval: Fast semantic search using FAISS vector database
- 🤖 Local LLM Integration: Powered by llama.cpp for privacy-focused inference
- ⚡ Hardware Acceleration: Supports Metal (Apple Silicon), CUDA (NVIDIA), and CPU inference
- 🎯 Flexible Chunking: Token-based or character-based document segmentation
- 📊 Visualization Support: Optional indexing progress visualization
- 🛠️ Production-Ready: Conda-based environment management with automated builds
- 🔧 Configurable: YAML-based configuration system
- Python: 3.9+
- Conda/Miniconda: For environment management
- System Requirements:
- macOS: Xcode Command Line Tools
- Linux: GCC, make, cmake
- Windows: Visual Studio Build Tools (for compilation)
git clone https://github.com/georgia-tech-db/TokenSmith.git
cd tokensmith
make build
This will:
- Create a conda environment named
tokensmith
- Install all Python dependencies
- Detect or build llama.cpp with platform-specific optimizations
- Install TokenSmith in development mode
conda activate tokensmith
Place your PDF files in the data directory
mkdir -p data/chapters
cp your-documents.pdf data/chapters/
Index with default settings
make run-index
Or with custom parameters, eg.
make run-index ARGS="--pdf_range 1-10 --chunk_mode chars --visualize"
Activate environment first (required for interactive mode)
conda activate tokensmith
python -m src.main chat
You might have to download
qwen2.5-0.5b-instruct-q5_k_m.gguf
into yourllama.cpp/models
if you get an error about a missing model.
conda deactivate
TokenSmith uses YAML configuration files with the following priority order:
- Command-line
--config
argument - User config (
~/.config/tokensmith/config.yaml
) - Default config (
config/config.yaml
)
# config/config.yaml
embed_model: "sentence-transformers/all-MiniLM-L6-v2"
top_k: 5
max_gen_tokens: 400
halo_mode: "none"
seg_filter: null
# Model settings
model_path: "models/qwen2.5-0.5b-instruct-q5_k_m.gguf"
# Indexing settings
chunk_mode: "tokens" # or "chars"
chunk_tokens: 500
chunk_size_char: 20000
make run-index
make run-index ARGS="--pdf_range <start_page_number>-<end_page_number> --chunk_mode <tokens_or_chars>"
make run-index ARGS="--keep_tables --visualize --chunk_tokens <number_of_chunk_tokens>"
make run-index ARGS="--pdf_dir <path_to_pdf> --index_prefix book_index --config <path_to_yaml_config_file>"
python -m src.main chat --config <path_to_yaml_config_file> --model_path <path_to_llm_model>
export LLAMA_CPP_BINARY=/usr/local/bin/llama-cli
make build
make update-env
make export-env
make show-deps
mode
: Operation mode (index
orchat
)--config
: Configuration file path--pdf_dir
: Directory containing PDF files--index_prefix
: Prefix for index files--model_path
: Path to GGUF model file
--pdf_range
: Process specific page range (e.g., "1-10")--chunk_mode
: Chunking strategy (tokens
orchars
)--chunk_tokens
: Tokens per chunk (default: 500)--chunk_size_char
: Characters per chunk (default: 20000)--keep_tables
: Preserve table formatting--visualize
: Show indexing progress visualization
make help # Show all available commands
make env # Create conda environment
make build-llama # Build llama.cpp from source
make install # Install package in development mode
make build # Full build process
make test # Run tests
make clean # Clean build artifacts
make show-deps # Show installed packages
make update-env # Update environment
make export-env # Export environment with exact versions
# Add new conda package
conda activate tokensmith
conda install new-package
Add to environment.yml for persistence. Edit environment.yml, then:
make update-env
TokenSmith includes a comprehensive benchmark testing framework that evaluates answer quality across multiple similarity metrics. The framework uses pytest with YAML-defined test cases for easy management and execution.
Test cases are defined in tests/benchmarks.yaml
. Each benchmark includes a question, expected answer, keywords, and similarity threshold:
# tests/benchmarks.yaml
benchmarks:
- id: "unique_test_id"
question: "Your question here?"
expected_answer: "The expected answer that should be generated."
keywords: ["key", "terms", "to", "match"]
similarity_threshold: 0.65 # Minimum score to pass (0.0-1.0)
- id: "ml_basics"
question: "What is machine learning?"
expected_answer: "Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed."
keywords: ["machine learning", "artificial intelligence", "data", "learn", "algorithm"]
similarity_threshold: 0.6
Required fields:
id
: Unique identifier for the test casequestion
: The question to ask TokenSmithexpected_answer
: Reference answer for comparisonkeywords
: List of important terms to check forsimilarity_threshold
: Minimum similarity score (0.6-0.8 recommended)
Scoring weights:
- Text similarity: 30%
- Semantic similarity: 50%
- Keyword matching: 20%
make test-benchmarks
make test-benchmarks ARGS="--index-prefix my_test_index --timeout 600 --model_path models/custom-model.gguf"
make test-quick
conda activate tokensmith
pytest tests/test_benchmarks.py::test_tokensmith_benchmark -k "ml_basics" -v
Test results are automatically generated in:
tests/results/benchmark_results.json
- Detailed JSON datatests/results/benchmark_summary.html
- Visual HTML reporttests/results/failed_tests.log
- Failed test details
Open HTML report:
make show-test-results
make clean-test-results