A production-ready Retrieval-Augmented Generation (RAG) system built with OpenAI embeddings, Pinecone vector database, and GPT-5. The system processes The Alchemist by Paulo Coelho to answer questions about the book with source citations.
This project implements a complete RAG pipeline with separate offline indexing and online query stages. The system chunks documents intelligently, generates high-dimensional embeddings, stores them in Pinecone's HNSW index, and retrieves relevant context for GPT-powered question answering.
Key Features:
- π Semantic search with 3072-dimensional embeddings
- π Intelligent sentence-based chunking (400-800 tokens)
- π― Top-K retrieval with configurable similarity thresholds
- π€ GPT-5 powered answer generation with source citations
- π Comprehensive logging and performance tracking
- β 57 passing unit tests across all modules
graph TB
subgraph "Offline Pipeline (Indexing)"
A[document.txt<br/>The Alchemist] --> B[Chunking Module<br/>Sentence-based splitting]
B --> C[Text Embedder<br/>OpenAI text-embedding-3-large]
C --> D[Vector Indexer<br/>Pinecone HNSW]
D --> E[(Pinecone Index<br/>78 chunks)]
end
subgraph "Online Pipeline (Query)"
F[User Query] --> G[Query Processor<br/>Validation & Embedding]
G --> H[Vector Retriever<br/>Semantic Search]
E --> H
H --> I[Response Generator<br/>GPT-5]
I --> J[Answer with Citations]
end
style A fill:#e1f5ff, color:#000
style E fill:#ffe1e1, color:#000
style F fill:#e1ffe1, color:#000
style J fill:#ffe1ff, color:#000
rag-prototype/
βββ config.py # Configuration management
βββ document.txt # Source text (The Alchemist)
βββ .env # API credentials (not in repo)
βββ .env.example # Environment template
βββ pyproject.toml # Poetry dependencies
β
βββ offline/ # Indexing pipeline
β βββ chunking.py # Sentence-based text chunking
β βββ embedding.py # OpenAI embedding generation
β βββ indexing.py # Pinecone vector storage
β
βββ online/ # Query pipeline
β βββ query.py # Query processing & embedding
β βββ retrieval.py # Vector similarity search
β βββ generation.py # GPT answer generation
β
βββ scripts/ # Executable scripts
β βββ setup.py # NLTK data download
β βββ run_indexing.py # Offline pipeline orchestrator
β βββ run_query.py # Online pipeline orchestrator
β
βββ tests/ # Comprehensive test suite
β βββ test_embedding.py # 5 tests
β βββ test_query.py # 21 tests
β βββ test_retrieval.py # 17 tests
β βββ test_generation.py # 19 tests
β
βββ docs/ # Documentation
βββ SETUP-GUIDE.md # Detailed setup instructions
βββ TEST_CASES.md # 20 example queries
| Component | Technology | Purpose |
|---|---|---|
| Language | Python 3.13 | Core implementation |
| Dependency Management | Poetry | Package management |
| Embeddings | OpenAI text-embedding-3-large | 3072-dimensional vectors |
| Vector DB | Pinecone (Serverless) | HNSW index for fast search |
| LLM | OpenAI GPT-5 | Answer generation |
| Chunking | NLTK + tiktoken | Sentence splitting + token counting |
| Logging | Loguru | Structured logging |
| Testing | pytest + unittest.mock | Unit & integration tests |
| Retry Logic | Tenacity | Exponential backoff for API calls |
sequenceDiagram
participant Doc as document.txt
participant Chunk as Chunker
participant Embed as Embedder
participant Pine as Pinecone
Doc->>Chunk: Read text (226KB)
Chunk->>Chunk: Split into sentences (NLTK)
Chunk->>Chunk: Group by token count (400-800)
Chunk->>Chunk: Add overlap (80-150 tokens)
Chunk->>Embed: 78 chunks
Embed->>Embed: Batch process (100/batch)
Embed->>Pine: Upsert vectors (3072D)
Pine->>Pine: Build HNSW index
Note over Pine: Ready for queries
Steps:
- Load Document - Read The Alchemist text (226KB)
- Chunking - Sentence-based splitting with 400-800 tokens per chunk
- Embedding - Generate 3072-dimensional vectors using OpenAI
- Indexing - Upload to Pinecone with HNSW index
- Result - 78 searchable chunks ready for retrieval
sequenceDiagram
participant User
participant Query as QueryProcessor
participant Retriever as VectorRetriever
participant Generator as ResponseGenerator
participant GPT as GPT-5
User->>Query: "What is Santiago's Personal Legend?"
Query->>Query: Validate & preprocess
Query->>Query: Generate embedding (3072D)
Query->>Retriever: Search vector
Retriever->>Retriever: Cosine similarity search
Retriever->>Retriever: Filter by threshold (0.6)
Retriever->>Generator: Top-K chunks (10)
Generator->>Generator: Construct context with [Source N]
Generator->>GPT: System + User message
GPT->>Generator: Answer + token usage
Generator->>User: Answer with citations
Steps:
- Query Processing - Validate and embed user question
- Retrieval - Find top-K similar chunks (default: 10, threshold: 0.6)
- Context Construction - Format retrieved chunks with source citations
- Generation - GPT-5 generates answer from context
- Response - Return answer with source references and token usage
- Python 3.13+
- Poetry
- OpenAI API key
- Pinecone API key
- Clone the repository
git clone <repository-url>
cd rag-prototype- Install dependencies
poetry install- Download NLTK data
poetry run python scripts/setup.py- Configure environment
cp .env.example .env
# Edit .env with your API keys:
# - OPENAI_API_KEY
# - PINECONE_API_KEY
# - PINECONE_ENVIRONMENTRun once to index The Alchemist into Pinecone:
poetry run python scripts/run_indexing.pyExpected output:
β Loaded document (226.19 KB)
β Created 78 chunks
β Generated 78 embeddings (3072 dimensions)
β Uploaded 78 vectors to Pinecone
β Indexing complete (15.34s)
Single query mode:
poetry run python scripts/run_query.py "What is Santiago's Personal Legend?"Interactive mode (multiple queries):
poetry run python scripts/run_query.py
# Then enter queries interactivelyWith custom parameters:
# Retrieve more chunks
poetry run python scripts/run_query.py -k 15 "Who is Fatima?"
# Lower similarity threshold
poetry run python scripts/run_query.py -s 0.5 "What is alchemy?"
# Verbose mode (show detailed progress)
poetry run python scripts/run_query.py -v "What are omens?"
# Combine parameters
poetry run python scripts/run_query.py -k 5 -s 0.7 -v "Who is the alchemist?"================================================================================
ANSWER:
================================================================================
Santiago's Personal Legend is to search for and find the hidden treasure
revealed in his dream near the Egyptian Pyramids. [Source 2][Source 1]
================================================================================
METADATA:
================================================================================
Model: gpt-5
Sources: 2 chunks
Prompt tokens: 1682
Response tokens: 549
Total tokens: 2231
Total time: 16.91s
Source Chunk IDs:
[1] 11 (similarity: 0.6581)
[2] 10 (similarity: 0.6238)
================================================================================
Run the comprehensive test suite:
# All tests
poetry run pytest
# Specific module
poetry run pytest tests/test_generation.py
# With verbose output
poetry run pytest -v
# With coverage
poetry run pytest --cov=offline --cov=online --cov=configTest Coverage:
- β 5 tests - Embedding module
- β 21 tests - Query processing
- β 17 tests - Vector retrieval
- β 19 tests - Answer generation
- Total: 57 passing tests
See docs/TEST_CASES.md for 20 example queries organized by category:
- Character questions (Santiago, Fatima, Alchemist)
- Concepts & themes (Personal Legend, Soul of the World)
- Plot & journey (Dreams, treasure, crystal shop)
- Symbolic questions (Alchemy, omens, love)
Example:
poetry run python scripts/run_query.py "What does Maktub mean?"
poetry run python scripts/run_query.py "How does Santiago learn the Language of the World?"
poetry run python scripts/run_query.py "What is the significance of the treasure location?"Key parameters in .env:
| Parameter | Default | Description |
|---|---|---|
OPENAI_EMBEDDING_MODEL |
text-embedding-3-large | Embedding model |
OPENAI_CHAT_MODEL |
gpt-5 | Generation model |
OPENAI_EMBEDDING_DIMENSIONS |
3072 | Vector dimensions |
CHUNK_MIN_TOKENS |
400 | Minimum chunk size |
CHUNK_MAX_TOKENS |
800 | Maximum chunk size |
CHUNK_OVERLAP_MIN |
80 | Minimum overlap tokens |
CHUNK_OVERLAP_MAX |
150 | Maximum overlap tokens |
TOP_K |
10 | Number of chunks to retrieve |
RETRIEVAL_MIN_SCORE |
0.6 | Similarity threshold |
- SETUP-GUIDE.md - Detailed implementation guide
- TEST_CASES.md - Example queries with expected outputs
poetry run python scripts/run_query.py [OPTIONS] [QUERY]
Options:
-k, --top-k INT Number of chunks to retrieve (overrides config)
-s, --min-score FLOAT Minimum similarity score (overrides config)
-t, --temperature FLOAT Sampling temperature (default: 0.7, ignored for GPT-5)
-m, --max-tokens INT Maximum tokens to generate
-v, --verbose Show detailed progress
Examples:
# Single query
poetry run python scripts/run_query.py "What is Santiago's dream?"
# Interactive mode
poetry run python scripts/run_query.py
# Custom parameters
poetry run python scripts/run_query.py -k 5 -s 0.7 "Who is the alchemist?"
# Verbose output
poetry run python scripts/run_query.py -v "What are omens?"Typical Query Performance:
- Query processing: ~1.4s (embedding generation)
- Retrieval: ~0.5s (Pinecone search)
- Generation: ~12-15s (GPT-5 response)
- Total: ~14-17s per query
Token Usage:
- Embedding: ~50-100 tokens per query
- Prompt: 1,500-8,000 tokens (depends on retrieved chunks)
- Completion: 200-800 tokens (depends on answer complexity)
- Average: ~2,000-5,000 total tokens per query
Cost Estimates (OpenAI pricing):
- Embedding: $0.00013 per query
- GPT-5 generation: ~$0.01-0.05 per query (varies by token usage)
The system includes comprehensive error handling:
- β Retry logic with exponential backoff (3 attempts)
- β Query validation (length, empty checks)
- β Graceful degradation (continues with empty results)
- β Detailed logging for debugging
- β Token usage tracking for cost monitoring
This is a prototype project for demonstrating RAG pipeline implementation. For production use, consider:
- Adding authentication and rate limiting
- Implementing caching for repeated queries
- Supporting multiple document sources
- Adding hybrid search (keyword + semantic)
- Implementing conversation memory for follow-up questions
See LICENSE file for details.
- Document Source: The Alchemist by Paulo Coelho
- Technologies: OpenAI, Pinecone, NLTK, Poetry
- Testing: pytest, unittest.mock