Paul Graham Essays RAG System

A Retrieval-Augmented Generation (RAG) system that allows you to query and interact with Paul Graham's essays using natural language. This system combines web scraping, text embedding, and OpenAI's language models to provide intelligent responses based on Paul Graham's writings.

Components

1. Scraper (`scraper.py`)

Scrapes essays from Paul Graham's website
Extracts essay titles, content, and URLs
Saves data in a structured format
Includes error handling and logging

2. Embeddings Generator (`embeddings.py`)

Processes scraped essays into chunks
Generates embeddings using OpenAI's API
Implements retry logic and error handling
Saves embeddings for efficient retrieval

3. RAG System (`rag.py`)

Provides a conversational interface to query the essays
Uses embeddings for semantic search
Generates context-aware responses
Includes conversation history for better context

Usage

Scraping Essays

from scraper import scrape_essays
essays = scrape_essays()

Generating Embeddings

from embeddings import generate_embeddings
embeddings = generate_embeddings(essays)

Querying the RAG System

from rag import RAGSystem
rag = RAGSystem()
response = rag.query("What does Paul Graham say about startups?")

Features

Intelligent Search: Uses semantic search to find relevant essay content
Context-Aware Responses: Maintains conversation history for better context
Error Handling: Robust error handling and logging throughout
Chunking: Efficiently processes long essays by breaking them into manageable chunks
Retry Logic: Implements retry mechanisms for API calls

Requirements

Python 3.x
OpenAI API key
Required Python packages (see requirements.txt)

Note

This system requires an OpenAI API key to function. Make sure to set your API key in the environment variables before running the system.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
ReadMe.md		ReadMe.md
embeddings.py		embeddings.py
paul_essays.json		paul_essays.json
paul_essays_embeddings.json		paul_essays_embeddings.json
rag-output.png		rag-output.png
rag.py		rag.py
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Paul Graham Essays RAG System

Components

1. Scraper (`scraper.py`)

2. Embeddings Generator (`embeddings.py`)

3. RAG System (`rag.py`)

Usage

Features

Requirements

Note

About

Uh oh!

Releases

Packages

Languages

License

hackice20/pg-rag

Folders and files

Latest commit

History

Repository files navigation

Paul Graham Essays RAG System

Components

1. Scraper (scraper.py)

2. Embeddings Generator (embeddings.py)

3. RAG System (rag.py)

Usage

Features

Requirements

Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Scraper (`scraper.py`)

2. Embeddings Generator (`embeddings.py`)

3. RAG System (`rag.py`)

Packages