Chat with PDFs

A modern web application that allows you to upload PDF documents and chat with them using AI-powered Retrieval-Augmented Generation (RAG). Built with React, Node.js, LangChain, FAISS vector store, and OpenAI.

Features

📄 PDF Upload: Drag and drop PDF files for processing
🤖 AI Chat: Interactive chat interface powered by OpenAI GPT
🔍 RAG Implementation: Uses LangChain for document processing and FAISS for vector storage
📚 Document Management: View, select, and delete uploaded documents
🎯 Context-Aware Responses: AI responses are based on the actual content of your PDFs
📱 Responsive Design: Works seamlessly on desktop and mobile devices
⚡ Real-time Processing: Instant document processing and chat responses

Tech Stack

Backend

Node.js with Express.js
LangChain for document processing and AI chains
FAISS for vector storage and similarity search
OpenAI API for language model integration
PDF-parse for PDF text extraction
Multer for file upload handling

Frontend

React with modern hooks
React Dropzone for file uploads
Axios for API communication
Lucide React for beautiful icons
CSS3 with modern styling and animations

Prerequisites

Node.js (v16 or higher)
npm or yarn
OpenAI API key

Installation

Clone the repository

git clone <repository-url>
cd chat-with-pdfs

Install dependencies

# Install server dependencies
npm install

# Install client dependencies
cd client
npm install
cd ..

Set up environment variables

# Copy the example environment file
cp env.example .env

# Edit .env and add your OpenAI API key
OPENAI_API_KEY=your_openai_api_key_here
PORT=5000

Start the development servers

# Start both server and client (recommended)
npm run dev

# Or start them separately:
# Terminal 1: Start server
npm run server

# Terminal 2: Start client
npm run client

Open your browser
- Frontend: http://localhost:3000
- Backend API: http://localhost:5000

Usage

Upload a PDF: Drag and drop a PDF file into the upload area or click to select
Wait for Processing: The system will process your PDF and create vector embeddings
Select Document: Choose your processed PDF from the document list
Start Chatting: Ask questions about your PDF content
View Sources: See which parts of your PDF the AI used to answer your questions

API Endpoints

POST /api/upload - Upload and process a PDF file
POST /api/chat - Send a message and get AI response
GET /api/documents - Get list of uploaded documents
DELETE /api/documents/:id - Delete a document

Project Structure

chat-with-pdfs/
├── server/
│   └── index.js          # Express server with RAG implementation
├── client/
│   ├── public/
│   │   └── index.html
│   ├── src/
│   │   ├── App.js        # Main React component
│   │   ├── App.css       # Component styles
│   │   ├── index.js      # React entry point
│   │   └── index.css     # Global styles
│   └── package.json
├── uploads/              # Uploaded PDF files
├── vector_stores/        # FAISS vector stores
├── package.json
├── env.example
└── README.md

How RAG Works

Document Processing: PDFs are loaded and split into chunks using LangChain's text splitter
Embedding Generation: Each chunk is converted to vector embeddings using OpenAI's embedding model
Vector Storage: Embeddings are stored in FAISS for fast similarity search
Query Processing: User questions are converted to embeddings and similar document chunks are retrieved
Response Generation: Retrieved chunks are sent to OpenAI along with the user's question for context-aware responses

Configuration

Environment Variables

OPENAI_API_KEY: Your OpenAI API key (required)
PORT: Server port (default: 5000)

LangChain Configuration

The application uses the following LangChain components:

PDFLoader: Loads PDF documents
RecursiveCharacterTextSplitter: Splits documents into chunks
OpenAIEmbeddings: Generates vector embeddings
FaissStore: Stores and retrieves vectors
RetrievalQAChain: Combines retrieval and question-answering

Troubleshooting

Common Issues

"Failed to process PDF"
- Ensure your PDF is not corrupted
- Check that the file is actually a PDF
- Verify OpenAI API key is valid
"Document not found"
- The vector store may have been deleted
- Try re-uploading the PDF
Slow responses
- Large PDFs take longer to process initially
- Subsequent queries should be faster

Performance Tips

Keep PDFs under 50MB for optimal performance
Use text-based PDFs rather than scanned images
The system automatically chunks documents for better retrieval

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

MIT License - see LICENSE file for details

Acknowledgments

Built with LangChain
Vector storage powered by FAISS
AI capabilities provided by OpenAI
UI components from Lucide React

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chat with PDFs

Features

Tech Stack

Backend

Frontend

Prerequisites

Installation

Usage

API Endpoints

Project Structure

How RAG Works

Configuration

Environment Variables

LangChain Configuration

Troubleshooting

Common Issues

Performance Tips

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
client		client
server		server
.gitignore		.gitignore
README.md		README.md
env.example		env.example
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

Chat with PDFs

Features

Tech Stack

Backend

Frontend

Prerequisites

Installation

Usage

API Endpoints

Project Structure

How RAG Works

Configuration

Environment Variables

LangChain Configuration

Troubleshooting

Common Issues

Performance Tips

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages