A modern web application that allows you to upload PDF documents and chat with them using AI-powered Retrieval-Augmented Generation (RAG). Built with React, Node.js, LangChain, FAISS vector store, and OpenAI.
- 📄 PDF Upload: Drag and drop PDF files for processing
- 🤖 AI Chat: Interactive chat interface powered by OpenAI GPT
- 🔍 RAG Implementation: Uses LangChain for document processing and FAISS for vector storage
- 📚 Document Management: View, select, and delete uploaded documents
- 🎯 Context-Aware Responses: AI responses are based on the actual content of your PDFs
- 📱 Responsive Design: Works seamlessly on desktop and mobile devices
- ⚡ Real-time Processing: Instant document processing and chat responses
- Node.js with Express.js
- LangChain for document processing and AI chains
- FAISS for vector storage and similarity search
- OpenAI API for language model integration
- PDF-parse for PDF text extraction
- Multer for file upload handling
- React with modern hooks
- React Dropzone for file uploads
- Axios for API communication
- Lucide React for beautiful icons
- CSS3 with modern styling and animations
- Node.js (v16 or higher)
- npm or yarn
- OpenAI API key
-
Clone the repository
git clone <repository-url> cd chat-with-pdfs
-
Install dependencies
# Install server dependencies npm install # Install client dependencies cd client npm install cd ..
-
Set up environment variables
# Copy the example environment file cp env.example .env # Edit .env and add your OpenAI API key OPENAI_API_KEY=your_openai_api_key_here PORT=5000
-
Start the development servers
# Start both server and client (recommended) npm run dev # Or start them separately: # Terminal 1: Start server npm run server # Terminal 2: Start client npm run client
-
Open your browser
- Frontend: http://localhost:3000
- Backend API: http://localhost:5000
- Upload a PDF: Drag and drop a PDF file into the upload area or click to select
- Wait for Processing: The system will process your PDF and create vector embeddings
- Select Document: Choose your processed PDF from the document list
- Start Chatting: Ask questions about your PDF content
- View Sources: See which parts of your PDF the AI used to answer your questions
POST /api/upload- Upload and process a PDF filePOST /api/chat- Send a message and get AI responseGET /api/documents- Get list of uploaded documentsDELETE /api/documents/:id- Delete a document
chat-with-pdfs/
├── server/
│ └── index.js # Express server with RAG implementation
├── client/
│ ├── public/
│ │ └── index.html
│ ├── src/
│ │ ├── App.js # Main React component
│ │ ├── App.css # Component styles
│ │ ├── index.js # React entry point
│ │ └── index.css # Global styles
│ └── package.json
├── uploads/ # Uploaded PDF files
├── vector_stores/ # FAISS vector stores
├── package.json
├── env.example
└── README.md
- Document Processing: PDFs are loaded and split into chunks using LangChain's text splitter
- Embedding Generation: Each chunk is converted to vector embeddings using OpenAI's embedding model
- Vector Storage: Embeddings are stored in FAISS for fast similarity search
- Query Processing: User questions are converted to embeddings and similar document chunks are retrieved
- Response Generation: Retrieved chunks are sent to OpenAI along with the user's question for context-aware responses
OPENAI_API_KEY: Your OpenAI API key (required)PORT: Server port (default: 5000)
The application uses the following LangChain components:
- PDFLoader: Loads PDF documents
- RecursiveCharacterTextSplitter: Splits documents into chunks
- OpenAIEmbeddings: Generates vector embeddings
- FaissStore: Stores and retrieves vectors
- RetrievalQAChain: Combines retrieval and question-answering
-
"Failed to process PDF"
- Ensure your PDF is not corrupted
- Check that the file is actually a PDF
- Verify OpenAI API key is valid
-
"Document not found"
- The vector store may have been deleted
- Try re-uploading the PDF
-
Slow responses
- Large PDFs take longer to process initially
- Subsequent queries should be faster
- Keep PDFs under 50MB for optimal performance
- Use text-based PDFs rather than scanned images
- The system automatically chunks documents for better retrieval
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details
- Built with LangChain
- Vector storage powered by FAISS
- AI capabilities provided by OpenAI
- UI components from Lucide React