RAG PDF Chatbot is an advanced document analysis application that enables users to upload PDF documents and interact with their content through an intelligent, context-aware question-answering system. By leveraging cutting-edge AI technologies, this application transforms static PDF documents into interactive knowledge bases.
- PDF Text Extraction: Seamlessly extract text from uploaded PDF documents
- Semantic Search: Utilize advanced embedding techniques to find contextually relevant information
- AI-Powered Responses: Generate precise, context-aware answers using Google's Gemini AI
- Vector Database Integration: Implement efficient information retrieval with Pinecone vector database
- Streamlit
- Google Gemini AI
- Pinecone Vector Database
- PyPDF2
- Python 3.8+
- Python 3.8 or higher
- Pinecone API Key
- Google Gemini API Key
- Clone the repository:
git clone https://github.com/leeh-nix/pdfPulse.git
cd pdfPulse- Install required dependencies:
pip install -r requirements.txtCreate a .secret file in the .streamlit folder (optional, for local development):
GEMINI_API_KEY=your_gemini_api_key
PINECONE_API_KEY=your_pinecone_api_key
streamlit run app.py- Navigate to the Streamlit application
- Enter your Gemini and Pinecone API keys in the sidebar
- Upload a PDF document
- Ask questions about the document's content
- Receive contextually accurate responses
The application implements a Retrieval-Augmented Generation (RAG) workflow:
- Extract text from uploaded PDF
- Split text into semantic chunks
- Generate vector embeddings for chunks
- Store embeddings in Pinecone vector database
- When a query is made, retrieve the most relevant chunks
- Generate a response using retrieved context
- API keys are used locally and not stored
- Sensitive information remains in memory during the session
- Supports PDF documents only
- Requires active internet connection
- Response accuracy depends on document complexity and AI model capabilities
Contributions are welcome! Please submit pull requests or open issues on the GitHub repository.
