pdfPulse

Overview

RAG PDF Chatbot is an advanced document analysis application that enables users to upload PDF documents and interact with their content through an intelligent, context-aware question-answering system. By leveraging cutting-edge AI technologies, this application transforms static PDF documents into interactive knowledge bases.

Key Features

PDF Text Extraction: Seamlessly extract text from uploaded PDF documents
Semantic Search: Utilize advanced embedding techniques to find contextually relevant information
AI-Powered Responses: Generate precise, context-aware answers using Google's Gemini AI
Vector Database Integration: Implement efficient information retrieval with Pinecone vector database

Technology Stack

Streamlit
Google Gemini AI
Pinecone Vector Database
PyPDF2
Python 3.8+

Preview

Prerequisites

Python 3.8 or higher
Pinecone API Key
Google Gemini API Key

Installation

Clone the repository:

git clone https://github.com/leeh-nix/pdfPulse.git
cd pdfPulse

Install required dependencies:

pip install -r requirements.txt

Configuration

Create a .secret file in the .streamlit folder (optional, for local development):

GEMINI_API_KEY=your_gemini_api_key
PINECONE_API_KEY=your_pinecone_api_key

Running the Application

streamlit run app.py

Usage Instructions

Navigate to the Streamlit application
Enter your Gemini and Pinecone API keys in the sidebar
Upload a PDF document
Ask questions about the document's content
Receive contextually accurate responses

How It Works

The application implements a Retrieval-Augmented Generation (RAG) workflow:

Extract text from uploaded PDF
Split text into semantic chunks
Generate vector embeddings for chunks
Store embeddings in Pinecone vector database
When a query is made, retrieve the most relevant chunks
Generate a response using retrieved context

Security Note

API keys are used locally and not stored
Sensitive information remains in memory during the session

Limitations

Supports PDF documents only
Requires active internet connection
Response accuracy depends on document complexity and AI model capabilities

Contributing

Contributions are welcome! Please submit pull requests or open issues on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
backend		backend
images		images
.gitignore		.gitignore
README.md		README.md
app.py		app.py
log.py		log.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdfPulse

Overview

Key Features

Technology Stack

Preview

Prerequisites

Installation

Configuration

Running the Application

Usage Instructions

How It Works

Security Note

Limitations

Contributing

Acknowledgments

About

Uh oh!

Uh oh!

Languages

leeh-nix/pdfPulse

Folders and files

Latest commit

History

Repository files navigation

pdfPulse

Overview

Key Features

Technology Stack

Preview

Prerequisites

Installation

Configuration

Running the Application

Usage Instructions

How It Works

Security Note

Limitations

Contributing

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages