AskPDF

PDF Question Answering Application

A Streamlit application that allows users to upload PDF documents and ask questions about their content using Google's Gemini AI and Vertex AI Vector Search.

Features

📄 PDF document upload and processing
🔍 Semantic search using Vertex AI Vector Search
💬 Question answering using Google's Gemini AI
🎯 Accurate answers based on document content
🔒 Secure handling of API keys and credentials

Prerequisites

Python 3.8 or higher
Google Cloud Platform account
Gemini API key
Vertex AI enabled in your GCP project

Installation

Clone the repository:

git clone https://github.com/yourusername/askPdf.git
cd askPdf

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up your credentials:

Create a .streamlit/secrets.toml file with the following structure:

# Google Cloud Project Configuration
GCP_PROJECT_ID = "your-project-id"
GCP_LOCATION = "us-central1"
VERTEX_AI_INDEX_ID = "your-index-id"
VERTEX_AI_ENDPOINT_ID = "your-endpoint-id"
GEMINI_API_KEY = "your-gemini-api-key"

# Service Account Credentials
[gcp_service_account]
type = "service_account"
project_id = "your-project-id"
private_key_id = "your-private-key-id"
private_key = """your-private-key"""
client_email = "your-service-account-email"
client_id = "your-client-id"
auth_uri = "https://accounts.google.com/o/oauth2/auth"
token_uri = "https://oauth2.googleapis.com/token"
auth_provider_x509_cert_url = "https://www.googleapis.com/oauth2/v1/certs"
client_x509_cert_url = "your-client-cert-url"
universe_domain = "googleapis.com"

# Vertex AI Index Configuration
[vertex_ai_index]
dimensions = 384
algorithm_config = { "bruteForceConfig" = {} }
distance_measure_type = "DOT_PRODUCT_DISTANCE"

# Vertex AI Streaming Index Configuration
[vertex_ai_stream_index]
dimensions = 384
algorithm_config = { "bruteForceConfig" = {} }
distance_measure_type = "DOT_PRODUCT_DISTANCE"
shard_size = "SHARD_SIZE_MEDIUM"

Usage

Start the Streamlit app:

streamlit run src/app.py

Open your browser and navigate to http://localhost:8501
Upload a PDF document
Ask questions about the document content

Project Structure

askPdf/
├── src/
│   └── app.py              # Main application code
├── .streamlit/
│   └── secrets.toml        # API keys and configuration
├── requirements.txt        # Python dependencies
└── README.md              # Project documentation

Configuration

The application uses Streamlit secrets for configuration. All sensitive information and configuration settings are stored in .streamlit/secrets.toml, including:

Google Cloud Project settings
Service Account credentials
Vertex AI index configurations
API keys

Dependencies

streamlit: Web application framework
PyMuPDF (fitz): PDF processing
sentence-transformers: Text embedding generation
google-generativeai: Gemini API integration
google-cloud-aiplatform: Vertex AI integration

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

Run tests:

python -m unittest tests/test_pdf_processing.py

Google Gemini AI for the language model
Vertex AI for vector search capabilities
Streamlit for the web interface framework

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AskPDF

PDF Question Answering Application

Features

Prerequisites

Installation

Usage

Project Structure

Configuration

Dependencies

Contributing

About

Uh oh!

Releases

Packages

Languages

merongithub/askPdf

Folders and files

Latest commit

History

Repository files navigation

AskPDF

PDF Question Answering Application

Features

Prerequisites

Installation

Usage

Project Structure

Configuration

Dependencies

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages