A Streamlit application that allows users to upload PDF documents and ask questions about their content using Google's Gemini AI and Vertex AI Vector Search.
- 📄 PDF document upload and processing
- 🔍 Semantic search using Vertex AI Vector Search
- 💬 Question answering using Google's Gemini AI
- 🎯 Accurate answers based on document content
- 🔒 Secure handling of API keys and credentials
- Python 3.8 or higher
- Google Cloud Platform account
- Gemini API key
- Vertex AI enabled in your GCP project
- Clone the repository:
git clone https://github.com/yourusername/askPdf.git
cd askPdf- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up your credentials:
- Create a
.streamlit/secrets.tomlfile with the following structure:
# Google Cloud Project Configuration GCP_PROJECT_ID = "your-project-id" GCP_LOCATION = "us-central1" VERTEX_AI_INDEX_ID = "your-index-id" VERTEX_AI_ENDPOINT_ID = "your-endpoint-id" GEMINI_API_KEY = "your-gemini-api-key" # Service Account Credentials [gcp_service_account] type = "service_account" project_id = "your-project-id" private_key_id = "your-private-key-id" private_key = """your-private-key""" client_email = "your-service-account-email" client_id = "your-client-id" auth_uri = "https://accounts.google.com/o/oauth2/auth" token_uri = "https://oauth2.googleapis.com/token" auth_provider_x509_cert_url = "https://www.googleapis.com/oauth2/v1/certs" client_x509_cert_url = "your-client-cert-url" universe_domain = "googleapis.com" # Vertex AI Index Configuration [vertex_ai_index] dimensions = 384 algorithm_config = { "bruteForceConfig" = {} } distance_measure_type = "DOT_PRODUCT_DISTANCE" # Vertex AI Streaming Index Configuration [vertex_ai_stream_index] dimensions = 384 algorithm_config = { "bruteForceConfig" = {} } distance_measure_type = "DOT_PRODUCT_DISTANCE" shard_size = "SHARD_SIZE_MEDIUM"
- Create a
- Start the Streamlit app:
streamlit run src/app.py-
Open your browser and navigate to
http://localhost:8501 -
Upload a PDF document
-
Ask questions about the document content
askPdf/
├── src/
│ └── app.py # Main application code
├── .streamlit/
│ └── secrets.toml # API keys and configuration
├── requirements.txt # Python dependencies
└── README.md # Project documentation
The application uses Streamlit secrets for configuration. All sensitive information and configuration settings are stored in .streamlit/secrets.toml, including:
- Google Cloud Project settings
- Service Account credentials
- Vertex AI index configurations
- API keys
- streamlit: Web application framework
- PyMuPDF (fitz): PDF processing
- sentence-transformers: Text embedding generation
- google-generativeai: Gemini API integration
- google-cloud-aiplatform: Vertex AI integration
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
- Run tests:
python -m unittest tests/test_pdf_processing.py
- Google Gemini AI for the language model
- Vertex AI for vector search capabilities
- Streamlit for the web interface framework
