Skip to content

A project that dwells into the concept of LLM , RAG and Langchain

avanshh99/chatpdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChatPDF

Interact with your PDF documents using Google Gemini!
This Streamlit app lets you upload PDF files, process their contents, and then ask questions that are answered contextually using Google's Gemini model.


Features

  • Upload one or more PDF files.
  • Extracts and stores their contents efficiently.
  • Embeds PDF data using Google Generative AI embeddings.
  • Stores embeddings locally with FAISS for fast retrieval.
  • Ask natural language questions about your PDFs.
  • Answers generated by Gemini via LangChain's conversational retrieval QA.

Installation

  1. Clone the repository

    git clone https://github.com/avanshh99/Chat-with-PDF-using-Gemini.git
    cd Chat-with-PDF-using-Gemini
  2. Install the required packages It's recommended to use a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use venv\Scripts\activate
    pip install -r requirements.txt

    If requirements.txt is missing, install main packages:

    pip install streamlit PyPDF2 langchain langchain-google-genai google-generativeai langchain-community faiss-cpu python-dotenv
  3. Set up your Google API Key


Usage

  1. Start the Streamlit application

    streamlit run your_script_name.py
  2. In the app:

    • Upload one or more PDF files via the sidebar.
    • Click "Submit & Process" to extract and embed the data.
    • Ask questions in the main input box — the app will answer using the content of your PDFs!

How it works

  • PDF Extraction: Uses PyPDF2 to extract text from uploaded PDFs.
  • Text Chunking: Splits the text using LangChain’s RecursiveCharacterTextSplitter for efficient processing.
  • Embedding: Converts chunks into vector embeddings using Google Generative AI.
  • Vector Store: Stores embeddings in a local FAISS index for similarity search.
  • Conversational QA: Uses LangChain’s QA chain with Gemini to answer user questions based on retrieved document chunks.

File Structure

your_repo/
├── your_script_name.py
├── requirements.txt
├── .env.example
├── faiss_index/
└── README.md

Replace your_script_name.py with your actual script filename.


Requirements

  • Python 3.8+
  • Google Generative AI API Key

Notes

  • For best results, upload clearly scanned PDFs with selectable text.
  • The FAISS index is stored locally; delete the faiss_index folder to reset embeddings.
  • Be mindful of Google API usage limits.

License

MIT License. See LICENSE for details.


Credits


Troubleshooting

  • Ensure your .env and API key are set up correctly.
  • If you encounter package errors, check your Python version and dependency installations.
  • For any issues, please open an issue on this repository.

Releases

No releases published

Packages

No packages published

Languages