This Streamlit application implements a Langchain-based retrieval system for processing PDF documents and conducting conversational retrieval using Langchain's capabilities.
- Read More: here
DocChat is a Langchain-based retrieval system that processes PDF documents and creates a conversational retrieval chain. It leverages multiple technologies to extract text, generate embeddings, and enable chat-based querying over processed content.
- FastAPI – Serves as the backend API for processing PDFs and handling chat requests.
- Streamlit – Provides the frontend user interface for uploading PDFs and interacting with the conversational system.
- Langchain – A core library for NLP tasks such as text splitting and conversational retrieval.
- Google Palm & Google Generative Language – Used for generating embeddings.
- FAISS – Facebook AI Similarity Search used for efficient similarity search over embeddings.
- PyMuPDF (fitz) – Extracts text from PDFs.
- Docker & Docker Compose – Containerizes and orchestrates the backend and frontend applications.
- Python-dotenv – Loads environment variables (e.g. API keys) from a
.envfile.
- Python Environment: Python 3.x is required.
- Environment Variables: Create a
.envfile in the project root with the content:ReplaceGOOGLE_API_KEY=your_google_api_key_hereyour_google_api_key_herewith your actual Google API key.
- Clone the Repository:
git clone https://github.com/Varunv003/langchain-palm2-rag_application
- Set Up Virtual Environment:
python -m venv venv # On Windows: .\venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
- Install Dependencies:
pip install -r requirements.txt
- Initialize Folder Structure (if needed):
python template.py
- Running the Streamlit App (Frontend):
The application will be available at http://localhost:8501.
streamlit run app.py
- Running the FastAPI App (Backend):
The backend API will be available at http://localhost:8000.
uvicorn main:app --reload --host 0.0.0.0 --port 8000
This project includes Dockerfiles for both the FastAPI backend and the Streamlit frontend. Docker Compose is used to orchestrate both services.
- Ensure Docker Desktop is Running.
- From the project root (where the
docker-compose.ymlis located), run:docker-compose up --build
- Services:
- Backend (FastAPI) will be available at http://localhost:8000.
- Frontend (Streamlit) will be available at http://localhost:8501.
Your docker-compose.yml defines two services:
- backend: Built using
Dockerfile.backendand exposing port 8000. - frontend: Built using
Dockerfile.frontendand exposing port 8501, with a dependency on the backend service.
- Upload PDFs: Use the sidebar in the Streamlit interface to upload PDF files.
- Process Documents: Click "Submit and Process" to extract text, generate embeddings, and initialize the conversational chain.
- Chat: Ask questions related to the processed PDFs through the chat interface. The backend retrieves and forms responses using the Langchain conversational chain.
- Enhance error handling and user feedback.
- Optimize scalability and performance for larger documents.
- Integrate additional AI models or refine existing conversational models for improved responses.

