Document Portal: End-to-End RAG Application

📄 Overview

Document Portal is an interactive, AI-powered platform that demonstrates Retrieval-Augmented Generation (RAG) workflows on PDF documents. It enables users to:

Analyze documents and extract key metadata.
Ask questions about one or multiple documents using a conversational AI interface.
Compare different versions of documents to quickly spot changes.
Operate with robust industry-standard logging, error handling, and session management.

This application is built for research, knowledge management, and enterprise workflows, offering a clean, intuitive interface for document-centric AI operations.

🚀 Key Features

Document Analysis & Chat Upload PDFs to extract text and metadata, and interactively query their content using natural language. Supports both single-document and multi-document workflows.
Document Comparison Upload two versions of a document and instantly detect differences, making version control and review easier.
Efficient & Scalable RAG Pipeline Uses FAISS for vector storage and LangChain for retrieval and answer generation. Existing indexes are reused to reduce processing time.
Interactive UI Built with Streamlit, providing a responsive and minimal interface for smooth user interaction.

🏗️ How It Works

[Upload PDFs] --> [Data Ingestion & Preprocessing] --> [FAISS Vectorization] --> [RAG Query] --> [Answers / Comparison]

Step-by-step workflow:

Upload Documents – Users can upload one or multiple PDFs.
Data Ingestion & Processing – Documents are parsed and structured for embeddings.
Vectorization – FAISS creates vector representations for fast similarity search.
RAG Pipeline – LangChain retrieves context and generates natural language responses.
Output – Answers, analysis, or document differences are displayed interactively.

🖼️ Screenshots / Visuals

Document Chat Example:

Document Comparison:

Document Summarizer:

Project Structure Overview:

project-name/
│
├── .github/
│   └── workflows/                  # GitHub Actions CI/CD workflows
│       ├── aws.yml                 # AWS deployment workflow
│       ├── task_definition.json    # ECS or container task definition
│       └── template.yml            # Template for workflows
│
├── .idea/                          # PyCharm IDE config files (can be ignored in Git)
│   └── ...
│
├── api/
│   └── main.py                     # Main API entrypoint (FastAPI/Flask)
│   └── ...                         # Other API endpoints, routers
│
├── archive/
│   └── src/                        # Old or backup source code for reference
│       └── ...
│
├── config/
│   └── config.yaml                 # Core configuration settings
│   └── ...                         # Additional config files (JSON/YAML)
│
├── data/
│   ├── doc_analysis/               # Data for document analysis workflow
│   ├── multidoc_chat/              # Data for multi-document chat workflow
│   └── single_doc/                 # Data for single-document chat workflow
│
├── exception/
│   ├── __init__.py                 # Makes folder a Python package
│   └── Custom_exception.py         # Custom exception classes for robust error handling
│
├── faiss_index/
│   ├── index.faiss                  # FAISS vector index for embeddings
│   ├── index.pkl                    # Serialized FAISS object for fast loading
│   └── ...                          # Any session or auxiliary files
│
├── logger/
│   ├── __init__.py                 # Python package initializer
│   └── custom_logger.py            # Logging utilities for debugging & monitoring
│
├── model/
│   ├── __init__.py                 # Python package initializer
│   └── models.py                   # Trained ML/DL models or model utilities
│
├── notebook/
│   └── ...                         # Jupyter notebooks for experiments and testing
│
├── prompts/
│   ├── __init__.py                 # Python package initializer
│   └── prompts.py                  # Prompt templates and utilities for RAG pipelines
│
├── src/
│   ├── document_analyzer/
│   │   ├── __init__.py
│   │   └── data_analysis.py        # Core logic for document analysis
│   │
│   ├── document_chat/
│   │   ├── __init__.py
│   │   └── retrieval.py            # Single & multi-document chat workflows
│   │
│   ├── documents_compare/
│   │   ├── __init__.py
│   │   └── document_comparator.py  # Logic to compare different versions of documents
│   │
│   └── document_ingestion/
│       ├── __init__.py
│       └── data_ingestion.py       # Parsing & preprocessing documents for RAG
│
├── static/
│   └── style.css                   # Frontend CSS styling
│               
│
├── templates/
│   └── index.html                  # HTML templates for FastAPI/Flask or frontend rendering
│
├── utils/
│   ├── __init__.py                 # Python package initializer
│   ├── config_loaders.py           # Load and manage configuration files
│   ├── document_ops.py             # Helper functions for document processing
│   ├── file_io.py                  # File reading/writing utilities
│   └── model_loader.py             # Load ML/DL models efficiently
│
├── .dockerignore                   # Docker ignore rules
├── .gitattributes                  # Git attributes
├── .gitignore                      # Git ignore rules
├── Dockerfile                      # Docker container setup
├── README.md                        # Project README (documentation)
├── app.py                          # Main app entrypoint (Streamlit/FastAPI/Flask)
├── requirements.txt                # Python dependencies
├── setup.py                        # Package setup script
└── test.py                          # Test scripts or Streamlit UI for quick prototyping

🛠️ Tech Stack

Python 3.10+
Streamlit – Interactive frontend
LangChain – RAG orchestration
FAISS – Vector database for embeddings
Custom Modules – Document ingestion, analysis, retrieval, and comparison

⚙️ Getting Started

Clone the repository:

git clone https://github.com/sayed-ashfaq/document-RAG-app.git
cd document-portal

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
streamlit run app.py
```
Open the Streamlit interface and start uploading PDFs, chatting with documents, or comparing versions.

💡 Notes & Recommendations

Existing FAISS indexes are reused for faster querying.
Designed for both single and multi-document workflows in a single interface.
Ideal for research, document management, and AI-powered knowledge extraction.
Easily extendable with new RAG workflows or document types.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Portal: End-to-End RAG Application

📄 Overview

🚀 Key Features

🏗️ How It Works

🖼️ Screenshots / Visuals

🛠️ Tech Stack

⚙️ Getting Started

💡 Notes & Recommendations

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
.idea		.idea
api		api
archive/src		archive/src
config		config
data		data
exception		exception
faiss_index		faiss_index
logger		logger
model		model
notebook		notebook
prompts		prompts
src		src
static		static
templates		templates
utils		utils
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py

sayed-ashfaq/Document-RAG-app

Folders and files

Latest commit

History

Repository files navigation

Document Portal: End-to-End RAG Application

📄 Overview

🚀 Key Features

🏗️ How It Works

🖼️ Screenshots / Visuals

🛠️ Tech Stack

⚙️ Getting Started

💡 Notes & Recommendations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages