📚 Intelligent Document Summarizer

A powerful Streamlit-based application that uses LangChain and Ollama to generate intelligent summaries from multiple document formats. Upload documents, provide custom instructions, and get professionally formatted summaries tailored to your needs.

✨ Features

Multi-Format Support: Load and process PDF, DOCX, PPTX, TXT, and HTML documents
Custom Instructions: Control summarization style through natural language (e.g., "Teach this to a beginner", "Extract key financial risks")
Multiple Summarization Methods:
- Stuff: Fast, for smaller documents
- Map Reduce: Scalable, processes large documents in chunks
- Refine: Iterative refinement for comprehensive summaries
Chat Interface: Interactive conversation-style interface with chat history
Real-time Processing: Live feedback during document loading and summarization
Clean Markdown Output: Professionally formatted summaries with headers, bullets, and tables

🚀 Quick Start

Prerequisites

Python 3.10 or higher
Ollama installed and running locally
A compatible Ollama model (default: mistral-large-3:675b-cloud)

Installation

Clone or navigate to the project directory:
```
cd /path/to/summarization
```
Install dependencies using uv (recommended):
```
uv sync
```
Or with pip:
```
pip install -r requirements.txt
```
Ensure Ollama is running:
```
ollama serve
```

Pull your desired model (if not already available):

ollama pull mistral-large-3:675b-cloud
# Or use another model like:
# ollama pull llama3
# ollama pull mistral

Running the App

uv run streamlit run main.py

Or with standard Python:

streamlit run main.py

The app will open in your browser at http://localhost:8501.

📖 Usage Guide

Step 1: Configure Settings (Sidebar)

Ollama Model: Enter the model name (default: mistral-large-3:675b-cloud)
Method: Choose summarization approach:
- Stuff - Best for documents under ~4000 tokens
- Map Reduce - Best for large documents
- Refine - Best for iterative, comprehensive summaries

Step 2: Upload Documents

Click "Upload Files" in the sidebar
Select one or more documents (PDF, DOCX, PPTX, TXT, or HTML)
Click "Process Documents"
Wait for confirmation: "Ready! Loaded X chunks."

Step 3: Generate Summaries

Type your instruction in the chat input (examples below)
Press Enter
View the formatted summary in the chat area

Example Instructions

"Summarize the main points in bullet format"
"Teach this material to a Year 2 nursing student"
"Extract financial risks and present in a table"
"Give me a 3-paragraph executive summary"
"Explain this like I'm 10 years old"

📁 Project Structure

summarization/
├── main.py              # Streamlit UI and chat interface
├── loaders.py           # Document loading logic for all formats
├── summarizer.py        # LLM and summarization chain logic
├── utils.py             # Logging utilities
├── pyproject.toml       # Project dependencies (uv/pip)
├── README.md            # This file
└── .venv/               # Virtual environment (auto-created)

🛠️ Technical Details

Dependencies

langchain (0.3.14): LLM orchestration framework
langchain-community (0.3.14): Community loaders and tools
langchain-ollama: Ollama integration for LangChain
langchain-text-splitters: Text chunking utilities
streamlit: Web UI framework
pypdf: PDF parsing
python-docx: DOCX parsing
python-pptx: PowerPoint parsing
beautifulsoup4 + lxml: HTML parsing
docx2txt: Additional DOCX support

How It Works

Document Loading (loaders.py):
- Files are saved to a temporary directory
- Appropriate loader selected based on file extension
- Documents converted to LangChain Document objects
Text Splitting (summarizer.py):
- Documents split into manageable chunks (default: 2000 chars, 200 overlap)
- Preserves context across chunk boundaries
Summarization:
- User instruction combined with formatting directive
- Chain selected based on method (Stuff/Map Reduce/Refine)
- LLM generates summary following instructions
- Output extracted and formatted for display
Chat Interface (main.py):
- Session state maintains document chunks and chat history
- Real-time streaming and status updates
- Clean Markdown rendering

🔧 Customization

Change the Default Model

Edit main.py or summarizer.py:

model_name = st.text_input("Ollama Model", value="llama3")  # Change here based on the ollama model you pulled

Adjust Chunk Size

Edit summarizer.py:

def split_documents(documents, chunk_size=3000, chunk_overlap=300):  # You can adjust here depending on your interest

Modify Prompts

Edit the prompt templates in summarizer.py to customize how the LLM processes documents.

🐛 Troubleshooting

"No module named 'langchain.chains'"

Ensure langchain==0.3.14 is installed (not 1.2.0)
Run: uv sync or reinstall dependencies

"Connection refused to Ollama"

Make sure Ollama is running: ollama serve
Check Ollama is on default port 11434

"Model not found"

Pull the model: ollama pull mistral-large-3:675b-cloud
Or change to an available model in the sidebar

Slow summarization

Use a smaller/faster model (e.g., mistral instead of mistral-large-3)
Reduce chunk size or use "Stuff" method for smaller docs

📝 License

This project is provided as-is for educational and personal use.

🤝 Contributing

Contributions welcome! Feel free to:

Report bugs
Suggest features
Submit pull requests

📧 Support

For issues or questions, please open an issue in the repository.

Built with ❤️ using LangChain, Streamlit, and Ollama

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
loaders.py		loaders.py
main.py		main.py
pyproject.toml		pyproject.toml
summarizer.py		summarizer.py
utils.py		utils.py
uv.lock		uv.lock
verify_imports.py		verify_imports.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📚 Intelligent Document Summarizer

✨ Features

🚀 Quick Start

Prerequisites

Installation

Running the App

📖 Usage Guide

Step 1: Configure Settings (Sidebar)

Step 2: Upload Documents

Step 3: Generate Summaries

Example Instructions

📁 Project Structure

🛠️ Technical Details

Dependencies

How It Works

🔧 Customization

Change the Default Model

Adjust Chunk Size

Modify Prompts

🐛 Troubleshooting

📝 License

🤝 Contributing

📧 Support

About

Uh oh!

Releases

Packages

Languages

Macowen14/summarizationRAG

Folders and files

Latest commit

History

Repository files navigation

📚 Intelligent Document Summarizer

✨ Features

🚀 Quick Start

Prerequisites

Installation

Running the App

📖 Usage Guide

Step 1: Configure Settings (Sidebar)

Step 2: Upload Documents

Step 3: Generate Summaries

Example Instructions

📁 Project Structure

🛠️ Technical Details

Dependencies

How It Works

🔧 Customization

Change the Default Model

Adjust Chunk Size

Modify Prompts

🐛 Troubleshooting

📝 License

🤝 Contributing

📧 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages