| title | emoji | colorFrom | colorTo | sdk | app_port | pinned |
|---|---|---|---|---|---|---|
Multi-Document Summarizer |
📚 |
blue |
green |
docker |
8000 |
true |
A full-stack application that effectively summarizes information from multiple documents, focusing on key topics and reducing redundancy. This project uses a hybrid extractive-abstractive approach to generate coherent and concise summaries.
https://huggingface.co/spaces/NeelM47/summarizer
- Multi-Document Input: A dynamic web interface that allows users to input multiple documents for summarization.
- Configurable Topic Granularity: Users can select the number of topics (clusters) to extract, controlling the detail level of the summary.
- Redundancy Reduction: Uses
sentence-transformersto create semantic embeddings andscikit-learnKMeans clustering to identify unique topics across all documents. - Coherent Summaries: Employs a pre-trained BART model from Hugging Face
transformersto generate a fluent, abstractive summary from the most representative sentences. - RESTful API: The core NLP logic is exposed via a robust and fast FastAPI backend.
- Containerized Application: The entire application is containerized with Docker, ensuring easy and reproducible deployment.
- Backend: Python 3.9, FastAPI
- NLP/ML: Hugging Face
transformers,sentence-transformers,scikit-learn,nltk - Frontend: HTML, CSS, Vanilla JavaScript
- Deployment: Docker, Uvicorn, Hugging Face Spaces
- Ingestion: The system takes a list of documents and a desired number of topics (
k) as input. - Sentence Cleaning & Embedding: All documents are split into sentences. Each sentence is cleaned (lowercased, punctuation removed) and then converted into a high-dimensional vector embedding using a Sentence-Transformer model.
- Extractive Clustering: The sentence embeddings are clustered using KMeans with
kclusters. This groups semantically similar sentences into topics. - Centroid-Based Selection: The most representative sentence from each cluster is selected by finding the one closest to the cluster's centroid.
- Abstractive Summarization: These selected, non-redundant sentences are combined and passed to a BART model, which generates a final, coherent summary.
# Clone the repository
git clone https://github.com/NeelM47/summarizer.git
cd summarizer
# Create and activate the conda environment
conda env create -f environment.yml
conda activate summarizer
# Run the FastAPI server
uvicorn main:app --reloadAccess the app at http://127.0.0.1:8000
Ensure you have Docker installed and running.
# Clone the repository
git clone https://github.com/NeelM47/summarizer.git
<<<<<<< HEAD
cd summarizer
=======
cd suvidha-summarizer
>>>>>>> 1d551d7cc667304ddd3e9d9a672bebd1f68bc829
# Build the Docker image
docker build -t summarizer .
# Run the Docker container
docker run -p 8000:8000 summarizerAccess the app at http://127.0.0.1:8000
Once the server is running, interactive API documentation (powered by Swagger UI) is available at http://127.0.0.1:8000/docs.
