📰 NewsByte — Automated Nepali/English News Summarizer

NewsByte is an automated Nepali news aggregation and summarization platform that streamlines information consumption from multiple online news portals.
It collects articles from Ekantipur and The Kathmandu Post, summarizes them using a hybrid TF-IDF + TextRank algorithm, and presents concise, contextually accurate summaries via an intuitive, responsive web interface.

🚀 Features

Automated Web Scraping — Uses Scrapy and Playwright for scalable, policy-compliant scraping of Nepali news portals.
Hybrid NLP Summarization — Summarizes articles to ~30% of their original size using a TF-IDF-enhanced TextRank algorithm.
Asynchronous Processing — RabbitMQ handles scraping, summarization, and storage pipelines in parallel for high throughput.
Scalable Backend — NestJS APIs with Prisma ORM provide efficient, secure data access.
Search & Filter Support — Full-text search with PostgreSQL tsvector and topic-based filtering (Politics, Business, Sports, etc.).
Responsive Frontend — Built using React 18 with TypeScript, optimized for both mobile and desktop.
Bilingual Support — Summarization pipeline optimized for English and Nepali news content.
Secure & Reliable — JWT-authenticated API endpoints, HTTPS communication, and WCAG 2.1 accessibility compliance.

🏗️ Architecture Overview

           ┌─────────────┐
           │   Scrapy    │  <-- Web Crawling (Ekantipur, Kathmandu Post)
           └──────┬──────┘
                  │
           ┌──────▼──────┐
           │ RabbitMQ    │  <-- Asynchronous Message Broker
           └──────┬──────┘
                  │
           ┌──────▼──────┐
           │ Summarizer  │  <-- TF-IDF + TextRank Engine
           └──────┬──────┘
                  │
         ┌────────▼────────┐
         │ PostgreSQL DB   │  <-- Stores Summarized Articles
         └────────┬────────┘
                  │
    ┌─────────────▼─────────────┐
    │       NestJS API          │  <-- JWT Auth + RESTful Endpoints
    └─────────────┬─────────────┘
                  │
         ┌────────▼────────┐
         │   React Frontend │  <-- User Interface
         └──────────────────┘

🛠️ Tech Stack

Frontend

React 18 + TypeScript
Material-UI for responsive layouts
React Query for data synchronization
Axios for API integration

Backend

NestJS 9 (TypeScript-based framework)
Prisma ORM for type-safe database interactions
JWT Authentication for secure endpoints
Swagger (OpenAPI 3.0) for API documentation

Data Pipeline

Scrapy + Playwright for dynamic web scraping
RabbitMQ for distributed task management
TF-IDF + TextRank for extractive summarization

Database

PostgreSQL 14 (GIN-indexed full-text search)

Infrastructure

Docker for containerized deployment
GitHub Actions for CI/CD
Prometheus for monitoring
Sentry for error tracking

📦 Installation

1️⃣ Clone the repository

git clone https://github.com/fuunshi/newsbyte.git
cd newsbyte

2️⃣ Set up environment variables

Create a .env file in the root directory:

# Backend
DATABASE_URL=postgresql://user:password@localhost:5432/newsbyte
JWT_SECRET=your_jwt_secret
RABBITMQ_URL=amqp://localhost

# Frontend
NEXT_PUBLIC_API_URL=http://localhost:3000/api

3️⃣ Install dependencies

Backend:

cd backend
npm install
npx prisma migrate dev

Frontend:

cd frontend
npm install

4️⃣ Start services

docker-compose up -d

🧪 Testing

Unit Tests

cd backend
npm run test

System Tests

Scraping validation — Ensures that news articles are collected and summarized correctly.
API tests — Validates RESTful endpoints and JWT-based authentication.
Frontend tests — Covers filters, search, and accessibility compliance.

📊 Performance Benchmarks

Feature	Metric
Avg. Article Summarization	< 15s per article
API Response Time	< 500ms
Daily Capacity	1000+ articles/day
Summarization Accuracy	ROUGE-1 F1 ≈ 0.72
Uptime	99% SLA

📌 Future Enhancements

🔹 Transformer-based Summarization Integrate DistilBERT and Sentence-BERT for abstractive summaries.
🔹 Real-time News Updates Switch to RSS monitoring and Kafka-based streaming for instant updates.
🔹 Personalized Recommendations Build user profiles for customized news feeds.
🔹 Extended Multilingual Support Add Maithili and other regional languages.

📚 References

👨‍💻 Contributors

Name	Contributions
Pranil Shrestha	Backend & Scrapper; collaborated on frontend and NLP modules
Sumit Shrestha	Frontend & NLP modules; collaborated on backend and Scrapper

📝 License

This project is licensed under the Apache License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
frontend		frontend
scrapper		scrapper
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📰 NewsByte — Automated Nepali/English News Summarizer

🚀 Features

🏗️ Architecture Overview

🛠️ Tech Stack

Frontend

Backend

Data Pipeline

Database

Infrastructure

📦 Installation

1️⃣ Clone the repository

2️⃣ Set up environment variables

3️⃣ Install dependencies

Backend:

Frontend:

4️⃣ Start services

🧪 Testing

Unit Tests

System Tests

📊 Performance Benchmarks

📌 Future Enhancements

📚 References

👨‍💻 Contributors

📝 License

About

Uh oh!

Languages

License

fuunshi/newsbyte

Folders and files

Latest commit

History

Repository files navigation

📰 NewsByte — Automated Nepali/English News Summarizer

🚀 Features

🏗️ Architecture Overview

🛠️ Tech Stack

Frontend

Backend

Data Pipeline

Database

Infrastructure

📦 Installation

1️⃣ Clone the repository

2️⃣ Set up environment variables

3️⃣ Install dependencies

Backend:

Frontend:

4️⃣ Start services

🧪 Testing

Unit Tests

System Tests

📊 Performance Benchmarks

📌 Future Enhancements

📚 References

👨‍💻 Contributors

📝 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages