NewsByte is an automated Nepali news aggregation and summarization platform that streamlines information consumption from multiple online news portals.
It collects articles from Ekantipur and The Kathmandu Post, summarizes them using a hybrid TF-IDF + TextRank algorithm, and presents concise, contextually accurate summaries via an intuitive, responsive web interface.
- Automated Web Scraping — Uses Scrapy and Playwright for scalable, policy-compliant scraping of Nepali news portals.
- Hybrid NLP Summarization — Summarizes articles to ~30% of their original size using a TF-IDF-enhanced TextRank algorithm.
- Asynchronous Processing — RabbitMQ handles scraping, summarization, and storage pipelines in parallel for high throughput.
- Scalable Backend — NestJS APIs with Prisma ORM provide efficient, secure data access.
- Search & Filter Support — Full-text search with PostgreSQL tsvector and topic-based filtering (Politics, Business, Sports, etc.).
- Responsive Frontend — Built using React 18 with TypeScript, optimized for both mobile and desktop.
- Bilingual Support — Summarization pipeline optimized for English and Nepali news content.
- Secure & Reliable — JWT-authenticated API endpoints, HTTPS communication, and WCAG 2.1 accessibility compliance.
┌─────────────┐
│ Scrapy │ <-- Web Crawling (Ekantipur, Kathmandu Post)
└──────┬──────┘
│
┌──────▼──────┐
│ RabbitMQ │ <-- Asynchronous Message Broker
└──────┬──────┘
│
┌──────▼──────┐
│ Summarizer │ <-- TF-IDF + TextRank Engine
└──────┬──────┘
│
┌────────▼────────┐
│ PostgreSQL DB │ <-- Stores Summarized Articles
└────────┬────────┘
│
┌─────────────▼─────────────┐
│ NestJS API │ <-- JWT Auth + RESTful Endpoints
└─────────────┬─────────────┘
│
┌────────▼────────┐
│ React Frontend │ <-- User Interface
└──────────────────┘
- React 18 + TypeScript
- Material-UI for responsive layouts
- React Query for data synchronization
- Axios for API integration
- NestJS 9 (TypeScript-based framework)
- Prisma ORM for type-safe database interactions
- JWT Authentication for secure endpoints
- Swagger (OpenAPI 3.0) for API documentation
- Scrapy + Playwright for dynamic web scraping
- RabbitMQ for distributed task management
- TF-IDF + TextRank for extractive summarization
- PostgreSQL 14 (GIN-indexed full-text search)
- Docker for containerized deployment
- GitHub Actions for CI/CD
- Prometheus for monitoring
- Sentry for error tracking
git clone https://github.com/fuunshi/newsbyte.git
cd newsbyteCreate a .env file in the root directory:
# Backend
DATABASE_URL=postgresql://user:password@localhost:5432/newsbyte
JWT_SECRET=your_jwt_secret
RABBITMQ_URL=amqp://localhost
# Frontend
NEXT_PUBLIC_API_URL=http://localhost:3000/apicd backend
npm install
npx prisma migrate devcd frontend
npm installdocker-compose up -dcd backend
npm run test- Scraping validation — Ensures that news articles are collected and summarized correctly.
- API tests — Validates RESTful endpoints and JWT-based authentication.
- Frontend tests — Covers filters, search, and accessibility compliance.
| Feature | Metric |
|---|---|
| Avg. Article Summarization | < 15s per article |
| API Response Time | < 500ms |
| Daily Capacity | 1000+ articles/day |
| Summarization Accuracy | ROUGE-1 F1 ≈ 0.72 |
| Uptime | 99% SLA |
- 🔹 Transformer-based Summarization Integrate DistilBERT and Sentence-BERT for abstractive summaries.
- 🔹 Real-time News Updates Switch to RSS monitoring and Kafka-based streaming for instant updates.
- 🔹 Personalized Recommendations Build user profiles for customized news feeds.
- 🔹 Extended Multilingual Support Add Maithili and other regional languages.
| Name | Contributions |
|---|---|
| Pranil Shrestha | Backend & Scrapper; collaborated on frontend and NLP modules |
| Sumit Shrestha | Frontend & NLP modules; collaborated on backend and Scrapper |
This project is licensed under the Apache License.