Real-time News Sentiment Analysis for the Land of the Thunder Dragon
A comprehensive analytics dashboard that aggregates, filters, and analyzes news about Bhutan from domestic, international, and social media sources to visualize real-time sentiment trends.
- Strict Relevance Filtering: Automatically filters out articles that don't explicitly mention Bhutan (unless from 100% domestic sources), preventing irrelevant "noise" in the data.
- 5-Hour Refresh Cooldown: Implements a robust, persistent cooldown mechanism to prevent spamming news sources and ensuring ethical data scraping.
- Deduplication: Smart logic to prevent duplicate articles from cluttering the dataset.
- Enhanced Sentiment Analysis: Uses a custom-tuned VADER model with an expanded lexicon of 160+ Bhutan-specific terms and general news vocabulary (e.g., GNH, Dzongkha, hydropower, carbon negative) for highly accurate, culturally aware scoring.
- Smart Hybrid Classifier: Eliminates "Uncategorized" content using a hybrid approach—combining Keyword Matching (High Precision) with TF-IDF Vector Similarity (High Recall) to cluster articles into topics even without exact keyword matches.
- Mobile-First Design: Fully responsive UI that adapts seamlessly to desktop and mobile screens, with dark-mode friendly readabilty.
- Sentiment Clusters: Interactive scatter plot visualizing the spread of positive, neutral, and negative articles.
- Topic Word Cloud: Generates dynamic word clouds from article content to identify trending topics.
- Interactive Charts: Donut charts, Treemaps, and Bar charts built with Plotly.
# Clone the repository
git clone https://github.com/androidilicious/sentiment.bt.git
cd sentiment.bt
# Create virtual environment
python -m venv venv
# Windows:
venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Download required NLTK data
python -c "import nltk; nltk.download('vader_lexicon'); nltk.download('punkt')"streamlit run app.pyThe app will open automatically at http://localhost:8501.
We monitor a curated list of sources to ensure comprehensive coverage:
| Category | Sources |
|---|---|
| 🇧🇹 Domestic | Kuensel, BBS, The Bhutanese, Business Bhutan, Bhutan Times |
| 🌍 International | Google News, Al Jazeera, BBC Asia, The Diplomat, SCMP, Reuters |
| 🔮 Unconventional | Reddit (r/Bhutan), Global Voices, Twitter/X (Coming Soon) |
sentiment.bt/
├── app.py # Main Streamlit dashboard application
├── config/
│ └── news_sources.py # Configuration for RSS feeds & scraping rules
├── core/
│ ├── news_fetcher.py # RSS fetching & web scraping logic
│ ├── sentiment_analyzer.py # Custom NLP engine with Bhutan lexicon
│ ├── domain_classifier.py # Rule-based topic categorization
│ └── data_manager.py # SQLite database interface
├── data/
│ ├── bhutan_sentiment.db # Local database storing analyzed articles
│ └── last_refresh.txt # Persistent timestamp for cooldown logic
└── requirements.txt # Project dependencies
This tool provides automated sentiment analysis for informational purposes only. Sentiment scores are generated using NLP algorithms and may not reflect the true nuance, context, or intent of the original articles. Always refer to original sources for accurate information.
MIT License. Built with ❤️ for Bhutan.