Note
π’ Announcement: Our paper is now available on arXiv!
Title: CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media
If you use this project, please consider citing our work. Thank you for your support!
CognitiveSky is an open-source research infrastructure and dashboard for analyzing mental health narratives on the Bluesky social platform. Inspired by TwiXplorer, it integrates real-time data ingestion, robust NLP processing, and interactive visualization to empower researchers, advocates, and developers with actionable social insights.
Live Dashboard: CognitiveSky Dashboard
- π Features
- βοΈ System Architecture
- π¨ Tools And Technologies
- π§ͺ Data Flow
- π¦ Summary Outputs
- π Dashboard
- π Get Started
- π οΈ Makefile Commands
- π€ Contributing
- π License
- Acknowledgements
- Real-time Data Ingestion: Continuously collects public posts related to mental health from Bluesky using the Firehose API.
- NLP Processing: Applies state-of-the-art sentiment analysis, emotion detection, and topic modeling to understand mental health narratives.
- Interactive Dashboard: Visualizes trends, user engagement, and topic distributions using React and Next.js.
- Open Source: Fully transparent and community-driven, allowing contributions from researchers and developers.
The CognitiveSky system is built around two primary components:
- Language: Node.js
- Host: Oracle Cloud (free-tier VM)
- Function: A real-time listener using Bluesky's Firehose API
- Purpose: Filters public posts related to mental health and stores them in a
posts_unlabeledtable within Supabase. - Frequency: Continuous, 24Γ7 ingestion
- Output: Raw mental-health-related posts in Supabase
Read more about the worker: mh_worker README
- Language: Python
- Trigger: Scheduled daily via GitHub Actions (4 parallel shards Γ 500 posts)
- Purpose: Processes unlabeled posts using:
- Sentiment analysis (
cardiffnlp/twitter-roberta-base-sentiment) - Emotion detection (
j-hartmann/emotion-english-distilroberta-base) - Topic modeling (NMF + TF-IDF)
- Sentiment analysis (
- Database: Processes are stored in Turso (libSQL)
- Output: JSON snapshots written to
/summary/*.jsonfor dashboard rendering
View Latest Summary Output: Latest Summary JSON
- Node.js: For real-time data ingestion
- Bluesky Firehose API: Streams public posts using
@atproto/syncand@atproto/apilibraries - Supabase: Acts as the database for storing unlabeled posts
- Oracle Cloud: Hosts the worker for continuous operation
- Python: Main language for NLP processing
- Transformers: For sentiment and emotion analysis using pre-trained models
- Turso (libSQL): Lightweight database for storing labeled data
- GitHub Actions: Automates daily processing and export of summaries
- NLP Libraries:
transformersfor sentiment and emotion analysisscikit-learnfor topic modeling
- React + Next.js: Frontend framework for building the dashboard
- Tailwind CSS + shadcn/ui: For styling the dashboard components
- Recharts: For data visualization
ββββββββββββββββββββββ
β Bluesky Firehose β
ββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββ
β mh_worker (Node) β
β Filter + Ingest β
ββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β Supabase (Unlabeled) β
ββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β summary.py (GitHub CI) β
β NLP + Topics + Migrate β
ββββββββββ¬βββββββββββββββββ
β
βΌ
ββββββββββββββββββ
β Turso DB β
β (Labeled Data) β
ββββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββββββββββ
β summary_snapshots DB β
ββββββββββ¬ββββββββββββββ
β
βΌ
ββββββββββββββββββββββ
β JSON Files (π) β
ββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββ
β Dashboard (Web) β
ββββββββββββββββββββββEvery run of the summarization pipeline generates JSON files like:
summary/narratives.json: Sentiment & language distributionssummary/emotions.json: Emotion category trendssummary/hashtags.json: Trending hashtags and emojissummary/activity.json: Post volume over timesummary/engagement.json: Top posts and active userssummary/topics.json: Topic distributions, keywords, and per-topic sentiment/emotion/hashtags
Each is grouped by date to support historical and temporal exploration in the dashboard.
View Example Output: Sample JSON Output
- Framework: React + Next.js + Recharts
- Features:
- Topic-wise sentiment/emotion timelines
- Hashtag and emoji trends
- Most active users and posts
- Narrative shifts across time
- Data Source: JSON files from
summary/directory
Live Demo: CognitiveSky Dashboard | Source Code: Dashboard Code
git clone https://github.com/gauravfs-14/CognitiveSky.git
cd CognitiveSkycp .env.example .env
# Fill in Supabase, Turso, and Bluesky credentialspip install -r requirements.txtor use the conda environment provided with environment.yml.
conda env create -f environment.yml
conda activate cognitive-skyEnsure you have Node.js installed, then run:
cd mh_worker
npm installSet up the environment variables in mh_worker/.env with your Bluesky credentials and Supabase connection details.
Then start the worker:
npm startThis will start the real-time listener that filters and ingests mental health posts into Supabase.
To process the unlabeled posts and generate summaries, run:
EXPORT_ONLY=0 python scripts/summary.py && EXPORT_ONLY=1 python scripts/summary.pyThis will:
- Process the unlabeled posts
- Generate sentiment, emotion, and topic summaries
- Export the results to JSON files in the
summary/directory
You can also run the script with the EXPORT_ONLY environment variable to control whether to export the summaries or just process the snapshots:
EXPORT_ONLY=0 python scripts/summary.pyOr export just the snapshots:
EXPORT_ONLY=1 python scripts/summary.pyThe project includes a Makefile for streamlined testing and production workflows. Below are the available commands:
make test-label: Run full labeling and snapshot generation inTEST_MODE.make test-export: Export summary JSONs only from the test database.make test-db-to-db: Generate snapshot DB from labeled posts inTEST_MODE.make test-full: Run full labeling and snapshot generation inTEST_MODE, followed by exporting JSONs.
make prod-label: Run full labeling and snapshot generation on the production database.make prod-export: Export summary JSONs only from the production database.
make clean-test-db: Remove the local test database.make gen-dummy: Generate dummy data for testing.make help: Display the list of available Makefile commands.
We welcome contributions from researchers, developers, and mental health advocates. You can:
- Suggest new metrics or visualizations
- Help improve NLP model support
- Extend to other languages or regions
- Report bugs or submit PRs
This project is licensed under the MIT License. See LICENSE for details.
If you decide to use our project, code, and artifacts in any way, please consider citing our paper.
@misc{chhetri2025cognitiveskyscalablesentimentnarrative,
title={CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media},
author={Gaurab Chhetri and Anandi Dutta and Subasish Das},
year={2025},
eprint={2509.11444},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.11444},
}This project was initially inspired by TwiXplorer and aims to build a similar infrastructure for Bluesky mental health narratives. Special thanks to:
- Bluesky Community: For their support and resources.
- Oracle Cloud: For providing the Forever Free Tier VM hosting the
mh_worker. - Supabase: For enabling seamless database integration and real-time data storage.
- Hugging Face Transformers: For providing pre-trained models used in sentiment and emotion analysis.
- AIT Lab: For their guidance, collaboration, and technical support.
- Open Source Contributors: For their valuable feedback, suggestions, and code contributions.
Developed by Gaurab Chhetri, Supported by AIT Lab.