Skip to content

CognitiveSky is an open-source research tool designed to explore and analyze mental health narratives in public Bluesky data. Inspired by TwiXplorer, this dashboard enables researchers, analysts, and public health advocates to gain insights from social discourse using NLP, sentiment analysis, topic modeling, and interactive visualizations.

License

Notifications You must be signed in to change notification settings

gauravfs-14/CognitiveSky

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 CognitiveSky

GitHub Stars GitHub Forks MIT License Daily Labeling and Summary Export

Note

πŸ“’ Announcement: Our paper is now available on arXiv!
Title: CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media
If you use this project, please consider citing our work. Thank you for your support!

CognitiveSky is an open-source research infrastructure and dashboard for analyzing mental health narratives on the Bluesky social platform. Inspired by TwiXplorer, it integrates real-time data ingestion, robust NLP processing, and interactive visualization to empower researchers, advocates, and developers with actionable social insights.

Live Dashboard: CognitiveSky Dashboard

πŸ“– Table of Contents

🌟 Features

  • Real-time Data Ingestion: Continuously collects public posts related to mental health from Bluesky using the Firehose API.
  • NLP Processing: Applies state-of-the-art sentiment analysis, emotion detection, and topic modeling to understand mental health narratives.
  • Interactive Dashboard: Visualizes trends, user engagement, and topic distributions using React and Next.js.
  • Open Source: Fully transparent and community-driven, allowing contributions from researchers and developers.

βš™οΈ System Architecture

The CognitiveSky system is built around two primary components:

1. Mental Health Worker (mh_worker)

  • Language: Node.js
  • Host: Oracle Cloud (free-tier VM)
  • Function: A real-time listener using Bluesky's Firehose API
  • Purpose: Filters public posts related to mental health and stores them in a posts_unlabeled table within Supabase.
  • Frequency: Continuous, 24Γ—7 ingestion
  • Output: Raw mental-health-related posts in Supabase

Read more about the worker: mh_worker README

2. Summarization & Labeling Pipeline (summary.py)

Daily Labeling and Summary Export

  • Language: Python
  • Trigger: Scheduled daily via GitHub Actions (4 parallel shards Γ— 500 posts)
  • Purpose: Processes unlabeled posts using:
    • Sentiment analysis (cardiffnlp/twitter-roberta-base-sentiment)
    • Emotion detection (j-hartmann/emotion-english-distilroberta-base)
    • Topic modeling (NMF + TF-IDF)
  • Database: Processes are stored in Turso (libSQL)
  • Output: JSON snapshots written to /summary/*.json for dashboard rendering

View Latest Summary Output: Latest Summary JSON

πŸ”¨ Tools And Technologies

Data Ingestion mh_worker

  • Node.js: For real-time data ingestion
  • Bluesky Firehose API: Streams public posts using @atproto/sync and @atproto/api libraries
  • Supabase: Acts as the database for storing unlabeled posts
  • Oracle Cloud: Hosts the worker for continuous operation

NLP Processing and Summarization summary.py

  • Python: Main language for NLP processing
  • Transformers: For sentiment and emotion analysis using pre-trained models
  • Turso (libSQL): Lightweight database for storing labeled data
  • GitHub Actions: Automates daily processing and export of summaries
  • NLP Libraries:
    • transformers for sentiment and emotion analysis
    • scikit-learn for topic modeling

Dashboard

  • React + Next.js: Frontend framework for building the dashboard
  • Tailwind CSS + shadcn/ui: For styling the dashboard components
  • Recharts: For data visualization

πŸ§ͺ Data Flow

      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β”‚ Bluesky Firehose   β”‚
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚   mh_worker (Node) β”‚
     β”‚  Filter + Ingest   β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ Supabase (Unlabeled) β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ summary.py (GitHub CI)  β”‚
  β”‚ NLP + Topics + Migrate  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Turso DB     β”‚
    β”‚ (Labeled Data) β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ summary_snapshots DB β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  JSON Files (πŸ“)   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Dashboard (Web)  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“¦ Summary Outputs

Every run of the summarization pipeline generates JSON files like:

  • summary/narratives.json: Sentiment & language distributions
  • summary/emotions.json: Emotion category trends
  • summary/hashtags.json: Trending hashtags and emojis
  • summary/activity.json: Post volume over time
  • summary/engagement.json: Top posts and active users
  • summary/topics.json: Topic distributions, keywords, and per-topic sentiment/emotion/hashtags

Each is grouped by date to support historical and temporal exploration in the dashboard.

View Example Output: Sample JSON Output

πŸ“Š Dashboard

  • Framework: React + Next.js + Recharts
  • Features:
    • Topic-wise sentiment/emotion timelines
    • Hashtag and emoji trends
    • Most active users and posts
    • Narrative shifts across time
  • Data Source: JSON files from summary/ directory

Live Demo: CognitiveSky Dashboard | Source Code: Dashboard Code

πŸš€ Get Started

1. Clone the Repo

git clone https://github.com/gauravfs-14/CognitiveSky.git
cd CognitiveSky

2. Setup Environment

cp .env.example .env
# Fill in Supabase, Turso, and Bluesky credentials

3. Install Dependencies

pip install -r requirements.txt

or use the conda environment provided with environment.yml.

conda env create -f environment.yml
conda activate cognitive-sky

4. Start the Mental Health Worker

Ensure you have Node.js installed, then run:

cd mh_worker
npm install

Set up the environment variables in mh_worker/.env with your Bluesky credentials and Supabase connection details.

Then start the worker:

npm start

This will start the real-time listener that filters and ingests mental health posts into Supabase.

5. Run Summary Pipeline

To process the unlabeled posts and generate summaries, run:

EXPORT_ONLY=0 python scripts/summary.py && EXPORT_ONLY=1 python scripts/summary.py

This will:

  • Process the unlabeled posts
  • Generate sentiment, emotion, and topic summaries
  • Export the results to JSON files in the summary/ directory

You can also run the script with the EXPORT_ONLY environment variable to control whether to export the summaries or just process the snapshots:

EXPORT_ONLY=0 python scripts/summary.py

Or export just the snapshots:

EXPORT_ONLY=1 python scripts/summary.py

πŸ› οΈ Makefile Commands

The project includes a Makefile for streamlined testing and production workflows. Below are the available commands:

Test Commands

  • make test-label: Run full labeling and snapshot generation in TEST_MODE.
  • make test-export: Export summary JSONs only from the test database.
  • make test-db-to-db: Generate snapshot DB from labeled posts in TEST_MODE.
  • make test-full: Run full labeling and snapshot generation in TEST_MODE, followed by exporting JSONs.

Production Commands

  • make prod-label: Run full labeling and snapshot generation on the production database.
  • make prod-export: Export summary JSONs only from the production database.

Utility Commands

  • make clean-test-db: Remove the local test database.
  • make gen-dummy: Generate dummy data for testing.
  • make help: Display the list of available Makefile commands.

🀝 Contributing

We welcome contributions from researchers, developers, and mental health advocates. You can:

  • Suggest new metrics or visualizations
  • Help improve NLP model support
  • Extend to other languages or regions
  • Report bugs or submit PRs

πŸ“„ License

This project is licensed under the MIT License. See LICENSE for details.

BibTex Citation

If you decide to use our project, code, and artifacts in any way, please consider citing our paper.

@misc{chhetri2025cognitiveskyscalablesentimentnarrative,
      title={CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media}, 
      author={Gaurab Chhetri and Anandi Dutta and Subasish Das},
      year={2025},
      eprint={2509.11444},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.11444}, 
}

Acknowledgements

This project was initially inspired by TwiXplorer and aims to build a similar infrastructure for Bluesky mental health narratives. Special thanks to:

  • Bluesky Community: For their support and resources.
  • Oracle Cloud: For providing the Forever Free Tier VM hosting the mh_worker.
  • Supabase: For enabling seamless database integration and real-time data storage.
  • Hugging Face Transformers: For providing pre-trained models used in sentiment and emotion analysis.
  • AIT Lab: For their guidance, collaboration, and technical support.
  • Open Source Contributors: For their valuable feedback, suggestions, and code contributions.

Developed by Gaurab Chhetri, Supported by AIT Lab.

About

CognitiveSky is an open-source research tool designed to explore and analyze mental health narratives in public Bluesky data. Inspired by TwiXplorer, this dashboard enables researchers, analysts, and public health advocates to gain insights from social discourse using NLP, sentiment analysis, topic modeling, and interactive visualizations.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •