Skip to content

WING-NUS/BuzzProphet

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data and Code for "Forecasting the Buzz: Enriching Hashtag Popularity Prediction with LLM Reasoning" (CIKM 2025)

This repo contains the data and code for the following paper:

Yifei Xu, Jiaying Wu, Herun Wan, Yang Li, Zhen Hou, Min-Yen Kan. Forecasting the Buzz: Enriching Hashtag Popularity Prediction with LLM Reasoning, ACM International Conference on Information and Knowledge Management (CIKM) 2025.

Abstract

Hashtag trends ignite campaigns, shift public opinion, and steer millions of dollars in advertising spend, yet forecasting which tag goes viral is elusive. Classical regressors digest surface features but ignore context, while large language models (LLMs) excel at contextual reasoning but misestimate numbers. We present BuzzProphet, a reasoning-augmented hashtag popularity prediction framework that (1) instructs an LLM to articulate a hashtag’s topical virality, audience reach, and timing advantage; (2) utilizes these popularity-oriented rationales to enrich the input features; and (3) regresses on these inputs. To facilitate evaluation, we release HashView, a 7,532-hashtag benchmark curated from social media. Across diverse regressor—LLM combinations, BuzzProphet reduces RMSE by up to 2.8% and boosts correlation by 30% over baselines, while producing human-readable rationales. Results demonstrate that using LLMs as context reasoners rather than numeric predictors injects domain insight into tabular models, yielding an interpretable and deployable solution for social media trend forecasting.

🔧 Installation

Install the required dependencies:

pip install -r requirement.txt

📂 HashView dataset (data/)

new_processed_time-sorted_data.csv: The original HashView dataset for hashtag popularity prediction, collected from Chinese Weibo.

This dataset includes the following attributes: id, title, datetime, browse_count, and browse_log_norm.

  • id: hashtag ID.
  • title: hashtag text.
  • datetime: posting time of the hashtag, in the format YYYY-MM-DD hh:mm:ss.
  • browse_count: view count of the hashtag, which serves as the main indicator of popularity.
  • browse_log_norm: log-normalized value of browse_count, used as the prediction target.

o3_instruction.csv: popularity-oriented reasoning elicited from the o3-mini model.

  • id, title, datetime, browse_log_norm: same as new_processed_time-sorted_data.csv.
  • category_instruction: o3-mini reasoning about the hashtag's topic category attribute.
  • audience_instruction: o3-mini reasoning about the hashtag's target audience attribute.
  • time_instruction: o3-mini reasoning about the hashtag's posting time attribute.
  • merge_instruction: o3-mini reasoning about the hashtag's overall popularity by jointly considering all three attributes.

🚀 Run BuzzProphet

To run BuzzProphet on the basis of different regression models, use the following shell scripts:

  • Run RandomForest + BuzzProphet:

    sh run_RF_BuzzProphet.sh
    
  • Run CatBoost + BuzzProphet:

    sh run_CB_BuzzProphet.sh
    

After running the script, the results will be saved under browse_trained_results/ in CSV format.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.0%
  • Shell 2.0%