Data and Code for "Forecasting the Buzz: Enriching Hashtag Popularity Prediction with LLM Reasoning" (CIKM 2025)
This repo contains the data and code for the following paper:
Yifei Xu, Jiaying Wu, Herun Wan, Yang Li, Zhen Hou, Min-Yen Kan. Forecasting the Buzz: Enriching Hashtag Popularity Prediction with LLM Reasoning, ACM International Conference on Information and Knowledge Management (CIKM) 2025.
Hashtag trends ignite campaigns, shift public opinion, and steer millions of dollars in advertising spend, yet forecasting which tag goes viral is elusive. Classical regressors digest surface features but ignore context, while large language models (LLMs) excel at contextual reasoning but misestimate numbers. We present BuzzProphet, a reasoning-augmented hashtag popularity prediction framework that (1) instructs an LLM to articulate a hashtag’s topical virality, audience reach, and timing advantage; (2) utilizes these popularity-oriented rationales to enrich the input features; and (3) regresses on these inputs. To facilitate evaluation, we release HashView, a 7,532-hashtag benchmark curated from social media. Across diverse regressor—LLM combinations, BuzzProphet reduces RMSE by up to 2.8% and boosts correlation by 30% over baselines, while producing human-readable rationales. Results demonstrate that using LLMs as context reasoners rather than numeric predictors injects domain insight into tabular models, yielding an interpretable and deployable solution for social media trend forecasting.
Install the required dependencies:
pip install -r requirement.txtnew_processed_time-sorted_data.csv: The original HashView dataset for hashtag popularity prediction, collected from Chinese Weibo.
This dataset includes the following attributes: id, title, datetime, browse_count, and browse_log_norm.
id: hashtag ID.title: hashtag text.datetime: posting time of the hashtag, in the format YYYY-MM-DD hh:mm:ss.browse_count: view count of the hashtag, which serves as the main indicator of popularity.browse_log_norm: log-normalized value of browse_count, used as the prediction target.
o3_instruction.csv: popularity-oriented reasoning elicited from the o3-mini model.
id,title,datetime,browse_log_norm: same asnew_processed_time-sorted_data.csv.category_instruction: o3-mini reasoning about the hashtag's topic category attribute.audience_instruction: o3-mini reasoning about the hashtag's target audience attribute.time_instruction: o3-mini reasoning about the hashtag's posting time attribute.merge_instruction: o3-mini reasoning about the hashtag's overall popularity by jointly considering all three attributes.
To run BuzzProphet on the basis of different regression models, use the following shell scripts:
-
Run RandomForest + BuzzProphet:
sh run_RF_BuzzProphet.sh
-
Run CatBoost + BuzzProphet:
sh run_CB_BuzzProphet.sh
After running the script, the results will be saved under browse_trained_results/ in CSV format.