Latest headlines → VADER sentiment → Bayesian Student-t regression (PyMC) → next-day log-return and multi-day price forecasts with uncertainty.
Authors
Shreemadhi Babu Rajendra Prasad (24207575) · Saipavan Narayanasamy (24233785) -
M.Sc. in Data & Computational Science, University College Dublin
Poster: Project Poster
- About the project
- Workflow overview
- Overview
- What the app does
- Bayesian Model
- Install & Run
- Using the app
- Outputs & Run Log
- Outputs
- Repo Structure
- Limitations & future work
- Tech stack
- License
- Cite
- Acknowledgments
- Contributors
Goal. Turn daily headlines into a quantitative sentiment signal and measure its predictive effect on next-day returns; produce uncertainty-aware price forecasts over short horizons.
Why Student-t? Heavy-tailed residuals guard against outliers and volatility clustering common in returns.
Why Bayesian? Full posteriors + diagnostics (ESS,
Why Streamlit? A fast, transparent interface to explore data, diagnostics, and forecasts.
We build a small research app that:
- pulls the latest news headlines per ticker,
- scores each headline with NLTK VADER (compound),
- aggregates to a daily sentiment signal (z_t), and
- fits a Bayesian Student-t regression (with PyMC) for next-day log-return and 3-day price forecasts, reporting 94% HDIs for parameters and 90% prediction intervals (PIs) for prices.
- Headlines → sentiment: For each ticker, fetch recent public headlines and score with VADER (compound). Average by day to create (z_t).
- Bayesian regression: Fit a Student-t regression of next-day log-return on yesterday’s sentiment (z_{t-1}) (lag-1). Heavy tails robustify against outliers.
- Uncertainty first-class: Report 94% HDIs for (\alpha,\beta,\sigma,\nu) and 90% PIs for predicted prices.
- Forecasts: Produce next-3-day price forecast table and chart.
- Comparison: Side-by-side β (sentiment effect) table across two tickers + indexed history vs mean forecast plot.
- Reproducible logging: Append each run to a local CSV at
results/predictions_log.csv(kept out of Git by.gitignore).
We model daily log-returns with heavy tails:
-
$r_t$ : next-day log-return -
$z_{t-1}$ : yesterday’s (lag-1) VADER daily average - Parameters
$(\alpha,\beta,\sigma,\nu)$ are inferred with PyMC (NUTS). - β answers: does yesterday’s sentiment move tomorrow’s return?
- Price forecasts are obtained by transforming simulated log-return paths to prices.
Windows
python -m venv venv
venv\Scripts\activatemacOS/Linux
python -m venv venv
source venv/bin/activatepip install -r requirements.txtpython -m nltk.downloader vader_lexiconstreamlit run app/streamlit_app.pyOpen the local URL shown by Streamlit (http://localhost:8501).
Inputs
- Ticker A (required) and Ticker B (optional)
- Run Speed: Fast / Standard / Accurate (controls MCMC draws / tuning)
Tabs
- Ticker 1 / Ticker 2: company blurb, Latest Headlines, Predicted Log-Return, Next Price (90% PI), Today’s Sentiment, Posterior Summary, 3-day Price Forecast (table + chart).
- Comparison: quick table of (\beta) (mean + 94% HDI) and Day-1 price forecast; indexed history vs mean forecast dots.
- Run log: a banner displays whether the log CSV was Created or Appended. You can also download the log directly from the UI.
Figures & tables shown in the UI
- Predicted log-return (mean + 94% HDI)
- Next price (mean + 90% PI)
- Posterior summary for (\alpha, \beta, \sigma, \nu) with diagnostics (ESS, (\hat{R}))
- 3-day price forecast:
(day_ahead, price_mean, price_p05, price_p95)+ chart
Run log CSV: results/predictions_log.csv (local; ignored by Git)
Contains timestamp, tickers, posterior summaries and key forecast numbers (including day-ahead price mean and PI endpoints).
Useful for auditing, comparisons across runs, and lightweight experimentation.
We tested the app and predicted the next day return of BIC on 20th Aug and checked against the actual closing price on 21st Aug using yahoo finance.
project/
├─ app/
│ └─ streamlit_app.py
├─ poster/
│ └─ final_project_poster_A0.pdf
├─ literature/
├─ outputs/
├─ results/
│ └─ predictions_log.csv
├─ requirements.txt
└─ README.md
├─ requirements.txt
└─ README.md
- Predictability may be weak/noisy; real-world alpha is hard.
- Headline sampling & VADER rules can bias the signal — try domain-tuned or LLM sentiment.
- Extend to multivariate models (market/sector factors), hierarchical priors, or state-space models with stochastic volatility.
- Evaluation: add rolling backtests; CRPS/quantile loss for PIs; compare with AR/ARX/GARCH baselines.
- Scheduled data refresh, richer news sources, and caching.
Disclaimer: For research/education only — not financial advice.
Python · Streamlit · PyMC · ArviZ · NumPy · pandas · Matplotlib · NLTK (VADER) · requests/bs4 · yfinance
MIT — see LICENSE.
If you reference this project:
Narayanasamy, S.; Rajendra Prasad, S.B. (2025). Bayesian Estimation of Sentiment Impact on Stock Prices. Version 1.0.0. MIT License. Poster:
poster/final_project_poster_A0.pdf.
@misc{narayanasamy_prasad_2025,
title={Bayesian Estimation of Sentiment Impact on Stock Prices},
author={Narayanasamy, Sai Pavan and Rajendra Prasad, Shreemadhi Babu},
year={2025},
note={Version 1.0.0. Poster: poster/final_project_poster_A0.pdf},
howpublished={GitHub repository}
}- VADER sentiment (NLTK)
- Public headline sources used by the app; Yahoo price data
- UCD — ACM40960 Projects in Maths Modelling
- Saipavan Narayanasamy (24233785) - mailto:saipavan.narayanasamy@ucdconnect.ie
- Shreemadhi Babu Rajendra Prasad (24207575) - mailto:shreemadhi.baburajendrapra@ucdconnect.ie






