Skip to content

project-bayesian-stock-impact-estimation created by GitHub Classroom

License

ACM40960/project-bayesian-stock-impact-estimation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bayesian Estimation of Sentiment Impact on Stock Prices

Latest headlines → VADER sentiment → Bayesian Student-t regression (PyMC) → next-day log-return and multi-day price forecasts with uncertainty.

License Python Streamlit PyMC Status

Authors
Shreemadhi Babu Rajendra Prasad (24207575) · Saipavan Narayanasamy (24233785) - M.Sc. in Data & Computational Science, University College Dublin

Poster: Project Poster


Table of Contents


About the project

Goal. Turn daily headlines into a quantitative sentiment signal and measure its predictive effect on next-day returns; produce uncertainty-aware price forecasts over short horizons.

Why Student-t? Heavy-tailed residuals guard against outliers and volatility clustering common in returns.

Why Bayesian? Full posteriors + diagnostics (ESS, $\hat{R}$) + calibrated prediction intervals.

Why Streamlit? A fast, transparent interface to explore data, diagnostics, and forecasts.


Workflow overview

End-to-end workflow: data → sentiment → model → forecasts


Overview

We build a small research app that:

  1. pulls the latest news headlines per ticker,
  2. scores each headline with NLTK VADER (compound),
  3. aggregates to a daily sentiment signal (z_t), and
  4. fits a Bayesian Student-t regression (with PyMC) for next-day log-return and 3-day price forecasts, reporting 94% HDIs for parameters and 90% prediction intervals (PIs) for prices.

What the app does

  • Headlines → sentiment: For each ticker, fetch recent public headlines and score with VADER (compound). Average by day to create (z_t).
  • Bayesian regression: Fit a Student-t regression of next-day log-return on yesterday’s sentiment (z_{t-1}) (lag-1). Heavy tails robustify against outliers.
  • Uncertainty first-class: Report 94% HDIs for (\alpha,\beta,\sigma,\nu) and 90% PIs for predicted prices.
  • Forecasts: Produce next-3-day price forecast table and chart.
  • Comparison: Side-by-side β (sentiment effect) table across two tickers + indexed history vs mean forecast plot.
  • Reproducible logging: Append each run to a local CSV at results/predictions_log.csv (kept out of Git by .gitignore).

Bayesian Model

We model daily log-returns with heavy tails:

$$ r_t ;=; \alpha + \beta, z_{t-1} + \varepsilon_t, \qquad \varepsilon_t \sim \text{Student-t}(\nu, 0, \sigma) $$

  • $r_t$: next-day log-return
  • $z_{t-1}$: yesterday’s (lag-1) VADER daily average
  • Parameters $(\alpha,\beta,\sigma,\nu)$ are inferred with PyMC (NUTS).
  • β answers: does yesterday’s sentiment move tomorrow’s return?
  • Price forecasts are obtained by transforming simulated log-return paths to prices.

Install & Run

Python 3.9+ recommended · Docs

1) Create & activate a virtual environment

Windows

python -m venv venv
venv\Scripts\activate

macOS/Linux

python -m venv venv
source venv/bin/activate

2) Install packages

pip install -r requirements.txt

3) One-time: download VADER lexicon used by NLTK

python -m nltk.downloader vader_lexicon

4) Launch the app

streamlit run app/streamlit_app.py

Open the local URL shown by Streamlit (http://localhost:8501).


Using the app

Inputs

  • Ticker A (required) and Ticker B (optional)
  • Run Speed: Fast / Standard / Accurate (controls MCMC draws / tuning)

Tabs

  • Ticker 1 / Ticker 2: company blurb, Latest Headlines, Predicted Log-Return, Next Price (90% PI), Today’s Sentiment, Posterior Summary, 3-day Price Forecast (table + chart).
  • Comparison: quick table of (\beta) (mean + 94% HDI) and Day-1 price forecast; indexed history vs mean forecast dots.
  • Run log: a banner displays whether the log CSV was Created or Appended. You can also download the log directly from the UI.

Outputs & Run Log

Figures & tables shown in the UI

  • Predicted log-return (mean + 94% HDI)
  • Next price (mean + 90% PI)
  • Posterior summary for (\alpha, \beta, \sigma, \nu) with diagnostics (ESS, (\hat{R}))
  • 3-day price forecast: (day_ahead, price_mean, price_p05, price_p95) + chart

Run log CSV: results/predictions_log.csv (local; ignored by Git)
Contains timestamp, tickers, posterior summaries and key forecast numbers (including day-ahead price mean and PI endpoints).
Useful for auditing, comparisons across runs, and lightweight experimentation.


Outputs

Forecast charts

TSLA — Price forecast (next 3 days) AAPL — Price forecast (next 3 days)

Comparison

Comparison around 'Today': history vs mean forecast

Comparison table

Predicted price and range of price that 90% price falls under this range for next day.

Tested Predictions

We tested the app and predicted the next day return of BIC on 20th Aug and checked against the actual closing price on 21st Aug using yahoo finance.

App prediction (20 Aug run) Comparison: predicted vs actual close on 21 Aug


Repo structure

project/
├─ app/
│  └─ streamlit_app.py          
├─ poster/
│  └─ final_project_poster_A0.pdf
├─ literature/                   
├─ outputs/                      
├─ results/
│  └─ predictions_log.csv        
├─ requirements.txt
└─ README.md
├─ requirements.txt
└─ README.md

Limitations & future work

  • Predictability may be weak/noisy; real-world alpha is hard.
  • Headline sampling & VADER rules can bias the signal — try domain-tuned or LLM sentiment.
  • Extend to multivariate models (market/sector factors), hierarchical priors, or state-space models with stochastic volatility.
  • Evaluation: add rolling backtests; CRPS/quantile loss for PIs; compare with AR/ARX/GARCH baselines.
  • Scheduled data refresh, richer news sources, and caching.

Disclaimer: For research/education only — not financial advice.


Tech stack

Python · Streamlit · PyMC · ArviZ · NumPy · pandas · Matplotlib · NLTK (VADER) · requests/bs4 · yfinance


License

MIT — see LICENSE.


Cite

If you reference this project:

Narayanasamy, S.; Rajendra Prasad, S.B. (2025). Bayesian Estimation of Sentiment Impact on Stock Prices. Version 1.0.0. MIT License. Poster: poster/final_project_poster_A0.pdf.

BibTeX

@misc{narayanasamy_prasad_2025,
  title={Bayesian Estimation of Sentiment Impact on Stock Prices},
  author={Narayanasamy, Sai Pavan and Rajendra Prasad, Shreemadhi Babu},
  year={2025},
  note={Version 1.0.0. Poster: poster/final_project_poster_A0.pdf},
  howpublished={GitHub repository}
}

Acknowledgments

  • VADER sentiment (NLTK)
  • Public headline sources used by the app; Yahoo price data
  • UCD — ACM40960 Projects in Maths Modelling

Contributors

About

project-bayesian-stock-impact-estimation created by GitHub Classroom

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages