Skip to content

madhavmadupu/FinPredict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“ˆ FinPredict: End-to-End Stock Price Direction Prediction Using Machine Learning & Deep Learning

A comprehensive data science project covering EDA, feature engineering, classical ML, and neural networks β€” all applied to real financial time-series data.


🧭 Project Overview

This project aims to predict whether a stock’s closing price will go UP or DOWN the next trading day using historical OHLCV (Open, High, Low, Close, Volume) data. We follow a full pipeline:

  1. βœ… Exploratory Data Analysis (EDA) – Understand data structure, trends, and seasonality
  2. 🧹 Data Preprocessing & Feature Engineering – Create technical indicators, handle missing values, create lag features
  3. πŸ“Š Statistical Analysis & Visualization – Correlation, stationarity tests, returns distribution
  4. πŸ€– Classical Machine Learning – Train Logistic Regression, Random Forest, XGBoost models
  5. 🧠 Deep Learning (LSTM/GRU) – Build sequence models for time-series forecasting
  6. πŸ“ˆ Model Evaluation & Comparison – Compare performance across models using time-based splits

πŸ“¦ Dataset

We use the S&P 500 Stock Prices (2010–2024) dataset from Kaggle.

Columns:

  • date – Trading date
  • open, high, low, close – Daily price levels
  • volume – Number of shares traded
  • Name – Ticker symbol (e.g., AAPL, MSFT)

πŸ’‘ Note: Your Excel screenshot shows all_stocks_5yr.csv β€” this is likely the same dataset. We’ll load it into pandas for analysis.


πŸ› οΈ Tech Stack

  • Language: Python 3.9+
  • Libraries:
    • pandas, numpy β†’ Data handling
    • matplotlib, seaborn, plotly β†’ Visualization
    • scikit-learn β†’ Classical ML
    • tensorflow / keras β†’ Neural Networks (LSTM)
    • statsmodels, scipy β†’ Statistical tests
    • ta (Technical Analysis library) β†’ Feature engineering

πŸ“ Project Structure

FinPredict/
β”‚
β”œβ”€β”€ data/                   # Raw and processed datasets
β”œβ”€β”€ notebooks/              # Jupyter notebooks for each phase
β”‚   β”œβ”€β”€ 01_eda.ipynb
β”‚   β”œβ”€β”€ 02_preprocessing.ipynb
β”‚   β”œβ”€β”€ 03_statistics_visualization.ipynb
β”‚   β”œβ”€β”€ 04_machine_learning.ipynb
β”‚   └── 05_neural_networks.ipynb
β”œβ”€β”€ models/                 # Saved trained models
β”œβ”€β”€ utils/                  # Helper functions (feature engineering, plotting, etc.)
β”œβ”€β”€ README.md               # This file
└── requirements.txt        # Python dependencies

πŸš€ Getting Started

  1. Clone this repository
  2. Install dependencies:
    pip install -r requirements.txt
  3. Download the dataset from Kaggle and place it in data/all_stocks_5yr.csv
  4. Open notebooks/01_eda.ipynb to begin!

πŸ“Œ Key Insights (To Be Filled After Analysis)

  • Stationarity of returns?
  • Most predictive features?
  • Best performing model? (XGBoost vs LSTM)
  • Accuracy on test set?

πŸ“ Future Work

  • Add fundamental data (P/E ratio, EPS) from Yahoo Finance
  • Try Transformer models for multi-stock forecasting
  • Deploy model via Streamlit or FastAPI
  • Backtest trading strategy based on predictions

πŸ™‹β€β™‚οΈ Author

Madhav Madupu | [LinkedIn/GitHub] | Date: December 10, 2025


🌟 Built for learning, portfolio showcase, and real-world finance applications.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published