📈 StockSage: End-to-End LSTM Stock Price Prediction Pipeline

StockSage is a robust, reproducible machine learning pipeline for stock price prediction using LSTM neural networks. The project leverages DVC for data and model versioning, MLflow for experiment tracking, Hyperopt for automated hyperparameter tuning, and DagsHub for collaborative data science and remote storage.

🚀 Features

LSTM Neural Network for time series regression
Automated Hyperparameter Tuning with Hyperopt
Experiment Tracking with MLflow
Reproducible Pipelines using DVC
Remote Data & Model Storage with DagsHub
Robust Data Preprocessing and validation
Easy Configuration via params.yaml

🗂️ Project Structure

StockSage/
├── data/
│   ├── raw/
│   │   └── data.csv
│   └── processed/
│       └── data.csv
├── models/
│   └── model.h5
├── src/
│   ├── preprocess.py
│   ├── train.py
│   └── evaluate.py
├── params.yaml
├── dvc.yaml
├── requirements.txt
├── .env
└── README.md

⚙️ Setup & Installation

1. Clone the Repository

git clone https://github.com/yourusername/StockSage.git
cd StockSage

2. Install Dependencies

pip install -r requirements.txt

3. Configure Environment Variables

Create a .env file in the project root:

MLFLOW_TRACKING_URI=http://your-mlflow-server:5000
MLFLOW_TRACKING_USERNAME=your_username
MLFLOW_TRACKING_PASSWORD=your_password

4. Prepare Data

Place your raw stock data as data/raw/data.csv.
The file must include a CloseUSD column (target) and any number of numeric feature columns.

☁️ DagsHub Integration

This project uses DagsHub for:

Remote DVC storage: Store and version datasets and models in the cloud.
Collaboration: Share experiments, data, and models with your team.
Experiment tracking: Integrate MLflow and DVC for a seamless MLOps experience.

To use DagsHub as your DVC remote:

dvc remote add origin https://dagshub.com/<username>/<repo>.dvc
dvc remote modify origin --local auth basic
dvc remote modify origin --local user <your-dagshub-username>
dvc remote modify origin --local password <your-dagshub-token>

Push your data and models to DagsHub:

dvc push

🏃 Pipeline Usage

📝 DVC Stage Commands

To manually add pipeline stages (already present in dvc.yaml):

dvc stage add -n preprocess \
    -p preprocess.input,preprocess.output \
    -d src/preprocess.py -d data/raw/data.csv \
    -o data/processed/data.csv \
    python src/preprocess.py

dvc stage add -n train \
    -p train.data,train.model \
    -d src/train.py -d data/processed/data.csv \
    -o models/model.h5 \
    python src/train.py

dvc stage add -n evaluate \
    -d src/evaluate.py -d models/model.h5 -d data/processed/data.csv \
    python src/evaluate.py

Run the Full Pipeline

dvc repro

This will:

Preprocess the data
Train the LSTM model with hyperparameter tuning
Evaluate the model and log metrics

Run Stages Individually

dvc repro preprocess
dvc repro train
dvc repro evaluate

Or run scripts directly:

python src/preprocess.py
python src/train.py
python src/evaluate.py

📋 Configuration

params.yaml

preprocess:
  input: data/raw/data.csv
  output: data/processed/data.csv

train:
  data: data/processed/data.csv
  model: models/model.h5
  learning_rate: 0.001
  momentum: 0.9

dvc.yaml

Defines the pipeline stages and their dependencies.

🧠 Model & Training

Model: 2-layer LSTM with Dropout and Dense output
Input: All features are numeric, reshaped for LSTM
Loss: Mean Squared Error (MSE)
Optimizer: SGD (learning rate and momentum tuned)
Metrics: Root Mean Squared Error (RMSE), accuracy (rounded, optional)
Hyperparameters Tuned: LSTM units, dropout, learning rate, momentum, batch size, epochs

📊 Experiment Tracking

MLflow logs all hyperparameters, metrics, and models.
Access the MLflow UI with:
```
mlflow ui
```
Then visit http://localhost:5000 (or your configured URI).
DagsHub can also visualize MLflow experiments and DVC data lineage in the cloud.

🐍 Example Code Snippet

data = pd.read_csv(data_path)
X = data.drop("CloseUSD", axis=1)
y = data["CloseUSD"]

# Ensure all features are numeric
X = X.apply(pd.to_numeric, errors='coerce').fillna(0)
y = y.fillna(0)

# Split and reshape for LSTM
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
train_X_lstm = np.expand_dims(X_train.values, axis=2)
# ... model definition and training ...

🐛 Troubleshooting

Keras model saving error:
Ensure model_path ends with .h5 or .keras.
DVC parameter errors:
Remove unused parameters from dvc.yaml or add them to params.yaml.
MLflow connection issues:
Check your .env file and MLflow server status.
NaN or dtype errors:
Ensure all features are numeric and fill missing values before training.
DagsHub authentication issues:
Make sure your DagsHub token is correct and you have access to the repository.

📑 License

MIT License

🙏 Acknowledgments

Happy Predicting!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.dvc		.dvc
data		data
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
Readme.md		Readme.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📈 StockSage: End-to-End LSTM Stock Price Prediction Pipeline

🚀 Features

🗂️ Project Structure

⚙️ Setup & Installation

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Prepare Data

☁️ DagsHub Integration

🏃 Pipeline Usage

📝 DVC Stage Commands

Run the Full Pipeline

Run Stages Individually

📋 Configuration

params.yaml

dvc.yaml

🧠 Model & Training

📊 Experiment Tracking

🐍 Example Code Snippet

🐛 Troubleshooting

📑 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

shashwat051102/Stock_Sage_ML_pipeline

Folders and files

Latest commit

History

Repository files navigation

📈 StockSage: End-to-End LSTM Stock Price Prediction Pipeline

🚀 Features

🗂️ Project Structure

⚙️ Setup & Installation

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Prepare Data

☁️ DagsHub Integration

🏃 Pipeline Usage

📝 DVC Stage Commands

Run the Full Pipeline

Run Stages Individually

📋 Configuration

params.yaml

dvc.yaml

🧠 Model & Training

📊 Experiment Tracking

🐍 Example Code Snippet

🐛 Troubleshooting

📑 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages