Bitcoin Price Prediction with MLFlow and Feast

Overview

This repository contains code that predicts the price of Bitcoin using various blockchain features combined from two sources. The project includes various deep learning model architectures (primarily CNN and LSTM) to predict Bitcoin prices using different size time steps. The full machine learning lifecycle is managed by MLFlow and Feast.

Feast feature views for Bitcoin price and metric data are combined and pulled for training and batch model scoring. MLFlow Projects is used to orchestrate each step in the process; ETL, model training using hyperparameter optimization, model selection and deployment, and model scoring. Optuna is used to perform hyperparameter optimization, which is integrated with MLFlow Tracking using Optuna's callback integration. The best performing model is registered programatically and made available for scoring inside the MLFlow Model Registry.

Project Structure

All methods can be executed via the command line using mlflow run . or main.py. Using mlflow run . ensures all features of MLFlow are integrated properly.

├── artifacts                               <- tokenizers, scalers, and optuna studies 
├── base                                    <- base classes 
├── data                                    <- bitcoin metric and price data
    ├── feature_repo                        <- feast feature repository
      ├── data                              <- feast data home
    ├── mlflow_backend                      <- backend for mlflow server 
    ├── processed                           <- cleansed, transformed, scaled data
    ├── raw                                 <- raw price and metric data from apis 
├── data_loaders                            <- classes that load preprocessed data and splits it 
├── data_producers                          <- classes that load data from apis and preprocesses data
├── mlruns                                  <- default MLFlow artifact folder created automatically
├── models                                  <- price models
├── optimizers                              <- optuna optimizers for price models
├── predictors                              <- model prediction classes
├── trainers                                <- model training classes
├── bitcoin-price-prediction-eda.ipynb      <- Jupyter notebook for project eda
├── main.py                                 <- command line methods
├── MLproject                               <- MLflow Project configuration
├── price-prediction-experiments.sh         <- script for price prediction experiments
├── conda.yaml                              <- conda environment requirements

Data Sources

Alpha Vantage API

Daily bitcoin price data, open, high, low, close, volume, and market, are pulled using the Alpha Vantage API. A free API key must be obtained and set the value to an alphavantage_api_key key in a config.yaml file in the root of the project.

Blockchain.com API

The blockchain metrics are obtained from the blockchain.com API. More information about the information is available at https://www.blockchain.com/charts. An API key is not required.

Envrionment Setup

The following are required to run the machine learning pipelines locally.

Setup MLFlow Environment in your command-line

These environment variables need to be set prior to using mlflow run . commands.

export MLFLOW_TRACKING_URI=http://127.0.0.1:5000
export MLFLOW_EXPERIMENT_NAME="crypto-reg-experiment"

Running MLFlow Server

This command starts MLFlow server locally. See MLFlow documentation for production grade deployments options.

mlflow server --backend-store-uri sqlite:///data/mlflow_backend/mlflow.db --host 0.0.0.0

Feast Setup

These commands will create or update the Feast feature repository. Feast teardowm can also be run to reset the repository.

cd ./data/feature_repo
feast apply

Model Pipelines

The command line methods below represent the steps in the model pipeline for retrieving the bitcoin blockchain metrics and price data, training the regression Bitcoin price prediction models in the context of an experiment, registering the best performing model, and scoring against the registered model. The MLproject file defines the entry points into main.py so the options are the same and only listed once in each step of the pipeline. You can run all the commands below together using price-prediction-regression-experiment.sh.

Retrieve Data

These commands will pull data from Blockchain.com and Alpha Vantage when --pull-data is True and save them as parquet files. They are used as Feast file sources for the feature views and feature services. See Feast docs for a review of key concepts.

Usage: mlflow run . -e price-data-retrieve -P pull-data=True --env-manager=local

OR

Usage: main.py price-data-retrieve [OPTIONS]

Options:
  --pull-data BOOLEAN  load saved data or pull new data. Default is False
  --help               Show this message and exit.

Price Regression Model Training with Hyperparameter Optimization

These commands create an Optuna study with the number of trials specified. Model metrics and parameters are logged, and models are saved for each trial as a run under the specified experiment in MLFlow. Subsequent command execution can use the same experiment. If a new experiment is desired, be sure to modify the environment variable for experiment name as described above. The data is loaded using Feast.

mlflow run . -P model_type='LSTM' -P n_trials=4 --env-manager=local --experiment-name="crypto-reg-experiment"

OR

Usage: main.py price-regression-optimize [OPTIONS]

Options:
  --model-type TEXT            LSTM, CNN. Default uses all
  --pull-data BOOLEAN          load saved data or pull new data. Default is
                               False
  --n-trials INTEGER           number of study trials. Default is 2
  --timesteps INTEGER          Number of timesteps, if not set the optimizer
                               will select the timesteps. Default is 0
  --prediction-length INTEGER  The number of predictions in the test set. The
                               size of the test set will be prediction length
                               + timesteps + 1. Be careful not to go above the
                               total size of data set. Default is 5
  --version INTEGER            version of study. Default is 1
  --experiment-name TEXT       Name of the experiment
  --help                       Show this message and exit.

Register Best Model in MLFlow Model Registry

These commands select the best model based on the model in an experiment that had the best MAPE (Mean Absolute Percentage Error) and registers in for scoring.

mlflow run . -e register-best-model -P model-name="bitcoin-regression" -P experiment-id="2" --env-manager=local

OR

Usage: main.py register-best-model [OPTIONS]

Options:
  --model-name TEXT     Model name
  --experiment-id TEXT  Experiment id
  --help                Show this message and exit.

Price Prediction

This command will load the latest version of a registered model and predict the Bitcoin price for the next day. The prediciton is printed in the console and logged in the experiment as a run.

mlflow run . -e price-predict -P date-end="2023-01-02" -P registered-model-name="bitcoin-regression" -P timesteps=10 --env-manager=local

OR

Usage: main.py price-predict [OPTIONS]

Options:
  --date-end [%Y-%m-%d]         Final date for time series prior to prediction
                                date
  --registered-model-name TEXT  Registered model name without version. The
                                latest version will be selected from the
                                registry
  --timesteps INTEGER           Number of timesteps the model was trained
                                with. Default is 10
  --help                        Show this message and exit.

Tools

Running Optuna Dashboard

optuna-dashboard sqlite:///artifacts/db.sqlite3

Running the Feast UI

cd ./data/feature_repo
feast ui

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bitcoin Price Prediction with MLFlow and Feast

Overview

Project Structure

Data Sources

Alpha Vantage API

Blockchain.com API

Envrionment Setup

Setup MLFlow Environment in your command-line

Running MLFlow Server

Feast Setup

Model Pipelines

Retrieve Data

Price Regression Model Training with Hyperparameter Optimization

Register Best Model in MLFlow Model Registry

Price Prediction

Tools

Running Optuna Dashboard

Running the Feast UI

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
base		base
data		data
data_loaders		data_loaders
data_producers		data_producers
models		models
optimizers		optimizers
predictors		predictors
trainers		trainers
.gitignore		.gitignore
MLproject		MLproject
README.md		README.md
__init__.py		__init__.py
bitcoin-price-prediction-eda.ipynb		bitcoin-price-prediction-eda.ipynb
conda.yaml		conda.yaml
main.py		main.py
price-prediction-regression-experiment.sh		price-prediction-regression-experiment.sh

pahurlocker/crypto-price-prediction

Folders and files

Latest commit

History

Repository files navigation

Bitcoin Price Prediction with MLFlow and Feast

Overview

Project Structure

Data Sources

Alpha Vantage API

Blockchain.com API

Envrionment Setup

Setup MLFlow Environment in your command-line

Running MLFlow Server

Feast Setup

Model Pipelines

Retrieve Data

Price Regression Model Training with Hyperparameter Optimization

Register Best Model in MLFlow Model Registry

Price Prediction

Tools

Running Optuna Dashboard

Running the Feast UI

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages