Short-Frame-Price-Prediction-on-Limit-Order-Book-Data - Short look

(Please please Check out the extended read me if you have the time. It is much more fleshed out)

Project Plan

Take snap shots of the order book state at different times from BitMex websockets, collect all L2(individual orders) data
Process data and perform feature extraction and engineering to create features that can be used properly by the models
Design a series of predictive models including LSTM Neural Networks and Gradient Tree Boosting to predict future order book states
Compare model results and find the best model to predict order book states

XGB and LSTM Models

After building the baselines, I built two models that take multivariate inputs to see if we can improve, a LSTM and XGBoosted Trees

XGBoost

Some resources I used for XGBoost:

At each timestep, I wanted to add values for the previous 20 timesteps as well so the XGBoost model would have relevant information on previous timesteps as well.

Hyperparameters:

The most important hyperparameters I focused on when tuning were:
- max_depth: The max depth of the trees. Making sure this value is not too high is crucial for good results.
- learning_rate: Many models out there have very small learning rates, but due to the stochastic nature of this project, a higher learning rate of 0.1 is more appropriate

Here we can see the performance of the XGBoost model in comparison to the baseline models we created.

LSTMs

To build the LSTM, there is some more data processing that is needed in comparison the XGBoost model.

I got a lot of inspiration from this article as well

For the LSTM, we use two bidirectional LSTMs with several dense layers. The LSTMs also use a loockback function that allows us to use a sliding window to garner information from the past.

Here we can see the LSTM model compared to the training data:

Here is the model on the testing data:

When using recursive approaches, errors can compound on each other. Here is a great illustration I found:

With a little bit of research, you will find that LSTM neural networks seem to perform pretty poorly on real financial data. The reason for this is that they are extremely prone to over-fitting, and on top of that, they perform poorly when working with auto-regression problems.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
Data and Feature Processing		Data and Feature Processing
Websocket		Websocket
PyOrderBook_modeling.ipynb		PyOrderBook_modeling.ipynb
PyOrderBook_visuals.ipynb		PyOrderBook_visuals.ipynb
README.md		README.md
README_extended.md		README_extended.md

Model	MSE
Exponential Smoothing	1.833
Arima	3.496
LSTM - Recursive	1.972
XGBoost - Direct	1.725

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Short-Frame-Price-Prediction-on-Limit-Order-Book-Data - Short look

Project Plan

Table of Contents

Model Results

Data Collection

Feature Engineering

Baseline Models

XGB and LSTM Models

XGBoost

LSTMs

About

Releases

Packages

Languages

bhulston/Time-Series-Prediction-with-LSTM-and-XGB

Folders and files

Latest commit

History

Repository files navigation

Short-Frame-Price-Prediction-on-Limit-Order-Book-Data - Short look

Project Plan

Table of Contents

Model Results

Data Collection

Feature Engineering

Baseline Models

XGB and LSTM Models

XGBoost

LSTMs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages