(Please please Check out the extended read me if you have the time. It is much more fleshed out)
- Take snap shots of the order book state at different times from BitMex websockets, collect all L2(individual orders) data
- Process data and perform feature extraction and engineering to create features that can be used properly by the models
- Design a series of predictive models including LSTM Neural Networks and Gradient Tree Boosting to predict future order book states
- Compare model results and find the best model to predict order book states
There are a few ways to setup XGBoost (and LSTM) for multi-step predictions:
- Direct Approach: Fit a new regressor for each future time point we want to predict.
- Recursive Approach: Creating clusters of models that predict features individually at each timestep for each variable. And then a larger model that regresses the final value we want
We will use the direct approach with an XGBoost model, and the recursive approach with an LSTM neural network. The goal is to capture trends, so we look at the most recent 10 timesteps...
Model | MSE |
---|---|
Exponential Smoothing | 1.833 |
Arima | 3.496 |
LSTM - Recursive | 1.972 |
XGBoost - Direct | 1.725 |
- XGBoost model seems to perform the best, followed by our exponential smoothing baseline, and then the LSTM
- The LSTM model performs worse as we increase the timesteps
- This is because by using the recursive approach, we essentially compound our error, more on this in model section
40 timesteps with XGBoost
- We don't see an increasing error even at the 40 timesteps in the future, compared to the original 10 that we predicted.
Direct Approach explained:
Every 30 seconds, I collect data on the current order book and save the data according to each "batch" because most exchange APIs don't have historical data.
Here is an example of the market depth(liquidity) near the mid-price:
I engineereed about 12 features based off of the data we collected. The formulas and algorithms can be found in the feature engineering doc
To combat noise:
- I smooth data in hopes for the model to be less sensitive to sharp spikes, making it better at capturing general trends.
A good example is with the directional signal value...
Non-smoothed data:
Smoothed data:
It is clear we do a much better job of capturing "stationary" trends, because we are less sensitive to small changes.
First, I built some univariate baseline models using ARIMA and Exponential Smoothing.
After building the baselines, I built two models that take multivariate inputs to see if we can improve, a LSTM and XGBoosted Trees
Some resources I used for XGBoost:
- XGBoost - arxiv
- Time Series Prediction Models -arxiv
- Fine tuning XGBoost - medium
At each timestep, I wanted to add values for the previous 20 timesteps as well so the XGBoost model would have relevant information on previous timesteps as well.
Hyperparameters:
- The most important hyperparameters I focused on when tuning were:
- max_depth: The max depth of the trees. Making sure this value is not too high is crucial for good results.
- learning_rate: Many models out there have very small learning rates, but due to the stochastic nature of this project, a higher learning rate of 0.1 is more appropriate
Here we can see the performance of the XGBoost model in comparison to the baseline models we created.
To build the LSTM, there is some more data processing that is needed in comparison the XGBoost model.
- I got a lot of inspiration from this article as well
For the LSTM, we use two bidirectional LSTMs with several dense layers. The LSTMs also use a loockback function that allows us to use a sliding window to garner information from the past.
Here we can see the LSTM model compared to the training data:
Here is the model on the testing data:
When using recursive approaches, errors can compound on each other. Here is a great illustration I found:
With a little bit of research, you will find that LSTM neural networks seem to perform pretty poorly on real financial data. The reason for this is that they are extremely prone to over-fitting, and on top of that, they perform poorly when working with auto-regression problems.