Benchmark that compares quality of GBDT packages on rossman-store-sales dataset.
Number of hyperopt iterations was set to 50, final model is tuned with best hyperparameters on all train data.
Experiment | Best hyperparameters | RMSE on test |
catboost with specifying cat features | best_n_estimators = 1415 params = {'random_seed': 0, 'learning_rate': 0.10663314690544494, 'iterations': 1500, 'od_wait': 100, 'one_hot_max_size': 143.0, 'bagging_temperature': 0.39933964736871874, 'random_strength': 1, 'depth': 8.0, 'loss_function': 'RMSE', 'l2_leaf_reg': 5.529962582104021, 'border_count': 254, 'boosting_type': 'Plain', 'bootstrap_type': 'Bayesian'} |
489.75 |
lightgbm with specifying cat features | best_n_estimators = 3396 params = {'num_leaves': 63, 'max_cat_threshold': 2, 'cat_l2': 12.93150760783131, 'verbose': -1, 'bagging_seed': 3, 'max_cat_to_onehot': 2, 'learning_rate': 0.12103165638430856, 'max_delta_step': 0.0, 'data_random_seed': 1, 'cat_smooth': 4.287437698866151, 'min_data_in_leaf': 26, 'bagging_fraction': 0.6207358917316325, 'min_data_per_group': 261, 'min_sum_hessian_in_leaf': 7.515138790064522e-05, 'feature_fraction_seed': 2, 'min_gain_to_split': 0.0, 'lambda_l1': 0, 'bagging_freq': 1, 'lambda_l2': 0.1709660204090765, 'max_depth': -1, 'objective': 'mean_squared_error', 'drop_seed': 4, 'metric': 'l2', 'feature_fraction': 0.8168930995735235} |
504.76 |
xgboost | best_n_estimators = 4011 params = {'reg_alpha': 0.14747200224681817, 'tree_method': 'gpu_hist', 'colsample_bytree': 0.883176060062088, 'silent': 1, 'eval_metric': 'rmse', 'grow_policy': 'depthwise', 'learning_rate': 0.10032091014826115, 'subsample': 0.5740170782945163, 'reg_lambda': 0, 'max_bin': 1020, 'objective': 'reg:linear', 'min_split_loss': 0, 'max_depth': 7} |
490.83 |
Max iterations limit was set to 9999 and early_stopping_rounds
to 100.
Note that for CatBoost results differ between CPU and GPU implementations because border_count
parameter has default value 254 in CPU mode and 128 in GPU mode.
CPU - Intel Xeon E312xx (Sandy Bridge) VM, 16 cores.
Experiment | Early stopping time (sec) | RMSE on test | Comments |
catboost w/o specifying cat features | 212.67 | 578.10 | reached max iterations limit |
catboost with specifying cat features | 894.51 | 520.07 | |
lightgbm w/o specifying cat features | 51.17 | 499.67 | |
lightgbm with specifying cat features | 9.90 | 490.57 | |
xgboost | 272.3 | 567.8 | reached max iterations limit |
GPU - 2x nVidia GeForce 1080 Ti.
Experiment | Early stopping time (sec) | RMSE on test | Comments |
catboost w/o specifying cat features | 39.5 | 575.75 | reached max iterations limit |
catboost with specifying cat features | 90.83 | 528.63 | |
lightgbm w/o specifying cat features | 97.93 | 501.22 | |
lightgbm with specifying cat features | n/a | n/a | Failed: [LightGBM] [Fatal] bin size 1093 cannot run on GPU, see microsoft/LightGBM#1116 |
xgboost in 'gpu-exact' mode | 125.48 | 566.55 | reached max iterations limit |
xgboost in 'gpu-hist' mode | 68.04 | 626.09 | reached max iterations limit |
Hyperparameter distributions:
'n_estimators' : LogUniform(100, 1000, True),
'max_depth' : scipy.stats.randint(low=1, high=16),
'learning_rate' : scipy.stats.uniform(0.01, 1.0)
(see experiments_lib.py file for LogUniform definition)
CPU - Intel Xeon E312xx (Sandy Bridge) VM, 16 cores.
Experiment | Time (sec) | RMSE on test |
catboost w/o specifying cat features | 239.91 | 568.38 |
catboost with specifying cat features | 1145.98 | 534.13 |
lightgbm w/o specifying cat features | 105.02 | 523.62 |
lightgbm with specifying cat features | 97.94 | 510.53 |
xgboost | 437.8 | 512.74 |
OS - Linux (was tested on Ubuntu LTS 16.04)
Installed packages (via 'pip install'):
- kaggle
- hyperopt
- numpy
- pandas
- scipy
- scikit-learn
Tested on:
- catboost 0.11.0
- lightgbm 2.2.1
- xgboost 0.80
- Download dataset from kaggle
- Preprocess it (extract features and save in CatBoost data format)
- Run benchmarks
(see 'run_all.sh' that does all these steps)