Test xgboost modeling engine

The Data Department recently performed some model benchmarking ([ccao-data/report-model-benchmark](https://github.com/ccao-data/report-model-benchmark)) comparing the run times of XGBoost and LightGBM. We found that the current iteration of XGBoost runs much faster than LightGBM on most machines, while achieving similar performance.

We should test replacing LightGBM as the primary modeling engine in both models.

## LightGBM

### Pros

- Native categorical support (easier feature engineering + clean SHAP values)
- Better maintain R package
- Already have bindings for advanced features (via Lightsnip)
- _Slightly_ better performance for our data

### Cons

- Slightly slower for general training (as of XGBoost 2.0.0)
- _Massively_ slower for calculating SHAP values (full order of magnitude)
- Backend code seems much buggier
- GPU support is lacking (+ hard to build for the R package)
- Approximately 50,000 hyperparameters

## XGBoost

### Pros

- Well-maintained codebase, will definitely exist in perpetuity
- Excellent GPU and multi-core training support. Calculates SHAPs very quickly
- More widely used than LightGBM

### Cons

- No native categorical support in the R package, even though the underlying XGBoost C++ supports it. Unlikely to change by the time we need to ship the 2024 model
- R package support seems lacking


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test xgboost modeling engine #31

LightGBM

Pros

Cons

XGBoost

Pros

Cons

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test xgboost modeling engine #31

Description

LightGBM

Pros

Cons

XGBoost

Pros

Cons

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions