Quick Start

This is a quick start guide for LightGBM of cli version.

Follow the Installation Guide to install LightGBM first.

List of other Helpful Links

Parameters
Parameters Tuning
Python Package quick start guide
Python API Reference

Training data format

LightGBM supports input data file with CSV, TSV and LibSVM formats.

Label is the data of first column, and there is no header in the file.

Categorical feature support

update 12/5/2016:

LightGBM can use categorical feature directly (without one-hot coding). The experiment on Expo data shows about 8x speed-up compared with one-hot coding.

For the setting details, please refer to Parameters.

Weight and query/group data

LightGBM also support weighted training, it needs an additional weight data. And it needs an additional query data for ranking task.

update 11/3/2016:

support input with header now
can specific label column, weight column and query/group id column. Both index and column are supported
can specific a list of ignored columns

For the detailed usage, please refer to Configuration.

Parameter quick look

The parameter format is key1=value1 key2=value2 ... . And parameters can be in both config file and command line.

Some important parameters:

config, default="", type=string, alias=config_file
- path of config file
task, default=train, type=enum, options=train,prediction
- train for training
- prediction for prediction.
application, default=regression, type=enum, options=regression,binary,lambdarank,multiclass, alias=objective,app
- regression, regression application
- binary, binary classification application
- lambdarank, lambdarank application
- multiclass, multi-class classification application, should set num_class as well
boosting, default=gbdt, type=enum, options=gbdt,dart, alias=boost,boosting_type
- gbdt, traditional Gradient Boosting Decision Tree
- dart, Dropouts meet Multiple Additive Regression Trees
data, default="", type=string, alias=train,train_data
- training data, LightGBM will train from this data
valid, default="", type=multi-string, alias=test,valid_data,test_data
- validation/test data, LightGBM will output metrics for these data
- support multi validation data, separate by ,
num_iterations, default=100, type=int, alias=num_iteration,num_tree,num_trees,num_round,num_rounds
- number of boosting iterations/trees
learning_rate, default=0.1, type=double, alias=shrinkage_rate
- shrinkage rate
num_leaves, default=31, type=int, alias=num_leaf
- number of leaves in one tree
tree_learner, default=serial, type=enum, options=serial,feature,data
- serial, single machine tree learner
- feature, feature parallel tree learner
- data, data parallel tree learner
- Refer to Parallel Learning Guide to get more details.
num_threads, default=OpenMP_default, type=int, alias=num_thread,nthread
- Number of threads for LightGBM.
- For the best speed, set this to the number of real CPU cores, not the number of threads (most CPU using hyper-threading to generate 2 threads per CPU core).
- For parallel learning, should not use full CPU cores since this will cause poor performance for the network.
max_depth, default=-1, type=int
- Limit the max depth for tree model. This is used to deal with overfit when #data is small. Tree still grow by leaf-wise.
- < 0 means no limit
min_data_in_leaf, default=20, type=int, alias=min_data_per_leaf , min_data
- Minimal number of data in one leaf. Can use this to deal with over-fit.
min_sum_hessian_in_leaf, default=1e-3, type=double, alias=min_sum_hessian_per_leaf, min_sum_hessian, min_hessian
- Minimal sum hessian in one leaf. Like min_data_in_leaf, can use this to deal with over-fit.

For all parameters, please refer to Parameters.

Run LightGBM

For Windows:

lightgbm.exe config=your_config_file other_args ...

For unix:

./lightgbm config=your_config_file other_args ...

Parameters can be both in the config file and command line, and the parameters in command line have higher priority than in config file. For example, following command line will keep 'num_trees=10' and ignore same parameter in config file.

./lightgbm config=train.conf num_trees=10

Examples

Binary Classification
Regression
Lambdarank
Parallel Learning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick-Start.md

Quick-Start.md

Quick Start

Training data format

Categorical feature support

Weight and query/group data

Parameter quick look

Run LightGBM

Examples

Files

Quick-Start.md

Latest commit

History

Quick-Start.md

File metadata and controls

Quick Start

Training data format

Categorical feature support

Weight and query/group data

Parameter quick look

Run LightGBM

Examples