Skip to content

Xgb datasets adding #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Apr 26, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
62f87c3
Applied mypy + flake8 for all files
Mar 22, 2021
132d73f
Sorted imports with ISort
Mar 22, 2021
4aa4898
Moved env change to runner
Mar 22, 2021
5a8db33
fixed all mypy errors and added mypy check to CI
Mar 22, 2021
5594efd
Yet another mypy fixes
Mar 22, 2021
35b55b8
Small runner refactoring
Mar 23, 2021
56de8f7
First attempt of adding nvidia datasets
Mar 29, 2021
0ee5f05
Merge branch 'master' into mypy-applying
Mar 29, 2021
04e7a64
removed E265 ignoring for flake8 job
Mar 29, 2021
8268747
Merge remote-tracking branch 'my/mypy-applying' into xgb-nvidia-datasets
Mar 30, 2021
b6a7eb0
NVidia benchmarks are working now
Mar 30, 2021
7e780bb
Added higgs, msrank and airline fetching
Mar 30, 2021
670c289
small fixes of env
Mar 30, 2021
dc0e9c9
Applying comments
Apr 1, 2021
f64ae68
Merge branch 'mypy-applying' into xgb-nvidia-datasets
Apr 1, 2021
873754b
Split dataset loading to different files
Apr 1, 2021
93ea32d
Merge remote-tracking branch 'origin/master' into xgb-nvidia-datasets
Apr 1, 2021
dcfc5b9
Why doesnt mypy work?
Apr 1, 2021
340402e
Added abalone + letters, updated all GB configs
Apr 15, 2021
6e47423
Added links and descriptions for new datasets
Apr 15, 2021
340a628
Merge remote-tracking branch 'origin/master' into xgb-nvidia-datasets
Apr 15, 2021
4be3720
handling mypy
Apr 15, 2021
8184016
Handled skex fake message throwing
Apr 15, 2021
cf5ee76
Trying to handle mypy, at. 3
Apr 15, 2021
9db3177
Trying to handle mypy, at. 4
Apr 15, 2021
5e76a0b
Trying to handle mypy, at. 5
Apr 15, 2021
13fcd20
Changed configs readme and made small fixes in GB testing configs
Apr 20, 2021
0873f97
Merge branch 'master' of https://github.com/IntelPython/scikit-learn_…
Apr 20, 2021
877e0fd
Applying more comments, updating readme's
Apr 20, 2021
8bdc7f2
Applying comments: renamed configs
Apr 20, 2021
f9cf09b
Changed all datasets to npy, applied Kirill's comments
Apr 23, 2021
41e003f
Merge branch 'master' of https://github.com/IntelPython/scikit-learn_…
Apr 23, 2021
523df30
Cleanup after someone's commit
Apr 23, 2021
59303fa
Applying mypy
Apr 23, 2021
b56e42c
Applied Ekaterina's suggestions
Apr 23, 2021
ad176e5
Applied other Ekaterina's comments
Apr 23, 2021
b92a27f
Merge branch 'xgb-nvidia-datasets' of https://github.com/RukhovichIV/…
Apr 23, 2021
11a8ffc
Final commits applying
Apr 26, 2021
37d5461
Alexander's final comments
Apr 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Added abalone + letters, updated all GB configs
  • Loading branch information
Igor Rukhovich committed Apr 15, 2021
commit 340402e1176f7f5839c1372e172871b6f5186a47
144 changes: 78 additions & 66 deletions configs/lgbm_mb_cpu_config.json
Original file line number Diff line number Diff line change
@@ -1,108 +1,120 @@
{
"common": {
"lib": ["modelbuilders"],
"data-format": ["pandas"],
"data-order": ["F"],
"dtype": ["float32"]
"lib": "modelbuilders",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add note to README that parameters might be set with single value or list of values

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done earlier

"data-format": "pandas",
"data-order": "F",
"dtype": "float32",
"algorithm": "lgbm_mb"
},
"cases": [
{
"algorithm": "lgbm_mb",
"dataset": [
{
"source": "csv",
"name": "mortgage1Q",
"source": "npy",
"name": "airline-ohe",
"training":
{
"x": "data/mortgage_x.csv",
"y": "data/mortgage_y.csv"
"x": "data/airline-ohe_x_train.npy",
"y": "data/airline-ohe_y_train.npy"
},
"testing":
{
"x": "data/airline-ohe_x_test.npy",
"y": "data/airline-ohe_y_test.npy"
}
}
],
"n-estimators": [100],
"objective": ["regression"],
"max-depth": [8],
"scale-pos-weight": [2],
"learning-rate": [0.1],
"subsample": [1],
"reg-alpha": [0.9],
"reg-lambda": [1],
"min-child-weight": [0],
"max-leaves": [256]
"reg-alpha": 0.9,
"max-bin": 256,
"scale-pos-weight": 2,
"learning-rate": 0.1,
"subsample": 1,
"reg-lambda": 1,
"min-child-weight": 0,
"max-depth": 8,
"max-leaves": 256,
"n-estimators": 1000,
"objective": "binary"
},
{
"algorithm": "lgbm_mb",
"dataset": [
{
"source": "csv",
"name": "airline-ohe",
"source": "npy",
"name": "higgs1m",
"training":
{
"x": "data/airline-ohe_x_train.csv",
"y": "data/airline-ohe_y_train.csv"
"x": "data/higgs1m_x_train.npy",
"y": "data/higgs1m_y_train.npy"
},
"testing":
{
"x": "data/higgs1m_x_test.npy",
"y": "data/higgs1m_y_test.npy"
}
}
],
"reg-alpha": [0.9],
"max-bin": [256],
"scale-pos-weight": [2],
"learning-rate": [0.1],
"subsample": [1],
"reg-lambda": [1],
"min-child-weight": [0],
"max-depth": [8],
"max-leaves": [256],
"n-estimators": [1000],
"objective": ["binary"]
"reg-alpha": 0.9,
"max-bin": 256,
"scale-pos-weight": 2,
"learning-rate": 0.1,
"subsample": 1,
"reg-lambda": 1,
"min-child-weight": 0,
"max-depth": 8,
"max-leaves": 256,
"n-estimators": 1000,
"objective": "binary"
},
{
"algorithm": "lgbm_mb",
"dataset": [
{
"source": "csv",
"name": "higgs1m",
"source": "csv",
"name": "mortgage1Q",
"training":
{
"x": "data/higgs1m_x_train.csv",
"y": "data/higgs1m_y_train.csv"
"x": "data/mortgage_x.csv",
"y": "data/mortgage_y.csv"
}
}
],
"reg-alpha": [0.9],
"max-bin": [256],
"scale-pos-weight": [2],
"learning-rate": [0.1],
"subsample": [1],
"reg-lambda": [1],
"min-child-weight": [0],
"max-depth": [8],
"max-leaves": [256],
"n-estimators": [1000],
"objective": ["binary"]
"n-estimators": 100,
"objective": "regression",
"max-depth": 8,
"scale-pos-weight": 2,
"learning-rate": 0.1,
"subsample": 1,
"reg-alpha": 0.9,
"reg-lambda": 1,
"min-child-weight": 0,
"max-leaves": 256
},
{
"algorithm": "lgbm_mb",
"dataset": [
{
"source": "csv",
"name": "msrank",
"source": "npy",
"name": "msrank",
"training":
{
"x": "data/mlsr_x_train.csv",
"y": "data/mlsr_y_train.csv"
"x": "data/msrank_x_train.npy",
"y": "data/msrank_y_train.npy"
},
"testing":
{
"x": "data/msrank_x_test.npy",
"y": "data/msrank_y_test.npy"
}
}
],
"max-bin": [256],
"learning-rate": [0.3],
"subsample": [1],
"reg-lambda": [2],
"min-child-weight": [1],
"min-split-gain": [0.1],
"max-depth": [8],
"max-leaves": [256],
"n-estimators": [200],
"objective": ["multiclass"]
"max-bin": 256,
"learning-rate": 0.3,
"subsample": 1,
"reg-lambda": 2,
"min-child-weight": 1,
"min-split-loss": 0.1,
"max-depth": 8,
"max-leaves": 256,
"n-estimators": 200,
"objective": "multiclass"
}
]
}
155 changes: 99 additions & 56 deletions configs/xgb_cpu_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,77 +4,32 @@
"data-format": "pandas",
"data-order": "F",
"dtype": "float32",
"count-dmatrix":"",
"algorithm": "gbt",
"tree-method": "hist",
"num-threads": 56
"count-dmatrix":""
},
"cases": [
{
"dataset": [
{
"source": "csv",
"name": "plasticc",
"source": "npy",
"name": "abalone",
"training":
{
"x": "data/plasticc_x_train.csv",
"y": "data/plasticc_y_train.csv"
"x": "data/abalone_x_train.npy",
"y": "data/abalone_y_train.npy"
},
"testing":
{
"x": "data/plasticc_x_test.csv",
"y": "data/plasticc_y_test.csv"
}
}
],
"n-estimators": 60,
"objective": "multi:softprob",
"max-depth": 7,
"subsample": 0.7,
"colsample-bytree": 0.7
},
{
"dataset": [
{
"source": "csv",
"name": "santander",
"training":
{
"x": "data/santander_x_train.csv",
"y": "data/santander_y_train.csv"
}
}
],
"n-estimators": 10000,
"objective": "binary:logistic",
"max-depth": 1,
"subsample": 0.5,
"eta": 0.1,
"colsample-bytree": 0.05,
"single-precision-histogram": ""
},
{
"dataset": [
{
"source": "csv",
"name": "mortgage1Q",
"training":
{
"x": "data/mortgage_x.csv",
"y": "data/mortgage_y.csv"
"x": "data/abalone_x_test.npy",
"y": "data/abalone_y_test.npy"
}
}
],
"n-estimators": 100,
"objective": "reg:squarederror",
"max-depth": 8,
"scale-pos-weight": 2,
"learning-rate": 0.1,
"subsample": 1,
"reg-alpha": 0.9,
"reg-lambda": 1,
"min-child-weight": 0,
"max-leaves": 256
"learning-rate": 0.03,
"max-depth": 6,
"n-estimators": 1000,
"objective": "reg:squarederror"
},
{
"dataset": [
Expand Down Expand Up @@ -136,6 +91,51 @@
"enable-experimental-json-serialization": "False",
"inplace-predict": ""
},
{
"dataset": [
{
"source": "npy",
"name": "letters",
"training":
{
"x": "data/letters_x_train.npy",
"y": "data/letters_y_train.npy"
},
"testing":
{
"x": "data/letters_x_test.npy",
"y": "data/letters_y_test.npy"
}
}
],
"learning-rate": 0.03,
"max-depth": 6,
"n-estimators": 1000,
"objective": "multi:softprob"
},
{
"dataset": [
{
"source": "csv",
"name": "mortgage1Q",
"training":
{
"x": "data/mortgage_x.csv",
"y": "data/mortgage_y.csv"
}
}
],
"n-estimators": 100,
"objective": "reg:squarederror",
"max-depth": 8,
"scale-pos-weight": 2,
"learning-rate": 0.1,
"subsample": 1,
"reg-alpha": 0.9,
"reg-lambda": 1,
"min-child-weight": 0,
"max-leaves": 256
},
{
"dataset": [
{
Expand Down Expand Up @@ -163,6 +163,49 @@
"n-estimators": 200,
"objective": "multi:softprob",
"single-precision-histogram": ""
},
{
"dataset": [
{
"source": "csv",
"name": "plasticc",
"training":
{
"x": "data/plasticc_x_train.csv",
"y": "data/plasticc_y_train.csv"
},
"testing":
{
"x": "data/plasticc_x_test.csv",
"y": "data/plasticc_y_test.csv"
}
}
],
"n-estimators": 60,
"objective": "multi:softprob",
"max-depth": 7,
"subsample": 0.7,
"colsample-bytree": 0.7
},
{
"dataset": [
{
"source": "csv",
"name": "santander",
"training":
{
"x": "data/santander_x_train.csv",
"y": "data/santander_y_train.csv"
}
}
],
"n-estimators": 10000,
"objective": "binary:logistic",
"max-depth": 1,
"subsample": 0.5,
"eta": 0.1,
"colsample-bytree": 0.05,
"single-precision-histogram": ""
}
]
}
Loading