Skip to content

Commit 0bcaab3

Browse files
authored
Merge pull request #286 from meng-ustc/main
Add a new method to benchmarks: DoubleEnsemble
2 parents a96f0c2 + 1de4def commit 0bcaab3

File tree

7 files changed

+445
-1
lines changed

7 files changed

+445
-1
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,7 @@ Here is a list of models built on `Qlib`.
237237
- [SFM based on pytorch (Liheng Zhang, et al. 2017)](qlib/contrib/model/pytorch_sfm.py)
238238
- [TFT based on tensorflow (Bryan Lim, et al. 2019)](examples/benchmarks/TFT/tft.py)
239239
- [TabNet based on pytorch (Sercan O. Arik, et al. 2019)](qlib/contrib/model/pytorch_tabnet.py)
240+
- [DoubleEnsemble based on LightGBM (Chuheng Zhang, et al. 2020)](qlib/contrib/model/double_ensemble.py)
240241
241242
Your PR of new Quant models is highly welcomed.
242243
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# DoubleEnsemble
2+
* DoubleEnsemble is an ensemble framework leveraging learning trajectory based sample reweighting and shuffling based feature selection, to solve both the low signal-to-noise ratio and increasing number of features problems. They identify the key samples based on the training dynamics on each sample and elicit key features based on the ablation impact of each feature via shuffling. The model is applicable to a wide range of base models, capable of extracting complex patterns, while mitigating the overfitting and instability issues for financial market prediction.
3+
* This code used in Qlib is implemented by ourselves.
4+
* Paper: DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis [https://arxiv.org/pdf/2010.01265.pdf](https://arxiv.org/pdf/2010.01265.pdf).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
pandas==1.1.2
2+
numpy==1.17.4
3+
lightgbm==3.1.0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
qlib_init:
2+
provider_uri: "~/.qlib/qlib_data/cn_data"
3+
region: cn
4+
market: &market csi300
5+
benchmark: &benchmark SH000300
6+
data_handler_config: &data_handler_config
7+
start_time: 2008-01-01
8+
end_time: 2020-08-01
9+
fit_start_time: 2008-01-01
10+
fit_end_time: 2014-12-31
11+
instruments: *market
12+
port_analysis_config: &port_analysis_config
13+
strategy:
14+
class: TopkDropoutStrategy
15+
module_path: qlib.contrib.strategy.strategy
16+
kwargs:
17+
topk: 50
18+
n_drop: 5
19+
backtest:
20+
verbose: False
21+
limit_threshold: 0.095
22+
account: 100000000
23+
benchmark: *benchmark
24+
deal_price: close
25+
open_cost: 0.0005
26+
close_cost: 0.0015
27+
min_cost: 5
28+
task:
29+
model:
30+
class: DEnsembleModel
31+
module_path: qlib.contrib.model.double_ensemble
32+
kwargs:
33+
base_model: "gbm"
34+
loss: mse
35+
num_models: 6
36+
enable_sr: True
37+
enable_fs: True
38+
alpha1: 1
39+
alpha2: 1
40+
bins_sr: 10
41+
bins_fs: 5
42+
decay: 0.5
43+
sample_ratios:
44+
- 0.8
45+
- 0.7
46+
- 0.6
47+
- 0.5
48+
- 0.4
49+
sub_weights:
50+
- 1
51+
- 0.2
52+
- 0.2
53+
- 0.2
54+
- 0.2
55+
- 0.2
56+
epochs: 28
57+
colsample_bytree: 0.8879
58+
learning_rate: 0.2
59+
subsample: 0.8789
60+
lambda_l1: 205.6999
61+
lambda_l2: 580.9768
62+
max_depth: 8
63+
num_leaves: 210
64+
num_threads: 20
65+
verbosity: -1
66+
dataset:
67+
class: DatasetH
68+
module_path: qlib.data.dataset
69+
kwargs:
70+
handler:
71+
class: Alpha158
72+
module_path: qlib.contrib.data.handler
73+
kwargs: *data_handler_config
74+
segments:
75+
train: [2008-01-01, 2014-12-31]
76+
valid: [2015-01-01, 2016-12-31]
77+
test: [2017-01-01, 2020-08-01]
78+
record:
79+
- class: SignalRecord
80+
module_path: qlib.workflow.record_temp
81+
kwargs: {}
82+
- class: SigAnaRecord
83+
module_path: qlib.workflow.record_temp
84+
kwargs:
85+
ana_long_short: False
86+
ann_scaler: 252
87+
- class: PortAnaRecord
88+
module_path: qlib.workflow.record_temp
89+
kwargs:
90+
config: *port_analysis_config
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
qlib_init:
2+
provider_uri: "~/.qlib/qlib_data/cn_data"
3+
region: cn
4+
market: &market csi300
5+
benchmark: &benchmark SH000300
6+
data_handler_config: &data_handler_config
7+
start_time: 2008-01-01
8+
end_time: 2020-08-01
9+
fit_start_time: 2008-01-01
10+
fit_end_time: 2014-12-31
11+
instruments: *market
12+
infer_processors: []
13+
learn_processors:
14+
- class: DropnaLabel
15+
- class: CSRankNorm
16+
kwargs:
17+
fields_group: label
18+
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
19+
port_analysis_config: &port_analysis_config
20+
strategy:
21+
class: TopkDropoutStrategy
22+
module_path: qlib.contrib.strategy.strategy
23+
kwargs:
24+
topk: 50
25+
n_drop: 5
26+
backtest:
27+
verbose: False
28+
limit_threshold: 0.095
29+
account: 100000000
30+
benchmark: *benchmark
31+
deal_price: close
32+
open_cost: 0.0005
33+
close_cost: 0.0015
34+
min_cost: 5
35+
task:
36+
model:
37+
class: DEnsembleModel
38+
module_path: qlib.contrib.model.double_ensemble
39+
kwargs:
40+
base_model: "gbm"
41+
loss: mse
42+
num_models: 6
43+
enable_sr: True
44+
enable_fs: True
45+
alpha1: 1
46+
alpha2: 1
47+
bins_sr: 10
48+
bins_fs: 5
49+
decay: 0.5
50+
sample_ratios:
51+
- 0.8
52+
- 0.7
53+
- 0.6
54+
- 0.5
55+
- 0.4
56+
sub_weights:
57+
- 1
58+
- 0.2
59+
- 0.2
60+
- 0.2
61+
- 0.2
62+
- 0.2
63+
epochs: 136
64+
colsample_bytree: 0.8879
65+
learning_rate: 0.0421
66+
subsample: 0.8789
67+
lambda_l1: 205.6999
68+
lambda_l2: 580.9768
69+
max_depth: 8
70+
num_leaves: 210
71+
num_threads: 20
72+
verbosity: -1
73+
dataset:
74+
class: DatasetH
75+
module_path: qlib.data.dataset
76+
kwargs:
77+
handler:
78+
class: Alpha360
79+
module_path: qlib.contrib.data.handler
80+
kwargs: *data_handler_config
81+
segments:
82+
train: [2008-01-01, 2014-12-31]
83+
valid: [2015-01-01, 2016-12-31]
84+
test: [2017-01-01, 2020-08-01]
85+
record:
86+
- class: SignalRecord
87+
module_path: qlib.workflow.record_temp
88+
kwargs: {}
89+
- class: SigAnaRecord
90+
module_path: qlib.workflow.record_temp
91+
kwargs:
92+
ana_long_short: False
93+
ann_scaler: 252
94+
- class: PortAnaRecord
95+
module_path: qlib.workflow.record_temp
96+
kwargs:
97+
config: *port_analysis_config

examples/benchmarks/README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ The numbers shown below demonstrate the performance of the entire `workflow` of
1616
| LSTM (Sepp Hochreiter, et al.) | Alpha360 | 0.0443±0.01 | 0.3401±0.05| 0.0536±0.01 | 0.4248±0.05 | 0.0627±0.03 | 0.8441±0.48| -0.0882±0.03 |
1717
| ALSTM (Yao Qin, et al.) | Alpha360 | 0.0493±0.01 | 0.3778±0.06| 0.0585±0.00 | 0.4606±0.04 | 0.0513±0.03 | 0.6727±0.38| -0.1085±0.02 |
1818
| GATs (Petar Velickovic, et al.) | Alpha360 | 0.0475±0.00 | 0.3515±0.02| 0.0592±0.00 | 0.4585±0.01 | 0.0876±0.02 | 1.1513±0.27| -0.0795±0.02 |
19-
19+
| DoubleEnsemble (Chuheng Zhang, et al.) | Alpha360 | 0.0407±0.00| 0.3053±0.00 | 0.0490±0.00 | 0.3840±0.00 | 0.0380±0.02 | 0.5000±0.21 | -0.0984±0.02 |
2020
## Alpha158 dataset
2121
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
2222
|---|---|---|---|---|---|---|---|---|
@@ -31,5 +31,7 @@ The numbers shown below demonstrate the performance of the entire `workflow` of
3131
| LSTM (Sepp Hochreiter, et al.) | Alpha158 (with selected 20 features) | 0.0312±0.00 | 0.2394±0.04| 0.0418±0.00 | 0.3324±0.03 | 0.0298±0.02 | 0.4198±0.33| -0.1348±0.03 |
3232
| ALSTM (Yao Qin, et al.) | Alpha158 (with selected 20 features) | 0.0385±0.01 | 0.3022±0.06| 0.0478±0.00 | 0.3874±0.04 | 0.0486±0.03 | 0.7141±0.45| -0.1088±0.03 |
3333
| GATs (Petar Velickovic, et al.) | Alpha158 (with selected 20 features) | 0.0349±0.00 | 0.2511±0.01| 0.0457±0.00 | 0.3537±0.01 | 0.0578±0.02 | 0.8221±0.25| -0.0824±0.02 |
34+
| DoubleEnsemble (Chuheng Zhang, et al.) | Alpha158 | 0.0544±0.00 | 0.4338±0.01 | 0.0523±0.00 | 0.4257±0.01 | 0.1253±0.01 | 1.4105±0.14 | -0.0902±0.01 |
3435

3536
- The selected 20 features are based on the feature importance of a lightgbm-based model.
37+
- The base model of DoubleEnsemble is LGBM.

0 commit comments

Comments
 (0)