Skip to content

Commit cf35562

Browse files
you-n-gwendili-csdemon143
authored
DDG-DA paper code (#743)
* Merge data selection to main * Update trainer for reweighter * Typos fixed. * update data selection interface * successfully run exp after refactor some interface * data selection share handler & trainer * fix meta model time series bug * fix online workflow set_uri bug * fix set_uri bug * updawte ds docs and delay trainer bug * docs * resume reweighter * add reweighting result * fix qlib model import * make recorder more friendly * fix experiment workflow bug * commit for merging master incase of conflictions * Successful run DDG-DA with a single command * remove unused code * asdd more docs * Update README.md * Update & fix some bugs. * Update configuration & remove debug functions * Update README.md * Modfify horizon from code rather than yaml * Update performance in README.md * fix part comments * Remove unfinished TCTS. * Fix some details. * Update meta docs * Update README.md of the benchmarks_dynamic * Update README.md files * Add README.md to the rolling_benchmark baseline. * Refine the docs and link * Rename README.md in benchmarks_dynamic. * Remove comments. * auto download data Co-authored-by: wendili-cs <wendili.academic@qq.com> Co-authored-by: demon143 <785696300@qq.com>
1 parent 184ce34 commit cf35562

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+2440
-455
lines changed

README.md

+28-7
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
Recent released features
1212
| Feature | Status |
1313
| -- | ------ |
14+
| Meta-Learning-based framework & DDG-DA | [Released](https://github.com/microsoft/qlib/pull/743) on Jan 10, 2022 |
1415
| Planning-based portfolio optimization | [Released](https://github.com/microsoft/qlib/pull/754) on Dec 28, 2021 |
1516
| Release Qlib v0.8.0 | [Released](https://github.com/microsoft/qlib/releases/tag/v0.8.0) on Dec 8, 2021 |
1617
| ADD model | [Released](https://github.com/microsoft/qlib/pull/704) on Nov 22, 2021 |
@@ -50,9 +51,12 @@ For more details, please refer to our paper ["Qlib: An AI-oriented Quantitative
5051
- [Data Preparation](#data-preparation)
5152
- [Auto Quant Research Workflow](#auto-quant-research-workflow)
5253
- [Building Customized Quant Research Workflow by Code](#building-customized-quant-research-workflow-by-code)
53-
- [**Quant Model(Paper) Zoo**](#quant-model-paper-zoo)
54-
- [Run a single model](#run-a-single-model)
55-
- [Run multiple models](#run-multiple-models)
54+
- [Main Challenges & Solutions in Quant Research](#main-challenges--solutions-in-quant-research)
55+
- [Forecasting: Finding Valuable Signals/Patterns](#forecasting-finding-valuable-signalspatterns)
56+
- [**Quant Model (Paper) Zoo**](#quant-model-paper-zoo)
57+
- [Run a Single Model](#run-a-single-model)
58+
- [Run Multiple Models](#run-multiple-models)
59+
- [Adapting to Market Dynamics](#adapting-to-market-dynamics)
5660
- [**Quant Dataset Zoo**](#quant-dataset-zoo)
5761
- [More About Qlib](#more-about-qlib)
5862
- [Offline Mode and Online Mode](#offline-mode-and-online-mode)
@@ -69,7 +73,6 @@ Your feedbacks about the features are very important.
6973
| -- | ------ |
7074
| Point-in-Time database | Under review: https://github.com/microsoft/qlib/pull/343 |
7175
| Orderbook database | Under review: https://github.com/microsoft/qlib/pull/744 |
72-
| Meta-Learning-based data selection | Under review: https://github.com/microsoft/qlib/pull/743 |
7376

7477
# Framework of Qlib
7578

@@ -280,8 +283,18 @@ Qlib provides a tool named `qrun` to run the whole workflow automatically (inclu
280283
## Building Customized Quant Research Workflow by Code
281284
The automatic workflow may not suit the research workflow of all Quant researchers. To support a flexible Quant research workflow, Qlib also provides a modularized interface to allow researchers to build their own workflow by code. [Here](examples/workflow_by_code.ipynb) is a demo for customized Quant research workflow by code.
282285
286+
# Main Challenges & Solutions in Quant Research
287+
Quant investment is an very unique scenario with lots of key challenges to be solved.
288+
Currently, Qlib provides some solutions for several of them.
283289
284-
# [Quant Model (Paper) Zoo](examples/benchmarks)
290+
## Forecasting: Finding Valuable Signals/Patterns
291+
Accurate forecasting of the stock price trend is a very important part to construct profitable portfolios.
292+
However, huge amount of data with various formats in the financial market which make it challenging to build forecasting models.
293+
294+
An increasing number of SOTA Quant research works/papers, which focus on building forecasting models to mine valuable signals/patterns in complex financial data, are released in `Qlib`
295+
296+
297+
### [Quant Model (Paper) Zoo](examples/benchmarks)
285298
286299
Here is a list of models built on `Qlib`.
287300
- [GBDT based on XGBoost (Tianqi Chen, et al. KDD 2016)](examples/benchmarks/XGBoost/)
@@ -308,7 +321,7 @@ Your PR of new Quant models is highly welcomed.
308321
309322
The performance of each model on the `Alpha158` and `Alpha360` dataset can be found [here](examples/benchmarks/README.md).
310323
311-
## Run a single model
324+
### Run a single model
312325
All the models listed above are runnable with ``Qlib``. Users can find the config files we provide and some details about the model through the [benchmarks](examples/benchmarks) folder. More information can be retrieved at the model files listed above.
313326
314327
`Qlib` provides three different ways to run a single model, users can pick the one that fits their cases best:
@@ -318,7 +331,7 @@ All the models listed above are runnable with ``Qlib``. Users can find the confi
318331
- Users can use the script [`run_all_model.py`](examples/run_all_model.py) listed in the `examples` folder to run a model. Here is an example of the specific shell command to be used: `python run_all_model.py run --models=lightgbm`, where the `--models` arguments can take any number of models listed above(the available models can be found in [benchmarks](examples/benchmarks/)). For more use cases, please refer to the file's [docstrings](examples/run_all_model.py).
319332
- **NOTE**: Each baseline has different environment dependencies, please make sure that your python version aligns with the requirements(e.g. TFT only supports Python 3.6~3.7 due to the limitation of `tensorflow==1.15.0`)
320333
321-
## Run multiple models
334+
### Run multiple models
322335
`Qlib` also provides a script [`run_all_model.py`](examples/run_all_model.py) which can run multiple models for several iterations. (**Note**: the script only support *Linux* for now. Other OS will be supported in the future. Besides, it doesn't support parallel running the same model for multiple times as well, and this will be fixed in the future development too.)
323336

324337
The script will create a unique virtual environment for each model, and delete the environments after training. Thus, only experiment results such as `IC` and `backtest` results will be generated and stored.
@@ -330,6 +343,14 @@ python run_all_model.py run 10
330343

331344
It also provides the API to run specific models at once. For more use cases, please refer to the file's [docstrings](examples/run_all_model.py).
332345
346+
## [Adapting to Market Dynamics](examples/benchmarks_dynamic)
347+
348+
Due to the non-stationary nature of the environment of the financial market, the data distribution may change in different periods, which makes the performance of models build on training data decays in the future test data.
349+
So adapting the forecasting models/strategies to market dynamics is very important to the model/strategies' performance.
350+
351+
Here is a list of solutions built on `Qlib`.
352+
- [Rolling Retraining](examples/benchmarks_dynamic/baseline/)
353+
- [DDG-DA on pytorch (Wendi, et al. AAAI 2022)](examples/benchmarks_dynamic/DDG-DA/)
333354

334355
# Quant Dataset Zoo
335356
Dataset plays a very important role in Quant. Here is a list of the datasets built on `Qlib`:

docs/component/meta.rst

+68
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
.. _meta:
2+
3+
=================================
4+
Meta Controller: Meta-Task & Meta-Dataset & Meta-Model
5+
=================================
6+
.. currentmodule:: qlib
7+
8+
9+
Introduction
10+
=============
11+
``Meta Controller`` provides guidance to ``Forecast Model``, which aims to learn regular patterns among a series of forecasting tasks and use learned patterns to guide forthcoming forecasting tasks. Users can implement their own meta-model instance based on ``Meta Controller`` module.
12+
13+
Meta Task
14+
=============
15+
16+
A `Meta Task` instance is the basic element in the meta-learning framework. It saves the data that can be used for the `Meta Model`. Multiple `Meta Task` instances may share the same `Data Handler`, controlled by `Meta Dataset`. Users should use `prepare_task_data()` to obtain the data that can be directly fed into the `Meta Model`.
17+
18+
.. autoclass:: qlib.model.meta.task.MetaTask
19+
:members:
20+
21+
Meta Dataset
22+
=============
23+
24+
`Meta Dataset` controls the meta-information generating process. It is on the duty of providing data for training the `Meta Model`. Users should use `prepare_tasks` to retrieve a list of `Meta Task` instances.
25+
26+
.. autoclass:: qlib.model.meta.dataset.MetaTaskDataset
27+
:members:
28+
29+
Meta Model
30+
=============
31+
32+
General Meta Model
33+
------------------
34+
`Meta Model` instance is the part that controls the workflow. The usage of the `Meta Model` includes:
35+
1. Users train their `Meta Model` with the `fit` function.
36+
2. The `Meta Model` instance guides the workflow by giving useful information via the `inference` function.
37+
38+
.. autoclass:: qlib.model.meta.model.MetaModel
39+
:members:
40+
41+
Meta Task Model
42+
------------------
43+
This type of meta-model may interact with task definitions directly. Then, the `Meta Task Model` is the class for them to inherit from. They guide the base tasks by modifying the base task definitions. The function `prepare_tasks` can be used to obtain the modified base task definitions.
44+
45+
.. autoclass:: qlib.model.meta.model.MetaTaskModel
46+
:members:
47+
48+
Meta Guide Model
49+
------------------
50+
This type of meta-model participates in the training process of the base forecasting model. The meta-model may guide the base forecasting models during their training to improve their performances.
51+
52+
.. autoclass:: qlib.model.meta.model.MetaGuideModel
53+
:members:
54+
55+
56+
Example
57+
=============
58+
``Qlib`` provides an implementation of ``Meta Model`` module, ``DDG-DA``,
59+
which adapts to the market dynamics.
60+
61+
``DDG-DA`` includes four steps:
62+
63+
1. Calculate meta-information and encapsulate it into ``Meta Task`` instances. All the meta-tasks form a ``Meta Dataset`` instance.
64+
2. Train ``DDG-DA`` based on the training data of the meta-dataset.
65+
3. Do the inference of the ``DDG-DA`` to get guide information.
66+
4. Apply guide information to the forecasting models to improve their performances.
67+
68+
The `above example <https://github.com/microsoft/qlib/tree/main/examples/benchmarks_dynamic/DDG-DA>`_ can be found in ``examples/benchmarks_dynamic/DDG-DA/workflow.py``.

docs/index.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,11 @@ Document Structure
3636
:caption: COMPONENTS:
3737

3838
Workflow: Workflow Management <component/workflow.rst>
39-
Data Layer: Data Framework&Usage <component/data.rst>
39+
Data Layer: Data Framework & Usage <component/data.rst>
4040
Forecast Model: Model Training & Prediction <component/model.rst>
4141
Portfolio Management and Backtest <component/strategy.rst>
4242
Nested Decision Execution: High-Frequency Trading <component/highfreq.rst>
43+
Meta Controller: Meta-Task & Meta-Dataset & Meta-Model <component/meta.rst>
4344
Qlib Recorder: Experiment Management <component/recorder.rst>
4445
Analysis: Evaluation & Results Analysis <component/report.rst>
4546
Online Serving: Online Management & Strategy & Tool <component/online.rst>

examples/benchmarks/Linear/workflow_config_linear_Alpha158.yaml

-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ data_handler_config: &data_handler_config
2222
- class: CSRankNorm
2323
kwargs:
2424
fields_group: label
25-
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
2625
port_analysis_config: &port_analysis_config
2726
strategy:
2827
class: TopkDropoutStrategy

examples/benchmarks/TFT/tft.py

-1
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,6 @@ def fit(self, dataset: DatasetH, MODEL_FOLDER="qlib_tft_model", USE_GPU_ID=0, **
209209
fixed_params = self.data_formatter.get_experiment_params()
210210
params = self.data_formatter.get_default_model_params()
211211

212-
# Wendi: 合并调优的参数和非调优的参数
213212
params = {**params, **fixed_params}
214213

215214
if not os.path.exists(self.model_folder):
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Introduction
2+
This is the implementation of `DDG-DA` based on `Meta Controller` component provided by `Qlib`.
3+
4+
## Background
5+
In many real-world scenarios, we often deal with streaming data that is sequentially collected over time. Due to the non-stationary nature of the environment, the streaming data distribution may change in unpredictable ways, which is known as concept drift. To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data. However, there are still many cases that some underlying factors of environment evolution are predictable, making it possible to model the future concept drift trend of the streaming data, while such cases are not fully explored in previous work.
6+
7+
Therefore, we propose a novel method `DDG-DA`, that can effectively forecast the evolution of data distribution and improve the performance of models. Specifically, we first train a predictor to estimate the future data distribution, then leverage it to generate training samples, and finally train models on the generated data.
8+
9+
## Dataset
10+
The data in the paper are private. So we conduct experiments on Qlib's public dataset.
11+
Though the dataset is different, the conclusion remains the same. By applying `DDG-DA`, users can see rising trends at the test phase both in the proxy models' ICs and the performances of the forecasting models.
12+
13+
## Run the Code
14+
Users can try `DDG-DA` by running the following command:
15+
```bash
16+
python workflow.py run_all
17+
```
18+
19+
The default forecasting models are `Linear`. Users can choose other forecasting models by changing the `forecast_model` parameter when `DDG-DA` initializes. For example, users can try `LightGBM` forecasting models by running the following command:
20+
```bash
21+
python workflow.py --forecast_model="gbdt" run_all
22+
```
23+
24+
25+
## Results
26+
27+
The results of other methods in Qlib's public dataset can be found [here](../)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
torch==1.10.0

0 commit comments

Comments
 (0)