Skip to content

The hyperparameters optimized by optuna cannot reproduce the results when run separately. #1890

Open
@ethonchen

Description

@ethonchen

## 🐛 Bug Description

Does qlib's execution process have caching? The hyperparameters optimized by optuna cannot reproduce the results when run separately.

After excluding environmental influences, multi-threading effects, seed effects, and other factors, I found that in the same environment, running multiple times in succession causes the later executions to be affected by the previous ones.

To Reproduce

Steps to reproduce the behavior:
The simplified reproduction process is as follows:

  1. Run two different models consecutively and print the results of Model 1 and Model 2.
    --------------- both.py -----------------
import qlib
from qlib.constant import REG_CN
from qlib.tests import GetData
from qlib.utils import init_instance_by_config

if __name__ == '__main__':
    provider_uri = "~/.qlib/qlib_data/cn_data"
    GetData().qlib_data(target_dir=provider_uri, region=REG_CN, exists_skip=True)
    qlib.init(provider_uri=provider_uri, region="cn")

    dataset_str = {
        "class": "DatasetH",
        "kwargs": {
            "handler": {
                "class": "Alpha360",
                "kwargs": {
                    "end_time": "2024-09-30",
                    "fit_end_time": "2021-12-31",
                    "fit_start_time": "2019-01-01",
                    "infer_processors": [
                        {
                            "class": "RobustZScoreNorm",
                            "kwargs": {
                                "clip_outlier": True,
                                "fields_group": "feature",
                                "fit_end_time": "2021-12-31",
                                "fit_start_time": "2019-01-01",
                            },
                        },
                        {
                            "class": "Fillna",
                            "kwargs": {
                                "fields_group": "feature"
                            }
                        }
                    ],
                    "instruments": "csi300",
                    "label": ["Ref($close, -2) / Ref($close, -1) - 1"],
                    "learn_processors": [
                        {"class": "DropnaLabel"},
                        {
                            "class": "CSRankNorm",
                            "kwargs": {
                                "fields_group": "label"
                            }
                        }
                    ],
                    "start_time": "2019-01-01",
                },
                "module_path": "qlib.contrib.data.handler",
            },
            "segments": {
                "test": ["2022-01-01", "2023-12-31"],
                "train": ["2019-01-01", "2021-12-31"],
                "valid": ["2023-01-01", "2024-09-30"],
            },
        },
        "module_path": "qlib.data.dataset",
    }
    dataset = init_instance_by_config(dataset_str)
    # base task model
    task = {
        "model": {
            "class": "LocalformerModel",
            "module_path": "qlib.contrib.model.pytorch_localformer",
            "kwargs": {
                "d_feat": 6,  
                "d_model": 64,  
                "batch_size": 512,  
                "nhead": 4,  
                "num_layers": 3,  
                "dropout": 0.4,  
                "n_epochs": 100,  
                "lr": 0.1,  
                "early_stop": 10,  
                "loss": "mse",  
                "optimizer": "adam",  
                "reg": 0.001,  
                "n_jobs": 1,  
                "GPU": 0,  
                "seed": 618,  
            }
        }
    }

    # model param 1
    task["model"]["kwargs"]["lr"] = 0.005
    evals_result = dict()
    model = init_instance_by_config(task["model"])
    model.fit(dataset, evals_result=evals_result)
    print("model 1 : ", max(evals_result["valid"]))

    # model param 2
    task["model"]["kwargs"]["lr"] = 0.3
    evals_result = dict()
    model = init_instance_by_config(task["model"])
    model.fit(dataset, evals_result=evals_result)
    print("model 2 : ", max(evals_result["valid"]))
  1. Run the code for Model 1 separately.
    --------------- model1.py -----------------

import qlib
from qlib.constant import REG_CN
from qlib.tests import GetData
from qlib.utils import init_instance_by_config

if __name__ == '__main__':
    provider_uri = "~/.qlib/qlib_data/cn_data"
    GetData().qlib_data(target_dir=provider_uri, region=REG_CN, exists_skip=True)
    qlib.init(provider_uri=provider_uri, region="cn")


    dataset_str = {
        "class": "DatasetH",
        "kwargs": {
            "handler": {
                "class": "Alpha360",
                "kwargs": {
                    "end_time": "2024-09-30",
                    "fit_end_time": "2021-12-31",
                    "fit_start_time": "2019-01-01",
                    "infer_processors": [
                        {
                            "class": "RobustZScoreNorm",
                            "kwargs": {
                                "clip_outlier": True,
                                "fields_group": "feature",
                                "fit_end_time": "2021-12-31",
                                "fit_start_time": "2019-01-01",
                            },
                        },
                        {
                            "class": "Fillna",
                            "kwargs": {
                                "fields_group": "feature"
                            }
                        }
                    ],
                    "instruments": "csi300",
                    "label": ["Ref($close, -2) / Ref($close, -1) - 1"],
                    "learn_processors": [
                        {"class": "DropnaLabel"},
                        {
                            "class": "CSRankNorm",
                            "kwargs": {
                                "fields_group": "label"
                            }
                        }
                    ],
                    "start_time": "2019-01-01",
                },
                "module_path": "qlib.contrib.data.handler",
            },
            "segments": {
                "test": ["2022-01-01", "2023-12-31"],
                "train": ["2019-01-01", "2021-12-31"],
                "valid": ["2023-01-01", "2024-09-30"],
            },
        },
        "module_path": "qlib.data.dataset",
    }
    dataset = init_instance_by_config(dataset_str)
    # base task model
    task = {
        "model": {
            "class": "LocalformerModel",
            "module_path": "qlib.contrib.model.pytorch_localformer",
            "kwargs": {
                "d_feat": 6,
                "d_model": 64,
                "batch_size": 512,
                "nhead": 4,
                "num_layers": 3,
                "dropout": 0.4,
                "n_epochs": 100,
                "lr": 0.1,
                "early_stop": 10,
                "loss": "mse",
                "optimizer": "adam",
                "reg": 0.001,
                "n_jobs": 1,
                "GPU": 0,
                "seed": 618,
            }
        }
    }

    # model param 1
    task["model"]["kwargs"]["lr"] = 0.005
    evals_result = dict()
    model = init_instance_by_config(task["model"])
    model.fit(dataset, evals_result=evals_result)
    print("model 1 : ", max(evals_result["valid"]))

  1. Run the code for Model 2 separately.
    --------------- model2.py -----------------

import qlib
from qlib.constant import REG_CN
from qlib.tests import GetData
from qlib.utils import init_instance_by_config

if __name__ == '__main__':
    provider_uri = "~/.qlib/qlib_data/cn_data"
    GetData().qlib_data(target_dir=provider_uri, region=REG_CN, exists_skip=True)
    qlib.init(provider_uri=provider_uri, region="cn")

    dataset_str = {
        "class": "DatasetH",
        "kwargs": {
            "handler": {
                "class": "Alpha360",
                "kwargs": {
                    "end_time": "2024-09-30",
                    "fit_end_time": "2021-12-31",
                    "fit_start_time": "2019-01-01",
                    "infer_processors": [
                        {
                            "class": "RobustZScoreNorm",
                            "kwargs": {
                                "clip_outlier": True,
                                "fields_group": "feature",
                                "fit_end_time": "2021-12-31",
                                "fit_start_time": "2019-01-01",
                            },
                        },
                        {
                            "class": "Fillna",
                            "kwargs": {
                                "fields_group": "feature"
                            }
                        }
                    ],
                    "instruments": "csi300",
                    "label": ["Ref($close, -2) / Ref($close, -1) - 1"],
                    "learn_processors": [
                        {"class": "DropnaLabel"},
                        {
                            "class": "CSRankNorm",
                            "kwargs": {
                                "fields_group": "label"
                            }
                        }
                    ],
                    "start_time": "2019-01-01",
                },
                "module_path": "qlib.contrib.data.handler",
            },
            "segments": {
                "test": ["2022-01-01", "2023-12-31"],
                "train": ["2019-01-01", "2021-12-31"],
                "valid": ["2023-01-01", "2024-09-30"],
            },
        },
        "module_path": "qlib.data.dataset",
    }
    dataset = init_instance_by_config(dataset_str)
    # base task model
    task = {
        "model": {
            "class": "LocalformerModel",
            "module_path": "qlib.contrib.model.pytorch_localformer",
            "kwargs": {
                "d_feat": 6,
                "d_model": 64,
                "batch_size": 512,
                "nhead": 4,
                "num_layers": 3,
                "dropout": 0.4,
                "n_epochs": 100,
                "lr": 0.1,
                "early_stop": 10,
                "loss": "mse",
                "optimizer": "adam",
                "reg": 0.001,
                "n_jobs": 1,
                "GPU": 0,
                "seed": 618,
            }
        }
    }

    # model param 2
    task["model"]["kwargs"]["lr"] = 0.3
    evals_result = dict()
    model = init_instance_by_config(task["model"])
    model.fit(dataset, evals_result=evals_result)
    print("model 2 : ", max(evals_result["valid"]))

Expected Behavior

The return result of running python both.py is as follows:
----- python both.py --------
model 1 : -0.7482357621192932
model 2 : -0.7483096122741699

The return result of running python model1.py is as follows:
----- python model1.py --------
model 1 : -0.7482357621192932 # the same as both.py

The return result of running python model2.py is as follows:
----- python model2.py --------
model 2 : -0.7509312033653259 # The results are different from those of both.py even when running with the same parameters.

Screenshot

With the same parameters, the results show that the output of Model 1 matches that of both.py, while the output of Model 2 differs from both.py. The only difference is that in both.py, Model 2 is executed immediately after Model 1. This raises suspicion that there might be some caching affecting the execution results.

Environment

Note: User could run cd scripts && python collect_info.py all under project directory to get system information
and paste them here directly.
Linux
x86_64
Linux-5.15.0-112-generic-x86_64-with-glibc2.17
#122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024

Python version: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]

Qlib version: 0.9.6
numpy==1.23.5
pandas==1.5.3
scipy==1.10.1
requests==2.31.0

Additional Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions