Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto3DSeg AutoRunner HPO training would save model checkpoint to wrong locations #7585

Closed
mingxin-zheng opened this issue Mar 27, 2024 · 0 comments · Fixed by #7586
Closed
Assignees

Comments

@mingxin-zheng
Copy link
Contributor

Describe the bug

When the NNI HPO process starts, it will first use generate to generate templated algo with the new hyper-parameters and then run_algo with NNI to do algo.train(). However, because the generated algo doesn't update bundle_root, and the model checkpoints path is saved relative to the bundle_root config, the model checkpoints will appear in wrong locations.

To Reproduce

import os
import tempfile

from monai.bundle.config_parser import ConfigParser
from monai.apps import download_and_extract

from monai.apps.auto3dseg import AutoRunner

if __name__ == '__main__':
    directory = "./build"
    root_dir = tempfile.mkdtemp() if directory is None else directory
    print(root_dir)

    msd_task = "Task04_Hippocampus"
    resource = "https://msd-for-monai.s3-us-west-2.amazonaws.com/" + msd_task + ".tar"

    compressed_file = os.path.join(root_dir, msd_task + ".tar")


    dataroot = os.path.join(root_dir, msd_task)
    if not os.path.exists(dataroot):
        download_and_extract(resource, compressed_file, root_dir)

    datalist_file = os.path.join("../tutorials/auto3dseg/tasks", "msd", msd_task, "msd_" + msd_task.lower() + "_folds.json")

    input_cfg = {
        "name": msd_task,  # optional, it is only for your own record
        "task": "segmentation",  # optional, it is only for your own record
        "modality": "MRI",  # required
        "datalist": datalist_file,  # required
        "dataroot": dataroot,  # required
    }
    input = "./input.yaml"
    ConfigParser.export_config_file(input_cfg, input)


    runner = AutoRunner(work_dir="./work_dir", algos=('swinunetr', 'segresnet'), input=input, analyze=False, hpo=True, ensemble=False)
    num_epoch = 2
    hpo_params = {
        "maxTrialNumber": 20,
        "maxExperimentDuration": "30m",
        "num_epochs_per_validation": 1,
        "num_images_per_batch": 1,
        "num_epochs": 2,
        "num_warmup_epochs": 1,
        "training#num_epochs": 2,
        "training#num_epochs_per_validation": 1,
        "searching#num_epochs": 2,
        "searching#num_epochs_per_validation": 1,
        "searching#num_warmup_epochs": 1,
        "training#auto_scale_allowed": False,  # new
        "auto_scale_allowed": False,  # new
    }
    search_space = {"learning_rate": {"_type": "choice", "_value": [0.0001, 0.01]}}
    runner.set_num_fold(num_fold=1)
    runner.set_hpo_params(params=hpo_params)
    runner.set_nni_search_space(search_space)
    runner.run()

Expected behavior

Model checkpoint saved correctly.

@mingxin-zheng mingxin-zheng self-assigned this Mar 27, 2024
KumoLiu added a commit that referenced this issue Mar 27, 2024
Fixes #7585  .

### Description

Because the NNI test takes too much time, the previous behavior did not
get caught with the dry-run mode of HPO Gen

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).

---------

Signed-off-by: Mingxin Zheng <mingxinz@nvidia.com>
Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Yu0610 pushed a commit to Yu0610/MONAI that referenced this issue Apr 11, 2024
Fixes Project-MONAI#7585  .

### Description

Because the NNI test takes too much time, the previous behavior did not
get caught with the dry-run mode of HPO Gen

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).

---------

Signed-off-by: Mingxin Zheng <mingxinz@nvidia.com>
Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: Yu0610 <612410030@alum.ccu.edu.tw>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant