Skip to content

Some algorithm templates in Auto3DSeg doesn't support validation skipping #7777

@mingxin-zheng

Description

@mingxin-zheng

Describe the bug

If we have a datalist that includes 3 folds of data, whether it's allowed to run the 4th fold is debatable.

For example, we split the data in 3 groups: #0, #1, and #2.
1st experiment would hold #0 for validation and use 1 and 2
2nd experiment would hold #1 for val, and use 0 and 2
3rd experiment would hold #2 for val, and use 1 and 2.
The question is whether it should allow the 4th fold hold nothing and use 0, 1, and 2

The comment in code allows so:

Auto3DSeg allows no validation set, so the maximum fold number is max_fold + 1

But in practice it would cause an error in DiNTs

To Reproduce
Steps to reproduce the behavior:

  1. Create a datalist with 4 folds
  2. Run AutoRunner.
  3. Set the num_fold to 5

Expected behavior
Consistent behavior between doc and algorithm result

Additional context

16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:
dints_2 - training ...:   0%|          | 0/1 [00:00<?, ?round/s]
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:
dints_2 - training ...: 100%|██████████| 1/1 [00:35<00:00, 35.25s/round]
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:
dints_2 - training ...: 100%|██████████| 1/1 [00:35<00:00, 35.25s/round]
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: dints_2 - validation at original spacing/resolution
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: 2024-05-16 06:32:56,886 - WARNING - dints_2 - training: finished
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: 2024-05-16 06:32:58,570 - INFO - The keys num_warmup_epochs cannot be found in the /shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/hyper_parameters.yaml for training. Skipped overriding key num_warmup_epochs.
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: 2024-05-16 06:32:58,571 - INFO - ['python', '/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/scripts/train.py', 'run', "--config_file='/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/hyper_parameters.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/hyper_parameters_search.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/network.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/network_search.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/transforms_infer.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/transforms_train.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/transforms_validate.yaml'", '--training#num_epochs_per_validation=1', '--training#num_images_per_batch=2', '--training#num_epochs=1']
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: 2024/05/16 06:33:05 INFO mlflow.tracking.fluent: Experiment with name 'Auto3DSeg' does not exist. Creating a new experiment.
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:
dints_3 - training ...:   0%|          | 0/1 [00:00<?, ?round/s]
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:
dints_3 - training ...:   0%|          | 0/1 [00:43<?, ?round/s]
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: Traceback (most recent call last):
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:   File "/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/scripts/train.py", line 1002, in <module>
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:     fire.Fire()
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:   File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:     component_trace = _Fire(component, args, parsed_flag_args, context, name)
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:   File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:     component, remaining_args = _CallAndUpdateTrace(
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:   File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:     component = fn(*varargs, **kwargs)
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:   File "/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/scripts/train.py", line 767, in run
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:     logger.debug(f"evaluation metric - class {_c + 1}: {metric[2 * _c] / metric[2 * _c + 1]}")
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: ZeroDivisionError: float division by zero

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions