-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Description
Describe the bug
If we have a datalist that includes 3 folds of data, whether it's allowed to run the 4th fold is debatable.
For example, we split the data in 3 groups: #0, #1, and #2.
1st experiment would hold #0 for validation and use 1 and 2
2nd experiment would hold #1 for val, and use 0 and 2
3rd experiment would hold #2 for val, and use 1 and 2.
The question is whether it should allow the 4th fold hold nothing and use 0, 1, and 2
The comment in code allows so:
Auto3DSeg allows no validation set, so the maximum fold number is max_fold + 1
But in practice it would cause an error in DiNTs
To Reproduce
Steps to reproduce the behavior:
- Create a datalist with 4 folds
- Run AutoRunner.
- Set the
num_foldto5
Expected behavior
Consistent behavior between doc and algorithm result
Additional context
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:
dints_2 - training ...: 0%| | 0/1 [00:00<?, ?round/s]
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:
dints_2 - training ...: 100%|██████████| 1/1 [00:35<00:00, 35.25s/round]
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:
dints_2 - training ...: 100%|██████████| 1/1 [00:35<00:00, 35.25s/round]
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: dints_2 - validation at original spacing/resolution
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: 2024-05-16 06:32:56,886 - WARNING - dints_2 - training: finished
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: 2024-05-16 06:32:58,570 - INFO - The keys num_warmup_epochs cannot be found in the /shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/hyper_parameters.yaml for training. Skipped overriding key num_warmup_epochs.
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: 2024-05-16 06:32:58,571 - INFO - ['python', '/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/scripts/train.py', 'run', "--config_file='/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/hyper_parameters.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/hyper_parameters_search.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/network.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/network_search.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/transforms_infer.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/transforms_train.yaml,/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/configs/transforms_validate.yaml'", '--training#num_epochs_per_validation=1', '--training#num_images_per_batch=2', '--training#num_epochs=1']
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: 2024/05/16 06:33:05 INFO mlflow.tracking.fluent: Experiment with name 'Auto3DSeg' does not exist. Creating a new experiment.
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:
dints_3 - training ...: 0%| | 0/1 [00:00<?, ?round/s]
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx:
dints_3 - training ...: 0%| | 0/1 [00:43<?, ?round/s]
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: Traceback (most recent call last):
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: File "/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/scripts/train.py", line 1002, in <module>
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: fire.Fire()
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: component_trace = _Fire(component, args, parsed_flag_args, context, name)
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: component, remaining_args = _CallAndUpdateTrace(
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: component = fn(*varargs, **kwargs)
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: File "/shared/orgs/iasixjqzw1hj/users/9550eff3-7258-5c36-96a0-5f8d3b030ad8/jobs/4cc2678c-1244-4d1b-ac3d-b47cfb7da171/auto3dseg_v0.0.8/dints_3/scripts/train.py", line 767, in run
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: logger.debug(f"evaluation metric - class {_c + 1}: {metric[2 * _c] / metric[2 * _c + 1]}")
16/May/2024:06:39:18,6511159,4cc2678c-1244-4d1b-ac3d-b47cfb7da171-dgx: ZeroDivisionError: float division by zero
Metadata
Metadata
Assignees
Labels
No labels