Auto3D Swinunet fails with Instance22 dataset

Running Auto3d with instance22 works with all networks. When I wanted to duplicate the data json in order to simulate larger dataset all networks worked except SwinUnet. 

**To Reproduce**
1 - use Auto3d with instance22 using the dataset.json attached. I changed the extension to txt as json was not supported for uploading
2 - run script below to only trigger swinunet

```
train_1_node(){
    FOLDER="/workspace/${WORK_DIR}/${MODEL}_${FOLD}"
    rm -r $FOLDER/model_fold$FOLD
    CONF_FOLDER=${FOLDER}"/configs"
    rm ${FOLDER}/${MODEL}.log

    (time \
    torchrun --nnodes=1 --nproc_per_node=8 \
        ${SCRIPT} run \
        --config_file "['${CONF_FOLDER}/hyper_parameters.yaml','${CONF_FOLDER}/network.yaml','${CONF_FOLDER}/transforms_train.yaml','${CONF_FOLDER}/transforms_validate.yaml']" \
        $EXTRA_PRAMS ) 2>&1 | tee -i -p ${FOLDER}/${MODEL}.log
}

swinunetr(){
    MODEL="swinunetr"
    SCRIPT="-m ${WORK_DIR}.${MODEL}_${FOLD}.scripts.train"
    ## new paramets makes it run for 20,000 epochs  !! force it to 1,500
    EXTRA_PRAMS=" --num_images_per_batch 16"
    EXTRA_PRAMS=$EXTRA_PRAMS" --num_patches_per_image 1"
    EXTRA_PRAMS=$EXTRA_PRAMS" --num_iterations 1500"
    EXTRA_PRAMS=$EXTRA_PRAMS" --num_iterations_per_validation 100"
    EXTRA_PRAMS=$EXTRA_PRAMS" --num_sw_batch_size 36"
    train_1_node
}

swinunetr

```

**Error**
```
epoch 8/210
learning rate is set to 0.0001
[2022-11-29 21:44:18] 1/7, train_loss: 0.4237
[2022-11-29 21:44:19] 2/7, train_loss: 0.4575
2022-11-29 21:44:25,647 - > collate dict key "image" out of 4 keys
2022-11-29 21:44:25,701 - >> collate/stack a list of tensors
2022-11-29 21:44:25,705 - >> E: stack expects each tensor to be equal size, but got [1, 96, 96, 64] at entry 0 and [1, 96, 95, 64] at entry 10, shape [(1, 96, 96, 64), (
1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 95, 64), 
(1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64)] in collate([tensor([[[[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],

[  0.16601867,   0.11132774,   0.97981832, -12.53823159],
       [  0.        ,   0.        ,   0.        ,   1.        ]])},
                                id: 140606314046512,
                                orig_size: (96, 96, 64)},
                  id: 140604144127376,
                  orig_size: (96, 96, 64)},
    id: 140604144127184,
    orig_size: (96, 96, 64)}]
Is batch?: False] ... )
2022-12-06 20:32:04,170 - > collate dict key "label" out of 4 keys
2022-12-06 20:32:04,219 - >> collate/stack a list of tensors
```


**Expected behavior**
As you see from the error log it actually starts training in to 1 sometimes 10 epochs then it errors out. Expected for it wo continue running 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto3D Swinunet fails with Instance22 dataset #5742

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Auto3D Swinunet fails with Instance22 dataset #5742

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions