SlidingWindowInferer runtime increase if sw_batch_size is too big

**Describe the bug**
I am currently using the SlidingWindowInferer for some modified DeepEdit Code. I discovered that for small sw_roi_sizes like (32,32,32) I have to set a higher sw_batch_size to make it run faster. See the data for that below.
However when the sw_batch_size becomes too big, the performance takes a dramatic hit which does not make any sense to me. Initial inputs volume shape is (1,3,344,344,284) and the inferer is created with `eval_inferer = SlidingWindowInferer(roi_size=args.sw_roi_size, sw_batch_size=args.sw_batch_size, mode="gaussian")
`
Results of my test runs:

138 seconds for (32,32,32)  on sw_batch_size 1
13.38 seconds for (32,32,32)  on sw_batch_size 200 (12 iterations) 
11 seconds for (32,32,32)  on sw_batch_size 500 (8 iterations) 
11 seconds for (32,32,32)  on sw_batch_size 1000 (3 iterations)
93 seconds for (32,32,32)  on sw_batch_size 2000 (2 iterations)
191 seconds for (32,32,32)  on sw_batch_size 2400 (1 iteration)

I tried to debug that but I am not sure why this crazy increase in terms of time is happening. Of course I can always calculate the best sw_batch_size beforehand (1/4 of the actual amount of slices I guess from above but I have to know the size of the maximum volume beforehand), but an actual solution would be nice. Or maybe it is an issue with my code I am not aware of, would be good to know anyways.

**To Reproduce**
Use the SlidingWindowInferer, set the sw_batch_size so that is it is higher that the actual amount of slices and then the performance will deteriorate heavily.

**Environment**

Tried it on Monai 1.1 and also on the nightly, no change.

```
================================
Printing MONAI config...
================================
MONAI version: 1.1.0
Numpy version: 1.24.3
Pytorch version: 2.0.0+cu117
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: a2ec3752f54bfc3b40e7952234fbeb5452ed63e3
MONAI __file__: /homes/mhadlich/.conda/envs/monai/lib/python3.10/site-packages/monai/__init__.py

Optional dependencies:
Pytorch Ignite version: 0.4.12
Nibabel version: 5.1.0
scikit-image version: 0.20.0
Pillow version: 9.5.0
Tensorboard version: 2.13.0
gdown version: 4.7.1
TorchVision version: 0.15.1+cu117
tqdm version: 4.65.0
lmdb version: 1.4.1
psutil version: 5.9.5
pandas version: 2.0.1
einops version: 0.6.1
transformers version: 4.21.3
mlflow version: 2.3.1
pynrrd version: 1.0.0

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
System: Linux
Linux version: Ubuntu 22.04.2 LTS
Platform: Linux-5.15.0-73-generic-x86_64-with-glibc2.35
Processor: x86_64
Machine: x86_64
Python version: 3.10.10
Process name: python
Command: ['python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: [popenfile(path='/projects/mhadlich_segmentation/sliding-window-based-interactive-segmentation-of-volumetric-medical-images_main/tmp.txt', fd=1, position=1040, mode='w', flags=32769)]
Num physical CPUs: 48
Num logical CPUs: 48
Num usable CPUs: 1
CPU usage (%): [100.0, 100.0, 59.9, 1.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 0.4, 0.2, 0.0, 0.0, 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7, 0.0, 0.0, 0.0, 0.4, 0.0, 0.0, 2.7, 0.0, 0.0, 0.0, 0.0]
CPU freq. (MHz): 1724
Load avg. in last 1, 5, 15 mins (%): [5.1, 5.0, 5.1]
Disk usage (%): 66.3
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 1007.8
Available memory (GB): 980.8
Used memory (GB): 20.0

================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 11.7
cuDNN enabled: True
cuDNN version: 8500
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86']
GPU 0 Name: NVIDIA RTX A6000
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 84
GPU 0 Total memory (GB): 47.5
GPU 0 CUDA capability (maj.min): 8.6

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SlidingWindowInferer runtime increase if sw_batch_size is too big #6628

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SlidingWindowInferer runtime increase if sw_batch_size is too big #6628

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions