Skip to content

change dataloader_persistent_workers default value to True #39963

@farbodbj

Description

@farbodbj

dataloader_persistent_workers: bool = field(

As described in the documentation, setting this configuration to True will cause training speedup but will cause more RAM usage, and the default is set to True.
I believe this configuration should default to True for two reasons:
1- Recreation of workers while training, bottlenecks the GPU and causes significant slowdown, potentially doubling the training time (benchmarked on a single A100 fine-tuning whisper-large-v3)
2- In case practitioners want to mitigate this slowdown, which is visible by the square-wave pattern in the GPU utilization, they have to go through a lot of configuration and perform many hours of time-consuming tests

I request changing the default value of dataloader_persistent_workers to True
If the maintainers of this repo are OK with this decision i will submit the PR promptly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions