Skip to content

Training on a large set is much slower than on a smaller set - proportionally #1624

Open
@Jamiroquai88

Description

Tested versions

Running 3.1

System information

ubuntu 20.04, GPU V100 (p3.2x instance)

Issue description

Bonjour Hervé,

I noticed that when training PyanNet on a large set, training speed deteriorates significantly. I have a training and development set (statistics below).

Train:

  • 26.000 hours of audio
  • 7501003 lines in rttm

Dev:

  • 545 hours of audio
  • 157514 lines in rttm

When I train on training, one epoch takes 1 day, 17 hours, around 1.05it/s.
When I swap training for dev, one epoch takes 17 minutes, showing around 6.50it/s.
I have ~48x more audio in training, however, if I iterated 48 times over the development set, it would take me ~13.5 hours, which is around 3 times faster than training on a train.

Do you have some ideas where this comes from? Both sets are on the same disk. I am going to investigate further, I just wanted to know if you have an idea where to start.
Thanks.

-Jan

Minimal reproduction example (MRE)

can't share my data, sorry

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions