Description
Tested versions
Running 3.1
System information
ubuntu 20.04, GPU V100 (p3.2x instance)
Issue description
Bonjour Hervé,
I noticed that when training PyanNet on a large set, training speed deteriorates significantly. I have a training and development set (statistics below).
Train:
- 26.000 hours of audio
- 7501003 lines in rttm
Dev:
- 545 hours of audio
- 157514 lines in rttm
When I train on training, one epoch takes 1 day, 17 hours, around 1.05it/s.
When I swap training for dev, one epoch takes 17 minutes, showing around 6.50it/s.
I have ~48x more audio in training, however, if I iterated 48 times over the development set, it would take me ~13.5 hours, which is around 3 times faster than training on a train
.
Do you have some ideas where this comes from? Both sets are on the same disk. I am going to investigate further, I just wanted to know if you have an idea where to start.
Thanks.
-Jan
Minimal reproduction example (MRE)
can't share my data, sorry