Description
System information
- OS version/distro: Windows Server 2016
- .NET Version (eg., dotnet --info): .NET 4.8
Issue
My training tasks times exploded when going from 1.4 to 1.5.2 (from seconds to tens of hours). The culprit is ShuffleRows which is now extremely slow.
ShuffleRows was already broken in previous release (see issue #5312). The fix for that issue (#5313) is broken too in my opinion. It contains a line with a Thread.Sleep(1) to wait for async completion. This is a no go ... First the Sleep should not be there. And second, sleeping 1ms is dependent on timer resolution which is in general 15ms, so a wait will be in most cases 15ms. As my learning tasks read millions of datarows, this can not work and learning time explode to the point of being unusable.
To confirm that Thread.Sleep(1) is the culprit, I changed the timer resolution to 1ms on my server and learning time improved greatly but are still very far from the times I got with version 1.4. The fix for #5312 needs to be redone properly (sorry for being a bit harsh).
So ShuffleRows needs a fix, as I suspect this bug will impact many users. I'm not an async specialist so I can't fix the code myself.