Further optimize BlobDataProvider constructor #2625
Labels
A-performance
Area: Performance (CPU, Memory)
C-data-infra
Component: provider, datagen, fallback, adapters
S-small
Size: One afternoon (small bug fix or enhancement)
T-enhancement
Type: Nice-to-have but not required
Milestone
In #2603, I improved the performance of the BlobDataProvider constructor by 60%. However, there is still a lot more that can be done. In principle, this should be a constant-time operation, except that we need to do minimal validation of the bytes.
There are two parts of the constructor that take the most time:
Both of these get slower as the data file gets bigger, especially (1).
If we could skip this during construction and delay the check to data load time, we should be able to get the BlobDataProvider construction time down to 100 ns or less.
I did one attempt at this in #2609 by changing the data model. However, this can be done without touching the data model, meaning we can do this later and still read the same data files.
CC @zbraniecki
The text was updated successfully, but these errors were encountered: