Skip to content

Plans for lazy Loading? #199

Closed
Closed
@dotChris90

Description

@dotChris90

Hello ML.NET experts :D,

there is just one other point which was in my mind. In a time of cloud and big data there is always one important question : how to access data with a good performance and low memory?

I had a lot of trouble in past in simulations because the tools of our customers loaded (mostly) all measurement data at once. It took about 5-10 Minutes for loading and 20-30 GB of memory. Just let you know - this was in car simulation area and to be honest - this was low data - todays time such cars have TB of data in just 1 hour driving. So we reinvented their tools and included lazy loading.

We archived this lazy loading by included some tricks into import level. Every data source got one general class called "dataSrc" and was injected with an Dataprovider. When user called the property "DataSets" (property in C# meaning) the Getter invoke provider and the provider looked for all dataSet IDs (column names or general spoken - the ID of an measurement array) in the source. The DataSets property returned an Array of dataset objects. Each Set had an Array property (which was still not invoken) and a dictionary with Attributes (key - value). So we could search for meta informations (by the Dictionary) without loading the array (our true measurement data). Just when the User really was sure - okay I want the data from DataSet XYZ - then the Provider was invoken to look for my data.

With this strategy we was able to reduce the time and memory a lot. The users could search for attributes and take datasets they need - moreover - It was even possible to use Linq in the same way for every possible datasource.

One thing more to consider :

  • If the arrays get too big maybe we could think about loading parts of the total measurement array (like Span? .

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions