Accounting for data objects that only iterate 

In the old MLLearn terminology, we have *data containers* (observations can be randomly accessed) and mere *iterators*. Dataloaders (as currently implemented in DataUtils.jl) for example, are only iterators and some models will want to support them *and* regular data containers. In the latter case, a higher level interface will want to control observation resampling (CV) but in the former case, we're happy to forgo that functionality. How does the current "data interface" adapt to this complication? That is, how does the implementation articulate the fact that some allowed data objects cannot be subsampled? (At present the implicit assumption is that all data objects are data containers.)

(Originally there had been [some discussion](https://github.com/lorenzoh/DataLoaders.jl/issues/26) that Dataloaders would support (slow) random access, but that idea appears to have been abandoned in the DataLoaders -> MLUtils refactoring.  Perhaps @lorenzoh would care to comment.)

One idea is for a model accepting an iterator `Xiter` to set `getobs(model, I, Xiter) = nothing`, and to define `getobs(model, I, X)` as normal for a data container `X`. Is it safe to say we will generally be able to distinguish the iterable from the containers based on type alone, and avoid possible type instabilities here?

Any other ideas?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accounting for data objects that only iterate #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Accounting for data objects that only iterate #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions