-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose api for custom datasets #1288
Conversation
The interface could just be |
Yes, this would be a better solution) I've changed code accordingly |
@DhairyaLGandhi are you ok with the interface we expose here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this seems like a nice approach. I added a couple thoughts.
There needs to be an interface that can handle a more functional and generic approach. How would I make a lazy loaded dataset? How would putting things into channels and taking from there work? Things of that nature.
docs/src/data/dataset.md
Outdated
to make it compatible with `DataLoader`. | ||
|
||
```julia | ||
struct ZerosDataset{T, N} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we name this as CustomDataset
, or add a line that says what it is intended to return (just an array of zeros)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
total::Int | ||
end | ||
|
||
Base.eltype(::DataLoader{ZerosDataset{T, N}}) where {T, N} = Array{T, N} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the eltype to be defined separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because by default eltype of the dataloader is the type of data
that is holds, not the type of data that data
returns.
Thus it causes type instability and issues with things like @inferred
, collect
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little uncomfortable having to define eltypes like that, could we maybe make it part of the signature because strictly speaking, the elements of the data loader are the mini batches, which isn't represented by the T
here.
I think these are orthogonal concerns possibly related to the implementation of the Dataset itself and not the Dataloader, and I wouldn't charge this PR with solving things we haven't yet discussed |
Instead of defining something custom, it would be best to wait for JuliaML/LearnBase.jl#44. Ideally Flux itself wouldn't contain any data loading functionality (see also: MLDatasets removal) and depend on a library (much like we're trying to do with Optimisers.jl). |
Expose
Flux.Data.nobs
andFlux.Data.getobs
to be able to make custom datasets compatible with DataLoader.PR Checklist
@dhairyagandhi96
(for API changes).