-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn on using tasks in dataloader #1530
base: master
Are you sure you want to change the base?
Conversation
This currently switches on shuffling, so the tests would need to be modified. |
@maleadt do you think this would appropriately use the Task parallelism in CUDA? |
I would suggest this effort is better spent in DataLoaders.jl and cleaning it up/preparing it for GPU tasks. |
It's not clear to me where and what kind GPU operations would be happening here. Assuming it's mostly memory operations, it will need a solution to JuliaGPU/CUDA.jl#735 before this can efficiently use the GPU though, either in the form of a call to |
In the case of DataLoaders.jl, the memory is loaded into ring buffers under the control of the dataloader. So we can manually pin the memory. I don't want to derail this PR too much, since it's WIP. Just wanted to add the the direction it is headed looks a lot like the work in DataLoaders.jl (i.e. channels/tasks by default, |
So the interface might be to have gpu operations happen either at |
Let me know what needs to happen for the pinning, we can add tests for it as well. |
Nothing is required, so you can go ahead and benefit from multitasking already for kernels and other operations. Once we have proper memory pinning, we can add the use of those APIs here. |
@DhairyaLGandhi it would be useful if you could review DataLoaders.jl and submit changes there if needed, as @darsnack said a lot of work has already gone there, so it would be good to cumulate effort instead of duplicating it. We can then bring the package in the org and re-export from Flux.jl |
@KristofferC is this an appropriate setup for using Channels to parallelise data loading or do you think we would be better off doing the regular iteration? |
The utility here has a slightly different goal, which is to generalise to arbitrary tuples of data - while also making it possible to have benefits of lazy loading or manage loading functions via the first argument. Why can't we pin memory here? Seems reasonable to me. The |
Cleaned up the case where its a tuple of args. It also seems to have the side effect of making it slightly more usable. for paths_to_sampled_images in DataLoader(sampling_function, args)
for minibatch in DataLoader(preprocess+gpu, paths_sampled_images)
...
end
end Currently requires the lengths of all the elements of the tuples to agree for proper |
Adding functionality unrelated to tasks shouldn't be part of this PR and discussed separately. |
This also came up in the DDP work, where we could tier and compose data loading of large data sets. It is handy to have a loader which can also handle the |
…n case we want to onehot and pad per batch so it's not the same as input.
Relax type of returned values by DataLoader i…
Allows using Channels and task based iteration by default, making it easier to hook into the new Channels based approach of CUDA as well, keeping the main thread free. Largely backwards compatible, and we can always have hooks. with this we can move preprocessing on GPU to a different task, so we don't have to block the main task which can continue with training.