Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebDataset.jl, a linearly scalable data loader based on iterable datasets #2037

Open
tmbdev opened this issue Apr 28, 2021 · 3 comments
Open

Comments

@tmbdev
Copy link

tmbdev commented Apr 28, 2021

I'm the developer of WebDataset for PyTorch, a linearly scalable format, libraries, and server for PyTorch. WebDataset represents datasets as .tar archives of files on disk and allows access to them from any web server, object store, and cloud storage system. It's all open source, and we have demonstrated 1 Gbyte/s per GPU I/O speeds.

The PyTorch implementation is at github.com/tmbdev/webdataset; the server implementation is at github.com/nvidia/aistore.

I have recently implemented a multithreaded loader for Julia that can read the same format. You can find it at github.com/tmbdev/WebDataset.jl.

You might want to add this to the resources, as well as take it into account for DataLoaders.jl and FastAI.jl

(I work on very large scale machine learning problems, so my next step is to see how I can get multi-GPU and multinode training to work in Julia.)

@AriMKatz
Copy link

AriMKatz commented Apr 28, 2021

Cool!

Both @jpsamaroo and @vchuravy work on multinode/multi GPU computing and you might be interested in working with or reaching out to them. Also check out dagger.jl , https://github.com/JuliaComputing/DataSets.jl, filetrees.jl and the juliafolds ecosystem

@darsnack darsnack transferred this issue from FluxML/FluxML-Community-Call-Minutes Aug 9, 2022
@darsnack
Copy link
Member

darsnack commented Aug 9, 2022

We might want to add this to the ecosystem page when the package is ready?

@tmbdev
Copy link
Author

tmbdev commented Aug 12, 2022

I'm starting to use Flux.jl more heavily, so I'll be adding more examples over the next few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants