You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be great to be able to pass a StreamingDataLoader to map. When experimenting with CLIP embeddings I've found that I needed to use StreamingDataLoader to be able to fully utilise the GPU - but it doesn't play nicely with map because we don't set the right env variables and things for it to work in a distributed setting.
Motivation
This would let us run distributed embedding of much larger data sets like LAION
Pitch
Allow providing a StreamingDataLoader to the map function, and then set correct envs etc. so that we still visit each sample just once.
The text was updated successfully, but these errors were encountered:
🚀 Feature
It would be great to be able to pass a StreamingDataLoader to map. When experimenting with CLIP embeddings I've found that I needed to use StreamingDataLoader to be able to fully utilise the GPU - but it doesn't play nicely with map because we don't set the right env variables and things for it to work in a distributed setting.
Motivation
This would let us run distributed embedding of much larger data sets like LAION
Pitch
Allow providing a StreamingDataLoader to the map function, and then set correct envs etc. so that we still visit each sample just once.
The text was updated successfully, but these errors were encountered: