-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo Dataset and Dataloader #90
Comments
I don't want to lick the cookie, but one of the things I'm excited about mojo is the type safety / memory management. What are everyone's thoughts on torchdata ? I experimented with building a RL framework off it fastrl Pros
Cons
Some things I'm seeing that would be needed from mojo: Major blockers
Minor needs
I'm curious what other frameworks / libs people have used, liked, disliked. |
Hi @josiahls, I think data pipelining is quite a complex, but important, topic that Basalt might not be focusing on soon, at least not in the near future. What I read about it is that torchdata suffers from lower level control over things like multiprocessing, and even though it should be possible in Mojo, other then For sure the type safety and safely passing through references to the data without copies will and must be possible. And as a first rework of the current dataloader (which just simply loads all data in memory), I think an ultra simple pipeline that 'chunk-loads' the data in memory & passes it to the model like that should be the goal. Additionally Mojo might have an edge here with it's very convenient and easy to use compile time features. Are you perhaps interested in trying this out? Long term thinking. I can see cloud storage integration & distributed computing being massively important here as well. And I wonder if that was one of the re-design evaluations of torchdata. |
No description provided.
The text was updated successfully, but these errors were encountered: