What do we want to build? #1
Description
Welcome!
I created this repository as a discussion hub for the ML ecosystem in Rust, "following" a talk I gave at the Rust meetup in London (slides).
I do believe that Rust has great potential in this area, but to fully realize this potential we need to provide building blocks: we need to tackle those shared challenges that, once removed, will enable more and more people to just come to Rust and build what they want to build.
The three building blocks I do see as fundamental for an ML ecosystem are:
- n-dimensional arrays;
- dataframes;
- an ML model interface.
I have spent the last year, when it comes to open-source contributions, enhancing n-dimensional arrays: direct contributions to ndarray
, statistical routines on top of it (ndarray-stats
) and tutorials to help people to get into the Rust scientific ecosystem from Python, Julia or R. I do believe that ndarray
is in more than a good shape when it comes to fulfil NumPy's role in the Rust ecosystem.
There is now movement as well when it comes to dataframes - a discussion is taking place at rust-dataframe/discussion#1 to explore use cases and potential designs. (The idea of opening this repository comes directly from this experiment of community-led design for dataframes).
Given that one of the two data structures that are usually consumed by ML models is ready (n-dimensional arrays) and the other one is baking (dataframes) I think it's time to start thinking about what to do with the ML-specific piece.
I don't want to steer the debate too much with the opening post (I'll chip in once the discussion starts), but the questions I'd like to see tackled are:
- what use-cases could make Rust shine in the ML ecosystem?
- what are the basic capabilities that have to be built to enable the usage of Rust for ML workloads?
- how should we structure such a project? A core library with few traits and a set of separate crates tackling different aspects? A large battery-included scikit-learn equivalent?
- why do you want to use Rust for ML?