-
-
Notifications
You must be signed in to change notification settings - Fork 125
GSoC 2023 Ideas Page
PyData/Sparse is a software project that provides sparse arrays for the PyData ecosystem, conforming to the NumPy API. That's a lot to digest, so let's break it down:
A sparse array is one that has a lot of zeros in it. Except in this package, we can also treat other arrays as sparse: Ones that have a lot of the same non-zero values in them.
Because we don't have infinite memory or computational power, so it's important to make the best use of it possible. If we "skip over" the zeros when doing computations, it will be a lot faster. In practice, this also means keeping track of where the zeros are, so that also has some extra overhead.
It means you can use it mostly as you would use NumPy. In fact, if you do try using it, some of the familiar functions, like np.max
, np.exp
etc. work on arrays provided by this project.
A lot of people, actually. Sparse arrays are important in physics and simulations, as well as electron microscopy. If you look at the public dependents, you'll even find some COVID-19 research done with this package.
Look at our contributing page! There are a lot of great instructions there. Our source code is hosted here.
Currently, we use mainly Numba, a package that makes Python go faster than it normally does. However, we are considering using other approaches, such as leveraging research by the TACO team to make things faster. For the curious reader, here's a PhD thesis from the pioneer of the topic. Most of our ideas are in that direction.
Our Gitter Channel is the best place to get in touch, or to ask if something should go someplace else. We also have an issue tracker for the more experienced among you!
We have a contributing page that we'll link to as the go-to source for how to get started. If you get stuck, just see above on how to contact us!
Usually, your GSoC application has to be a true "game plan" if what you'd like to achieve. It has to be hashed out in enough detail so we are reasonably sure you can make it to the very end. We'd like to remind you that the tile of the sub-org, in this case "PyData/Sparse", must be in the title of your application. We'd also like to point you to Google's own instructions for writing GSoC proposals.
- Completion of the XSparse re-implementation of PyData/Sparse
-
Description: The TACO project does some JIT compilation in an ad-hoc manner by writing out
*.c
files, compiling them and dynamically linking them into the executable. We would like to have a back-end for PyData/Sparse that instantiates C++ templates at runtime, therefore providing a much nicer experience/API to work with. - Skills: C++ Template MetaProgramming (TMP) skills
- Difficulty Level: Hard
-
Related Readings/Links:
- The research paper that moved to the current method of code generation.
- Some partial work on the C++ implementation so far.
- Potential mentors: Hameer Abbasi (@hameerabbasi), Bharath K K (@bharath2438)
-
Description: The TACO project does some JIT compilation in an ad-hoc manner by writing out