Description
Submitting Author: Erik Welch (@eriknw)
All current maintainers: (@eriknw, @jim22k, @SultanOrazbayev)
Package Name: Python-graphblas
One-Line Description of Package: Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics
Repository Link: https://github.com/python-graphblas/python-graphblas
Version submitted: 2023.1.0
Editor: @tomalrussell
Reviewer 1: @sneakers-the-rat
Reviewer 2: @szhorvat
Archive:
JOSS DOI: N/A
Version accepted: 2023.7.0
Date accepted (month/day/year): 07/14/2023
Description
Python-graphblas is like a faster, more capable scipy.sparse
that can implement NetworkX. It is a Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics. Python-graphblas mimics the math notation, making it the most natural way to learn, use, and think about GraphBLAS. In contrast to other high level GraphBLAS bindings, Python-graphblas can fully and cleanly support any implementation of the GraphBLAS C API specification thereby allowing us to be vendor-agnostic.
Scope
- Please indicate which category or categories this package falls under:
- Data retrieval
- Data extraction
- Data munging
- Data deposition
- Reproducibility
- Geospatial
- Education
- Data visualization*
- Scientific software wrappers (added from here)
Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
-
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
- Who is the target audience and what are scientific applications of this package?
Audience: anybody who works with sparse data or graphs. We are also implementing a backend to NetworkX (which supports dispatching in version 3.0) written in Python-graphblas called graphblas-algorithms
, so we are quite literally targeting NetworkX users!
Python-graphblas provides a faster, easier, more flexible, and more scalable way to operate on sparse data, including for graph algorithms. There are too many scientific applications to list ranging from neuroscience, genomics, biology, etc. It may be useful wherever scipy.sparse
or NetworkX are used. Although GraphBLAS was designed to build graph algorithms, it is flexible enough to be used in other applications. Anecdotally, most of our current users that I know about are from research groups in universities and laboratories.
We are also targeting applications that need very large distributed graphs. We have experimented with Dask-ifying python-graphblas
here, and we get regular interest from people who want e.g. distributed PageRank or connected components.
- Are there other Python packages that accomplish the same thing? If so, how does yours differ?
pygraphblas
, which hasn't been updated in more than 16 months. There are many differences in syntax, functionality, philosophy, architecture, and (I would argue) robustness and maturity. python-graphblas
syntax targets the math syntax, whereas pygraphblas
is much closer to C. python-graphblas
handles dtypes much more robustly, has efficient conversions to/from numpy and other formats, is architected to handle additional GraphBLAS implementations (more are on the way!), has exceptional error messages, has many more tests and functionality, supports Windows, and much, much more. We have also been growing our team, because sustainability is very important to us.
Although we have/had irreconcilable differences (which is why we decided to create python-graphblas
), the authors have always been cordial. We all believe strongly in the ethos of open source, and I would describe our relationship as having "radical generosity". For example, we have an outstanding agreement that each library is welcome to "borrow" from the other (with credit). We may "borrow" some of their documentation :)
We also worked together to create and maintain the C binding to SuiteSparse:GraphBLAS:
https://github.com/GraphBLAS/python-suitesparse-graphblas/
We could use help automatically generating wheels for this library on major platforms via cibuildwheel.
- If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:
Limited prior discussion in this issue: pyOpenSci/python-package-guide#21 (comment)
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
- does not violate the Terms of Service of any service it interacts with.
- has an OSI approved license.
- contains a README with instructions for installing the development version.
- includes documentation with examples for all functions.
- We're working on this! The C API and SuiteSparse:GraphBLAS C library are both well documented. We have a very large API surface area to cover, so "documentation with examples for all functions" is a really, really high bar, but one I hope we achieve someday :)
- contains a vignette with examples of its essential functions and uses.
- has a test suite.
- has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
Publication options
- Do you wish to automatically submit to the Journal of Open Source Software? If so:
- Undecided. We don't have a paper, but in principle I would like for us to submit a paper to JOSS someday.
JOSS Checks
- The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
- The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
- The package contains a
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
. - The package is deposited in a long-term repository with the DOI:
Note: Do not submit your package separately to JOSS
Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
- Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.
Code of conduct
- I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
Other comments (manually added)
Given a product mindset, we believe that Python-graphblas is a great product, but I think our go-to-market strategy has been lacking. We have been very engineering-heavy, and even our goal of targeting NetworkX users is engineering-heavy via creating graphblas-algorithms
. I hope this peer-review process can help us prioritize our efforts (such as a plan to improve documentation) as well as a place to write a blog post or two.
Please fill out our survey
- Last but not least please fill out our pre-review survey. This helps us track
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Metadata
Metadata
Assignees
Type
Projects
Status