You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ARROW-854: [Format] Add tentative SparseTensor format
I'm interested in making a language-agnostic sparse tensor format. I believe one of the suitable places to do this is Apache Arrow, so let me propose my idea of this here.
First of all, I found that there is no common memory layout of sparse tensor representations in my investigation. It means we need some kinds of conversion to share sparse tensors among different systems even if the data format is logically the same. It is the same situation as dataframe, and this is the reason why I believe Apache Arrow is the suitable place.
There are many formats to represent a sparse tensor. Most of them are specialized for a matrix, which has two dimensions. There are few formats for general sparse tensor with more than two dimensions.
I think the COO format is suitable to start because COO can handle any dimensions, and many systems support the COO format. In my investigation, the systems support COO are SciPy, dask, pydata/sparse, TensorFlow, and PyTorch.
Additionally, CSR format for matrices may also be good to support at the first time. The reason is that CSR format is efficient to extract row slices, that may be important for extracting samples from tidy data, and it is supported by SciPy, MXNet, and R's Matrix library.
I add my prototype definition of SparseTensor format in this pull-request. I designed this prototype format to be extensible so that we can support additional sparse formats. I think we at least need to support additional sparse tensor format for more than two dimensions in addition to COO so we will need this extensibility.
Author: Kenta Murata <mrkn@mrkn.jp>
Closes#2546 from mrkn/sparse_tensor_proposal and squashes the following commits:
148bff8 <Kenta Murata> make format
d57e56f <Kenta Murata> Merge sparse_tensor_format.h into sparse_tensor.h
880bbc4 <Kenta Murata> Rename too-verbose function name
c83ea6a <Kenta Murata> Add type aliases of sparse tensor types
90e8b31 <Kenta Murata> Rename sparse tensor classes
07a6518 <Kenta Murata> Use substitution instead of constructor call
37a0a14 <Kenta Murata> Remove needless function declaration
97e85bd <Kenta Murata> Use std::make_shared
3dd434c <Kenta Murata> Capitalize member function name
6ef6ad0 <Kenta Murata> Apply code formatter
6f29158 <Kenta Murata> Mark APIs for sparse tensor as EXPERIMENTAL
ff3ea71 <Kenta Murata> Rename length to non_zero_length in SparseTensor
f782303 <Kenta Murata> Return Status::IOError instead of DCHECK if message header type is not matched
7e814de <Kenta Murata> Put EXPERIMENTAL markn in comments
357860d <Kenta Murata> Fix typo in comments
43d8eea <Kenta Murata> Fix coding style
99b1d1d <Kenta Murata> Add missing ARROW_EXPORT specifiers
401ae80 <Kenta Murata> Fix SparseCSRIndex::ToString and add tests
9e457ac <Kenta Murata> Remove needless virtual specifiers
3b1db7d <Kenta Murata> Add SparseTensorBase::Equals
d6a8c38 <Kenta Murata> Unify Tensor.fbs and SparseTensor.fbs
b3a62eb <Kenta Murata> Fix format
6bc9e29 <Kenta Murata> Support IPC read and write of SparseTensor
1d90427 <Kenta Murata> Fix format
51a83bf <Kenta Murata> Add SparseTensorFormat
93c03ad <Kenta Murata> Add SparseIndex::ToString()
021b46b <Kenta Murata> Add SparseTensorBase
ed3984d <Kenta Murata> Add SparseIndex::format_type
4251b4d <Kenta Murata> Add SparseCSRIndex
433c9b4 <Kenta Murata> Change COO index matrix to column-major in a format description
392a25b <Kenta Murata> Implement SparseTensor and SparseCOOIndex
b24f3c3 <Kenta Murata> Insert additional padding in sparse tensor format
c508db0 <Kenta Murata> Write sparse tensor format in IPC.md
2b50040 <Kenta Murata> Add an example of the CSR format in comment
76c56dd <Kenta Murata> Make indptr of CSR a buffer
d7e653f <Kenta Murata> Add an example of COO format in comment
866b2c1 <Kenta Murata> Add header comments in SparseTensor.fbs
aa9b8a4 <Kenta Murata> Add SparseTensor.fbs in FBS_SRC
1f16ffe <Kenta Murata> Fix syntax error in SparseTensor.fbs
c3bc6ed <Kenta Murata> Add tentative SparseTensor format
0 commit comments