Open
Description
We currently have two methods for dataset size reduction, precision
and subsample
, introduced more clearly in PR #1250. However we have not implemented precision
reduction with pandas dataframes as this is a bit more involved, when compared to the fact ndarray
's have a uniform type while dataframes ahave a type per column.
We also can not use reduce_dataset_size_if_too_large
with dataframes yet as we have not implemented a method to calculate it's size, such that we know how much to subsample
.
This shouldn't be too hard to implement but will require updating tests as well.
Edit:
Just adding an extra point to include more nuanced calculation for spare matrices.
arr.data.nbytes + arr.indices.nbytes + arr.indptr.nbytes