Skip to content

No support for precision reduction when reducing dataset size for pandas dataframe or series. #1278

Open
@eddiebergman

Description

@eddiebergman

We currently have two methods for dataset size reduction, precision and subsample, introduced more clearly in PR #1250. However we have not implemented precision reduction with pandas dataframes as this is a bit more involved, when compared to the fact ndarray's have a uniform type while dataframes ahave a type per column.

We also can not use reduce_dataset_size_if_too_large with dataframes yet as we have not implemented a method to calculate it's size, such that we know how much to subsample.

This shouldn't be too hard to implement but will require updating tests as well.

Edit:
Just adding an extra point to include more nuanced calculation for spare matrices.
arr.data.nbytes + arr.indices.nbytes + arr.indptr.nbytes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions