WIP: Support decimal dtypes #17113

wence- · 2024-10-17T17:30:11Z

Description

Polars only has 128 bit decimals. This currently loses the precision for round trips.

I can run a groupby and it seems to work.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Polars only has 128 bit decimals. This currently loses the precision for round trips.

wence- · 2024-10-17T17:35:49Z

Conversion back to arrow doesn't work because we lost the precision argument

vyasr · 2024-10-30T00:16:48Z

Conversion back to arrow doesn't work because we lost the precision argument

Would resolving #17080 (which should be possible with arrow 18) resolve this? Could we carry around extra metadata in the interim to manage this (necessary until we drop support for older versions of arrow anyway)?

wence- · 2024-10-30T11:15:37Z

Conversion back to arrow doesn't work because we lost the precision argument

Would resolving #17080 (which should be possible with arrow 18) resolve this? Could we carry around extra metadata in the interim to manage this (necessary until we drop support for older versions of arrow anyway)?

So I don't really understand decimals, but AFAICT, polars only has an underlying Decimal128 dtype (rather than separate 32, 64, and 128 types like libcudf does). When I therefore go from arrow column to libcudf column, because libcudf's decimal type only cares about the scale and not the precision, we upgrade all of the precisions to "full" (whatever that means). Now when I go back again, the precision argument is different from when we started.

We haven't really looked at dtypes that need extra metadata in cudf-polars yet, but both decimals and struct dtypes do need that. And then, transitively, list dtypes also need it. So I think we're going to have to do something. One thing might just be to keep the polars dtype around and use that in various places.

wence- · 2024-10-30T11:20:24Z

As an example:

import polars as pl
import pylibcudf as plc

df = pl.DataFrame({"a": pl.Series([1, 2, 3], dtype=pl.Decimal(precision=5, scale=3))})

print(df.to_arrow())

print(plc.interop.to_arrow(plc.interop.from_arrow(df.to_arrow())))

pyarrow.Table
a: decimal128(5, 3)
----
a: [[1.000,2.000,3.000]]
pyarrow.Table
: decimal128(38, 3) not null
----
: [[1.000,2.000,3.000]]

So the data are all there, but the precision is missing

vyasr · 2024-10-31T19:19:47Z

Ah OK so this is about the values stored in the dtype, not the dtype itself being lost (which is what my question was about, i.e. whether getting 32 and 64 bit decimals in arrow would help; although even then I guess you'd need those explicitly added in polars too).

WIP: Support decimal dtypes

3b20bcd

Polars only has 128 bit decimals. This currently loses the precision for round trips.

github-actions bot added Python Affects Python cuDF API. cudf.polars Issues specific to cudf.polars labels Oct 17, 2024

vyasr assigned wence- Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Support decimal dtypes #17113

WIP: Support decimal dtypes #17113

wence- commented Oct 17, 2024

wence- commented Oct 17, 2024

vyasr commented Oct 30, 2024

wence- commented Oct 30, 2024

wence- commented Oct 30, 2024

vyasr commented Oct 31, 2024

WIP: Support decimal dtypes #17113

Are you sure you want to change the base?

WIP: Support decimal dtypes #17113

Conversation

wence- commented Oct 17, 2024

Description

Checklist

wence- commented Oct 17, 2024

vyasr commented Oct 30, 2024

wence- commented Oct 30, 2024

wence- commented Oct 30, 2024

vyasr commented Oct 31, 2024