Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: Support to_string for Duration dtype #19663

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Nov 6, 2024

Closes #7174.

Consolidates four duration formatting functions (used to convert Duration to String when printing the frame repr) into one, and enables reuse of this function for casting to string.

Note: This seems like the most reasonable/consistent cast; turning a duration into a specific type of string (such as ISO8601 format1) should be done with a dedicated expression.

Example

from datetime import timedelta
import polars as pl

df = pl.DataFrame(
    {'td': [
        timedelta(days=180, seconds=56789, microseconds=987654),
        timedelta(days=0, seconds=64875, microseconds=8884),
        timedelta(days=2, hours=23, seconds=4975, milliseconds=1),
        timedelta(hours=1, seconds=1, milliseconds=1, microseconds=1),
        timedelta(seconds=-42, milliseconds=42),
        None,
    ]},
)

df.with_columns(
    td_str=pl.col("td").cast(str),
)
# shape: (6, 2)
# ┌───────────────────────────┬───────────────────────────┐
# │ td                        ┆ td_str                    │
# │ ---                       ┆ ---                       │
# │ duration[μs]              ┆ str                       │
# ╞═══════════════════════════╪═══════════════════════════╡
# │ 180d 15h 46m 29s 987654µs ┆ 180d 15h 46m 29s 987654µs │
# │ 18h 1m 15s 8884µs         ┆ 18h 1m 15s 8884µs         │
# │ 3d 22m 55s 1000µs         ┆ 3d 22m 55s 1000µs         │
# │ 1h 1s 1001µs              ┆ 1h 1s 1001µs              │
# │ -41s -958000µs            ┆ -41s -958000µs            │
# │ null                      ┆ null                      │
# └───────────────────────────┴───────────────────────────┘

Footnotes

  1. ISO 8601 duration format

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Nov 6, 2024
@ritchie46
Copy link
Member

I understand the feature, but it feels to me more like a formatting functionality than a cast. I think a to_string with formatting options is more fitting.

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Nov 6, 2024

I understand the feature, but it feels to me more like a formatting functionality than a cast. I think a to_string with formatting options is more fitting.

Yup, can redo it inside a to_string or equivalent; however, the only duration format I'm aware of that has been standardised is ISO8601, which doesn't offer any flexibility: it's always P(n)Y(n)M(n)DT(n)H(n)M(n)S, where S is fractional secs - the standard doesn't separately account for milli/micro/nano (any zero entries can be omitted, as long as at least one entry is present).

We would have to output P(n)DT(n)H(n)M(n)S (as the number of days in Y and M is obviously not fixed), so I don't think there are many opportunities for custom formatting, unless you have any other ideas for that?

(It would be nice to be able to get the same form we display for some purposes; maybe that's why it felt like a cast 🤔)

@alexander-beedie alexander-beedie force-pushed the cast-duration-to-string branch 2 times, most recently from 7437325 to 3e3b435 Compare November 6, 2024 18:12
@ritchie46
Copy link
Member

We already have dt.to_string for those reasons. We don't have to support different formats, but I do think it must be under a to_string method as it is not a cast but a formatting representation IMO.

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Nov 6, 2024

We already have dt.to_string for those reasons. We don't have to support different formats, but I do think it must be under a to_string method as it is not a cast but a formatting representation IMO.

Ok, can look at turning it into an ISO8601 duration via to_string; any formatting directive will probably have to raise an error, as durations don't work with strftime (or perhaps we allow "iso" and "repr" as format strings for duration, defaulting to one of them if unspecified).

The other temporal types default to their respective ISO format via to_string (if not given a format directive), so this would be reasonable 👍

(Parking in Draft; will come back and finish it up tomorrow)

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 87.78626% with 16 lines in your changes missing coverage. Please review.

Project coverage is 79.73%. Comparing base (3cdb7c2) to head (20bbaed).

Files with missing lines Patch % Lines
...lars-core/src/chunked_array/temporal/conversion.rs 66.66% 7 Missing ⚠️
crates/polars-core/src/fmt.rs 91.89% 6 Missing ⚠️
...polars-core/src/chunked_array/temporal/duration.rs 76.92% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #19663   +/-   ##
=======================================
  Coverage   79.73%   79.73%           
=======================================
  Files        1542     1542           
  Lines      212223   212291   +68     
  Branches     2449     2450    +1     
=======================================
+ Hits       169216   169272   +56     
- Misses      42453    42465   +12     
  Partials      554      554           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexander-beedie alexander-beedie force-pushed the cast-duration-to-string branch 2 times, most recently from 4d8fdeb to 6b6850c Compare November 7, 2024 15:04
@alexander-beedie alexander-beedie changed the title feat: Support cast from Duration to String feat: Support to_string for Duration dtype Nov 7, 2024
@alexander-beedie alexander-beedie force-pushed the cast-duration-to-string branch 8 times, most recently from 20bbaed to ffe3eab Compare November 8, 2024 04:55
@alexander-beedie alexander-beedie changed the title feat: Support to_string for Duration dtype wip: Support to_string for Duration dtype Nov 8, 2024
@alexander-beedie
Copy link
Collaborator Author

Ok, think I've got it; will re-open under a fresh PR 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars title needs formatting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Duration => Utf8 cast; currently returns stringified physical (Int64) representation
2 participants