Skip to content

Conversation

@hagenw
Copy link
Member

@hagenw hagenw commented Jan 22, 2026

Closes #487

This add tests and fixes for the new pandas==3.0.0 timdelta[s] instead of timedelta[ns] default.

This required the following fixes:

Location Fix
audformat.utils.to_segmented_index() Convert ends to timedelta64[ns] before iloc assignment
audformat.utils.union() Normalize timedelta dtypes to timedelta64[ns] in all code paths
audformat.utils.intersect() Normalize timedelta dtypes to timedelta64[ns]
audformat.utils.set_index_dtypes() Add .astype(dtype) after pd.to_timedelta() for empty levels
audformat.segmented_index() Call set_index_dtypes to ensure timedelta64[ns]
audformat.testing.add_table() Remove unnecessary pd.to_timedelta() call
audformat.utils.hash() Enforce object dtype for string columns to get same hash under Python 3.14

Summary by Sourcery

Ensure segmented index duration handling preserves sub-second precision with pandas 3.0 and later.

Bug Fixes:

  • Fix to_segmented_index to always use nanosecond timedelta precision so assigning high-precision durations no longer raises type or FutureWarning errors with pandas 3.0.

Tests:

  • Add regression test verifying to_segmented_index correctly handles sub-second duration values when the index uses second-level timedelta precision and that no FutureWarning is raised.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 22, 2026

Reviewer's Guide

Adjusts audformat.utils.to_segmented_index() to preserve nanosecond timedelta precision under pandas 3.0’s new default of second precision, and adds a regression test to guard against FutureWarnings and precision loss when filling NaT segment ends from file durations.

File-Level Changes

Change Details Files
Ensure to_segmented_index preserves sub-second timedelta precision under pandas 3.0 and add a regression test for the behavior.
  • Add a pytest that constructs a MultiIndex with timedelta64[s] start/end levels, applies to_segmented_index with sub-second file durations, and asserts no warnings are raised and precise end times are preserved.
  • Update the logic that replaces NaT entries in the segmented index end level to first convert the ends array to a Series and then cast it to timedelta64[ns] before assigning duration values, preventing dtype incompatibilities and precision loss with pandas 3.0 timedelta defaults.
tests/test_utils.py
audformat/core/utils.py

Assessment against linked issues

Issue Objective Addressed Explanation
#487 Update audformat.utils.to_segmented_index() so it correctly handles pandas 3.0.0's changed default timedelta precision (seconds instead of nanoseconds), preserving sub-second precision and avoiding type/warning issues.
#487 Add tests that verify to_segmented_index() works with pandas 3.0.0, including sub-second duration handling and absence of FutureWarning/TypeError due to timedelta dtype incompatibilities.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Before calling ends = ends.astype("timedelta64[ns]"), consider guarding with a check that ends has a timedelta-like dtype (e.g., using is_timedelta64_dtype) to avoid surprising failures if the index level type changes in the future.
  • In test_to_segmented_index_timedelta_precision, you can simplify and make the duration comparison more robust by using pandas.testing.assert_index_equal (or assert_series_equal) for result_ends vs expected_ends instead of the manual loop.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Before calling `ends = ends.astype("timedelta64[ns]")`, consider guarding with a check that `ends` has a timedelta-like dtype (e.g., using `is_timedelta64_dtype`) to avoid surprising failures if the index level type changes in the future.
- In `test_to_segmented_index_timedelta_precision`, you can simplify and make the duration comparison more robust by using `pandas.testing.assert_index_equal` (or `assert_series_equal`) for `result_ends` vs `expected_ends` instead of the manual loop.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@hagenw hagenw marked this pull request as draft January 22, 2026 15:05
@hagenw
Copy link
Member Author

hagenw commented Jan 22, 2026

We have now too many changes here, and introduced a lot of new issues that were not present before, so we should maybe better target the updates step by step in several pull requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pandas 3.0.0 breaks timedelta precision

2 participants