Skip to content

Conversation

@hagenw
Copy link
Member

@hagenw hagenw commented Jan 23, 2026

Updates code and tests of audformat.utils.set_index_dtypes() and audformat.segmented_index() to be compatible with pandas 3.0.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 23, 2026

Reviewer's Guide

Adjusts tests and the implementation of audformat.utils.set_index_dtypes() to align index and MultiIndex dtype handling with pandas 2.x and 3.0, especially for empty indexes, timedelta, datetime, and segmented indexes.

Sequence diagram for set_index_dtypes() MultiIndex timedelta conversion

sequenceDiagram
    participant Caller
    participant Utils as audformat_utils
    participant Pandas as pandas

    Caller->>Utils: set_index_dtypes(index, dtypes)
    Utils->>Utils: Build df from index
    Utils->>Utils: Detect MultiIndex and iterate levels
    Utils->>Utils: Check if level dtype is timedelta64
    alt timedelta64 level
        Utils->>Pandas: to_timedelta(list(df[level]))
        Pandas-->>Utils: TimedeltaArray
        Utils->>Pandas: astype(dtype) on TimedeltaArray
        Pandas-->>Utils: TimedeltaArray with target dtype
        Utils->>Utils: Assign back to df[level]
    else other dtype
        Utils->>Pandas: astype(dtype) on df[level]
        Pandas-->>Utils: Series with target dtype
        Utils->>Utils: Assign back to df[level]
    end
    Utils->>Pandas: MultiIndex.from_frame(df)
    Pandas-->>Utils: MultiIndex with updated dtypes
    Utils-->>Caller: Converted index
Loading

Flow diagram for updated set_index_dtypes() timedelta handling

flowchart TD
    A[start set_index_dtypes] --> B["Create DataFrame df from index"]
    B --> C{"Index is MultiIndex?"}
    C -- No --> D["Handle non-MultiIndex dtypes (not shown)"]
    D --> Z["Return converted index"]
    C -- Yes --> E["Iterate over MultiIndex levels"]
    E --> F{"Target level dtype is timedelta64?"}
    F -- No --> G["df[level] = df[level].astype(dtype)"]
    G --> H["Continue with next level or finish"]
    F -- Yes --> I["df[level] = pd.to_timedelta(list(df[level]))"]
    I --> J["df[level] = df[level].astype(dtype)"]
    J --> H
    H --> K["index = pd.MultiIndex.from_frame(df)"]
    K --> Z["Return converted index"]
Loading

File-Level Changes

Change Details Files
Update set_index_dtypes() to correctly cast timedelta index levels to the requested dtype under pandas 3.0.
  • When a MultiIndex level needs a timedelta64 dtype, wrap pd.to_timedelta() with .astype(dtype) instead of relying on default timedelta casting.
  • Preserves the intended target dtype for timedelta levels while avoiding TypeError when casting from datetime-like data.
audformat/core/utils.py
Align tests for set_index_dtypes() with pandas 2.x and 3.0 index dtype behavior.
  • Explicitly set dtype='object' for string-like Index inputs where pandas 3 changes the default dtype.
  • Force timedelta and datetime arrays used in MultiIndex construction to have explicit 'timedelta64[ns]' or 'datetime64[ns]' dtypes for stable comparisons across pandas versions.
  • Replace audformat.segmented_index() in one test with an explicit MultiIndex.from_arrays construction that fixes the dtypes of file, start, and end levels.
  • Use 'str' instead of 'string' in dtypes mapping where the implementation expects NumPy/pandas dtype aliases instead of the pandas StringDtype name.
  • Add a new test case for empty segmented MultiIndex where start/end levels are converted from int64/object to timedelta64[ns] to cover the pandas 3.0 regression scenario.
tests/test_utils.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In the tests, dtypes sometimes uses "string" and sometimes "str" for string conversion; consider standardizing on one of these to avoid ambiguity about what the helper is supposed to accept.
  • The new astype(dtype) after pd.to_timedelta(list(df[level])) in set_index_dtypes() assumes dtype is a NumPy timedelta64 dtype (e.g. timedelta64[ns]); if other timedelta-like specifiers are valid inputs to dtypes, it may be safer to normalize/validate dtype before casting.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the tests, `dtypes` sometimes uses "string" and sometimes "str" for string conversion; consider standardizing on one of these to avoid ambiguity about what the helper is supposed to accept.
- The new `astype(dtype)` after `pd.to_timedelta(list(df[level]))` in `set_index_dtypes()` assumes `dtype` is a NumPy timedelta64 dtype (e.g. `timedelta64[ns]`); if other timedelta-like specifiers are valid inputs to `dtypes`, it may be safer to normalize/validate `dtype` before casting.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.0%. Comparing base (3c88176) to head (0c775ee).
⚠️ Report is 1 commits behind head on dev.

Additional details and impacted files
Files with missing lines Coverage Δ
audformat/core/index.py 100.0% <100.0%> (ø)
audformat/core/utils.py 100.0% <100.0%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hagenw hagenw force-pushed the fix-set-index-dtypes branch from 4eee423 to 8b968b4 Compare January 23, 2026 10:32
@hagenw hagenw changed the base branch from main to dev January 23, 2026 10:33
@hagenw hagenw changed the title Fix set_index_dtypes() for pandas 3.0 pandas 3.0: segmented_index() and set_index_dtypes() Jan 23, 2026
@hagenw hagenw merged commit 5d0a915 into dev Jan 23, 2026
13 checks passed
@hagenw hagenw deleted the fix-set-index-dtypes branch January 23, 2026 10:46
hagenw added a commit that referenced this pull request Jan 24, 2026
* Add failing test

* Make test pandas 3.0.0 compatible

* Fix set_index_dtypes() for pandas 3.0

* Add comment

* Fix doctests

* Update segmented_index()

* Use segmented_index in test

* Add test for segmented_index
hagenw added a commit that referenced this pull request Jan 24, 2026
* pandas 3.0: segmented_index() and set_index_dtypes() (#490)

* Add failing test

* Make test pandas 3.0.0 compatible

* Fix set_index_dtypes() for pandas 3.0

* Add comment

* Fix doctests

* Update segmented_index()

* Use segmented_index in test

* Add test for segmented_index

* Avoid warning in testing.add_table() (#491)

* pandas 3.0: fix utils.hash() (#492)

* pandas 3.0: fix utils.hash()

* Fix comment

* Remove unneeded code

* Add more tests

* Preserve ordered setting

* Update comment

* Fix categorical dtype with Database.get() (#493)

* Fix categorical dtype with Database.get()

* Update tests

* Add additional test

* Improve code

* Clean up comment

* We converted to categorical data

* Simplify test

* Simplify string test

* Require timedelta64[ns] in assert_index() (#494)

* Require timedelta64[ns] in assert_index()

* Add tests for mixed cases

* pandas 3.0: fix doctests output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants