pandas 3.0: segmented_index() and set_index_dtypes() #490

hagenw · 2026-01-23T10:06:38Z

Updates code and tests of audformat.utils.set_index_dtypes() and audformat.segmented_index() to be compatible with pandas 3.0.

sourcery-ai · 2026-01-23T10:06:52Z

Reviewer's Guide

Adjusts tests and the implementation of audformat.utils.set_index_dtypes() to align index and MultiIndex dtype handling with pandas 2.x and 3.0, especially for empty indexes, timedelta, datetime, and segmented indexes.

Sequence diagram for set_index_dtypes() MultiIndex timedelta conversion

sequenceDiagram
    participant Caller
    participant Utils as audformat_utils
    participant Pandas as pandas

    Caller->>Utils: set_index_dtypes(index, dtypes)
    Utils->>Utils: Build df from index
    Utils->>Utils: Detect MultiIndex and iterate levels
    Utils->>Utils: Check if level dtype is timedelta64
    alt timedelta64 level
        Utils->>Pandas: to_timedelta(list(df[level]))
        Pandas-->>Utils: TimedeltaArray
        Utils->>Pandas: astype(dtype) on TimedeltaArray
        Pandas-->>Utils: TimedeltaArray with target dtype
        Utils->>Utils: Assign back to df[level]
    else other dtype
        Utils->>Pandas: astype(dtype) on df[level]
        Pandas-->>Utils: Series with target dtype
        Utils->>Utils: Assign back to df[level]
    end
    Utils->>Pandas: MultiIndex.from_frame(df)
    Pandas-->>Utils: MultiIndex with updated dtypes
    Utils-->>Caller: Converted index

Flow diagram for updated set_index_dtypes() timedelta handling

flowchart TD
    A[start set_index_dtypes] --> B["Create DataFrame df from index"]
    B --> C{"Index is MultiIndex?"}
    C -- No --> D["Handle non-MultiIndex dtypes (not shown)"]
    D --> Z["Return converted index"]
    C -- Yes --> E["Iterate over MultiIndex levels"]
    E --> F{"Target level dtype is timedelta64?"}
    F -- No --> G["df[level] = df[level].astype(dtype)"]
    G --> H["Continue with next level or finish"]
    F -- Yes --> I["df[level] = pd.to_timedelta(list(df[level]))"]
    I --> J["df[level] = df[level].astype(dtype)"]
    J --> H
    H --> K["index = pd.MultiIndex.from_frame(df)"]
    K --> Z["Return converted index"]

File-Level Changes

Change	Details	Files
Update set_index_dtypes() to correctly cast timedelta index levels to the requested dtype under pandas 3.0.	When a MultiIndex level needs a timedelta64 dtype, wrap pd.to_timedelta() with .astype(dtype) instead of relying on default timedelta casting. Preserves the intended target dtype for timedelta levels while avoiding TypeError when casting from datetime-like data.	`audformat/core/utils.py`
Align tests for set_index_dtypes() with pandas 2.x and 3.0 index dtype behavior.	Explicitly set dtype='object' for string-like Index inputs where pandas 3 changes the default dtype. Force timedelta and datetime arrays used in MultiIndex construction to have explicit 'timedelta64[ns]' or 'datetime64[ns]' dtypes for stable comparisons across pandas versions. Replace audformat.segmented_index() in one test with an explicit MultiIndex.from_arrays construction that fixes the dtypes of file, start, and end levels. Use 'str' instead of 'string' in dtypes mapping where the implementation expects NumPy/pandas dtype aliases instead of the pandas StringDtype name. Add a new test case for empty segmented MultiIndex where start/end levels are converted from int64/object to timedelta64[ns] to cover the pandas 3.0 regression scenario.	`tests/test_utils.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've left some high level feedback:

In the tests, dtypes sometimes uses "string" and sometimes "str" for string conversion; consider standardizing on one of these to avoid ambiguity about what the helper is supposed to accept.
The new astype(dtype) after pd.to_timedelta(list(df[level])) in set_index_dtypes() assumes dtype is a NumPy timedelta64 dtype (e.g. timedelta64[ns]); if other timedelta-like specifiers are valid inputs to dtypes, it may be safer to normalize/validate dtype before casting.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In the tests, `dtypes` sometimes uses "string" and sometimes "str" for string conversion; consider standardizing on one of these to avoid ambiguity about what the helper is supposed to accept.
- The new `astype(dtype)` after `pd.to_timedelta(list(df[level]))` in `set_index_dtypes()` assumes `dtype` is a NumPy timedelta64 dtype (e.g. `timedelta64[ns]`); if other timedelta-like specifiers are valid inputs to `dtypes`, it may be safer to normalize/validate `dtype` before casting.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

codecov · 2026-01-23T10:08:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.0%. Comparing base (3c88176) to head (0c775ee).
⚠️ Report is 1 commits behind head on dev.

Additional details and impacted files

Files with missing lines	Coverage Δ
audformat/core/index.py	`100.0% <100.0%> (ø)`
audformat/core/utils.py	`100.0% <100.0%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

* Add failing test * Make test pandas 3.0.0 compatible * Fix set_index_dtypes() for pandas 3.0 * Add comment * Fix doctests * Update segmented_index() * Use segmented_index in test * Add test for segmented_index

* pandas 3.0: segmented_index() and set_index_dtypes() (#490) * Add failing test * Make test pandas 3.0.0 compatible * Fix set_index_dtypes() for pandas 3.0 * Add comment * Fix doctests * Update segmented_index() * Use segmented_index in test * Add test for segmented_index * Avoid warning in testing.add_table() (#491) * pandas 3.0: fix utils.hash() (#492) * pandas 3.0: fix utils.hash() * Fix comment * Remove unneeded code * Add more tests * Preserve ordered setting * Update comment * Fix categorical dtype with Database.get() (#493) * Fix categorical dtype with Database.get() * Update tests * Add additional test * Improve code * Clean up comment * We converted to categorical data * Simplify test * Simplify string test * Require timedelta64[ns] in assert_index() (#494) * Require timedelta64[ns] in assert_index() * Add tests for mixed cases * pandas 3.0: fix doctests output

sourcery-ai bot reviewed Jan 23, 2026

View reviewed changes

hagenw added 5 commits January 23, 2026 11:32

Add failing test

1598844

Make test pandas 3.0.0 compatible

697add8

Fix set_index_dtypes() for pandas 3.0

863059e

Add comment

3b56f3b

Fix doctests

8b968b4

hagenw force-pushed the fix-set-index-dtypes branch from 4eee423 to 8b968b4 Compare January 23, 2026 10:32

hagenw changed the base branch from main to dev January 23, 2026 10:33

hagenw added 3 commits January 23, 2026 11:40

Update segmented_index()

a387e3a

Use segmented_index in test

85982ab

Add test for segmented_index

0c775ee

hagenw changed the title ~~Fix set_index_dtypes() for pandas 3.0~~ pandas 3.0: segmented_index() and set_index_dtypes() Jan 23, 2026

hagenw merged commit 5d0a915 into dev Jan 23, 2026
13 checks passed

hagenw deleted the fix-set-index-dtypes branch January 23, 2026 10:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas 3.0: segmented_index() and set_index_dtypes() #490

pandas 3.0: segmented_index() and set_index_dtypes() #490

Uh oh!

hagenw commented Jan 23, 2026 •

edited

Loading

Uh oh!

sourcery-ai bot commented Jan 23, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

codecov bot commented Jan 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pandas 3.0: segmented_index() and set_index_dtypes() #490

pandas 3.0: segmented_index() and set_index_dtypes() #490

Uh oh!

Conversation

hagenw commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for set_index_dtypes() MultiIndex timedelta conversion

Flow diagram for updated set_index_dtypes() timedelta handling

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hagenw commented Jan 23, 2026 •

edited

Loading

sourcery-ai bot commented Jan 23, 2026 •

edited

Loading

codecov bot commented Jan 23, 2026 •

edited

Loading