-
Notifications
You must be signed in to change notification settings - Fork 1
Fix to_segmented_index() for pandas 3.0 #488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideAdjusts audformat.utils.to_segmented_index() to preserve nanosecond timedelta precision under pandas 3.0’s new default of second precision, and adds a regression test to guard against FutureWarnings and precision loss when filling NaT segment ends from file durations. File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've left some high level feedback:
- Before calling
ends = ends.astype("timedelta64[ns]"), consider guarding with a check thatendshas a timedelta-like dtype (e.g., usingis_timedelta64_dtype) to avoid surprising failures if the index level type changes in the future. - In
test_to_segmented_index_timedelta_precision, you can simplify and make the duration comparison more robust by usingpandas.testing.assert_index_equal(orassert_series_equal) forresult_endsvsexpected_endsinstead of the manual loop.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Before calling `ends = ends.astype("timedelta64[ns]")`, consider guarding with a check that `ends` has a timedelta-like dtype (e.g., using `is_timedelta64_dtype`) to avoid surprising failures if the index level type changes in the future.
- In `test_to_segmented_index_timedelta_precision`, you can simplify and make the duration comparison more robust by using `pandas.testing.assert_index_equal` (or `assert_series_equal`) for `result_ends` vs `expected_ends` instead of the manual loop.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
We have now too many changes here, and introduced a lot of new issues that were not present before, so we should maybe better target the updates step by step in several pull requests. |
Closes #487
This add tests and fixes for the new
pandas==3.0.0timdelta[s]instead oftimedelta[ns]default.This required the following fixes:
audformat.utils.to_segmented_index()timedelta64[ns]before iloc assignmentaudformat.utils.union()timedelta64[ns]in all code pathsaudformat.utils.intersect()timedelta64[ns]audformat.utils.set_index_dtypes().astype(dtype)afterpd.to_timedelta()for empty levelsaudformat.segmented_index()set_index_dtypesto ensuretimedelta64[ns]audformat.testing.add_table()pd.to_timedelta()callaudformat.utils.hash()objectdtype for string columns to get same hash under Python 3.14Summary by Sourcery
Ensure segmented index duration handling preserves sub-second precision with pandas 3.0 and later.
Bug Fixes:
Tests: