-
Notifications
You must be signed in to change notification settings - Fork 1
pandas 3.0: segmented_index() and set_index_dtypes() #490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideAdjusts tests and the implementation of audformat.utils.set_index_dtypes() to align index and MultiIndex dtype handling with pandas 2.x and 3.0, especially for empty indexes, timedelta, datetime, and segmented indexes. Sequence diagram for set_index_dtypes() MultiIndex timedelta conversionsequenceDiagram
participant Caller
participant Utils as audformat_utils
participant Pandas as pandas
Caller->>Utils: set_index_dtypes(index, dtypes)
Utils->>Utils: Build df from index
Utils->>Utils: Detect MultiIndex and iterate levels
Utils->>Utils: Check if level dtype is timedelta64
alt timedelta64 level
Utils->>Pandas: to_timedelta(list(df[level]))
Pandas-->>Utils: TimedeltaArray
Utils->>Pandas: astype(dtype) on TimedeltaArray
Pandas-->>Utils: TimedeltaArray with target dtype
Utils->>Utils: Assign back to df[level]
else other dtype
Utils->>Pandas: astype(dtype) on df[level]
Pandas-->>Utils: Series with target dtype
Utils->>Utils: Assign back to df[level]
end
Utils->>Pandas: MultiIndex.from_frame(df)
Pandas-->>Utils: MultiIndex with updated dtypes
Utils-->>Caller: Converted index
Flow diagram for updated set_index_dtypes() timedelta handlingflowchart TD
A[start set_index_dtypes] --> B["Create DataFrame df from index"]
B --> C{"Index is MultiIndex?"}
C -- No --> D["Handle non-MultiIndex dtypes (not shown)"]
D --> Z["Return converted index"]
C -- Yes --> E["Iterate over MultiIndex levels"]
E --> F{"Target level dtype is timedelta64?"}
F -- No --> G["df[level] = df[level].astype(dtype)"]
G --> H["Continue with next level or finish"]
F -- Yes --> I["df[level] = pd.to_timedelta(list(df[level]))"]
I --> J["df[level] = df[level].astype(dtype)"]
J --> H
H --> K["index = pd.MultiIndex.from_frame(df)"]
K --> Z["Return converted index"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've left some high level feedback:
- In the tests,
dtypessometimes uses "string" and sometimes "str" for string conversion; consider standardizing on one of these to avoid ambiguity about what the helper is supposed to accept. - The new
astype(dtype)afterpd.to_timedelta(list(df[level]))inset_index_dtypes()assumesdtypeis a NumPy timedelta64 dtype (e.g.timedelta64[ns]); if other timedelta-like specifiers are valid inputs todtypes, it may be safer to normalize/validatedtypebefore casting.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the tests, `dtypes` sometimes uses "string" and sometimes "str" for string conversion; consider standardizing on one of these to avoid ambiguity about what the helper is supposed to accept.
- The new `astype(dtype)` after `pd.to_timedelta(list(df[level]))` in `set_index_dtypes()` assumes `dtype` is a NumPy timedelta64 dtype (e.g. `timedelta64[ns]`); if other timedelta-like specifiers are valid inputs to `dtypes`, it may be safer to normalize/validate `dtype` before casting.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
🚀 New features to boost your workflow:
|
4eee423 to
8b968b4
Compare
* Add failing test * Make test pandas 3.0.0 compatible * Fix set_index_dtypes() for pandas 3.0 * Add comment * Fix doctests * Update segmented_index() * Use segmented_index in test * Add test for segmented_index
* pandas 3.0: segmented_index() and set_index_dtypes() (#490) * Add failing test * Make test pandas 3.0.0 compatible * Fix set_index_dtypes() for pandas 3.0 * Add comment * Fix doctests * Update segmented_index() * Use segmented_index in test * Add test for segmented_index * Avoid warning in testing.add_table() (#491) * pandas 3.0: fix utils.hash() (#492) * pandas 3.0: fix utils.hash() * Fix comment * Remove unneeded code * Add more tests * Preserve ordered setting * Update comment * Fix categorical dtype with Database.get() (#493) * Fix categorical dtype with Database.get() * Update tests * Add additional test * Improve code * Clean up comment * We converted to categorical data * Simplify test * Simplify string test * Require timedelta64[ns] in assert_index() (#494) * Require timedelta64[ns] in assert_index() * Add tests for mixed cases * pandas 3.0: fix doctests output
Updates code and tests of
audformat.utils.set_index_dtypes()andaudformat.segmented_index()to be compatible with pandas 3.0.