#28283 , Finalize coverage for DataFrame.merge #1
+103
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Issue pandas-dev#28283
Evaluation and Solution Summary
Issue Analysis:
The GitHub issue pandas-dev#28283 is about improving the coverage of NDFrame.finalize in pandas. Specifically, many pandas methods (including DataFrame.merge) don't properly call finalize to propagate metadata like attrs and flags from input DataFrames to the result.
Problem:
When you perform a merge operation on DataFrames that have metadata (stored in .attrs), the resulting DataFrame loses this metadata because the merge methods don't call finalize.
Solution Components:
Core Fix: Modify the merge-related functions in pandas to call finalize after creating the result DataFrame.
Key Files to Modify:
Implementation Strategy:
Add result.finalize(left, method="merge") calls after merge operations
Use the left DataFrame as the primary source for metadata propagation
Ensure all merge variants (inner, outer, left, right, asof) are covered
Handle both DataFrame-DataFrame and DataFrame-Series merges
Testing Strategy:
Benefits of the Fix:
Preserves important metadata during merge operations
Maintains consistency with other pandas operations that already call finalize
Enables better data lineage tracking
Supports custom metadata propagation workflows
Implementation Notes:
The fix follows pandas' existing pattern of calling finalize in similar operations
Metadata conflicts are resolved by preferring the left DataFrame's attributes
The solution is backward compatible and doesn't change the existing API
Performance impact is minimal since finalize is only called once per operation
This solution addresses the specific DataFrame.merge part of the broader issue pandas-dev#28283 and provides a template for fixing other methods mentioned in the issue.