ENH: Add sort_columns parameter to combine_first #60437
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR enhances the combine_first method in pandas.DataFrame by adding a new parameter, sort_columns, which allows users to control whether the result's column order should be sorted lexicographically or preserve the original order of the calling DataFrame (self).
Currently, combine_first always returns a DataFrame with columns sorted in lexicographical order, which may not be desirable for users who want to maintain the column order of the original DataFrame
With the new sort_columns parameter:
Default Behavior (sort_columns=True): Columns remain sorted as before.
New Behavior (sort_columns=False): Columns retain the order from the original DataFrame (self).
Tests: Added new test cases in pandas/tests/frame/methods/test_combine_first.py to validate:
Default behavior with sort_columns=True.
Column order preservation with sort_columns=False.
Documentation:
Updated the docstring for combine_first with examples showcasing the new parameter.
Added a changelog entry in doc/source/whatsnew/v3.0.0.rst.
This enhancement maintains backward compatibility, as the default behavior (sort_columns=True) remains unchanged. The new parameter provides additional flexibility for users who need control over column order.