Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: repr aligning left for string dtype columns #54801

Merged
merged 2 commits into from
Aug 28, 2023
Merged

Conversation

phofl
Copy link
Member

@phofl phofl commented Aug 28, 2023

cc @jorisvandenbossche this is a bit hacky, but changing format_array breaks index-related stuff. Don't feel comfortable to do this now, can be a follow up (we are casting to object anyway, so this does not impact performance)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the formatter converts to object dtype anyway, then this looks like a fine short term solution.

Comment on lines 1398 to 1399
if is_string_dtype(values.dtype):
values = np.asarray(values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if is_string_dtype(values.dtype):
values = np.asarray(values)
if is_string_dtype(values.dtype):
# ensure we have an object dtype numpy array
values = np.asarray(values)

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche
Copy link
Member

this is a bit hacky, but changing format_array breaks index-related stuff.

Looking at the code in Index._format_with_header, it seems we just never use format_array in case of object dtype (i.e. current strings). And in the code path for non-object dtype calling format_array, we specify justify="left". That's the reason for this behaviour, and so something we would have to change (or let the value depend on the dtype; I don't fully understand the reasoning for this justify="left" for other dtypes)

@phofl
Copy link
Member Author

phofl commented Aug 28, 2023

Yeah that's correct, we can set justify=all, but this screws with Index formatting if the index has a NaN because we are doing something weird. But that's too big for this here, that's why I went with the short term solution

@jorisvandenbossche
Copy link
Member

I don't fully understand the reasoning for this justify="left" for other dtypes

You can get this "bug" as well for other data types because of this left alignment. For example with numeric index:

In [14]: df = pd.DataFrame({99999999999999999: [1, 2, 3], 1: [4, 5, 6]})

In [15]: df
   99999999999999999  1                
0                  1                  4
1                  2                  5
2                  3                  6

(will move this to the issue)

@mroeschke mroeschke added the Output-Formatting __repr__ of pandas objects, to_string label Aug 28, 2023
@phofl phofl added this to the 2.1 milestone Aug 28, 2023
@phofl phofl merged commit dc0ec0b into pandas-dev:main Aug 28, 2023
33 of 37 checks passed
@phofl phofl deleted the alignment branch August 28, 2023 17:35
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Aug 28, 2023
mroeschke pushed a commit that referenced this pull request Aug 28, 2023
…g dtype columns) (#54819)

Backport PR #54801: BUG: repr aligning left for string dtype columns

Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: wrong alignment of column names in the DataFrame repr when using pyarrow-backed string dtype
3 participants