Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes for pandas 2 #416

Merged
merged 5 commits into from
Apr 12, 2023
Merged

fixes for pandas 2 #416

merged 5 commits into from
Apr 12, 2023

Conversation

jtwhite79
Copy link
Collaborator

ran out of steam on one for the res 1-to-1 plot tests, otherwise, things seem to be at least passing tests now...

fix for line_terminator

set indexer fix

removed moouu tests

starting to fix applys with np functions ugh

more apply ufunc fixes

more apply ufunc fixes

more apply ufunc fixes

more pandas bs

more fixes

ugh

more

more fixes

more

more

less reals for py3.9 mem issue
@jtwhite79 jtwhite79 requested a review from briochh April 9, 2023 04:48
@coveralls
Copy link

coveralls commented Apr 9, 2023

Coverage Status

Coverage: 79.5% (+1.2%) from 78.319% when pulling 9dbf62e on jtwhite79:hotfix_pandas into 0883198 on pypest:develop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a lot of the Pandas2 issues can be traced back to this
df.loc[:, col] = df.loc[:, col].apply(str).apply(converters[col]) assignment. I have run a few tests and it looks like RHS values are now cast to LHS dtype when using this type of assignment. Not sure if this is a bug in Pandas or not.
Even df.loc[:, col] = df.loc[:, col].astype(int) will not modify the dtype of df.loc[:, col]. As a result this conversion line was not doing anything and our dtypes were remaining as object type causing downstream np.ufuncs to fail.

Copy link
Collaborator

@briochh briochh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtwhite79, thanks for chasing this through. I have done some more hunting and it looks like the issues can be traced back to a change in how df.loc[:, col]=RHS assignments are working in Pandas 2.0 (dtype changes are not happening as we might think/need).

I have made some changes locally that hopefully mean that we can strip back some of these mods. I'll push to this commit.

Looks like issues are less related to the use of ufunc (e.g. np.log10) on Object type series
(which has be deprecated since before Pandas 1.5) but rather related to assignments with
`df.loc[:,col] = RHS` casting RHS into the dtype of `df[col]` and, therefore,
when RHS is expected to update the series dtype (e.g. to float or int [using apply()
or astype()]) the action does nothing. Conversion of pst data cols was not occuring as expected
and subsequent ufunc calls (e.g. log10) were failing.
A big WTF but seems to work if the assignment is of the form `df[col]=RHS`.

Have stripped back most of JTW mods after making this change in pst_handler.py
@briochh
Copy link
Collaborator

briochh commented Apr 11, 2023

@briochh briochh merged commit ceec87a into pypest:develop Apr 12, 2023
@jtwhite79 jtwhite79 deleted the hotfix_pandas branch December 5, 2024 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants