-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes for pandas 2 #416
fixes for pandas 2 #416
Conversation
fix for line_terminator set indexer fix removed moouu tests starting to fix applys with np functions ugh more apply ufunc fixes more apply ufunc fixes more apply ufunc fixes more pandas bs more fixes ugh more more fixes more more less reals for py3.9 mem issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a lot of the Pandas2 issues can be traced back to this
df.loc[:, col] = df.loc[:, col].apply(str).apply(converters[col])
assignment. I have run a few tests and it looks like RHS values are now cast to LHS dtype when using this type of assignment. Not sure if this is a bug in Pandas or not.
Even df.loc[:, col] = df.loc[:, col].astype(int)
will not modify the dtype of df.loc[:, col]
. As a result this conversion line was not doing anything and our dtypes were remaining as object type causing downstream np.ufuncs to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtwhite79, thanks for chasing this through. I have done some more hunting and it looks like the issues can be traced back to a change in how df.loc[:, col]=RHS
assignments are working in Pandas 2.0 (dtype changes are not happening as we might think/need).
I have made some changes locally that hopefully mean that we can strip back some of these mods. I'll push to this commit.
Looks like issues are less related to the use of ufunc (e.g. np.log10) on Object type series (which has be deprecated since before Pandas 1.5) but rather related to assignments with `df.loc[:,col] = RHS` casting RHS into the dtype of `df[col]` and, therefore, when RHS is expected to update the series dtype (e.g. to float or int [using apply() or astype()]) the action does nothing. Conversion of pst data cols was not occuring as expected and subsequent ufunc calls (e.g. log10) were failing. A big WTF but seems to work if the assignment is of the form `df[col]=RHS`. Have stripped back most of JTW mods after making this change in pst_handler.py
Chasing pandas 2.0 issues
ran out of steam on one for the res 1-to-1 plot tests, otherwise, things seem to be at least passing tests now...