Skip to content

refactor code to work with pandas 2.0 #660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ClaudioSalvatoreArcidiacono
Copy link
Contributor

fixes #659

From pandas 2.0 any only accepts keyworkd arguments
ref pandas-dev/pandas#44896
I have not fully understood why this solve the problem, but splitting
the operation in 2 lines does not seem to work
Now pandas.to_datetime raises a warning when the column cannot be converted
Pandas dataframes created from python integers are created with int
column types `int64` but the operation tested returns `int32` which
caused issues
Merging dfs with different column lelvels has been disallowed
ref pandas-dev/pandas#34862
I am not sure why this caused an issue, maybe due to type casting?
Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ClaudioSalvatoreArcidiacono

Thanks for taking care of this fixes.

I've got to small questions. Could you please have a look?

Thank you!

@@ -216,7 +216,7 @@ def test_raises_error_when_nan_in_transform():
"unit, expected",
[
("D", [31, 61, 183]),
("M", [1.018501, 2.004148, 6.012444]),
("M", [1.0, 1.967741935483871, 5.903225806451613]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are the values changing?

Copy link
Contributor Author

@ClaudioSalvatoreArcidiacono ClaudioSalvatoreArcidiacono Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some time to deep dive into this, it is strange indeed. Might be that something has changed with subtraction between times or that something has changed with divisions of time deltas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely the latter as the test with days passes with no problem. Might as well be a bug with the new version of pandas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a bug with the latest version of pandas. Please see pandas-dev/pandas#52806

Copy link
Contributor Author

@ClaudioSalvatoreArcidiacono ClaudioSalvatoreArcidiacono Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, we could also fix this by reverting the change in the tests and by changing line 321 of datetime /datetime_subtraction.py.py as follows

    def _sub(self, dt_df: pd.DataFrame):
        """make datetime subtraction"""
        new_df = pd.DataFrame()
        for reference in self.reference_:
            new_varnames = [f"{var}_sub_{reference}" for var in self.variables_]
            new_df[new_varnames] = (
                dt_df[self.variables_]
                .sub(dt_df[reference], axis=0)
-               .apply(lambda s: s / np.timedelta64(1, self.output_unit))
+               .div(np.timedelta64(1, self.output_unit).astype("timedelta64[ns]"))
            )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, .div looks more pro than the lambda function.

Awesome work, thanks for digging this out!

It would be great if you can take care of this, then we are good to merge :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! And thanks for your review comments!

I have made the change from the previous comment.

Let me know if you have other comments.

Before merging, it would be great to run tests with pandas 1.5.x just to double check that changes are backward compatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried downgrading to pandas 1.5.3 in my laptop and running the tests. Everything passed for me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Thank you!

@solegalli solegalli changed the title 659 fix unit tests not passing refactor code to work with pandas 2.0 Apr 20, 2023
@solegalli solegalli merged commit feddb06 into feature-engine:main Apr 22, 2023
@ClaudioSalvatoreArcidiacono ClaudioSalvatoreArcidiacono deleted the 659-fix-unit-tests-not-passing branch April 22, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unit tests not passing
3 participants