-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor changes to make pandas 1.4.0 work #1421
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #1421 +/- ##
==========================================
+ Coverage 83.46% 83.48% +0.02%
==========================================
Files 64 64
Lines 6930 6937 +7
==========================================
+ Hits 5784 5791 +7
Misses 1146 1146
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! It seems like the timezone thing isn't a huge issue? Maybe we address it in a separate PR.
1.4.0 added support for reading csvs with pyarrow. I'm going to experiment running CEMS using the pyarrow engine for pd.read_csv()
Yeah, I created another issue for the timezone thing, but I do think we should investigate it and come up with a way to treat the timezone aware/naive columns appropriately. We need to give |
A PR to fix anything that Pandas 1.4.0 breaks. Draft for now because I am running
tox -e nuke
locally.The full ETL + data validation works with this PR, but it's also giving a warning about converting between naive & timezone aware datetime objects using
df.astype()
and I'm concerned that usingapply_pudl_dtypes()
across the board may be messing up timezone info. But I think that's a separate issue (See: #1423)