-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update PUDL to SQLAlchemy 2.0 #2267
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## dev #2267 +/- ##
=====================================
Coverage 88.7% 88.7%
=====================================
Files 90 90
Lines 10994 10995 +1
=====================================
+ Hits 9758 9759 +1
Misses 1236 1236 ☔ View full report in Codecov by Sentry. |
The unit tests in I still don't fully understand the distinction and when it's better to use |
0ba3a5b
to
4530d8e
Compare
Fast ETL and integration tests are passing locally, so I guess this is ready. I will run a full ETL and validation tests locally to see how that goes. With the centralization of all of our interactions with the database in the I ran a full |
* Use replace engine.connect() with engine.begin() * Update unit tests to work with SQLAlchemy 2.0 * Require SQLAlchemy 2.0. Again. Oops. * Remove deprecated use_nullable_dtypes arg from read_parquet calls. * Use string 'sum' rather than callable sum() in groupby transforms. * Use explicit observed=True in timezone groupby * Update conda-lock.yml and rendered conda environment files.
354c218
to
6f603e9
Compare
@@ -1012,7 +1012,7 @@ def _allocate_unassociated_pm_records( | |||
eia_generators_connected = gen_assoc.loc[connected_mask].assign( | |||
capacity_mw_minus_one=lambda x: x.groupby(idx_minus_one)[ | |||
"capacity_mw" | |||
].transform(sum), | |||
].transform("sum"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing a deprecation warning. Future pandas will use the callable here directly, which is not what we want.
@@ -147,7 +147,7 @@ def local_to_utc(local: pd.Series, tz: Iterable, **kwargs: Any) -> pd.Series: | |||
1 2020-01-01 06:00:00 | |||
dtype: datetime64[ns] | |||
""" | |||
return local.groupby(tz).transform( | |||
return local.groupby(tz, observed=True).transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing a deprecation warning. There are 15 million records with these timestamps and if we're ever operating on a subset (e.g. a single state that only contains one timezone) this behavior will be much more space efficient. Also it'll become the default pandas behavior in the future.
@@ -137,7 +137,6 @@ def epacems( | |||
|
|||
epacems = dd.read_parquet( | |||
epacems_path, | |||
use_nullable_dtypes=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument is deprecated since pandas 2.0 and is now the default behavior.
@@ -193,7 +193,7 @@ def _get_fk_list(self, table: str) -> pd.DataFrame: | |||
method collapses foreign keys with multiple fields into one record for | |||
readability. | |||
""" | |||
with self.engine.connect() as con: | |||
with self.engine.begin() as con: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incredibly, now that all of our interactions with the database are centralized in the IOManager
classes, this is the only place we need to change... anything to work with SQLAlchemy 2.0!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for fixing these deprecation warnings.
PR Overview
While we allowed SQLAlchemy 2.0 in our dependencies one or more of our dependencies prohibited it, but that no longer seems to be the case as of this morning, so I'm getting some local failures in a fresh
pudl-dev
environment. This PR will address whatever issues come up from actually using SQLAlchemy 2.0However for the moment it looks like
pandas
doesn't work with SQLAlchemy 2.0, so the first thing to do is probably re-pin our dependency.Blocked by #2320
PR Checklist
dev
).