-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply new naming conventions to devtool
notebooks
#3228
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -50,61 +50,8 @@ | |||
"\n", | |||
"from pudl.etl import defs\n", | |||
"\n", | |||
"asset_key = \"exploded_balance_sheet_assets_ferc1\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed some custom asset exploration work that got committed here. This is just suppose to be a template notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted this file because the PudlTabl method dagsterfication is complete!
aws s3 cp s3://pudl.catalyst.coop/nightly/pudl.sqlite.gz ./ --no-sign-request | ||
|
||
.. note:: | ||
|
||
To reduce network transfer times, we ``gzip`` the SQLite database files, which can | ||
be quite large when uncompressed. To decompress them locally, at the command line | ||
on Linux, MacOS, or Windows you can use the ``gunzip`` command. | ||
|
||
.. code-block:: console | ||
|
||
$ gunzip *.sqlite.gz | ||
|
||
On Windows you can also use a 3rd party tool like | ||
`7zip <https://www.7-zip.org/download.html>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This portion of the docs was out of date.
@@ -156,8 +156,8 @@ | |||
"\n", | |||
"def get_table_classes(module):\n", | |||
" classes = [member[1] for member in inspect.getmembers(module, inspect.isclass)]\n", | |||
" table_classes = [x for x in classes if x.__name__.endswith(\"Ferc1TableTransformer\")]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmgosnell I ran into some FERC related errors I wasn't sure how to solve in the last two cells of this notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay on the last cell i believe is a @katherinelamb / @zschira question/ verification of my assumption: it looks like the new FERC plant classifier got pulled out of the transform step and thus this special exception of needing to pass in the fuel table into the steam tables' transform is no longer needed! so if that's true i think we can delete this cell and take out the if table_name == "core_ferc1__yearly_steam_plants_sched402":
in the previous cell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the second to last cell failed while validating the calculations in the table. the expected error rates were all set using full and fast etl settings and this notebook has a cell up top that set years = [2020, 2021]
if you change it to years = [2020, 2022]
and rerun the notebook this will work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is definitely a fragile part of testing the transform step because BELIEVE IT OR NOT the newer years are much less clean in the calculations than the old years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmgosnell, yes that's true! core_ferc1__yearly_steam_plants_sched402
is now handled like any other transform, and no longer has plant IDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks y'all! Just made the changes.
… update years in devtools/debug-ferc1-etl.ipynb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i opened em all up and restart run all cells and they all ran and made sense to me with a minor exception that should be vv easy to fix in sqlite-table-diff.ipynb
devtools/sqlite-table-diff.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one there is still a denorm_plants_steam_ferc1
down in the last cell!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch! Just updated it.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #3228 +/- ##
=====================================
Coverage 92.6% 92.7%
=====================================
Files 143 143
Lines 12979 13047 +68
=====================================
+ Hits 12025 12089 +64
- Misses 954 958 +4 ☔ View full report in Codecov by Sentry. |
Overview
This PR updates the devtool notebooks to use the new asset naming conventions.
Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list