Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply new naming conventions to devtool notebooks #3228

Merged
merged 5 commits into from
Jan 17, 2024

Conversation

bendnorman
Copy link
Member

@bendnorman bendnorman commented Jan 10, 2024

Overview

This PR updates the devtool notebooks to use the new asset naming conventions.

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -50,61 +50,8 @@
"\n",
"from pudl.etl import defs\n",
"\n",
"asset_key = \"exploded_balance_sheet_assets_ferc1\"\n",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed some custom asset exploration work that got committed here. This is just suppose to be a template notebook.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this file because the PudlTabl method dagsterfication is complete!

Comment on lines +225 to +238
aws s3 cp s3://pudl.catalyst.coop/nightly/pudl.sqlite.gz ./ --no-sign-request

.. note::

To reduce network transfer times, we ``gzip`` the SQLite database files, which can
be quite large when uncompressed. To decompress them locally, at the command line
on Linux, MacOS, or Windows you can use the ``gunzip`` command.

.. code-block:: console

$ gunzip *.sqlite.gz

On Windows you can also use a 3rd party tool like
`7zip <https://www.7-zip.org/download.html>`__.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This portion of the docs was out of date.

@@ -156,8 +156,8 @@
"\n",
"def get_table_classes(module):\n",
" classes = [member[1] for member in inspect.getmembers(module, inspect.isclass)]\n",
" table_classes = [x for x in classes if x.__name__.endswith(\"Ferc1TableTransformer\")]\n",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmgosnell I ran into some FERC related errors I wasn't sure how to solve in the last two cells of this notebook.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay on the last cell i believe is a @katherinelamb / @zschira question/ verification of my assumption: it looks like the new FERC plant classifier got pulled out of the transform step and thus this special exception of needing to pass in the fuel table into the steam tables' transform is no longer needed! so if that's true i think we can delete this cell and take out the if table_name == "core_ferc1__yearly_steam_plants_sched402": in the previous cell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the second to last cell failed while validating the calculations in the table. the expected error rates were all set using full and fast etl settings and this notebook has a cell up top that set years = [2020, 2021] if you change it to years = [2020, 2022] and rerun the notebook this will work fine.

Copy link
Member

@cmgosnell cmgosnell Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely a fragile part of testing the transform step because BELIEVE IT OR NOT the newer years are much less clean in the calculations than the old years.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmgosnell, yes that's true! core_ferc1__yearly_steam_plants_sched402 is now handled like any other transform, and no longer has plant IDs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks y'all! Just made the changes.

@bendnorman bendnorman linked an issue Jan 10, 2024 that may be closed by this pull request
… update years in devtools/debug-ferc1-etl.ipynb
@zaneselvans zaneselvans added this to the v2024.01 milestone Jan 12, 2024
Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i opened em all up and restart run all cells and they all ran and made sense to me with a minor exception that should be vv easy to fix in sqlite-table-diff.ipynb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one there is still a denorm_plants_steam_ferc1 down in the last cell!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch! Just updated it.

@bendnorman bendnorman marked this pull request as ready for review January 17, 2024 17:49
Copy link

codecov bot commented Jan 17, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (f4f2470) 92.6% compared to head (b4baedb) 92.7%.
Report is 4 commits behind head on main.

Files Patch % Lines
...rc/pudl/analysis/record_linkage/embed_dataframe.py 92.1% 5 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #3228   +/-   ##
=====================================
  Coverage   92.6%   92.7%           
=====================================
  Files        143     143           
  Lines      12979   13047   +68     
=====================================
+ Hits       12025   12089   +64     
- Misses       954     958    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bendnorman bendnorman merged commit eb7b278 into main Jan 17, 2024
20 checks passed
@bendnorman bendnorman deleted the update-table-names-devtools branch January 17, 2024 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Clean up devtools notebooks to use new names
4 participants