Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CI workflows, add autoformatters / linters #236

Merged
merged 56 commits into from
Nov 10, 2022
Merged

Conversation

zaneselvans
Copy link
Member

No description provided.

@zaneselvans
Copy link
Member Author

Okay @cmgosnell whenever you have a chance to look, this seems to work as well as it can given the current state of the main branch. The tests fail locally with Index mismatch errors that I think we've already talked about on the other long-running development branch, but the CI and other infrastructure stuff seems to work fine.

@cmgosnell cmgosnell self-assigned this Jun 6, 2022
@cmgosnell cmgosnell added the rmi use on all issues in this repo (for project management tools) label Jun 6, 2022
@zaneselvans
Copy link
Member Author

@katie-lamb I updated the CI and package requirements to only ever use Python 3.10 (and to use the new bot automerge PR workflows) so hopefully your 3.8/3.9 dependency resolution issues can be ignored.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@zaneselvans
Copy link
Member Author

What's the max memory usage of the tests at this point? Is it close to 7GB? I imagine the runner uses some memory on its own just to exist and have an OS, and even if the tests themselves were a bit under 7GB, the sum of the overhead and the tests could still be too much.

@codecov
Copy link

codecov bot commented Oct 18, 2022

Codecov Report

❗ No coverage uploaded for pull request base (main@d9b708d). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #236   +/-   ##
=======================================
  Coverage        ?   74.39%           
=======================================
  Files           ?       10           
  Lines           ?     1183           
  Branches        ?        0           
=======================================
  Hits            ?      880           
  Misses          ?      303           
  Partials        ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@katie-lamb katie-lamb linked an issue Oct 19, 2022 that may be closed by this pull request
5 tasks
@katie-lamb
Copy link
Member

katie-lamb commented Oct 19, 2022

Update update update (for @cmgosnell):

The CI now passes wowww! Major changes:

  • I just made the GitHub workflow CI run on 5 years of data coverage (2015 - 2020). I did this instead of 1 year because with 5 years the matching models only perform slightly worse than with all the data, so the CI is somewhat testing how well the models are performing while still staying under the GitHub memory limit. Running tox or pytest without the --five-year-coverage argument will still run the CI on all the years of data.
  • I added new expected_errors CSV files for the data validation tests for the five year coverage. These roughly match the original expected_errors CSV files for all the data so I just went with it. But maybe there's a better way to check these expected errors files and make sure there's nothing wrong. The expected_errors for all years of data probably also need to be updated so that the validation tests pass.
  • The pudl installation in the pudl-rmi environment now pulls from the dev branch instead of the released version of PUDL. This might lead to more/faster maintenance down the line but it's also way nicer for modifying PUDL and having changes appear in this repo.
  • For clarity, I updated the plant_name_new and ownership columns in the plant part list to be plant_name_ppe and ownership_record_type respectively when it's created in PUDL. These name changes are reflected in this PR. Additionally, pudl.analysis.plant_parts_eia.PLANT_PARTS_ORDERED was taken out and now the PLANT_PARTS global dictionary does all the work.
  • Some clean up was done now that we're not memory constrained.
    • The distinct plant parts list is now an output instead of being made as part of the plant part list output.
    • The training data connections were moved back into the FERC to EIA connection module

I'm not entirely sure why one of the data validation tests isn't passing. It seems like a weird type error because I'm pretty sure the expected and actual data is the same. Need to look into this.

@katie-lamb
Copy link
Member

I feel confused why the validation tests are failing when checking if the index are equal. It seems to be an issue with the type of the report_year index level (the actual index is an Index while the index read in from the CSV is Int64Index. I set the argument exact=False and am still failing the assert even though it's supposed to ignore type differences.

],
level=levels_set_types,
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole index level data type setting business is super janky but it fixes the weird one off error I was getting with the type mismatch. Maybe I should look into a more general/lasting solution here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this looks janky/hard to understand what is happening.. if you are going to keep it i'd suggest at the least adding some why/what comments in here

Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @katie-lamb ! these changes looks good overall but added a smattering of suggestions/questions.

src/pudl_rmi/connect_deprish_to_ferc1.py Outdated Show resolved Hide resolved
src/pudl_rmi/coordinate.py Outdated Show resolved Hide resolved
src/pudl_rmi/coordinate.py Outdated Show resolved Hide resolved
src/pudl_rmi/connect_ferc1_to_eia.py Outdated Show resolved Hide resolved
src/pudl_rmi/deprish.py Outdated Show resolved Hide resolved
test/integration/rmi_out_test.py Outdated Show resolved Hide resolved
],
level=levels_set_types,
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this looks janky/hard to understand what is happening.. if you are going to keep it i'd suggest at the least adding some why/what comments in here

src/pudl_rmi/connect_ferc1_to_eia.py Outdated Show resolved Hide resolved
Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great!

@katie-lamb katie-lamb merged commit 87150eb into main Nov 10, 2022
@katie-lamb katie-lamb deleted the update-ci branch November 10, 2022 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rmi use on all issues in this repo (for project management tools)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enable 1-year of data processing
3 participants