Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NREL ATB axtraction #3498

Merged
merged 14 commits into from
Mar 26, 2024
Merged

NREL ATB axtraction #3498

merged 14 commits into from
Mar 26, 2024

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Mar 25, 2024

Overview

Closes #3468.

What did you change?

  • added nrel doi into datastore
  • added nrel atb settings
  • added a parquet extractor
  • added a nrel atb extractor
  • mapped columns from multiple years
  • added dagsets assets using raw_df_factory
  • import those assets in the dagset setup

Remaining Questions?

  • Is it okay that the ParquetExtractor assumes that the archive is one non-zipped file per partition?
  • Is there a safer/better way to open these parquet files?
  • Context: I noticed that only the excel extractor was actual renaming columns based on the column_maps by default. This felt wrong so I moved it. Q: This required a little bit of a hack in the unit tests... is that okay? Is there a cleaner way to do this? It relates to the use of any_year in the column maps. I like that convention overall. If we made it a little more general like any or any_part i could imagine adding this default behavior into the generic extractor. I didn't do that only bc it felt oos here.

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

@cmgosnell cmgosnell self-assigned this Mar 25, 2024
@cmgosnell cmgosnell added new-data Requests for integration of new data. gridlab Work related to open modeling input data integration funded/coordinated by GridLab nrelatb NREL's Annual Technology Baseline data labels Mar 25, 2024
@cmgosnell
Copy link
Member Author

hm i accidentally made this branch off of my unfinished eia extraction branch... i think it's okay now but i probably should have done something besides merge in main...

@cmgosnell cmgosnell requested a review from e-belfer March 25, 2024 21:04
@e-belfer e-belfer marked this pull request as ready for review March 26, 2024 14:25
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two smallish blocking changes:

  1. Please add any ATB docs that you're referencing into our data_sources folder.
  2. Suggested changes to the extraction test to actually test the rename, rather than skipping it.

Otherwise extraction works great and as expected.

src/pudl/package_data/nrelatb/column_maps/data.csv Outdated Show resolved Hide resolved
test/unit/extract/csv_test.py Show resolved Hide resolved
@cmgosnell cmgosnell requested a review from e-belfer March 26, 2024 18:42
@cmgosnell cmgosnell added this pull request to the merge queue Mar 26, 2024
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to go!

Merged via the queue into main with commit b9dc700 Mar 26, 2024
14 checks passed
@cmgosnell cmgosnell deleted the extract-nrel-atb branch March 26, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gridlab Work related to open modeling input data integration funded/coordinated by GridLab new-data Requests for integration of new data. nrelatb NREL's Annual Technology Baseline data
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Extract the NREL ATB data to a raw dataframe
2 participants