-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to 2022 data #322
Update to 2022 data #322
Conversation
src/oge/data_cleaning.py
Outdated
@@ -1002,7 +1018,8 @@ def clean_cems(year: int, small: bool, primary_fuel_table, subplant_emission_fac | |||
) | |||
|
|||
# manually remove steam-only units | |||
cems = manually_remove_steam_units(cems) | |||
# NOTE(greg): disabling this for the 2022 data release | |||
# cems = manually_remove_steam_units(cems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we stop removing these units in 2022?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this is unclear! This is not just for 2022, we are stopping using this method altogether. See note about this in the main PR description. I'll delete this whole line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it will apply to all years from now on. My question is why do we stop removing steam-only units. Do we feel confident estimating emissions from these generators moving forward?
# Cut off emissions at 9 hours after UTC year | ||
emissions = emissions[: f"{self.year+1}-01-01 09:00:00+00:00"] | ||
# Cut off emissions at 8 hours after UTC year | ||
emissions = emissions[: f"{self.year+1}-01-01 08:00:00+00:00"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it work? I thought the emission would be hourly over a full year in UTC time. Are we doing this in Pacific Time instead. And why was it nine hours before, did we have Alaska in 2021 and not 2022?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is just saying that we only want to do the calculations through midnight pacific time on 12/31, which would be 8am UTC time on Jan 1. Not sure why this was previously set to 9 hours, since we do not do the consumed calculations for AK or HI. I'm also not sure why this didn't raise a key error in previous years, but it does now, since 2023-01-01 9:00 does not exist in the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also set the lower bound? So, it starts on 2022-01-01 Pacific Time, i.e, do:
emissions = emissions[f"{self.year}-01-01 08:00:00+00:00": f"{self.year+1}-01-01 08:00:00+00:00"]
Right now, we probably have the last 8 hours in 2021/12/31.
Updates since last review: (in general, these are aimed at reducing the number of missing values in the output data)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Purpose
This PR updates the OGE pipeline to work with 2022 data, and also updates the manual tables.
What the code is doing
Updates default years to 2022 (Fixes CAR-3399)
Updates source of eGRID2020 data to the v2 file (not used in the pipeline, only for comparison)
Updates reference tables (see #260) (Fixes CAR-3349)
ba_reference
: no update to the FERC table, new retirements of GLHB and GRIF according to EIAdefault_gross_to_net_ratios.csv
: No updateseGRID2020_crosswalk_of_EIA_ID_to_EPA_ID.csv
: updated based on eGRID2021. One new plant added to listemission_factors_for_co2_ch4_n2o
: No updates to AP-42 or IPCCemission_factors_for_nox
: No updates to AP-42 or IPCC. Added several new factors for boiler configurations that were not previously added, but are in the 2022 data.emission_factors_for_so2
: No updates to AP-42 or IPCC, Added several new factors for boiler configurations that were not previously added, but are in the 2022 data.energy_source_groups
: no changes based on pudl metadataepa_eia_crosswalk_manual
: used notebook to identify new additions to tablegeothermal_emission_factors
: no changes to source dataipcc_gwp
: most current report is still AR6physical_ba
plants_not_connected_to_grid
: no changes based on eGRID2021steam_units_to_remove
: Not updated since steam units no longer being removed.updated_oth_energy_source_codes
: ran notebook, no new matches neededutility_name_ba_code_map
: ran notebook, added several new maps, sorted alphabetically.load_data.py
epa_eia_crosswalk_manual
would not be reflected. Now, whenever cems data is being loaded, we runupdate_epa_to_eia_map()
to update theplant_id_eia
codesdata_cleaning.py
steam_units_to_remove
manual table, which didn't seem worth it if we are not going to be dropping these units in the future.pudl.analysis.allocate_gen_fuel_by_generator_energy_source()
instead of loading the table from pudl. This allows us to see data quality warnings generated during running that part of the pipeline which will help with data quality checks.eia930.py
emissions.py
annual_avg_fuel_sulfur_content
. Now, in the case that there is no sulfur content data available for a fuel in a year, the pipeline will check the reported sulfur contents from the previous year to see if it can fill in the annual average value. This means for the 2022 pipeline, the JF generators without a specified sulfur content will use the average JF sulfur content from 2021 (looking at multiple years, this sulfur content does not seem to change from year to year so this seems like a reasonable backstop). In implementing this fix, I split an existing function into two components.validation.py
test_for_negative_values
check does not passcheck_for_complete_timeseries
check tocheck_for_complete_hourly_timeseries
, and adds acheck_for_complete_monthly_timeseries
check. This new check ensures that monthly-resolution data contains all 12 months of data.Testing
Running the pipeline for 2022 (not yet complete)
Usage Example/Visuals
How the code can be used and/or images of any graphs, tables or other visuals (not always applicable).
Review estimate
How long will it take for reviewers and observers to understand this code change?
Future work
The following warnings were raised when running the pipeline:
pudl.analysis.allocate_gen_fuel
is dropping/adding data catalyst-cooperative/pudl#3165Checklist
ruff