Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plants_utils_eia860 introduces NA values which get dropped #1700

Open
zaneselvans opened this issue Jun 20, 2022 · 1 comment
Open

plants_utils_eia860 introduces NA values which get dropped #1700

zaneselvans opened this issue Jun 20, 2022 · 1 comment
Labels
data-repair Interpolating or extrapolating data that we don't actually have. output Exporting data from PUDL into other platforms or interchange formats.

Comments

@zaneselvans
Copy link
Member

zaneselvans commented Jun 20, 2022

In some cases, plants or utilities may have missing attributes in their entity tables. E.g. there are several thousands plants that have no state. This can create issues when we're constructing the denormalized output tables, and result in data rows getting dropped, maybe unnecessarily.

For example, in fuel_receipts_costs_eia923() after merging in the results of plants_utils_eia860 there are 11,040 records that lack a utility_id_eia, most of which also happen to lack a state value. To ensure that the output table is usable and has all the IDs that downstream data products expect, these records are dropped, but this means that 11,040 FRC records with fuel delivery data are missing from the output table, even though they do have the plant, date, and fuel type information that's more fundamental to this table.

In creating the database views which replace the output tables, we should be more careful with these kinds of merges, and ensure that we aren't introducing null values we don't need to introduce, and are keeping as many of these data records as we can.

This caused problems / confusion in issue #1343

@zaneselvans zaneselvans added output Exporting data from PUDL into other platforms or interchange formats. data-repair Interpolating or extrapolating data that we don't actually have. labels Jun 20, 2022
@katie-lamb
Copy link
Member

Is it possible to try and fill in the state values? I've seen this problem crop up a few times where date_merge (or a normal merge) is performed on the output tables and the subsequent dropna removes a lot of previously valid records. Seems like we have a few of these imputation problems we could work on in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-repair Interpolating or extrapolating data that we don't actually have. output Exporting data from PUDL into other platforms or interchange formats.
Projects
None yet
Development

No branches or pull requests

2 participants