plants_utils_eia860 introduces NA values which get dropped #1700
Labels
data-repair
Interpolating or extrapolating data that we don't actually have.
output
Exporting data from PUDL into other platforms or interchange formats.
In some cases, plants or utilities may have missing attributes in their entity tables. E.g. there are several thousands plants that have no
state
. This can create issues when we're constructing the denormalized output tables, and result in data rows getting dropped, maybe unnecessarily.For example, in
fuel_receipts_costs_eia923()
after merging in the results ofplants_utils_eia860
there are 11,040 records that lack autility_id_eia
, most of which also happen to lack astate
value. To ensure that the output table is usable and has all the IDs that downstream data products expect, these records are dropped, but this means that 11,040 FRC records with fuel delivery data are missing from the output table, even though they do have the plant, date, and fuel type information that's more fundamental to this table.In creating the database views which replace the output tables, we should be more careful with these kinds of merges, and ensure that we aren't introducing null values we don't need to introduce, and are keeping as many of these data records as we can.
This caused problems / confusion in issue #1343
The text was updated successfully, but these errors were encountered: