-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a data_maturity
label for EIA data
#1855
Conversation
these columns being anywhere in the non-m eia860 enables the columns to exist in the tables without the eia860m being loaded. Moving them from the generators -> generators_existing is slightly aspirational. We hope they'll add them in their next non-monthly data updates so this just adds the empties here where we hope they'll show up later.
Codecov Report
@@ Coverage Diff @@
## dev #1855 +/- ##
=======================================
+ Coverage 83.0% 83.2% +0.1%
=======================================
Files 65 65
Lines 7327 7518 +191
=======================================
+ Hits 6088 6255 +167
- Misses 1239 1263 +24
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
This looks like some hackish stuff happening. Can you explain more in the PR comment what all you had to do to make it work? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a coding table that explains what the different levels of data_maturity
mean with some examples (maybe even exhaustive lists since there's not much right now).
Don't forget to update release notes!
annual release and should be used with caution. :pr:`1834` | ||
annual release and should be used with caution. We also integrated a ``data_maturity`` | ||
column and related ``data_maturities`` table into most of the EIA data tables in | ||
order to alter users to the level of finality of the data. :pr:`1834` :pr:`1855` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alter => alert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you've set up the foreign key generation rule for the data_maturity
columns, the ENUM constraint on that column is duplicative.
We added the early release eia data in #1834. This PR adds a
data_maturity
column into the EIA data tables to communicate with users about where the data comes from in the hope of communicating how much trust they should place in the permanence of the data.Things that happened in here:
add_data_maturity
method into the standard excel extractor that 🥁 adds adata_maturity
column. This is deployed inprocess_raw
(Question: the generic version ofprocess_raw
is customized in all of the datasets... shouldadd_data_maturity
be called directly inGenericExtractor.extract
instead?)utility_id_eia
. This PR actually sets this up in a non-hacky way (imo). For each entity inENTITIES
we add anot_to_drop_cols
list which is employed in the harvesting process to effectively avoid dropping those columns during harvesting.boiler_fuel_eia923
without some additional work (see enable non-data columns in aggregated boiler_fuel_eia923 table #1847)