-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up FIPS codes and use same method for ZIP codes #1476
Conversation
Add a string cleaning function `pudl.helpers.zero_pad_numeric_string()` which can be used to standardize columns that are supposed to contain numeric codes of fixed width stored as strings. These codes are particularly susceptible to corruption through data type conversions. This function replaces the similar `zero_pad_zips` function, and can be used to clean up zip codes and FIPS codes that need to be all numeric and have leading zeroes to be valid. The FIPS codes being cleaned up here are the ones associated with coalmines reported in the `fuel_receipts_costs_eia923` table. Separately: Turn on processing of the eia860m by default.
Codecov Report
@@ Coverage Diff @@
## dev #1476 +/- ##
==========================================
+ Coverage 83.37% 83.37% +0.01%
==========================================
Files 64 64
Lines 6938 6935 -3
==========================================
- Hits 5784 5782 -2
+ Misses 1154 1153 -1
Continue to review full report at Codecov.
|
for more information, see https://pre-commit.ci
Removed derelict ZIP code columns from `fields.py` which had already been replaced in the renaming of EIA spreadsheet columns with standard `zip_code` and `zip_code_4`. Updated the pre-commit.ci configuration to skip local repo hooks that need additional software installed to run (nb-clear-outputs and pytest). Pre-commit autoupdates will also not be made as PRs against the `dev` branch. Closes #550
…ve/pudl into fips-check-constraint
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
* Warning filters to pytest configuration in tox.ini * More exception types for Zenodo tests to xfail on * Defensive assertion in zero_pad_numeric_string()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Love the unit tests for the new helper function.
Add a string cleaning function
pudl.helpers.zero_pad_numeric_string()
which can beused to standardize columns that are supposed to contain numeric codes of fixed width
stored as strings. These codes are particularly susceptible to corruption through data
type conversions.
This function replaces the similar
zero_pad_zips
function, and can be used to clean upzip codes and FIPS codes that need to be all numeric and have leading zeroes to be
valid.
The FIPS codes being cleaned up here are the ones associated with coalmines reported in
the
fuel_receipts_costs_eia923
table.Separately:
accumulated_depreciation_ferc1
table, which we're no longer processing. Really this should be removed from the list of working tables.Settings
objects.