Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up FIPS codes and use same method for ZIP codes #1476

Merged
merged 11 commits into from
Feb 20, 2022

Conversation

zaneselvans
Copy link
Member

@zaneselvans zaneselvans commented Feb 17, 2022

Add a string cleaning function pudl.helpers.zero_pad_numeric_string() which can be
used to standardize columns that are supposed to contain numeric codes of fixed width
stored as strings. These codes are particularly susceptible to corruption through data
type conversions.

This function replaces the similar zero_pad_zips function, and can be used to clean up
zip codes and FIPS codes that need to be all numeric and have leading zeroes to be
valid.

The FIPS codes being cleaned up here are the ones associated with coalmines reported in
the fuel_receipts_costs_eia923 table.

Separately:

  • Turn on processing of the eia860m by default.
  • Avoid validating the accumulated_depreciation_ferc1 table, which we're no longer processing. Really this should be removed from the list of working tables.
  • Set up automatic pre-commit hooks checking & fixing in CI
  • A bunch of little linting issues that the above automatic pre-commit hooks running found.
  • Ignore a bunch of 3rd party warnings that come up in the tests. There are more to ignore.
  • Updated our EIA ETL debugging notebook to use the new Settings objects.

Add a string cleaning function `pudl.helpers.zero_pad_numeric_string()` which can be
used to standardize columns that are supposed to contain numeric codes of fixed width
stored as strings. These codes are particularly susceptible to corruption through data
type conversions.

This function replaces the similar `zero_pad_zips` function, and can be used to clean up
zip codes and FIPS codes that need to be all numeric and have leading zeroes to be
valid.

The FIPS codes being cleaned up here are the ones associated with coalmines reported in
the `fuel_receipts_costs_eia923` table.

Separately: Turn on processing of the eia860m by default.
@codecov
Copy link

codecov bot commented Feb 17, 2022

Codecov Report

Merging #1476 (894522e) into dev (ace5811) will increase coverage by 0.01%.
The diff coverage is 88.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1476      +/-   ##
==========================================
+ Coverage   83.37%   83.37%   +0.01%     
==========================================
  Files          64       64              
  Lines        6938     6935       -3     
==========================================
- Hits         5784     5782       -2     
+ Misses       1154     1153       -1     
Impacted Files Coverage Δ
src/pudl/metadata/classes.py 83.45% <ø> (ø)
src/pudl/metadata/fields.py 100.00% <ø> (ø)
src/pudl/metadata/resources/ferc1.py 100.00% <ø> (ø)
src/pudl/transform/eia923.py 93.97% <ø> (ø)
src/pudl/helpers.py 84.78% <85.71%> (-0.36%) ⬇️
src/pudl/glue/ferc1_eia.py 80.27% <100.00%> (+0.27%) ⬆️
src/pudl/settings.py 94.63% <100.00%> (ø)
src/pudl/analysis/timeseries_cleaning.py 88.89% <0.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ace5811...894522e. Read the comment docs.

pre-commit-ci bot and others added 4 commits February 17, 2022 23:19
Removed derelict ZIP code columns from `fields.py` which had already been replaced in
the renaming of EIA spreadsheet columns with standard `zip_code` and `zip_code_4`.

Updated the pre-commit.ci configuration to skip local repo hooks that need additional
software installed to run (nb-clear-outputs and pytest). Pre-commit autoupdates will
also not be made as PRs against the `dev` branch.

Closes #550
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@zaneselvans zaneselvans linked an issue Feb 18, 2022 that may be closed by this pull request
4 tasks
* Warning filters to pytest configuration in tox.ini
* More exception types for Zenodo tests to xfail on
* Defensive assertion in zero_pad_numeric_string()
@zaneselvans zaneselvans marked this pull request as ready for review February 18, 2022 15:00
Copy link
Member

@bendnorman bendnorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Love the unit tests for the new helper function.

src/pudl/glue/ferc1_eia.py Outdated Show resolved Hide resolved
src/pudl/helpers.py Outdated Show resolved Hide resolved
@zaneselvans zaneselvans merged commit bf50c65 into dev Feb 20, 2022
@zaneselvans zaneselvans deleted the fips-check-constraint branch February 20, 2022 03:18
This was referenced Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up postal codes
2 participants