Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update transform documentation #939

Merged
merged 9 commits into from
Mar 19, 2021
Merged

Conversation

aesharpe
Copy link
Member

@aesharpe aesharpe commented Mar 4, 2021

This branch adds more detail to the doc strings that transform each of the data tables. These can be linked in the documentation to show what is happening to each dataset to change it from the raw version.

What needs to be done:

I added a chunk of detail from the eia861.py transform module to the __init__.py file.
This section in particular needs to be standardized so that it applies to all of the data:

At the end of the main coordinating transform() function, every column that remains in each of the transformed dataframes should correspond to a column that will exist in the database and be associated with the EIA datasets, which means it is also part of the EIA column namespace.

I'm also curious whether this statement pertains to all the data or just EIA:

This information is important for the step after the intra-table transformations during which the collection of EIA tables is normalized as a whole.

@aesharpe aesharpe added the eia861 Anything having to do with EIA Form 861 label Mar 4, 2021
@aesharpe aesharpe self-assigned this Mar 4, 2021
Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I think the content here is good, though I'm concerned that updating all of these docstrings that often say the same things repeatedly will be tedious and not attended to (like replacing . with NA... which happens in a bajillion EIA tables), and having that documentation get out of sync with the contents of the functions will be very confusing.

All of the bulletized lists (of which there are many) need to be re-formatted to be valid RST. They need to have a blank line separating them from other text blocks. You can (and probably should) try building the docs before you submit a doc-heavy PR. You can do this with:

tox -rve docs

If the formatting is invalid, it'll give you an error and a pointer to where the (first) error is. Here's a cheat-sheet on formatting RST lists.

Comment on lines 346 to 347
lambda x: x.replace(
offset_fixes[x.name]) if x.name in offset_fixes else x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check and make sure that your editor is wrapping lines at 88 characters, and not at 79? I've seen a few of these auto-formatting changes come up.

Comment on lines +433 to +434
df["utc_offset_code"] = df.pipe(
_standardize_offset_codes, OFFSET_CODE_FIXES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another line that appears to have been auto formatted to 79 instead of 88 characters.

Comment on lines 18 to 22
Transformations include:
- Replace . values with NA.
- Convert pre-2012 ownership percentages to proportions to match
post-2012 reporting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a blank line between the list and the preceding text.

Comment on lines 86 to 87
Transformations include:
- Replace . values with NA.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a blank line between list and preceding text -- as do all of the other bulletized lists.

@zaneselvans zaneselvans added docs Documentation for users and contributors. and removed eia861 Anything having to do with EIA Form 861 labels Mar 18, 2021
@zaneselvans zaneselvans changed the base branch from main to sprint31 March 18, 2021 01:50
@codecov
Copy link

codecov bot commented Mar 18, 2021

Codecov Report

Merging #939 (f7377a5) into sprint31 (b1869c6) will decrease coverage by 0.15%.
The diff coverage is 83.33%.

Impacted file tree graph

@@             Coverage Diff              @@
##           sprint31     #939      +/-   ##
============================================
- Coverage     83.27%   83.12%   -0.15%     
============================================
  Files            44       45       +1     
  Lines          5498     5526      +28     
============================================
+ Hits           4578     4593      +15     
- Misses          920      933      +13     
Impacted Files Coverage Δ
src/pudl/transform/eia.py 97.15% <ø> (ø)
src/pudl/transform/eia860.py 96.81% <ø> (ø)
src/pudl/transform/epacems.py 70.59% <ø> (ø)
src/pudl/transform/eia923.py 88.12% <60.00%> (ø)
src/pudl/transform/eia861.py 96.91% <100.00%> (ø)
src/pudl/transform/ferc1.py 91.18% <100.00%> (ø)
src/pudl/transform/ferc714.py 97.83% <100.00%> (ø)
src/pudl/convert/datapkg_to_rst.py 53.57% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1869c6...f7377a5. Read the comment docs.

one dataframe and include an ``operational_status`` to indicate which tab
the record came from. We use ``operational_status`` to parse the pre 2009
files as well.
There are three tabs that the generator records come from (proposed, existing, retired). Pre 2009, the existing and retired data are lumped together under a single generator file with one tab. We pull each tab into one dataframe and include an ``operational_status`` to indicate which tab the record came from. We use ``operational_status`` to parse the pre 2009 files as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Now it seems like line wrapping has been disabled entirely -- we want to automatically wrap at 88 characters. These very long lines will cause errors and make reading the docstrings difficult in some contexts.

@zaneselvans zaneselvans merged commit cfea163 into sprint31 Mar 19, 2021
@zaneselvans zaneselvans deleted the transform-documentation branch March 20, 2021 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation for users and contributors.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants