Update transform documentation #939

aesharpe · 2021-03-04T20:16:13Z

This branch adds more detail to the doc strings that transform each of the data tables. These can be linked in the documentation to show what is happening to each dataset to change it from the raw version.

What needs to be done:

I added a chunk of detail from the eia861.py transform module to the __init__.py file.
This section in particular needs to be standardized so that it applies to all of the data:

At the end of the main coordinating transform() function, every column that remains in each of the transformed dataframes should correspond to a column that will exist in the database and be associated with the EIA datasets, which means it is also part of the EIA column namespace.

I'm also curious whether this statement pertains to all the data or just EIA:

This information is important for the step after the intra-table transformations during which the collection of EIA tables is normalized as a whole.

…rm module

…o doc strings in eia861 transform module.

zaneselvans

Hey, I think the content here is good, though I'm concerned that updating all of these docstrings that often say the same things repeatedly will be tedious and not attended to (like replacing . with NA... which happens in a bajillion EIA tables), and having that documentation get out of sync with the contents of the functions will be very confusing.

All of the bulletized lists (of which there are many) need to be re-formatted to be valid RST. They need to have a blank line separating them from other text blocks. You can (and probably should) try building the docs before you submit a doc-heavy PR. You can do this with:

tox -rve docs

If the formatting is invalid, it'll give you an error and a pointer to where the (first) error is. Here's a cheat-sheet on formatting RST lists.

zaneselvans · 2021-03-18T01:42:53Z

src/pudl/transform/ferc714.py

+        lambda x: x.replace(
+            offset_fixes[x.name]) if x.name in offset_fixes else x


Can you check and make sure that your editor is wrapping lines at 88 characters, and not at 79? I've seen a few of these auto-formatting changes come up.

zaneselvans · 2021-03-18T01:43:35Z

src/pudl/transform/ferc714.py

+    df["utc_offset_code"] = df.pipe(
+        _standardize_offset_codes, OFFSET_CODE_FIXES)


Here's another line that appears to have been auto formatted to 79 instead of 88 characters.

zaneselvans · 2021-03-18T01:44:30Z

src/pudl/transform/eia860.py

+    Transformations include:
+    - Replace . values with NA.
+    - Convert pre-2012 ownership percentages to proportions to match
+      post-2012 reporting.
+


Needs a blank line between the list and the preceding text.

zaneselvans · 2021-03-18T01:45:06Z

src/pudl/transform/eia860.py

+    Transformations include:
+    - Replace . values with NA.


Needs a blank line between list and preceding text -- as do all of the other bulletized lists.

…h to 88

codecov · 2021-03-18T18:56:08Z

Codecov Report

Merging #939 (f7377a5) into sprint31 (b1869c6) will decrease coverage by 0.15%.
The diff coverage is 83.33%.

@@             Coverage Diff              @@
##           sprint31     #939      +/-   ##
============================================
- Coverage     83.27%   83.12%   -0.15%     
============================================
  Files            44       45       +1     
  Lines          5498     5526      +28     
============================================
+ Hits           4578     4593      +15     
- Misses          920      933      +13

Impacted Files	Coverage Δ
src/pudl/transform/eia.py	`97.15% <ø> (ø)`
src/pudl/transform/eia860.py	`96.81% <ø> (ø)`
src/pudl/transform/epacems.py	`70.59% <ø> (ø)`
src/pudl/transform/eia923.py	`88.12% <60.00%> (ø)`
src/pudl/transform/eia861.py	`96.91% <100.00%> (ø)`
src/pudl/transform/ferc1.py	`91.18% <100.00%> (ø)`
src/pudl/transform/ferc714.py	`97.83% <100.00%> (ø)`
src/pudl/convert/datapkg_to_rst.py	`53.57% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1869c6...f7377a5. Read the comment docs.

zaneselvans · 2021-03-18T23:32:23Z

src/pudl/transform/eia860.py

-    one dataframe and include an ``operational_status`` to indicate which tab
-    the record came from. We use ``operational_status`` to parse the pre 2009
-    files as well.
+    There are three tabs that the generator records come from (proposed, existing, retired). Pre 2009, the existing and retired data are lumped together under a single generator file with one tab. We pull each tab into one dataframe and include an ``operational_status`` to indicate which tab the record came from. We use ``operational_status`` to parse the pre 2009 files as well.


Hmm. Now it seems like line wrapping has been disabled entirely -- we want to automatically wrap at 88 characters. These very long lines will cause errors and make reading the docstrings difficult in some contexts.

Austen Sharpe added 5 commits March 3, 2021 13:17

add list of transformation to function doc strings for eia860 transfo…

0b93af6

…rm module

add general transform information to init file, add transform steps t…

59b724a

…o doc strings in eia861 transform module.

add transformation descriptions to eia923 transform module doc-strings

1a41531

add transform details to FERC1 transform module doc strings

2a83d6b

add transformation descriptions to FERC714 transform module doc-strings

49f8d3d

aesharpe added the eia861 Anything having to do with EIA Form 861 label Mar 4, 2021

aesharpe requested a review from zaneselvans March 4, 2021 20:16

aesharpe self-assigned this Mar 4, 2021

Merge branch 'sprint31' into transform-documentation

87efae9

zaneselvans requested changes Mar 18, 2021

View reviewed changes

zaneselvans added docs Documentation for users and contributors. and removed eia861 Anything having to do with EIA Form 861 labels Mar 18, 2021

Fixed a couple of badly formatted lists.

1c6a383

zaneselvans changed the base branch from main to sprint31 March 18, 2021 01:50

add space between title and bullets in doc strings, update line lengt…

5888f5b

…h to 88

zaneselvans reviewed Mar 19, 2021

View reviewed changes

fix line length to hard 88

f7377a5

zaneselvans approved these changes Mar 19, 2021

View reviewed changes

zaneselvans merged commit cfea163 into sprint31 Mar 19, 2021

zaneselvans deleted the transform-documentation branch March 20, 2021 01:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update transform documentation #939

Update transform documentation #939

aesharpe commented Mar 4, 2021

zaneselvans left a comment

zaneselvans Mar 18, 2021

zaneselvans Mar 18, 2021

zaneselvans Mar 18, 2021

zaneselvans Mar 18, 2021

codecov bot commented Mar 18, 2021 •

edited

Loading

zaneselvans Mar 18, 2021

		lambda x: x.replace(
		offset_fixes[x.name]) if x.name in offset_fixes else x

		df["utc_offset_code"] = df.pipe(
		_standardize_offset_codes, OFFSET_CODE_FIXES)

Update transform documentation #939

Update transform documentation #939

Conversation

aesharpe commented Mar 4, 2021

zaneselvans left a comment

Choose a reason for hiding this comment

zaneselvans Mar 18, 2021

Choose a reason for hiding this comment

zaneselvans Mar 18, 2021

Choose a reason for hiding this comment

zaneselvans Mar 18, 2021

Choose a reason for hiding this comment

zaneselvans Mar 18, 2021

Choose a reason for hiding this comment

codecov bot commented Mar 18, 2021 • edited Loading

Codecov Report

zaneselvans Mar 18, 2021

Choose a reason for hiding this comment

codecov bot commented Mar 18, 2021 •

edited

Loading