-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update transform documentation #939
Conversation
…o doc strings in eia861 transform module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I think the content here is good, though I'm concerned that updating all of these docstrings that often say the same things repeatedly will be tedious and not attended to (like replacing . with NA... which happens in a bajillion EIA tables), and having that documentation get out of sync with the contents of the functions will be very confusing.
All of the bulletized lists (of which there are many) need to be re-formatted to be valid RST. They need to have a blank line separating them from other text blocks. You can (and probably should) try building the docs before you submit a doc-heavy PR. You can do this with:
tox -rve docs
If the formatting is invalid, it'll give you an error and a pointer to where the (first) error is. Here's a cheat-sheet on formatting RST lists.
src/pudl/transform/ferc714.py
Outdated
lambda x: x.replace( | ||
offset_fixes[x.name]) if x.name in offset_fixes else x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check and make sure that your editor is wrapping lines at 88 characters, and not at 79? I've seen a few of these auto-formatting changes come up.
df["utc_offset_code"] = df.pipe( | ||
_standardize_offset_codes, OFFSET_CODE_FIXES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another line that appears to have been auto formatted to 79 instead of 88 characters.
src/pudl/transform/eia860.py
Outdated
Transformations include: | ||
- Replace . values with NA. | ||
- Convert pre-2012 ownership percentages to proportions to match | ||
post-2012 reporting. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a blank line between the list and the preceding text.
src/pudl/transform/eia860.py
Outdated
Transformations include: | ||
- Replace . values with NA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a blank line between list and preceding text -- as do all of the other bulletized lists.
Codecov Report
@@ Coverage Diff @@
## sprint31 #939 +/- ##
============================================
- Coverage 83.27% 83.12% -0.15%
============================================
Files 44 45 +1
Lines 5498 5526 +28
============================================
+ Hits 4578 4593 +15
- Misses 920 933 +13
Continue to review full report at Codecov.
|
src/pudl/transform/eia860.py
Outdated
one dataframe and include an ``operational_status`` to indicate which tab | ||
the record came from. We use ``operational_status`` to parse the pre 2009 | ||
files as well. | ||
There are three tabs that the generator records come from (proposed, existing, retired). Pre 2009, the existing and retired data are lumped together under a single generator file with one tab. We pull each tab into one dataframe and include an ``operational_status`` to indicate which tab the record came from. We use ``operational_status`` to parse the pre 2009 files as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Now it seems like line wrapping has been disabled entirely -- we want to automatically wrap at 88 characters. These very long lines will cause errors and make reading the docstrings difficult in some contexts.
This branch adds more detail to the doc strings that transform each of the data tables. These can be linked in the documentation to show what is happening to each dataset to change it from the raw version.
What needs to be done:
I added a chunk of detail from the
eia861.py
transform module to the__init__.py
file.This section in particular needs to be standardized so that it applies to all of the data:
At the end of the main coordinating transform() function, every column that remains in each of the transformed dataframes should correspond to a column that will exist in the database and be associated with the EIA datasets, which means it is also part of the EIA column namespace.
I'm also curious whether this statement pertains to all the data or just EIA:
This information is important for the step after the intra-table transformations during which the collection of EIA tables is normalized as a whole.