-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: io/formats/html.py: refactor #22726
Conversation
Hello @simonjayhawkins! Thanks for submitting the PR.
|
@simonjayhawkins : Not sure where this was discussed before, but could you explain in your PR description why making |
@gfyoung : good question. why change? at the moment this was just an exercise to see if the current behavior could be implemented in a simpler way. why? was thinking what would the code look like if the common behavior of to_string()/DataFrameFormatter and to_html()/HTMLFormatter were inherited from TableFormatter. was also thinking how would the code look if to_html(notebook=True) were a subclass of HTMLFormatter, say NotebookFormatter and to_html(notebook=False) were also a subclass, say ToHTMLFormatter. i think this would allow easier implementation of some of the enhancements commonly requested for the to_html() method without making potentially breaking changes to the notebook display code path. with regard to fine grain control of the indentation on a method level, another subclass of HTMLFormatter maybe. but that is not needed at the moment as the current indentation behavior can be replicated without it. i'll report back when i have explored this avenue further. |
Sounds good. Might have preferred that an issue be filed first, but admittedly, the discussion would have been hard to carry without something concrete to analyse. |
quote from #20341
|
@gfyoung : i've reverted the changes to indentation for the moment and added some simpler refactorings to start. |
@@ -79,7 +79,7 @@ def _write_cell(self, s, kind='td', indent=0, tags=None): | |||
self.write(u'{start}{rs}</{kind}>' | |||
.format(start=start_tag, rs=rs, kind=kind), indent) | |||
|
|||
def write_tr(self, line, indent=0, indent_delta=4, header=False, | |||
def write_tr(self, line, indent=0, indent_delta=0, header=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the default argument is an API change. Could you explain why you are doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the only public method of HTMLFormatter
should be write_result()
.
The other methods are currently a mix of private and public. There does not seem to be any consistency.
If write_tr
should be private, the API change would not be an issue. The indent_delta
is not left to the default by any calls in the current implementation. indent_delta=0
would be a cleaner default.
Following on from the discussion regarding making indent
a stateful entity rather than leaving it as a parameter to individual methods: The write_tr
method accesses the bold_rows
instance variable. The value should probably be passed through the method's nindex_levels
parameter which was probably the intention of adding that parameter to the method.
The other HTML formatting methods write
, write_th
, write_td
and _write_cell
do not access instance variables with the exception that write
needs access to elements
which effectively the output buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should have said: the use of bold_rows
in write_tr
is why I stopped refactoring at that point for the moment. I'm not sure how many changes should be in one PR.
I've added some additional tests, but not all paths can be tested until the fixes for #22655 have been implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is ok, as its internal. I agree its less that clear what is external / internal (and we don't usually use _ prefixed). If you want to indicate in the doc-string of an external API would be ok.
does this actually change any output? |
@jreback : Are you referring to the comment regarding I've added some additional tests, but not all paths can be tested until the fixes for #22655 have been implemented. All the tests in maybe the way forward is to add unit tests for the |
lgtm. @WillAyd if you'd have a quick look. |
_has_names(index)) | ||
|
||
|
||
@pytest.fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be used outside this module? If not then can probably just be a global function instead of a fixture. If so then I think would make more sense to move fixtures to a conftest for this package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not be of use outside this module. It uses read_file
which uses the datapath
fixture. So does it not need to be a fixture or am I doing something wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think I misread this originally. Are you expecting to use this fixture in any of the test methods or just in the expected_output
fixture? If the latter it might just make more sense to have this all in one fixture so that this doesn't get unintentionally used later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a good idea if the methodology used here, if approved, is then used to move all the inline HTML from the other tests into separate files. Would be easier to check the expected output of other tests. Would be easier to re-use the expect outputs of other tests. And would allow much larger outputs/comprehensive tests to be used in future tests without cluttering the test file.
I'll move it all into one fixture for the moment. It can always be separated out again in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #22655, I've added a issue specific test as well as the parametric regression tests. So I am already using expected_html
in more than one test and will need to leave it as it is.
|
||
|
||
@pytest.fixture | ||
def expected_output(expected_html): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as for expected_html
Codecov Report
@@ Coverage Diff @@
## master #22726 +/- ##
==========================================
+ Coverage 92.23% 92.26% +0.02%
==========================================
Files 161 161
Lines 51408 51387 -21
==========================================
- Hits 47416 47410 -6
+ Misses 3992 3977 -15
Continue to review full report at Codecov.
|
can you merge master on this. |
@WillAyd comments? |
Been a little tight for time recently but should be able to review again before the end of the week. @simonjayhawkins if you could merge master in the interim would be helpful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good - one question around the handling of MI but overall a lot simpler. Nice job!
row.append(single_column_table(self.columns.names)) | ||
else: | ||
row.append('') | ||
style = "text-align: {just};".format(just=self.fmt.justify) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatively nuanced but in the case of a MultiIndex isn't this formatting the very first column row different from the rest? If so is that accounted for in the refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
clause is unreachable since _column_header()
was only called from the else
clause of another if isinstance(self.columns, ABCMultiIndex):
so the styling is not accounted for in this refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense thanks for clarifying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback back to you
thanks @simonjayhawkins |
…fixed * upstream/master: (46 commits) DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774) BUG: Don't warn if default conflicts with dialect (pandas-dev#23775) BUG: Fixing memory leaks in read_csv (pandas-dev#23072) TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771) STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370) ENH: between_time, at_time accept axis parameter (pandas-dev#21799) PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772) CLN: io/formats/html.py: refactor (pandas-dev#22726) API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466) TST: Add test case for GH14080 for overflow exception (pandas-dev#23762) BUG: Don't extract header names if none specified (pandas-dev#23703) BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618) DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621) PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634) TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757) REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761) DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650) ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662) DOC: Improve error message to show correct order (pandas-dev#23652) ENH: Improve error message for empty object array (pandas-dev#23718) ...
git diff upstream/master -u -- "*.py" | flake8 --diff