Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: io/formats/html.py: refactor #22726

Merged
merged 12 commits into from
Nov 18, 2018
Merged

Conversation

simonjayhawkins
Copy link
Member

@simonjayhawkins simonjayhawkins commented Sep 16, 2018

@pep8speaks
Copy link

Hello @simonjayhawkins! Thanks for submitting the PR.

@gfyoung gfyoung added IO HTML read_html, to_html, Styler.apply, Styler.applymap Clean Testing pandas testing functions or related to the test suite Refactor Internal refactoring of code and removed Testing pandas testing functions or related to the test suite Clean labels Sep 16, 2018
@gfyoung
Copy link
Member

gfyoung commented Sep 16, 2018

@simonjayhawkins : Not sure where this was discussed before, but could you explain in your PR description why making indent a stateful entity makes sense than leaving it as a parameter to individual methods?

@simonjayhawkins
Copy link
Member Author

@gfyoung : good question. why change?

at the moment this was just an exercise to see if the current behavior could be implemented in a simpler way. why?

was thinking what would the code look like if the common behavior of to_string()/DataFrameFormatter and to_html()/HTMLFormatter were inherited from TableFormatter.

was also thinking how would the code look if to_html(notebook=True) were a subclass of HTMLFormatter, say NotebookFormatter and to_html(notebook=False) were also a subclass, say ToHTMLFormatter.

i think this would allow easier implementation of some of the enhancements commonly requested for the to_html() method without making potentially breaking changes to the notebook display code path.
for example, #22692 just in the last few days,

with regard to fine grain control of the indentation on a method level, another subclass of HTMLFormatter maybe. but that is not needed at the moment as the current indentation behavior can be replicated without it.

i'll report back when i have explored this avenue further.

@gfyoung
Copy link
Member

gfyoung commented Sep 16, 2018

i'll report back when i have explored this avenue further.

Sounds good. Might have preferred that an issue be filed first, but admittedly, the discussion would have been hard to carry without something concrete to analyse.

@simonjayhawkins
Copy link
Member Author

quote from #20341

... PR splits the various formatters into different files so that it's easier to refactor a specific formatter.

@simonjayhawkins simonjayhawkins changed the title CLN: io/formats/html.py: refactor indentation CLN: io/formats/html.py: refactor Sep 17, 2018
@simonjayhawkins
Copy link
Member Author

@gfyoung : i've reverted the changes to indentation for the moment and added some simpler refactorings to start.

@@ -79,7 +79,7 @@ def _write_cell(self, s, kind='td', indent=0, tags=None):
self.write(u'{start}{rs}</{kind}>'
.format(start=start_tag, rs=rs, kind=kind), indent)

def write_tr(self, line, indent=0, indent_delta=4, header=False,
def write_tr(self, line, indent=0, indent_delta=0, header=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default argument is an API change. Could you explain why you are doing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the only public method of HTMLFormatter should be write_result().

The other methods are currently a mix of private and public. There does not seem to be any consistency.

If write_tr should be private, the API change would not be an issue. The indent_delta is not left to the default by any calls in the current implementation. indent_delta=0 would be a cleaner default.

Following on from the discussion regarding making indent a stateful entity rather than leaving it as a parameter to individual methods: The write_tr method accesses the bold_rows instance variable. The value should probably be passed through the method's nindex_levels parameter which was probably the intention of adding that parameter to the method.

The other HTML formatting methods write, write_th, write_td and _write_cell do not access instance variables with the exception that write needs access to elements which effectively the output buffer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should have said: the use of bold_rows in write_tr is why I stopped refactoring at that point for the moment. I'm not sure how many changes should be in one PR.

I've added some additional tests, but not all paths can be tested until the fixes for #22655 have been implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is ok, as its internal. I agree its less that clear what is external / internal (and we don't usually use _ prefixed). If you want to indicate in the doc-string of an external API would be ok.

@jreback
Copy link
Contributor

jreback commented Sep 18, 2018

does this actually change any output?

@simonjayhawkins
Copy link
Member Author

simonjayhawkins commented Sep 18, 2018

@jreback : Are you referring to the comment regarding indent_delta, or the removal of the unreachable code and subsequent refactoring? The answer is no and no.

I've added some additional tests, but not all paths can be tested until the fixes for #22655 have been implemented.

All the tests in pandas\tests\io\formats\test_to_html.py currently test the full df.to_html() path. Adding additional tests is quite time consuming. I still need to add some more tests to #22655 before I'm happy that the permutations of input parameters with DataFrame truncation has been fully tested.

maybe the way forward is to add unit tests for the HTMLFormatter methods?

@jreback jreback added this to the 0.24.0 milestone Sep 18, 2018
@jreback
Copy link
Contributor

jreback commented Sep 18, 2018

lgtm. @WillAyd if you'd have a quick look.

pandas/tests/io/formats/test_to_html.py Outdated Show resolved Hide resolved
pandas/tests/io/formats/test_to_html.py Outdated Show resolved Hide resolved
_has_names(index))


@pytest.fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be used outside this module? If not then can probably just be a global function instead of a fixture. If so then I think would make more sense to move fixtures to a conftest for this package

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not be of use outside this module. It uses read_file which uses the datapath fixture. So does it not need to be a fixture or am I doing something wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think I misread this originally. Are you expecting to use this fixture in any of the test methods or just in the expected_output fixture? If the latter it might just make more sense to have this all in one fixture so that this doesn't get unintentionally used later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea if the methodology used here, if approved, is then used to move all the inline HTML from the other tests into separate files. Would be easier to check the expected output of other tests. Would be easier to re-use the expect outputs of other tests. And would allow much larger outputs/comprehensive tests to be used in future tests without cluttering the test file.

I'll move it all into one fixture for the moment. It can always be separated out again in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #22655, I've added a issue specific test as well as the parametric regression tests. So I am already using expected_html in more than one test and will need to leave it as it is.



@pytest.fixture
def expected_output(expected_html):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as for expected_html

@codecov
Copy link

codecov bot commented Sep 19, 2018

Codecov Report

Merging #22726 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22726      +/-   ##
==========================================
+ Coverage   92.23%   92.26%   +0.02%     
==========================================
  Files         161      161              
  Lines       51408    51387      -21     
==========================================
- Hits        47416    47410       -6     
+ Misses       3992     3977      -15
Flag Coverage Δ
#multiple 90.65% <100%> (+0.02%) ⬆️
#single 42.32% <0%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/io/formats/html.py 97.68% <100%> (+4.49%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1250500...0905033. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Nov 6, 2018

can you merge master on this.

@jreback
Copy link
Contributor

jreback commented Nov 6, 2018

@WillAyd comments?

@WillAyd
Copy link
Member

WillAyd commented Nov 6, 2018

Been a little tight for time recently but should be able to review again before the end of the week. @simonjayhawkins if you could merge master in the interim would be helpful

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good - one question around the handling of MI but overall a lot simpler. Nice job!

row.append(single_column_table(self.columns.names))
else:
row.append('')
style = "text-align: {just};".format(just=self.fmt.justify)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatively nuanced but in the case of a MultiIndex isn't this formatting the very first column row different from the rest? If so is that accounted for in the refactor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if clause is unreachable since _column_header() was only called from the else clause of another if isinstance(self.columns, ABCMultiIndex): so the styling is not accounted for in this refactor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense thanks for clarifying

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback back to you

@jreback
Copy link
Contributor

jreback commented Nov 18, 2018

thanks @simonjayhawkins

@simonjayhawkins simonjayhawkins deleted the indentation branch November 19, 2018 00:14
thoo added a commit to thoo/pandas that referenced this pull request Nov 19, 2018
…fixed

* upstream/master: (46 commits)
  DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774)
  BUG: Don't warn if default conflicts with dialect (pandas-dev#23775)
  BUG: Fixing memory leaks in read_csv (pandas-dev#23072)
  TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771)
  STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370)
  ENH: between_time, at_time accept axis parameter (pandas-dev#21799)
  PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772)
  CLN: io/formats/html.py: refactor (pandas-dev#22726)
  API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466)
  TST: Add test case for GH14080 for overflow exception (pandas-dev#23762)
  BUG: Don't extract header names if none specified (pandas-dev#23703)
  BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618)
  DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621)
  PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634)
  TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757)
  REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761)
  DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650)
  ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662)
  DOC: Improve error message to show correct order (pandas-dev#23652)
  ENH: Improve error message for empty object array (pandas-dev#23718)
  ...
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants