Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pandas styling doesn't display for all rows in large dataframes #40913

Open
2 of 3 tasks
trenton3983 opened this issue Apr 13, 2021 · 7 comments
Open
2 of 3 tasks
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap Styler conditional formatting using DataFrame.style

Comments

@trenton3983
Copy link

trenton3983 commented Apr 13, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Update 2

  • The issue seems to be Google Chrome and Microsoft Edge
  • Jupyter Lab in Firefox correctly displays all of the styled rows and correctly renders an output HTML file.
  • The question is then, what are the relevant settings in Chrome and Edge, or why doesn't this work in Chrome or Edge?

Original

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np
from faker import Faker  # conda install -c conda-forge faker or pip install Faker

# for fake names
fake = Faker()

# test data
np.random.seed(365)
rows = 11000

# change 36 or 158 to test where the rows stop appearing
vals = {f'val{i}': np.random.randint(1, 11, size=(rows)) for i in range(1, 36)}
data = {'name': np.random.choice([fake.unique.name() for i in range(158)], size=rows),
        'cat': np.random.randint(1, 4, size=(rows))}
data.update(vals)

df = pd.DataFrame(data)

# used to create the mask for the background color
mean = df.groupby('cat').mean().round(2)

# calculate the mean for each name and cat
cat_mean = df.groupby(['name', 'cat']).mean()


def color(x):
    """Function to apply background color"""
    c1 = 'background-color: green'
    c = '' 
    # compare columns
    mask1 = x.gt(mean)
    # DataFrame with same index and columns names as original filled empty strings
    df1 =  pd.DataFrame(c, index=x.index, columns=x.columns)
    # modify values of df1 column by boolean mask
    df1.iloc[mask1] = c1
    display(df1)

    return df1

# displays the notebook in Jupyter
cat_mean.style.apply(color, axis=None)

# In PyCharm saving rendered styler to file
cm = cat_mean.style.apply(color, axis=None).set_precision(3).render()

with open('cm_test.html', 'w') as f:
    f.write(cm)

Problem description

  • Given a large dataframe, in this case 474 rows x 35 columns, the applied styling does not correctly display for all rows in Jupyter or if saving the file to HTML.
    • enter image description here
  • If the number of rows or columns increases beyond this size, then more rows aren't displayed properly
  • We can see from the styling map, that the rows are correctly mapped with a background color, but it isn't displayed.
    • enter image description here
  • If the number of rows or columns is reduced, then all of the rows display the correct styling.
  • jupyterlab v3.0.11 and pandas v1.2.3
  • In PyCharm 2021.1 (Professional Edition) Build #PY-211.6693.115, built on April 6, 2021 saving the redendered styler to a file has the same result, so this isn't just an issue with Jupyter.
  • This issue is reproducible on two different systems that I have tried.
  • If the shape is reduced to 471 rows × 35 columns or 474 rows × 34 columns, then all rows correctly display the highlighting.
  • Associated Stack Overflow question.

Workaround

Split the DataFrame
  • It's dissatisfying, but splitting the DataFrame, and applying the style will work, since if reduces the overall size.
cat_mean.iloc[:237, :].style.apply(color, axis=None)
cat_mean.iloc[237:, :].style.apply(color, axis=None)
Save to Excel
  • All rows are displayed correctly with the highlight color when saving to Excel
test = cat_mean.style.apply(color, axis=None)
test.to_excel('test.xlsx', engine='openpyxl')

Expected Output

  • All rows should display highlighting.

Output1 of pd.show_versions()

INSTALLED VERSIONS

commit : f2c8480
python : 3.8.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.2.3
numpy : 1.19.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 52.0.0.post20210125
Cython : 0.29.22
pytest : 6.2.3
hypothesis : None
sphinx : 3.5.3
blosc : None
feather : None
xlsxwriter : 1.3.8
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.22.0
pandas_datareader: 0.9.0
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.9.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.5
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

Update 1

  • The following options produced the same results as the original attempts
    • Standard python console creating HTML file
    • PyCharm creating HTML file
    • Jupyter Lab displaying styled dataframe in a cell

Output2 of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.9.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.2.4
numpy : 1.20.0
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 52.0.0.post20210125
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.22.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@trenton3983 trenton3983 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 13, 2021
@trenton3983 trenton3983 changed the title BUG: pandas styling doesn't appear for all rows in large dataframes BUG: pandas styling doesn't display for all rows in large dataframes Apr 14, 2021
@attack68
Copy link
Contributor

This is a duplicate issue. You can read more here: #39400.

Essentially this is a browser limitation issue, and has nothing to do with pandas (see the last two posts in the above thread)

Additionally you can read more about solutions in the upcoming pandas 1.3.0 docs. currently here: https://pandas.pydata.org/pandas-docs/dev/user_guide/style.html#Optimization

Please consider closing this issue if there is nothing more to add.

@attack68 attack68 added Duplicate Report Duplicate issue or pull request Styler conditional formatting using DataFrame.style IO HTML read_html, to_html, Styler.apply, Styler.applymap Closing Candidate May be closeable, needs more eyeballs and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 14, 2021
@trenton3983
Copy link
Author

trenton3983 commented Apr 14, 2021

Thank you for pointing to the other bug. I did not find it when checking for Style related bugs prior to creating my post.

and has nothing to do with pandas

  • Arguably, since pandas is formatting the CSS/HTML in such a non-performant way, this is just as much a pandas issue as a browser issue.
    • Styler Optimization suggests solutions, which imply the styling is not optimized for large dataframes, which is specifically a pandas issue.

@attack68
Copy link
Contributor

yes ok, a valid point. will present a marked departure from current implementation which might be quite extensive, so low priority. but leaving open for now.

@attack68 attack68 reopened this Apr 14, 2021
@attack68 attack68 removed the Closing Candidate May be closeable, needs more eyeballs label Apr 14, 2021
@trenton3983
Copy link
Author

trenton3983 commented Apr 14, 2021

def test(s, props=''):
    t = np.where(s.gt(mean[s.name]), props, '')
    return t

build = lambda x: pd.DataFrame(x, index=cat_mean.index, columns=cat_mean.columns)
cls1 = build(cat_mean.apply(test, props='cls-1 ', axis=0))

test = cat_mean.style.set_table_styles([{'selector': '.cls-1', 'props': [('color', 'white'), ('background-color', 'darkblue')]}]).set_td_classes(cls1)
  • Looking at cls1 we can see the class in the cell
    image

  • But it doesn't get set by with the table styles
    image

@attack68
Copy link
Contributor

@trenton3983 can you post the rendered HTML - tis easier to spot bugs in that.

It might be related to this bug fix: #39317

Unfortunately I was new to pandas at that point and didn't provide a PR that was easily back-ported to fix the 1.2.x versions in their workflow since it was a mixed PR. Meaning, the bug fix might only be 1.3.0 targeting release next month.

Albeit, if you need to get it to work temporarily (locally) you can see the minor loop comprehension fix in that issue.

@trenton3983
Copy link
Author

trenton3983 commented Apr 15, 2021

  • Here is the top of the file and the first row of data.
  • From the previous post, we can see at least the first few values for 'Adriana mcknight' should be highlighted, and have the css class cls-1, but no class= contains cls-1 .
  • Smaller dataframes have styling as follows
    • <td id="T_b3f37_row0_col34_cls-1 " class="data row0 col34 cls-1 " >5.521739</td>
    • Even then, not all values that should have a class update, do.
  • I looked at #39317, and the issues with using .set_table_styles and .set_td_classes seem the same.
    • The only difference is for the large dataframe, no class data is applied anywhere.
<style  type="text/css" >
    #T_9a870_ .cls-1 {
          color: white;
          background-color: darkblue;
    }</style><table id="T_9a870_" ><thead>    <tr>        <th class="blank" ></th>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >val1</th>        <th class="col_heading level0 col1" >val2</th>        <th class="col_heading level0 col2" >val3</th>        <th class="col_heading level0 col3" >val4</th>        <th class="col_heading level0 col4" >val5</th>        <th class="col_heading level0 col5" >val6</th>        <th class="col_heading level0 col6" >val7</th>        <th class="col_heading level0 col7" >val8</th>        <th class="col_heading level0 col8" >val9</th>        <th class="col_heading level0 col9" >val10</th>        <th class="col_heading level0 col10" >val11</th>        <th class="col_heading level0 col11" >val12</th>        <th class="col_heading level0 col12" >val13</th>        <th class="col_heading level0 col13" >val14</th>        <th class="col_heading level0 col14" >val15</th>        <th class="col_heading level0 col15" >val16</th>        <th class="col_heading level0 col16" >val17</th>        <th class="col_heading level0 col17" >val18</th>        <th class="col_heading level0 col18" >val19</th>        <th class="col_heading level0 col19" >val20</th>        <th class="col_heading level0 col20" >val21</th>        <th class="col_heading level0 col21" >val22</th>        <th class="col_heading level0 col22" >val23</th>        <th class="col_heading level0 col23" >val24</th>        <th class="col_heading level0 col24" >val25</th>        <th class="col_heading level0 col25" >val26</th>        <th class="col_heading level0 col26" >val27</th>        <th class="col_heading level0 col27" >val28</th>        <th class="col_heading level0 col28" >val29</th>        <th class="col_heading level0 col29" >val30</th>        <th class="col_heading level0 col30" >val31</th>        <th class="col_heading level0 col31" >val32</th>        <th class="col_heading level0 col32" >val33</th>        <th class="col_heading level0 col33" >val34</th>        <th class="col_heading level0 col34" >val35</th>    </tr>    <tr>        <th class="index_name level0" >name</th>        <th class="index_name level1" >cat</th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>        <th class="blank" ></th>    </tr></thead><tbody>
                <tr>
                        <th id="T_9a870_level0_row0" class="row_heading level0 row0" rowspan="3">Adriana Mcknight</th>
                        <th id="T_9a870_level1_row0" class="row_heading level1 row0" >1</th>
                        <td id="T_9a870_row0_col0" class="data row0 col0" >5.782609</td>
                        <td id="T_9a870_row0_col1" class="data row0 col1" >5.652174</td>
                        <td id="T_9a870_row0_col2" class="data row0 col2" >6.130435</td>
                        <td id="T_9a870_row0_col3" class="data row0 col3" >6.086957</td>
                        <td id="T_9a870_row0_col4" class="data row0 col4" >4.478261</td>
                        <td id="T_9a870_row0_col5" class="data row0 col5" >4.565217</td>
                        <td id="T_9a870_row0_col6" class="data row0 col6" >5.826087</td>
                        <td id="T_9a870_row0_col7" class="data row0 col7" >5.956522</td>
                        <td id="T_9a870_row0_col8" class="data row0 col8" >4.782609</td>
                        <td id="T_9a870_row0_col9" class="data row0 col9" >5.347826</td>
                        <td id="T_9a870_row0_col10" class="data row0 col10" >5.260870</td>
                        <td id="T_9a870_row0_col11" class="data row0 col11" >5.130435</td>
                        <td id="T_9a870_row0_col12" class="data row0 col12" >5.217391</td>
                        <td id="T_9a870_row0_col13" class="data row0 col13" >6.173913</td>
                        <td id="T_9a870_row0_col14" class="data row0 col14" >5.043478</td>
                        <td id="T_9a870_row0_col15" class="data row0 col15" >6.391304</td>
                        <td id="T_9a870_row0_col16" class="data row0 col16" >5.217391</td>
                        <td id="T_9a870_row0_col17" class="data row0 col17" >5.913043</td>
                        <td id="T_9a870_row0_col18" class="data row0 col18" >5.608696</td>
                        <td id="T_9a870_row0_col19" class="data row0 col19" >5.869565</td>
                        <td id="T_9a870_row0_col20" class="data row0 col20" >6.086957</td>
                        <td id="T_9a870_row0_col21" class="data row0 col21" >4.826087</td>
                        <td id="T_9a870_row0_col22" class="data row0 col22" >5.739130</td>
                        <td id="T_9a870_row0_col23" class="data row0 col23" >6.304348</td>
                        <td id="T_9a870_row0_col24" class="data row0 col24" >5.347826</td>
                        <td id="T_9a870_row0_col25" class="data row0 col25" >5.173913</td>
                        <td id="T_9a870_row0_col26" class="data row0 col26" >4.608696</td>
                        <td id="T_9a870_row0_col27" class="data row0 col27" >5.391304</td>
                        <td id="T_9a870_row0_col28" class="data row0 col28" >5.652174</td>
                        <td id="T_9a870_row0_col29" class="data row0 col29" >5.434783</td>
                        <td id="T_9a870_row0_col30" class="data row0 col30" >5.565217</td>
                        <td id="T_9a870_row0_col31" class="data row0 col31" >5.956522</td>
                        <td id="T_9a870_row0_col32" class="data row0 col32" >6.043478</td>
                        <td id="T_9a870_row0_col33" class="data row0 col33" >5.217391</td>
                        <td id="T_9a870_row0_col34" class="data row0 col34" >5.521739</td>
            </tr>

@mroeschke mroeschke added Bug and removed Duplicate Report Duplicate issue or pull request labels Aug 19, 2021
@trendyllama
Copy link

trendyllama commented May 30, 2024

Following up on this, this issue is still occurring with all new versions of pandas. The hack fix for me was to limit the line width with the textwrap package on the html string.

import textwrap

html_str = textwrap.fill(html_str, width = 200)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap Styler conditional formatting using DataFrame.style
Projects
None yet
Development

No branches or pull requests

4 participants