Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame rendering stylists? #459

Closed
lodagro opened this issue Dec 7, 2011 · 7 comments
Closed

DataFrame rendering stylists? #459

lodagro opened this issue Dec 7, 2011 · 7 comments
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@lodagro
Copy link
Contributor

lodagro commented Dec 7, 2011

Been working a bit on the idea of DataFrame rendering stylists. Would like some feedback on the idea / API.

Idea is that on top of the formatters used in DataFrame.to_string() and DataFrame.to_html(), i would like to add stylists. A stylist takes as input a string and returns a string, allowing to alter the string representation of each individual cell of a DataFrame. Each DataFrame element goes first through a formatter and the string result from this goes through a stylist. This allows to use e.g ANSI escape sequences to change text, background color of how a DataFrame cell is displayed on screen. Or in the DataFrame.to_html() case, a stylists can add html div, class tags - which combined with css change the rendering of a html table cell.

This is the description of stylists and new API for DataFrame.to_string() and DataFrame.to_html

class DataFrameFormatter(object):
    """
    Render a DataFrame

    self.to_string() : console-friendly tabular output
    self.to_html() : html table
    """
    def __init__(self, frame, buf=None, columns=None, col_space=None,
                 na_rep='NaN', formatters=None, float_format=None,
                 sparsify=True, index_names=True, stylists=None):
        """
        Parameters
        ----------
        stylists : object when indexed with [row_key][column_key] returns a
            callable object taking as input a string and outputs a string.
            When rendering a DataFrame, each cell can be altered individually
            using stylists, a cell gets formatted first with formatters after
            which a style can be applied. For example the stylist can add ansi
            escape sequences to display a cell in a different color, add
            html class or div tags, ...
            If stylist[row_key][column_key] does not exist, no styling is done
            to this particular cell of the DataFrame.
        """

class DataFrame():
    def to_string(self, buf=None, columns=None, colSpace=None,
                  na_rep='NaN', formatters=None, float_format=None,
                  sparsify=True, nanRep=None, index_names=True,
                  stylists=None):

    def to_html(self, buf=None, columns=None, colSpace=None,
                na_rep='NaN', formatters=None, float_format=None,
                sparsify=True, index_names=True, stylists=None):

A little demo when using stylists on screen:

import pandas
import numpy as np
from colorama import Fore, Back, Style

df = pandas.DataFrame(np.random.randint(0, 10, (5, 2)),
                      columns=['A', 'B'],
                      index=['a', 'b', 'c', 'd', 'e'])

red = lambda x: Back.RED + x + Back.RESET
green = lambda x: Back.GREEN + x + Back.RESET
yellow = lambda x: Back.YELLOW + x + Back.RESET

stylists = {'a': {'A': red, 'B': yellow},
            'b': {'A': green},
            'c': {'B': green}}

Results in the following (there should be an image here below):

stylists demo

As you can see, more work is needed. The ANSI escape sequences are taken into account when determining the number of characters needed for each column, this is not needed since they are invisible. Solution is e.g to set column widths before stylists are applied, implying that a stylist can not change the width of a column - seems reasonable.

Or maybe this should be taken one step further and find some way to combine the functionality of both formatters and stylists into one (have not thought about how this should look)? Ideas, feedback?

@kisielk
Copy link
Contributor

kisielk commented Apr 19, 2012

Interesting idea. How about instead of the argument being a nested dictionary make it a function of two arguments?

Your code would become:

print df.to_string(stylist=lambda r, c: stylists[r][c])

I renamed the argument to be singular since it accepts just one function.

This would also allow you to compose multiple styling functions by wrapping them, or have styling that's algorithmic instead of defined statically in a data-structure.

Sure you could achieve the same thing using your approach and defining getitem on a class, but that's just more busywork.

@CRP
Copy link
Contributor

CRP commented Feb 17, 2014

Hi all,
I am interested in adding functionality to to_html dataframe rendering that would allow the user to show and hide subgroups of data. This would mimic the outline functionality in Excel.
One possible use case would be a pivot table where one clicks on a subtotal to show the detail of the items that make it up.

Here is a very simple example of how the final result would look like:

<html>
   <head>
   <script>
        function toggle(thisname) {
           tr=document.getElementsByTagName('tr')
           for (i=0;i<tr.length;i++){
              if (tr[i].getAttribute(thisname)){
                 if ( tr[i].style.display=='none' ){
                    tr[i].style.display = '';
                 }
              else {
                 tr[i].style.display = 'none';
                 }
              }
           }
        }
   </script>
   </head>

<body>
<table border=1>
   <tr onClick="toggle('hide1');"><td >Main Row 1</td><td >Value 1</td></tr>
   <tr hide1=yes ><td>Subrow 11</td><td >Value 11</td></tr>
   <tr onClick="toggle('hide2');"><td>Main Row 2</td><td >Value 2</td></tr>
   <tr hide2=yes ><td>Subrow 21</td><td >Value 21</td></tr>
   <tr hide2=yes ><td>Subrow 22</td><td >Value 22</td></tr>
   <tr onClick="toggle('hide3');"><td>Main Row 3</td><td >Value 3</td></tr>
   <tr hide3=yes ><td>Subrow 31</td><td >Value 31</td></tr>
   <tr hide3=yes ><td>Subrow 32</td><td >Value 32</td></tr>
   <tr onClick="toggle('hide4');"><td>Main Row 4</td><td >Value 4</td></tr>
   <tr hide4=yes ><td>Subrow 41</td><td >Value 41</td></tr>
   <tr hide4=yes ><td>Subrow 42</td><td >Value 42</td></tr>
</table>
</body>
</html>

So basically the to_html code would need:

  1. A way to identify "main" and "sub" rows; in my tests I have achieved this simply by assuming that this is applied only to data frames with hierarchical indices, and then allowing the user to specify a string which identifies "main" rows, for example "Total"; the code would then identify a subgroup as the rows between two main row labels
  2. attach group labels to subrows
  3. add onClick parameters to main rows
  4. some way to add the javascript function to the output html string; I do not know much about html, so I do not know what the best way to do this would be, especially if one wishes to build a page with multiple pages.

What do you think?

@jreback
Copy link
Contributor

jreback commented Feb 17, 2014

@CRP can you inline a picture of that ? also...I realized #3190 is prob a better issue for this

@jreback
Copy link
Contributor

jreback commented Mar 11, 2014

closing in favor of #3190

@ghost711
Copy link

ghost711 commented Aug 17, 2019

The approach below appears to work great for calculating the correct column print width when ANSI codes are involved.

You can replace the "TextAdjustment" class with the version below in this file:
site-packages/pandas/io/formats/format.py

class TextAdjustment(object):  
    def __init__(self):
        import re
        self.ansi_regx = re.compile(r'\x1B[@-_][0-?]*[ -/]*[@-~]')
        self.encoding  = get_option("display.encoding")
    
    def len(self, text):  
        return compat.strlen(self.ansi_regx.sub('', text), 
                             encoding=self.encoding) 
            
    def justify(self, texts, max_len, mode='right'):       
        jfunc = str.ljust if (mode == 'left')  else \
                str.rjust if (mode == 'right') else str.center     
        out = [];  
        for s in texts:
            escapes = self.ansi_regx.findall(s)    
            if len(escapes) == 2:
                out.append(escapes[0].strip() + 
                           jfunc(self.ansi_regx.sub('', s), max_len) + 
                           escapes[1].strip()) 
            else:
                out.append(jfunc(s, max_len)) 
        return out;  
      
    def _join_unicode(self, lines, sep=''):
        try:
            return sep.join(lines)
        except UnicodeDecodeError:
            sep = compat.text_type(sep)
            return sep.join([x.decode('utf-8') if isinstance(x, str) else x
                                                            for x in lines])
    
    def adjoin(self, space, *lists, **kwargs): 
        # Add space for all but the last column: 
        pads = ([space] * (len(lists) - 1)) + [0] 
        max_col_len = max([len(col) for col in lists])
        new_cols = []
        for col, pad in zip(lists, pads): 
            width = max([self.len(s) for s in col]) + pad
            c     = self.justify(col, width, mode='left')
            # Add blank cells to end of col if needed for different col lens: 
            if len(col) < max_col_len:
                c.extend([' ' * width] * (max_col_len - len(col)))
            new_cols.append(c)
             
        rows = [self._join_unicode(row_tup) for row_tup in zip(*new_cols)] 
        return self._join_unicode(rows, sep='\n') 

@celynw
Copy link

celynw commented Oct 18, 2019

Thanks @ghost711!
For posterity, compat.strlen seems to have been removed in #25903.
I added it back in to this file with (simply):

def strlen(data, encoding=None):
    return len(data)

and also needed:

import pandas.compat as compat

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 18, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

7 participants