PLT: Cleaner plotting backend API, and unify Series and DataFrame accessors #27009

datapythonista · 2019-06-23T12:18:33Z

xref API: Define API for pandas plotting backends #26747
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

The main changes are:

The API that backends need to implement is simply a function plot(data, kind, **kwargs). After testing different approaches (abstract class previously mentioned and others), I think this is what makes things simpler in both ends pandas and the backends. For hvplot hvPlot.__call__ is already what needs to be implemented, and this change will mainly require to delete a bit of code. For pdvega a bit more work will be needed, but this will help remove a significant amount of repetitive code.

Second thing, more in the pandas side, I unify the accessors of Series and DataFrame. There will be minor changes, like unified documentation (which I think it's great, but docstrings will have to be generic), and also Series.plot.scatter will exist (which now doesn't) but will generate an exception if called. Not a big deal I'd say, because now it's already possible to do Series.plot(kind='scatter'), so it'll just make both cases consistent. Same things apply to hexbin.

CC: @pandas-dev/pandas-core @philippjfr @jakevdp

TomAugspurger · 2019-06-24T15:01:56Z

I'm still a bit unclear on how the dispatching to backends will work. IIUC, on master we structure things as DataFrame.plot.__call__(kind, ...), which selects the right method based on the kind. So DataFrame.plot(kind='line') calls DataFrame.plot.line which does its own thing.

Now we're going to change things a bit so that DataFrame.plot.line calls DataFrame.plot.__call__ with kind=line.

Does that have a direct impact on how backends will provide their own implementations?

TomAugspurger · 2019-06-24T14:52:49Z

pandas/plotting/_core.py


    Plotting methods can also be accessed by calling the accessor as a method
    with the ``kind`` argument:
    ``s.plot(kind='line')`` is equivalent to ``s.plot.line()``
    """
+    def __init__(self, data):
+        assert isinstance(data, (pandas.Series, pandas.DataFrame))


Why this assert?

There was a comment saying that data had to be Series or DataFrame, I thought that an assert would say the same and also check. No other reason, and happy to get rid of it.

TomAugspurger · 2019-06-24T14:53:40Z

pandas/plotting/_matplotlib/__init__.py

-           'bootstrap_plot', 'lag_plot', 'parallel_coordinates', 'radviz',
-           'scatter_matrix', 'register', 'deregister']
+def plot(data, kind, **kwargs):
+    if isinstance(data, pandas.Series):


Makes sense; not sure if copied this, but will change it.

datapythonista · 2019-06-24T15:18:09Z

Now we're going to change things a bit so that DataFrame.plot.line calls DataFrame.plot.__call__ with kind=line.

Not exactly. I didn't modify that part (besides merging the accessors for Series and DataFrame). In master DataFrame.plot.line calls the accessor line method, and the line method calls __call__. And this didn't change.

The __call__ method was calling plot_frame and plot_series, and those were calling plot. plot was instantiating/calling LinePlot() (LinePlot` was the public part of the backend). This is the part that changed.

Now __call__ will call plot in the backend (plot is the public part of the backend). And plot will instantiate and call LinePlot and the other classes. I think this is much better, since all backends will run some common code first no matter the plot kind.

…ailing)

codecov · 2019-06-25T14:54:41Z

Codecov Report

Merging #27009 into master will decrease coverage by 50.16%.
The diff coverage is 41.79%.

@@             Coverage Diff             @@
##           master   #27009       +/-   ##
===========================================
- Coverage   91.99%   41.83%   -50.17%     
===========================================
  Files         180      180               
  Lines       50774    50673      -101     
===========================================
- Hits        46712    21199    -25513     
- Misses       4062    29474    +25412

Flag	Coverage Δ
#multiple	`?`
#single	`41.83% <41.79%> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/frame.py	`34.99% <100%> (-62.02%)`	⬇️
pandas/core/series.py	`45.45% <100%> (-48.22%)`	⬇️
pandas/plotting/_matplotlib/__init__.py	`56% <26.66%> (-44%)`	⬇️
pandas/plotting/_core.py	`46.87% <44%> (-42.52%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/io/gcs.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0919f2...f00ec30. Read the comment docs.

codecov · 2019-06-25T14:54:46Z

Codecov Report

❗ No coverage uploaded for pull request base (master@ad2e98c). Click here to learn what that means.
The diff coverage is 36.84%.

@@            Coverage Diff            @@
##             master   #27009   +/-   ##
=========================================
  Coverage          ?   41.82%           
=========================================
  Files             ?      180           
  Lines             ?    50683           
  Branches          ?        0           
=========================================
  Hits              ?    21197           
  Misses            ?    29486           
  Partials          ?        0

Flag	Coverage Δ
#single	`41.82% <36.84%> (?)`

Impacted Files	Coverage Δ
pandas/plotting/_core.py	`26.66% <ø> (ø)`
pandas/plotting/_matplotlib/timeseries.py	`12.75% <0%> (ø)`
pandas/core/series.py	`45.45% <100%> (ø)`
pandas/core/frame.py	`34.99% <100%> (ø)`
pandas/plotting/_matplotlib/__init__.py	`60% <33.33%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad2e98c...2597bc9. Read the comment docs.

jorisvandenbossche

As said on the issue, +1 on such a simpler functional API!

jorisvandenbossche · 2019-06-26T05:33:32Z

pandas/plotting/_core.py

+                                    figsize=figsize, bins=bins, **kwds)
+
+
+hist_series.__doc__ = PlotAccessor.hist.__doc__


I would leave the original hist_series and hist_frame intact, as the accessor's docstring version is not correct for them (different keywords, different order of the keywords, ...)

And the same for boxplot below

Didn't realize, good point, thanks!

…frame

… things were wrong or could be improved)

…Series or DataFrame) was being modified in the first call

…ignature mismatches

jorisvandenbossche · 2019-06-28T04:22:17Z

pandas/plotting/_core.py

-        Plot a whole dataframe to a bar plot. Each column is assigned a
-        distinct color, and each row is nested in a group along the
-        horizontal axis.
+def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None,


It makes sense to move those non-accessor functions down in the file, but for the (already large) diff it might make sense to keep them at the top where they were (those ones are just a copy paste I think), and only do the move in a follow-up PR

Yes, that makes sense, but surely better in a follow up.

@datapythonista I actually meant moving them back here, so we can do the actual move in a follow-up
(the diff is right now basically not informative)

datapythonista · 2019-06-28T13:28:39Z

Still working on the failing tests here. Of the two that are failing locally, the problem can be reproduced with this simple example:

import pandas

df = pandas.DataFrame({'a': [.19, .78, .27],
                       'b': [.62, .77, .80]})
df.plot()
df.plot(y='a')

In master this generates two plots, but in this branch is generating a single plot for both calls. Couldn't figure out why yet.

…e this caused the parallel coordinates test failure)

pep8speaks · 2019-07-02T13:37:27Z

Hello @datapythonista! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-07-03 14:07:47 UTC

…ooks like this caused the parallel coordinates test failure)" This reverts commit 4d70d5d.

jorisvandenbossche · 2019-07-02T15:14:39Z

pandas/plotting/__init__.py

+future (e.g. `title` which should be useful for any backend).
+
+Currently, all the Matplotlib functions in pandas are accessed through
+the selected backend. For example, `pandas.plotting.boxplot` (equivalent


I don't think we should mention those right now, as there is still some discussion on whether to see them as part of the plotting backend interface or not.

two lines below I say that This is expected to change. Personally I think it adds more value to have this comment now, and remove it when/if that changes, than just not having this documented, or document the future behavior.

jorisvandenbossche · 2019-07-02T15:39:30Z

@datapythonista how are the changes in this PR affecting the misc plots? (since the failing tests seem to be related to those)

datapythonista · 2019-07-02T15:47:35Z

No idea, I think the test started to fail after some changes to the documentation and comments, or a merge, doesn't make any sense. The changes here shouldn't affect andrews_curves at all. Can't reproduce locally, so I need to debug in the CI.

This reverts commit 34d189f.

…rves test failure

datapythonista · 2019-07-02T16:21:27Z

Failing example:

from pandas import DataFrame
from pandas.plotting import parallel_coordinates

df = DataFrame({"feat": [i for i in range(30)],
                "class": [2 for _ in range(10)] +
                         [3 for _ in range(10)] +
                         [1 for _ in range(10)]})
ax = parallel_coordinates(df, 'class', sort_labels=True)
polylines, labels = ax.get_legend_handles_labels()
color_label_tuples = \
    zip([polyline.get_color() for polyline in polylines], labels)
ordered_color_label_tuples = sorted(color_label_tuples,
                                    key=lambda x: x[1])
ordered_color_label_tuples

Locally returns:

[([0.417022004702574, 0.7203244934421581, 0.00011437481734488664], '1'),
 ([0.43599490214200376, 0.025926231827891333, 0.5496624778787091], '2'),
 ([0.5488135039273248, 0.7151893663724195, 0.6027633760716439], '3')]

Which is correct, but in the Mac build in the CI seems to return:

[([0.417022004702574, 0.7203244934421581, 0.00011437481734488664], '1'),
 ([0.43599490214200376, 0.025926231827891333, 0.5496624778787091], '2'),
 ([0.5488135039273248, 0.7151893663724195, 0.6027633760716439], '3'),
 ('#1f77b4', 'None')]

No idea where this mysterious last element comes from. If anyone with a Mac can reproduce it locally that can makes things easy.

The CI was green, and this started to fail after updating comments and the docs, and adding a warning for Series.plot with positional arguments. I'm temporary removing the warning to see if it is causing the error.

jorisvandenbossche · 2019-07-02T17:19:55Z

I'm temporary removing the warning to see if it is causing the error.

It didn't help .. (the mac build still has the additional parallel_coordinates error).

I am fine with skipping that test on Mac to get this merged, and try to fix in a follow-up.

datapythonista · 2019-07-02T17:42:44Z

Very weird, the other commit when this started to fail was just on comments and docs. Will continue tomorrow.

…drews_curves test failure" This reverts commit 29d7547.

datapythonista · 2019-07-03T14:05:55Z

Looks like this is the commit that is making the parallel_coordinates test fail: 16d1f88

I'll remove it for now if that's all right.

jorisvandenbossche · 2019-07-03T14:25:01Z

Looks like this is the commit that is making the parallel_coordinates test fail: 16d1f88

It might be that putting it in the class fixes it. The setup method of that class imports matplotlib (which might influence if that is imported before pyplot).

datapythonista · 2019-07-03T18:11:01Z

The warning is tested in another test anyway, it's redundant, so I think it's not a problem to remove that test.

@jreback you requested changes, can you have a look?

jreback · 2019-07-03T18:21:26Z

thanks @datapythonista very nice!

prob want to have a look at built docs / doc-strings to make sure everything renderning correctly.

TomAugspurger · 2019-07-03T18:22:52Z

Nice work @datapythonista! Anything else you want to get to for the release candidate? I'm hoping to tag it in ~7 hours.

jorisvandenbossche · 2019-07-03T18:25:12Z

I just commented on the issue that I think we should decide about the scope of the plotting backend (include misc plotting functions or not) for 0.25.0, but that doesn't need to be done for the RC.

datapythonista · 2019-07-03T18:28:35Z

will check the docs once they're built.

Thanks Tom, we started to work on hvplot, and I fixed here couple of minor things that were creating problems for the backends. I think everything should be all right on this now, but if there is anything should be very small and can wait after the RC.

The main thing pending on this affecting the release is what Joris just mentioned on what we delegate to the backends. We can discuss that in #26747.

datapythonista added 2 commits June 21, 2019 22:46

Some experiments so far

f279d16

Refactoring of pandas plotting to make the API clearer

ca5671c

datapythonista added Visualization plotting Clean Needs Discussion Requires discussion from core team before further action labels Jun 23, 2019

TomAugspurger reviewed Jun 24, 2019

View reviewed changes

Marc Garcia added 3 commits June 25, 2019 10:14

Merge remote-tracking branch 'upstream/master' into plot_api

8a56ad7

Addressing review comments, and fixing many tests (still some tests f…

196388b

…ailing)

Merge remote-tracking branch 'upstream/master' into plot_api

f00ec30

jorisvandenbossche mentioned this pull request Jun 26, 2019

API: Define API for pandas plotting backends #26747

Open

jorisvandenbossche reviewed Jun 26, 2019

View reviewed changes

Marc Garcia added 7 commits June 26, 2019 10:48

Merge remote-tracking branch 'upstream/master' into plot_api

1c16faa

Restoring docstrings of hist_series, hist_frame, boxplot and boxplot_…

995d72e

…frame

Fixing plot accessor docstring (was in the wrong place, and couple of…

7e45996

… things were wrong or could be improved)

Fixing hexbin plot tests

4fbfed0

Fixing bug when calling plot twice on the same data, since the data (…

7d7263a

…Series or DataFrame) was being modified in the first call

Raising missing exception for pie in DataFrame, and fixing accessor s…

1a03cbf

…ignature mismatches

Fixing bug that shown the legend for Series plot

cf7cbc0

datapythonista changed the title ~~WIP/PLT: Cleaner plotting backend API, and unify Series and DataFrame accessors~~ PLT: Cleaner plotting backend API, and unify Series and DataFrame accessors Jun 26, 2019

datapythonista removed the Needs Discussion Requires discussion from core team before further action label Jun 26, 2019

Fix linting

d063e05

jorisvandenbossche reviewed Jun 28, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into plot_api

c83551d

jreback added the Blocker Blocking issue or pull request for an upcoming release label Jun 28, 2019

jorisvandenbossche added this to the 0.25.0 milestone Jun 28, 2019

Marc Garcia added 2 commits July 2, 2019 14:34

Fixing test of plotting accessor parameters

0cf4514

Temporary not warning for Series.plot positional arguments (looks lik…

4d70d5d

…e this caused the parallel coordinates test failure)

Marc Garcia added 3 commits July 2, 2019 15:37

Revert "Temporary not warning for Series.plot positional arguments (l…

a2330b2

…ooks like this caused the parallel coordinates test failure)" This reverts commit 4d70d5d.

Merge remote-tracking branch 'upstream/master' into plot_api

42a1b35

Adding debug info in the CI for failing test

34d189f

jorisvandenbossche reviewed Jul 2, 2019

View reviewed changes

Marc Garcia added 2 commits July 2, 2019 17:10

Revert "Adding debug info in the CI for failing test"

5819585

This reverts commit 34d189f.

Temporary removing the warning, to see if it's causing the andrews_cu…

29d7547

…rves test failure

jreback and others added 2 commits July 2, 2019 22:19

Merge branch 'master' into PR_TOOL_MERGE_PR_27009

37fb064

Revert "Temporary removing the warning, to see if it's causing the an…

a5d0fd9

…drews_curves test failure" This reverts commit 29d7547.

datapythonista mentioned this pull request Jul 3, 2019

DO NOT MERGE: Debugging test failure in plotting refactoring #27204

Closed

Removing test that causes parallel_coordinates test to fail

ce544e1

jreback approved these changes Jul 3, 2019

View reviewed changes

jreback merged commit 903a09c into pandas-dev:master Jul 3, 2019

datapythonista mentioned this pull request Sep 28, 2019

CLN: Exception in io, plotting #28649

Closed

nafarya mentioned this pull request Mar 10, 2022

TYP: annotation of __init__ return type (PEP 484) (pandas/plotting) #46283

Merged

1 task

rhshadrach mentioned this pull request Sep 9, 2024

QST: DataFrame.plot() vs Series.plot() have inconsistency wrt reuse plot #59716

Open

2 tasks

		figsize=figsize, bins=bins, **kwds)


		hist_series.__doc__ = PlotAccessor.hist.__doc__

Uh oh!

PLT: Cleaner plotting backend API, and unify Series and DataFrame accessors #27009

PLT: Cleaner plotting backend API, and unify Series and DataFrame accessors #27009

Uh oh!

Conversation

datapythonista commented Jun 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Jun 24, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datapythonista commented Jun 24, 2019

Uh oh!

codecov bot commented Jun 25, 2019

Codecov Report

Uh oh!

codecov bot commented Jun 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datapythonista commented Jun 28, 2019

Uh oh!

pep8speaks commented Jul 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-07-03 14:07:47 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Jul 2, 2019

Uh oh!

datapythonista commented Jul 2, 2019

Uh oh!

datapythonista commented Jul 2, 2019

Uh oh!

jorisvandenbossche commented Jul 2, 2019

Uh oh!

datapythonista commented Jul 2, 2019

Uh oh!

datapythonista commented Jul 3, 2019

Uh oh!

jorisvandenbossche commented Jul 3, 2019

Uh oh!

datapythonista commented Jul 3, 2019

Uh oh!

jreback commented Jul 3, 2019

Uh oh!

TomAugspurger commented Jul 3, 2019

Uh oh!

jorisvandenbossche commented Jul 3, 2019

Uh oh!

datapythonista commented Jul 3, 2019

Uh oh!

Uh oh!

datapythonista commented Jun 23, 2019 •

edited

Loading

codecov bot commented Jun 25, 2019 •

edited

Loading

pep8speaks commented Jul 2, 2019 •

edited

Loading