-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PLT: Cleaner plotting backend API, and unify Series and DataFrame accessors #27009
Conversation
I'm still a bit unclear on how the dispatching to backends will work. IIUC, on master we structure things as Now we're going to change things a bit so that Does that have a direct impact on how backends will provide their own implementations? |
pandas/plotting/_core.py
Outdated
|
||
Plotting methods can also be accessed by calling the accessor as a method | ||
with the ``kind`` argument: | ||
``s.plot(kind='line')`` is equivalent to ``s.plot.line()`` | ||
""" | ||
def __init__(self, data): | ||
assert isinstance(data, (pandas.Series, pandas.DataFrame)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a comment saying that data
had to be Series
or DataFrame
, I thought that an assert would say the same and also check. No other reason, and happy to get rid of it.
'bootstrap_plot', 'lag_plot', 'parallel_coordinates', 'radviz', | ||
'scatter_matrix', 'register', 'deregister'] | ||
def plot(data, kind, **kwargs): | ||
if isinstance(data, pandas.Series): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABCSeries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense; not sure if copied this, but will change it.
Not exactly. I didn't modify that part (besides merging the accessors for Series and DataFrame). In master The Now |
Codecov Report
@@ Coverage Diff @@
## master #27009 +/- ##
===========================================
- Coverage 91.99% 41.83% -50.17%
===========================================
Files 180 180
Lines 50774 50673 -101
===========================================
- Hits 46712 21199 -25513
- Misses 4062 29474 +25412
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #27009 +/- ##
=========================================
Coverage ? 41.82%
=========================================
Files ? 180
Lines ? 50683
Branches ? 0
=========================================
Hits ? 21197
Misses ? 29486
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said on the issue, +1 on such a simpler functional API!
pandas/plotting/_core.py
Outdated
figsize=figsize, bins=bins, **kwds) | ||
|
||
|
||
hist_series.__doc__ = PlotAccessor.hist.__doc__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave the original hist_series and hist_frame intact, as the accessor's docstring version is not correct for them (different keywords, different order of the keywords, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the same for boxplot below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realize, good point, thanks!
… things were wrong or could be improved)
…Series or DataFrame) was being modified in the first call
…ignature mismatches
pandas/plotting/_core.py
Outdated
Plot a whole dataframe to a bar plot. Each column is assigned a | ||
distinct color, and each row is nested in a group along the | ||
horizontal axis. | ||
def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to move those non-accessor functions down in the file, but for the (already large) diff it might make sense to keep them at the top where they were (those ones are just a copy paste I think), and only do the move in a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense, but surely better in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista I actually meant moving them back here, so we can do the actual move in a follow-up
(the diff is right now basically not informative)
Still working on the failing tests here. Of the two that are failing locally, the problem can be reproduced with this simple example: import pandas
df = pandas.DataFrame({'a': [.19, .78, .27],
'b': [.62, .77, .80]})
df.plot()
df.plot(y='a') In master this generates two plots, but in this branch is generating a single plot for both calls. Couldn't figure out why yet. |
…e this caused the parallel coordinates test failure)
Hello @datapythonista! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-07-03 14:07:47 UTC |
…ooks like this caused the parallel coordinates test failure)" This reverts commit 4d70d5d.
future (e.g. `title` which should be useful for any backend). | ||
|
||
Currently, all the Matplotlib functions in pandas are accessed through | ||
the selected backend. For example, `pandas.plotting.boxplot` (equivalent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should mention those right now, as there is still some discussion on whether to see them as part of the plotting backend interface or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two lines below I say that This is expected to change
. Personally I think it adds more value to have this comment now, and remove it when/if that changes, than just not having this documented, or document the future behavior.
@datapythonista how are the changes in this PR affecting the misc plots? (since the failing tests seem to be related to those) |
No idea, I think the test started to fail after some changes to the documentation and comments, or a merge, doesn't make any sense. The changes here shouldn't affect andrews_curves at all. Can't reproduce locally, so I need to debug in the CI. |
This reverts commit 34d189f.
…rves test failure
Failing example: from pandas import DataFrame
from pandas.plotting import parallel_coordinates
df = DataFrame({"feat": [i for i in range(30)],
"class": [2 for _ in range(10)] +
[3 for _ in range(10)] +
[1 for _ in range(10)]})
ax = parallel_coordinates(df, 'class', sort_labels=True)
polylines, labels = ax.get_legend_handles_labels()
color_label_tuples = \
zip([polyline.get_color() for polyline in polylines], labels)
ordered_color_label_tuples = sorted(color_label_tuples,
key=lambda x: x[1])
ordered_color_label_tuples Locally returns:
Which is correct, but in the Mac build in the CI seems to return:
No idea where this mysterious last element comes from. If anyone with a Mac can reproduce it locally that can makes things easy. The CI was green, and this started to fail after updating comments and the docs, and adding a warning for |
It didn't help .. (the mac build still has the additional parallel_coordinates error). I am fine with skipping that test on Mac to get this merged, and try to fix in a follow-up. |
Very weird, the other commit when this started to fail was just on comments and docs. Will continue tomorrow. |
…drews_curves test failure" This reverts commit 29d7547.
Looks like this is the commit that is making the parallel_coordinates test fail: 16d1f88 I'll remove it for now if that's all right. |
It might be that putting it in the class fixes it. The setup method of that class imports matplotlib (which might influence if that is imported before pyplot). |
The warning is tested in another test anyway, it's redundant, so I think it's not a problem to remove that test. @jreback you requested changes, can you have a look? |
thanks @datapythonista very nice! prob want to have a look at built docs / doc-strings to make sure everything renderning correctly. |
Nice work @datapythonista! Anything else you want to get to for the release candidate? I'm hoping to tag it in ~7 hours. |
I just commented on the issue that I think we should decide about the scope of the plotting backend (include misc plotting functions or not) for 0.25.0, but that doesn't need to be done for the RC. |
will check the docs once they're built. Thanks Tom, we started to work on hvplot, and I fixed here couple of minor things that were creating problems for the backends. I think everything should be all right on this now, but if there is anything should be very small and can wait after the RC. The main thing pending on this affecting the release is what Joris just mentioned on what we delegate to the backends. We can discuss that in #26747. |
git diff upstream/master -u -- "*.py" | flake8 --diff
The main changes are:
The API that backends need to implement is simply a function
plot(data, kind, **kwargs)
. After testing different approaches (abstract class previously mentioned and others), I think this is what makes things simpler in both ends pandas and the backends. Forhvplot
hvPlot.__call__
is already what needs to be implemented, and this change will mainly require to delete a bit of code. Forpdvega
a bit more work will be needed, but this will help remove a significant amount of repetitive code.Second thing, more in the pandas side, I unify the accessors of
Series
andDataFrame
. There will be minor changes, like unified documentation (which I think it's great, but docstrings will have to be generic), and alsoSeries.plot.scatter
will exist (which now doesn't) but will generate an exception if called. Not a big deal I'd say, because now it's already possible to doSeries.plot(kind='scatter')
, so it'll just make both cases consistent. Same things apply tohexbin
.CC: @pandas-dev/pandas-core @philippjfr @jakevdp