Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

By-keyword, Inconcistencies between the new df.plot.hist and its df counterpart #11483

Open
Twizzledrizzle opened this issue Oct 30, 2015 · 4 comments
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement Visualization plotting

Comments

@Twizzledrizzle
Copy link

Found this when working on

#11441

import pandas as pd
d = {'one' : ['A', 'A', 'B', 'B', 'C'],
     'two' : [4., 3., 2., 2, 1],
     'three' : [10., 8., 3, 5, 7.]}     
df = pd.DataFrame(d)

# this works
df.hist('two', by='one', bins=range(0, 10))

# this does not work (everything in one plot), also no way to specify column
df.plot.hist(by='one', bins=range(0, 10))

My idea was to make the df.plot.hist version similar to the df.hist. But the code is much more complex. Would it not be best to point the df.plot.hist to the df.hist version? Instead of having two separate logics for this?

Oh, and the by keyword does not seem to work for df.plot.box, have not found any it worked for. At least the way I expected it to work :)

@sinhrks sinhrks added the Visualization plotting label Oct 31, 2015
@sinhrks
Copy link
Member

sinhrks commented Oct 31, 2015

Related to #8018 (internally it splits data to groups).

by behaves differently in df.hist (subplots) and df.box (grouping in a sincle ax). Thus, I don't think porting these behavior to plot is not good idea. We should decide how by should work.

@Twizzledrizzle
Copy link
Author

Oh! I missed your work completely when looking through the pull requests. It looks really nice.

I did not know about the groupby().hist, or groupby().plot.hist. I guess I would expect if having the by-keyword, we would get the same results.

Also, can you take a look at my pull request: #11441

I am trying to get a better implementation of the weighs keyword, and also work even though you have different nan's in the data & weights. But if this could be integrated in your solution I would be sooo happy

For example

df.plot.hist(column='two', by='one', weights='three')
# or
df.groupby('one').plot.hist(column='two', weighs='three')

and when plotting multiple data, perhaps like below

df.plot.hist(column=['data1', 'data2'], by='one', weights=['weights1', 'weighs2'])

@Twizzledrizzle
Copy link
Author

And plotted in the same graph, if not your new keyword subplots=True is used?

@Twizzledrizzle
Copy link
Author

@sinhrks I tried to pull your changes into my own dev environment to test out various things with weighs, but alas I failed :(

I did not find your group by repository. Can you publish it again? It would be really really fun trying your great looking additions out in the hope I can contribute a little back.

@jbrockmendel jbrockmendel added the API - Consistency Internal Consistency of API/Behavior label Dec 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement Visualization plotting
Projects
None yet
Development

No branches or pull requests

4 participants