Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an out-of-the-box way to do a faceted histogram with percentages instead of counts? #1155

Closed
araichev opened this issue Aug 7, 2024 · 7 comments
Milestone

Comments

@araichev
Copy link

araichev commented Aug 7, 2024

Without having to create the percentages in your dataframe ahead of time?
Seems possible in ggplot: https://forum.posit.co/t/trouble-scaling-y-axis-to-percentages-from-counts/42999/3 .

@ASmirnov-HORIS
Copy link
Collaborator

Hello!

To make a histogram display density instead of counts:

+ geom_histogram(aes(y='..density..'))

To format it as a percentage:

+ scale_y_continuous(format=".0%")

Could you please clarify what do you mean by "faceted histogram"?

@araichev
Copy link
Author

araichev commented Aug 7, 2024

Thanks for your response, @ASmirnov-HORIS .
To be clear, i want to keep the histogram as a bar chart and not do kernel density estimation.
Re faceting, here's an example of what i mean: https://seaborn.pydata.org/examples/faceted_histogram.html , but instead of counts i wan percentages relative to the individual group counts.
Does that make sense?

@ASmirnov-HORIS
Copy link
Collaborator

Sorry if I confused you, but aes(y='..density..') does not apply the density statistic to the histogram, it’s just a way of normalising the y-values. The normalisation should be such that the area of the plot is 1, but we seem to have found a bug in our formulas, and so far this is not the case.

Nevertheless, here is a code on Lets-Plot, based on your demo:

import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/penguins.csv")
ggplot(df, aes(x="flipper_length_mm")) + \
    geom_histogram(aes(y='..density..'), binwidth=3, center=1) + \
    scale_y_continuous(format=".0%") + \
    facet_grid(x="species", y="sex", y_order=-1)

1155_plot1

@araichev
Copy link
Author

Thanks for the clarification and example @ASmirnov-HORIS .
Yes, with that code my plots have bars exceeding 100%, so something is wrong with the Lets-Plot formulas.
I'll keep an eye on Issue 1157.

@ASmirnov-HORIS
Copy link
Collaborator

I see.
Bar heights may exceed 1 (=100%) if binwidth is less than 1 because ..density.. is normalized by plot area. Are you looking for a different normalization (so that the sum of the values equals 1)?
You could also check geom_bar() API or demo notebook. Let us know if there are any variables you would like to see in geom_histogram().

@araichev
Copy link
Author

For each facet group, i'm looking for a histogram of the counts within the group divided by the total count within the group, expressed as a percentage.
Thus the sum of percentage bars within each group will equal 100%.

ASmirnov-HORIS added a commit that referenced this issue Aug 20, 2024
@alshan alshan closed this as completed in 858a708 Aug 20, 2024
@alshan
Copy link
Collaborator

alshan commented Aug 21, 2024

Hi @araichev , we've just added ..sumprop.. and ..sumpct.. computed variables to the "bin" statistic. This should cover your use case, see https://nbviewer.org/github/JetBrains/lets-plot/blob/master/docs/f-24f/new_stat_bin_vars.ipynb

UPD: v4.4.1

@alshan alshan added this to the 2024Q3 milestone Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants