Description
In situations when we want to calculate a group stat that requires knowledge of other groups, it would be useful for compute_group
to have access to the rest of the data
I would like to be able to create a new property, bin_prop
, applied to StatBin
, that returns the proportion of data in that bin, that belongs to the group.
In the example below, I want to analyze the number of plays, by each player in the lakers. I will use geom_freqpoly
to show the counts, but what I really want is the proportion of plays per player within the bin.
Set up data
library(lubridate)
library(ggplot2)
library(dplyr)
# set up data
laker_player_plays = lakers |>
tibble::as_tibble() |>
filter(team == 'LAL', stringr::str_length(player) > 0) |>
mutate(date = ymd(date))
Just counts, close to what I want, but I would love to use a after_stat(bin_prop)
instead.
# I'd like to do this, but instead cerate a new property `bin_prop` that shows the percentage of plays by that player
ggplot(laker_player_plays) +
geom_freqpoly(aes(x = date,
color = player,
y = after_stat(count)
),
binwidth = 31)
Side note
I do see that something equivalent can be done with geom_histogram + position = 'fill'
- but I do not believe this is being done by the stat layer, but maybe by the scales layer?
# I do notice this is done to some extent using geom_histogram + position = fill, but I believe this position is not computed during the stat step
ggplot(laker_player_plays) +
geom_histogram(aes(x = date, fill = player), position = 'fill', binwidth = 31)
Desired output
Here is an example of what I'd like to achieve, but by using stats instead of precomputing the proportion_of_plays
ahed of time`
# This is the type of plot I think we should be able to create, without having to pre-calculate the proportions (should be computed in StatBin)
# calculate breaks, for solutions that can't use stat_bin
breaks = seq(min(laker_player_plays$date), max(laker_player_plays$date)+31, by = 31)
laker_player_plays |>
mutate(date_group = cut(date, breaks = breaks, )) |>
group_by(player, date_group) |>
count(name = 'plays') |>
group_by(date_group) |>
mutate(proportion_of_plays = plays/sum(plays)) |>
ggplot(aes(x = date_group,
y = proportion_of_plays,
color = player,
group = player)) +
geom_point() +
geom_line() +
scale_y_continuous(labels=scales::percent)
Created on 2025-05-22 with reprex v2.1.1
Suggested API
ggplot(laker_player_plays) +
geom_freqpoly(aes(x = date,
color = player,
y = after_stat(bin_prop)
),
binwidth = 31)
I've attempted to create a PR for this, but noticed that each group is calculated independently. Is there a solution, or workaround that you propose to create a PR that enables the calculation of bin_prop
in StatBin
that requires calculation of proportions between groups? I do see that after_stat(prop)
is available for geom_bar
so I suspect this pattern has been solved for before?