Allow empty `keys` argument in `by()` #1837

alecloudenback · 2019-05-31T20:47:36Z

When exploring data or creating a function to abstract the process of generating a fixed set of results for different combinations of variables, it would be nice to be able to pass an empty array which would simply "group-by" all the rows in the dataset, e.g.

function summarize(varies_by)

    by(iris,varies_by) do df
          (m = mean(df.PetalLength), s² = var(df.PetalLength))
    end
end

summarize([])

would return

	m	s²
1	3.75867	3.11318

Currently it errors:

BoundsError: attempt to access ()
  at index [1]

nalimilan · 2019-05-31T21:13:59Z

Currently GroupedDataFrame isn't prepared to handle this, but I guess that would be doable (putting all rows in the same group). Is this allowed by other implementations (like dplyr and Pandas)?

alecloudenback · 2019-06-01T02:09:53Z

In R:

library(dplyr)
starwars %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  )

returns

# A tibble: 38 x 3
   species       n  mass
   <chr>     <int> <dbl>
 1 <NA>          5  48  
 2 Aleena        1  15  
 3 Besalisk      1 102  
 4 Cerean        1  82  
 5 Chagrian      1 NaN  
 6 Clawdite      1  55  
 7 Droid         5  69.8
 8 Dug           1  40  
 9 Ewok          1  20  
10 Geonosian     1  80  
# ... with 28 more rows

And leaving the group_by empty just aggregates across all:

starwars %>%
  group_by() %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  )

returns:

# A tibble: 1 x 2
      n  mass
  <int> <dbl>
1    87  97.3

alecloudenback · 2019-06-01T02:20:56Z

While R does (above), Pandas does not:

> grouped = df.groupby()
TypeError: You have to supply one of 'by' and 'level'

nalimilan · 2019-06-01T19:57:05Z

OK, then I guess we could support this too. Though that's not high priority for us, but feel free to make a pull request.

bkamins · 2019-12-01T13:57:40Z

fixed

bkamins closed this as completed Dec 1, 2019

alecloudenback mentioned this issue Jun 19, 2020

combine/select/transform on empty GroupedDataFrame #2297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow empty `keys` argument in `by()` #1837

Allow empty `keys` argument in `by()` #1837

alecloudenback commented May 31, 2019

nalimilan commented May 31, 2019

alecloudenback commented Jun 1, 2019 •

edited

Loading

alecloudenback commented Jun 1, 2019

nalimilan commented Jun 1, 2019

bkamins commented Dec 1, 2019

Allow empty keys argument in by() #1837

Allow empty keys argument in by() #1837

Comments

alecloudenback commented May 31, 2019

nalimilan commented May 31, 2019

alecloudenback commented Jun 1, 2019 • edited Loading

alecloudenback commented Jun 1, 2019

nalimilan commented Jun 1, 2019

bkamins commented Dec 1, 2019

Allow empty `keys` argument in `by()` #1837

Allow empty `keys` argument in `by()` #1837

alecloudenback commented Jun 1, 2019 •

edited

Loading