Skip to content

dplyr can't summarize this variable #2919

Closed
@jabranham

Description

I'm working with a data.frame and dplyr returns NA for all summaries for this variable.
Here's the data (from the General Social Survey). Sorry for the zip file, github won't let me upload the file directly.

test2.zip

and the R code. Note that you can change the summarize statement to anything (e.g. summarize(m=mean(russia, na.rm=TRUE)) and it'll still return NA:

library(dplyr)
test2 <- readRDS("test2.rds")

## Returns NA
test2 %>%
  summarize_at("russia", funs(m = mean(., na.rm = TRUE)))

##Returns 5.5ish
mean(test2$russia, na.rm = TRUE)

The data aren't crazy (and not all the values for "russia" are missing):

> str(test2)
'data.frame':	62466 obs. of  2 variables:
 $ year  : num  1972 1972 1972 1972 1972 ...
 $ russia: num  NA NA NA NA NA NA NA NA NA NA ...
> summary(test2$russia)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   0.00    4.00    5.00    5.52    9.00    9.00   46935 

Am I missing something really simple here?

Metadata

Assignees

No one assigned

    Labels

    reprexneeds a minimal reproducible example

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions