Closed
Description
openedon Oct 1, 2018
When importing string variables via read_spss missing labels (and sometimes also value labels) seem to behave strangely depending on the width of the variables (at least in educational large scale assessments missing and value labels for string variables are not that uncommon). I used the current GitHub version of haven.
The sav-file I attached looks like this (using SPSS 22.0.0.1 or SPSS 25)
Importing results in the following attributes on variable level:
rawDat <- haven::read_spss(file = "N:/spss/test1.sav", user_na = TRUE)
lapply(rawDat, attributes)
#> $v1
#> $v1$na_values
#> [1] 99
#>
#> $v1$class
#> [1] "haven_labelled_spss" "haven_labelled"
#>
#> $v1$format.spss
#> [1] "F8.2"
#>
#> $v1$labels
#> one
#> 1
#>
#>
#> $v2
#> $v2$na_values
#> [1] NA
#>
#> $v2$class
#> [1] "haven_labelled_spss" "haven_labelled"
#>
#> $v2$format.spss
#> [1] "A8"
#>
#> $v2$labels
#> one
#> "1"
#>
#>
#> $v3
#> $v3$format.spss
#> [1] "A9"
#>
#> $v3$class
#> [1] "haven_labelled"
#>
#> $v3$labels
#> one
#> "1"
#>
#>
#> $v4
#> $v4$format.spss
#> [1] "A21"
Created on 2018-10-01 by the reprex package (v0.2.1)
When writing to sav missing labels for string variables are also dropped:
# set up data frame
df <- data.frame(v1 = c(1, 99), v2 = c("aa", "99"), stringsAsFactors = FALSE)
attributes(df$v1) <- list(na_values = 99, class = c("haven_labelled_spss", "haven_labelled"), format.spss = "F8.2", labels = c(one = 1))
attributes(df$v2) <- list(na_values = "99", class = c("haven_labelled_spss", "haven_labelled"), format.spss = "A2", labels = c(sth = "aa"))
# write sav
haven::write_sav(df, path = "N:/spss/test2.sav")
# read sav
spssDF <- haven::read_spss(file = "N:/spss/test2.sav", user_na = TRUE)
lapply(spssDF, attributes)
#> $v1
#> $v1$na_values
#> [1] 99
#>
#> $v1$class
#> [1] "haven_labelled_spss" "haven_labelled"
#>
#> $v1$format.spss
#> [1] "F8.2"
#>
#> $v1$labels
#> one
#> 1
#>
#>
#> $v2
#> $v2$format.spss
#> [1] "A2"
#>
#> $v2$class
#> [1] "haven_labelled"
#>
#> $v2$labels
#> sth
#> "aa"
Created on 2018-10-01 by the reprex package (v0.2.1)
And the spss variable view looks like this:
Is there any way to import missing and value labels consistently from sav files to R?
Thank You!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment