Closed
Description
openedon Dec 20, 2017
When defining a labelled object, the label-value vector needs to be sorted by its numerical values, or else Stata cannot correctly read the label:
library(haven)
suppressPackageStartupMessages(library(dplyr))
labs <- c(Democrat = 1, Republican = 2, Independent = 3) # named vector
lbl <- tibble(pid3 = labelled(1L:3L, labs)) # both sorted
lbl_num <- tibble(pid3 = labelled(3L:1L, labs)) # values not sorted
lbl_lab <- tibble(pid3 = labelled(3L:1L, labs[c(1, 3, 2)])) # labels not sorted
write_dta(lbl, "foo.dta")
write_dta(lbl_num, "foo_num.dta")
write_dta(lbl_lab, "foo_lab.dta") # this one gets misread in Stata
When I open foo_lab.dta
in Stata, I get
. tab pid3
pid3 | Freq. Percent Cum.
------------+-----------------------------------
Democrat | 1 33.33 33.33
2 | 1 33.33 66.67
Independent | 1 33.33 100.00
------------+-----------------------------------
Total | 3 100.00
So the label for 2
dropped out. Is this a bug?
This error does not occur when the dta file is read into R by read_dta
and analyze it there; only when I open it in Stata.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment