Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reshaping forces to reorder the columns. #2009

Open
skanskan opened this issue Jan 31, 2017 · 1 comment
Open

Reshaping forces to reorder the columns. #2009

skanskan opened this issue Jan 31, 2017 · 1 comment

Comments

@skanskan
Copy link

skanskan commented Jan 31, 2017

I've already asked this question but not received a working answer

https://stackoverflow.com/questions/41163500/r-transform-from-wide-to-long-without-sorting-columns/41163691#41163691

I would like to convert a data.table from long to wide format.
For example this simple data.table.

library(data.table)
mydata <- data.table(ID=1:5, ZA_1=1:5, ZA_2=5:1,
BB_1=rep(3,5),BB_2=rep(6,5),CC_7=6:2)

There are some variables that will remain as is (here only ID) and some that will be transformed to long format (here all other variables, all ending with _1, _2 o _7)

The method I was using to do it is:

idvars =  grep("_[1-7]$",names(mydata) , invert = TRUE)
temp <- melt(mydata, id.vars = idvars)   # long
temp[, `:=`(var = sub("_[1-7]$", '', variable), measure = sub('.*_', '', variable), variable = NULL)]  #splits
dcast( temp,   ... ~ var, value.var='value' ) 

But it has a small problem, it reorders the columns alphabetically even if you don't want to.

Wouldn't it be better if dcast just keeps the original order?

Somebody there, mtoto, suggested to do it using factors, splitting the data and using melt.

measurevars <- names(mydata)[grepl("[1-9]$",names(mydata))]
groups <- gsub("
[1-9]$","",measurevars)
split_on <- factor(groups, levels = unique(groups))
measure_list <- split(measurevars, split_on)
measurenames <- unique(groups)
melt(setDT(mydata), measure = measure_list,
value.name = measurenames, variable.name = "measure")

It has the advantage of using only melt, and not dcast.
But it only works if every letter has the same number of variation numbers. If not, we get the error

Warning message:
In split.default(measurevars, split_on) :
data length is not a multiple of split variable

and many cases are just dropped. I guess is because this method relies on building a list and expects all the elements to have the same length.
Another problem with this method is it doesnt use the right names if we were using longer numbers

mydata <- data.frame(ID=1:5, ZA_2001=1:5, ZA_2002=5:1,
BB_2001=rep(3,5),BB_2002=rep(6,5), CC_2007=rep(6,5))
@skanskan
Copy link
Author

skanskan commented Jan 31, 2017

Finally I've found the way, I wasn't able to do it specifying a list for the measure, but I got modifying my initial solution:

mydata <- data.table(ID=1:5, ZA_2001=1:5, ZA_2002=5:1,
BB_2001=rep(3,5),BB_2002=rep(6,5),CC_2007=6:2)

idvars =  grep("_20[0-9][0-9]$",names(mydata) , invert = TRUE)
temp <- melt(mydata, id.vars = idvars)  
temp[, `:=`(var = sub("_20[0-9][0-9]$", '', variable), 
measure = sub('.*_', '', variable), variable = NULL)]  
temp[,var:=factor(var, levels=unique(var))]
dcast( temp,   ... ~ var, value.var='value' )

And it gives you the proper measure values.
Anyway this solution needs a lot of memory.

The trick was converting the var variable to factor specifying the order I want with levels, I don't know we need to do it that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants