-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Here's a minimal reproducible example to illustrate the issue (thanks to Steve Miller for the email exchanges):
require(reshape2)
require(data.table)
dt <- data.table(aa=c(1,1,1,2,2), bb=letters[1:5], cc=11:15, dd=letters[20:24])
# aa bb cc dd
#1: 1 a 11 t
#2: 1 b 12 u
#3: 1 c 13 v
#4: 2 d 14 w
#5: 2 e 15 x
Now, what we'd like to do is to cast with the formula aa ~ bb
and the rest of the columns should be cast wide. The issue is not that we've to use melt
on the data set, but that melt
will coerce the integer type to character.
dcast.data.table(melt(dt, id=1:2), aa ~ bb+variable, value.var="value")
# aa a_cc a_dd b_cc b_dd c_cc c_dd d_cc d_dd e_cc e_dd
#1: 1 11 t 12 u 13 v NA NA NA NA
#2: 2 NA NA NA NA NA NA 14 w 15 x
That could be quite frustrating 1) on large data, it could take considerable amount of time for character conversion, especially when there are many unique values. 2) And after casting one has to convert the required columns back to original type, which is very much unnecessary.
melt
and dcast
in data.table are (re)implemented with data dimensions large enough where even these type conversions could be costly, in mind. Not to mention the annoyance in having to get the types back.
What should be possible for data.table
s is:
dcast.data.table(dt, aa ~ bb, value.var=c("cc", "dd"))
As simple as that. Probably there are some concerns that comes up later, but we'll address them as and when.
Needs to be done carefully along with #716.