Skip to content

dcast not limited to casting one column #739

@arunsrinivasan

Description

@arunsrinivasan

Here's a minimal reproducible example to illustrate the issue (thanks to Steve Miller for the email exchanges):

require(reshape2)
require(data.table)
dt <- data.table(aa=c(1,1,1,2,2), bb=letters[1:5], cc=11:15, dd=letters[20:24])
#    aa bb cc dd
#1:  1  a 11  t
#2:  1  b 12  u
#3:  1  c 13  v
#4:  2  d 14  w
#5:  2  e 15  x

Now, what we'd like to do is to cast with the formula aa ~ bb and the rest of the columns should be cast wide. The issue is not that we've to use melt on the data set, but that melt will coerce the integer type to character.

dcast.data.table(melt(dt, id=1:2), aa ~ bb+variable, value.var="value")
#    aa a_cc a_dd b_cc b_dd c_cc c_dd d_cc d_dd e_cc e_dd
#1:  1   11    t   12    u   13    v   NA   NA   NA   NA
#2:  2   NA   NA   NA   NA   NA   NA   14    w   15    x

That could be quite frustrating 1) on large data, it could take considerable amount of time for character conversion, especially when there are many unique values. 2) And after casting one has to convert the required columns back to original type, which is very much unnecessary.

melt and dcast in data.table are (re)implemented with data dimensions large enough where even these type conversions could be costly, in mind. Not to mention the annoyance in having to get the types back.

What should be possible for data.tables is:

dcast.data.table(dt, aa ~ bb, value.var=c("cc", "dd"))

As simple as that. Probably there are some concerns that comes up later, but we'll address them as and when.

Needs to be done carefully along with #716.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions