Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

... should exclude anything mentioned in value.var #1833

Open
MichaelChirico opened this issue Aug 31, 2016 · 3 comments
Open

... should exclude anything mentioned in value.var #1833

MichaelChirico opened this issue Aug 31, 2016 · 3 comments
Labels
reshape dcast melt

Comments

@MichaelChirico
Copy link
Member

MichaelChirico commented Aug 31, 2016

Consider:

set.seed(120)
DT <- data.table(V1 = sample(10))
DT[ , paste0("V", 2:20) := replicate(19, sample(10), simplify = FALSE)]
DT[ , paste0("X", 1:50) := replicate(50, rnorm(10), simplify = FALSE)]

DT_m <- melt(DT, measure.vars = patterns("^X"), value.name = "X")

I want to reshape DT_m back to being "wide". I want V1:20 to be as originally in DT, but to aggregate all of X1:50 by summing into a single column.

It seems the way to do this is:

dcast(DT_m, V1 + V2 + V3 + [...] + V20 ~ ., fun.aggregate = sum)
#    V1 V2          .
#1   1  5   5.452721
#2   2  9   2.855705
#3   3  2  -1.775939
#4   4  6   7.033915
#5   5  1 -10.456389
#6   6  8   9.050576
#7   7  4   4.917982
#8   8  7   1.901975
#9   9 10   3.969899
#10 10  3  -1.367950

But obviously it's not optimal to need to type all the variables on the LHS. I thought that was the point of ..., but this doesn't work:

dcast(DT_m, ... ~ ., value.var = "X", fun.aggregate = sum)

... has included X, so the returned table is just DT_m (with X renamed to .).

@franknarf1
Copy link
Contributor

I want to reshape DT_m back to being "wide". I want V1:20 to be as originally in DT, but to aggregate all of X1:50 by summing into a single column.

You can do

DT_m[, variable := NULL]
dcast(DT_m, ... ~ ., value.var = "X", fun = sum)

However it won't be "as originally", since dcast sorts by the LHS of ~. Not sure if I'm missing something here, but this result does match the direct one DT[, Reduce(+, .SD), keyby=V1:V20, .SDcols=X1:X50]

@MichaelChirico
Copy link
Member Author

Indeed you're right. However my particular application here is part of an intro to R so I'm trying to demonstrate reshaping, and Reduce-based stuff should probably be reserved for well beyond when people have gotten their feet wet.

Anyway, I don't think it's unheard of to find data in a .csv in DT_m form to start with, so that we'd need this instead of the Reduce approach.

Good suggestion on just deleting variable... may be the only way to go since I know this FR probably runs up against compatibility with reshape2::dcast.

@franknarf1
Copy link
Contributor

I agree. I wasn't suggesting the Reduce way as an alternative, just using it to verify that my result was correct (since I was seeing different numbers from those you showed in the OP).

@jangorecki jangorecki added the reshape dcast melt label Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reshape dcast melt
Projects
None yet
Development

No branches or pull requests

3 participants