Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-Forge #1757] Add drop to [.data.table, and "dontmove" to not put group columns first #648

Open
arunsrinivasan opened this issue Jun 8, 2014 · 7 comments

Comments

@arunsrinivasan
Copy link
Member

Submitted by: Matt Dowle; Assigned to: Nobody; R-Forge link

http://r.789695.n4.nabble.com/Syntax-ag-M-age-age-tp4313006p4313006.html

It wouldn't just build the table and extract the vector at the end, but only construct the vector in the first place; i.e., dogroup.c would need to switch on drop and not populate iby when drop=TRUE.

@mattdowle
Copy link
Member

@jangorecki
Copy link
Member

+1 for support of drop=FALSE to return data.table and not reduce to vector

library(data.table)
x = data.table(a=c(2,5))
x[, sum(a), drop=FALSE]
# could translate to
x[, .(sum(a))]

This is minor thing but increase consistency as type returned won't be dependent on the existence of by, which at least as an option (drop=FALSE) would be nice.
I would use it in another FR, will workaround for now.

@eantonya
Copy link
Contributor

@jangorecki I don't understand why in the given example drop=FALSE would be preferred to .()? Can you explain or maybe you have a better example in mind?

@jangorecki
Copy link
Member

jangorecki commented Apr 22, 2016

@eantonya When I'm just passing j argument from higher function to data.table, it may be arbitrary expression - so no much control over adding .(). I need to have data.table as a result so rbindlist will work. Example here https://github.com/Rdatatable/data.table/pull/1667/files#diff-3b7395e08ce77c2fc6f3923f6790b45eR79 (2016-09-30 link updated)

@MichaelChirico
Copy link
Member

See also #1188

@jangorecki jangorecki added this to the 1.13.0 milestone Aug 1, 2019
@jangorecki
Copy link
Member

strikes again due to losing autonaming for .N, .GRP, .I for a grand total (missing by) aggregation in #3653

@jangorecki
Copy link
Member

jangorecki commented Jan 26, 2020

I would like to suggest a use case for drop=TRUE. When keeping data in a normalized form (also know as tidy) the identity columns (not measures) which are constant, in particular cases, could safely be removed. They unnecessarily cost memory. Because they are constant their value is recycled .N times and they does not contribute to analysis of measures. They could be easily stored as metadata.

library(data.table)
d = data.table(id1=c(1L,1L,1L,2L,2L), id2=c(6L,6L,7L,7L,7L), v1=c(1,2,3,4,5))
d[id1==1L]
#   id1 id2 v1
#1:   1   6  1
#2:   1   6  2
#3:   1   7  3
d[id1==2L]
#   id1 id2 v1
#1:   2   7  4
#2:   2   7  5
d[id1==1L, drop=TRUE]
#   id2 v1
#1:   6  1
#2:   6  2
#3:   7  3
d[id1==2L, drop=TRUE]
#   v1
#1:  4
#2:  5

This may be seen as something that makes it difficult to predict what the dimensions of the answer will be but there are use cases for that. I had exactly such use case, for which I used the following helper.

drop.data.table = function(x, cols) {
  ans = data.table:::shallow(x)
  un = sapply(cols, function(col) uniqueN(x[[col]]))
  rm = names(un)[un <= 1L]
  if (length(rm)) set(ans, NULL, rm, NULL) # Rdatatable/data.table#4086
  ans
}

The tricky part is how to distinguish identity columns from measures, that seems to be quite impossible without requiring user to provide such metadata. In the helper above cols argument is exactly for that.

Also worth to note it is basically what drop=TRUE for arrays do, having a constant value for a particular dimension it drops that dimension, so we would be consistent to base R way of handling multidimensional data.

@mattdowle mattdowle modified the milestones: 1.13.1, 1.13.3 Oct 17, 2020
@mattdowle mattdowle removed this from the 1.14.1 milestone Aug 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants