Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table in j results in an error #1473

Open
eantonya opened this issue Dec 18, 2015 · 6 comments
Open

data.table in j results in an error #1473

eantonya opened this issue Dec 18, 2015 · 6 comments
Labels
bug Low non-atomic column e.g. list columns, S4 vector columns

Comments

@eantonya
Copy link
Contributor

Not sure what's going on here, some weird bug:

dt1 = data.table(a = 1)

dt2 = data.table(b = 1:2)

dt2[, dt1, by = b]
#Error in FUN(X[[2L]], ...) : 
#  Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.

dt2[, dt1[], by = b]
#   b a
#1: 1 1
#2: 2 1
@MichaelChirico
Copy link
Member

I think these lines in format.data.table are the culprit:

do.call("cbind",lapply(x,function(col,...){
        if (!is.null(dim(col))) stop("Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.")
        if (is.list(col)) col = sapply(col, format.item)
        else col = format(char.trunc(col), justify=justify, ...) # added an else here to fix #5435
        col
    },...))

Note that x <- dt2[ , dt1, by = b] produces no error; it's only when print.data.table is called that we get this.

Not sure why dt[] solves the problem though...

Can we just switch the order of the tests? i.e. just test is.list first and only check !is.null(dim(col))) if !is.list? (not at a machine to test myself quite yet)

@MichaelChirico
Copy link
Member

@eantonya note the following:

x <- dt2 [, dt1, by = b]
y <- dt2[, dt1[], by = b]
> lapply(x,class)
$b
[1] "integer"

$dt1
[1] "data.table" "data.frame"

> lapply(y,class)
$b
[1] "integer"

$a
[1] "numeric"

We expect x to also incorporate the elements of dt1 as new columns, correct? I.e., the error seems to be that x tries to assign dt1 as a "list" column, while y properly includes the columns of dt1.


A more elaborate example is necessary.

Consider

dt3 = data.table(a = 1, c = 3)

w <- dt2[ , dt3, by = b]
z <- dt2[ , dt3[ ], by = b]

I think z produces (what I think is) the natural output:

> z
   b a c
1: 1 1 3
2: 2 1 3

But w (ignoring for now that we can't print it) seems to have been assigned incorrectly:

> lapply(w, "[")
$b
[1] 1 1 2 2

$dt3
   NA NA NA NA
1:  1  3  1  3

So perhaps the error runs a bit deeper than just in format.data.table (which is itself an easy fix).

@eantonya
Copy link
Contributor Author

@MichaelChirico completely agree that the square bracket versions' classes are the correct ones as far as my expectations go.

@MichaelChirico
Copy link
Member

Been poking around on this a bit. I think a key difference is dt1 is parsed as a name and dt1[] is read as a call.

@jangorecki
Copy link
Member

It doesn't error anymore, but not sure if the results are exactly how we should expect them to be.

dt1 = data.table(a = 1)
dt2 = data.table(b = 1:2)
dt2[, dt1, by = b]
#       b            dt1
#   <int>   <data.table>
#1:     1 <multi-column>
#2:     2 <multi-column>
dt2[, dt1[], by = b]
#       b     a
#   <int> <num>
#1:     1     1
#2:     2     1

Using AsIs class (pending PR) is quite related in such use cases.

@jangorecki jangorecki added the non-atomic column e.g. list columns, S4 vector columns label Apr 5, 2020
@JoshOBrien
Copy link
Contributor

Is the difference in the two cases related to a special route by which the symbol dt1 is found when doing dt2[, dt1, by = b] ?

Somehow, a value for the plain symbol dt1 is found when the call to [.data.table includes a by argument, but not when just doing dt2[, dt1]:

dt2[,dt1]
## Error: j (the 2nd argument inside [...]) is a single symbol but column name 'dt1' is not found. If you intended 
## to select columns using a variable in calling scope, please try DT[, ..dt1]. The .. prefix conveys one-level-up 
## similar to a file system path.
dt2[, dt1, by = b]
##    b            dt1
## 1: 1 <multi-column>
## 2: 2 <multi-column>

Seems like whatever branch of code allows the symbol dt1 to be found in the second case but not the first is what is also probably to blame for the mis-processing of data.table pointed to by that symbol.

Obviously it's a different branch than is used in cases where dt1 is part of a call, like any of the following:

dt2[, dt1[]]
##    a
## 1: 1
dt2[, dt1[], by = b]
##    b a
## 1: 1 1
## 2: 2 1
dt2[, (dt1)]
##    a
## 1: 1
dt2[, (dt1), by = b]
##    b a
## 1: 1 1
## 2: 2 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Low non-atomic column e.g. list columns, S4 vector columns
Projects
None yet
Development

No branches or pull requests

4 participants