Skip to content

allow.cartesian when i has no duplicate values #742

@nigmastar

Description

@nigmastar

Please consider the following:

> dt <- data.table(id=rep(letters[1:2], 2), var = rnorm(4), key="id")
> dt
   id       var
1:  a 0.9609685
2:  a 0.1432707
3:  b 1.1276582
4:  b 0.8051821

> dt[letters[1:3], list(var)]
Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x),  : 
  Join results in 5 rows; more than 4 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. [...]

> dt[letters[1:3], list(var), by=.EACHI]
   id       var
1:  a 0.9609685
2:  a 0.1432707
3:  b 1.1276582
4:  b 0.8051821
5:  c        NA

The second join results in 5 rows too, shouldn't both joins above be consistent? (Maybe both like the second)

I also wander, the concept behind the implementation of allow.cartesian is simply 1) output rows has not to be more than max(nrow(x),nrow(i)) or 2) to avoid duplicates in key values of i?

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
# data.table installed today from github

Metadata

Metadata

Labels

HighbugjoinsUse label:"non-equi joins" for rolling, overlapping, and non-equi joins

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions