Skip to content

Revisit allow.cartesian #1123

@arunsrinivasan

Description

@arunsrinivasan

Just came across some inconsistencies in allow.cartesian:

require(data.table) # v1.9.5, commit 1813
x = data.table(a=rep(1:2, each=2), b=10, key="a")
#    a  b
#1: 1 10
#2: 1 10
#3: 2 10
#4: 2 10
y = data.table(a=rep(1L, 4), b=5:6, key="a")
#    a b
#1: 1 5
#2: 1 6
#3: 1 5
#4: 1 6

y[x]
# Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  : 
# Join results in 10 rows; more than 8 = nrow(x)+nrow(i). Check for duplicate key values in i ...

y[x, nomatch=0L]
#    a b i.b
#1: 1 5  10
#2: 1 6  10
#3: 1 5  10
#4: 1 6  10
#5: 1 5  10
#6: 1 6  10
#7: 1 5  10
#8: 1 6  10

?data.table explains allow.cartesian as:

FALSE prevents joins that would result in more than max(nrow(x),nrow(i)) rows.

Both joins results in more than max(nrow(x), nrow(i)) rows.. nomatch=NA results in 10, and nomatch=0L results in 8. So why is the second one working fine? And why is the error message mentioning about join rows being larger than nrow(x) + nrow(i)??

Additionally, if we are to rename allow.cartesian as allow.i.dups (#914), then the error should occur irrespective of the number of rows, and only depending on whether i has duplicates on it's key columns.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions