-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Description
Just came across some inconsistencies in allow.cartesian
:
require(data.table) # v1.9.5, commit 1813
x = data.table(a=rep(1:2, each=2), b=10, key="a")
# a b
#1: 1 10
#2: 1 10
#3: 2 10
#4: 2 10
y = data.table(a=rep(1L, 4), b=5:6, key="a")
# a b
#1: 1 5
#2: 1 6
#3: 1 5
#4: 1 6
y[x]
# Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
# Join results in 10 rows; more than 8 = nrow(x)+nrow(i). Check for duplicate key values in i ...
y[x, nomatch=0L]
# a b i.b
#1: 1 5 10
#2: 1 6 10
#3: 1 5 10
#4: 1 6 10
#5: 1 5 10
#6: 1 6 10
#7: 1 5 10
#8: 1 6 10
?data.table
explains allow.cartesian
as:
FALSE
prevents joins that would result in more thanmax(nrow(x),nrow(i))
rows.
Both joins results in more than max(nrow(x), nrow(i))
rows.. nomatch=NA
results in 10, and nomatch=0L
results in 8. So why is the second one working fine? And why is the error message mentioning about join rows being larger than nrow(x) + nrow(i)
??
Additionally, if we are to rename allow.cartesian
as allow.i.dups
(#914), then the error should occur irrespective of the number of rows, and only depending on whether i
has duplicates on it's key columns.
skanskan