Skip to content

merge/inner join not symmetric #743

@arcosdium

Description

@arcosdium

I encounted a case where merge is not symmetric, using version 1.9.2 as well as 1.9.3. I have two data.tables Filter (read from a file with fread) und KOut (contructed by serveral data.table operations including merge und setcolorder)

tables() gives

NAME NROW MB COLS KEY
KOut 3,741,559 143 x1,x2,x3,x4,x5,value,y1,y2
Filter 17,172 1 x1,x2,x3,x4,x5

merging the tables results in a diffrent number of rows, depending on the order of the arguments

> nrow(merge(KOut, Filter,by=names(Filter)))
[1] 2936586
> nrow(merge(Filter,KOut,by=names(Filter)))
[1] 2555944

For comparison the merge as data.frames

> nrow(merge(data.frame(KOut),data.frame(Filter),by=names(Filter)))
[1] 2936586
> nrow(merge(data.frame(Filter),data.frame(KOut),by=names(Filter)))
[1] 2936586

I supected a bug in merge(Filter,KOut,by=names(Filter)), so I followed the code of merge till the essential statement:

y[xkey,nomatch=ifelse(all.x,NA,0),allow.cartesian=allow.cartesian]   
# same as y[xkey,nomatch=0]

Here tables() gives

NAME NROW MB COLS KEY
xkey 17,172 1 x1,x2,x3,x4,x5 x1,x2,x3,x4,x5
y 3,741,559 143 x1,x2,x3,x4,x5,value,y1,y2 x1,x2,x3,x4,x5

Some joins are:

y[xkey, nomatch=0]
x1 x2 x3 x4 x5 value y1 y2
1: 1 1 1 1 0 1.20693421 57 1
2: 1 1 1 1 0 -0.36395694 57 2
3: 1 1 1 1 0 -1.91636684 57 3
4: 1 1 1 1 0 -0.38118758 57 4
5: 1 1 1 1 0 0.84860626 57 5
---
2555940: 3 1 1 21 2 0.49530287 11697 4400
2555941: 3 1 1 21 2 -2.03795092 11697 4401
2555942: 3 1 1 21 2 1.28866177 11697 4402
2555943: 3 1 1 21 2 -2.02472550 11697 4403
2555944: 3 1 1 21 2 0.01210244 11697 4404

xkey[y, nomatch=0]
x1 x2 x3 x4 x5 value y1 y2
1: 1 0 1 1 0 -0.693537811 57 70578
2: 1 0 1 1 0 0.585084541 57 70579
3: 1 0 1 1 0 0.384647254 57 70580
4: 1 0 1 1 0 -1.011123900 57 70581
5: 1 0 1 1 0 -0.008338746 57 70582
---
2936582: 3 1 1 21 2 0.495302870 11697 4400
2936583: 3 1 1 21 2 -2.037950918 11697 4401
2936584: 3 1 1 21 2 1.288661770 11697 4402
2936585: 3 1 1 21 2 -2.024725499 11697 4403
2936586: 3 1 1 21 2 0.012102439 11697 4404

y[xkey]
x1 x2 x3 x4 x5 value y1 y2
1: 1 0 1 1 0 NA NA NA
2: 1 0 1 1 10 NA NA NA
3: 1 0 1 1 20 NA NA NA
4: 1 0 1 1 30 NA NA NA
5: 1 0 1 1 40 NA NA NA
---
2573000: 3 1 3 21 56 NA NA NA
2573001: 3 1 3 21 57 NA NA NA
2573002: 3 1 3 21 58 NA NA NA
2573003: 3 1 3 21 59 NA NA NA
2573004: 3 1 3 21 60 NA NA NA

Remarkable is the first line of y[xkey], which says the key combination (1,0,1,1,0) in xkey has no match in y. But the first line of xkey[y, nomatch=0] shows that there is in fact such a line in y!

Any ideas?

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions