-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
I encounted a case where merge is not symmetric, using version 1.9.2 as well as 1.9.3. I have two data.tables Filter (read from a file with fread) und KOut (contructed by serveral data.table operations including merge und setcolorder)
tables()
gives
NAME | NROW | MB | COLS | KEY |
---|---|---|---|---|
KOut | 3,741,559 | 143 | x1,x2,x3,x4,x5,value,y1,y2 | |
Filter | 17,172 | 1 | x1,x2,x3,x4,x5 |
merging the tables results in a diffrent number of rows, depending on the order of the arguments
> nrow(merge(KOut, Filter,by=names(Filter)))
[1] 2936586
> nrow(merge(Filter,KOut,by=names(Filter)))
[1] 2555944
For comparison the merge as data.frames
> nrow(merge(data.frame(KOut),data.frame(Filter),by=names(Filter)))
[1] 2936586
> nrow(merge(data.frame(Filter),data.frame(KOut),by=names(Filter)))
[1] 2936586
I supected a bug in merge(Filter,KOut,by=names(Filter))
, so I followed the code of merge till the essential statement:
y[xkey,nomatch=ifelse(all.x,NA,0),allow.cartesian=allow.cartesian]
# same as y[xkey,nomatch=0]
Here tables()
gives
NAME | NROW | MB | COLS | KEY |
---|---|---|---|---|
xkey | 17,172 | 1 | x1,x2,x3,x4,x5 | x1,x2,x3,x4,x5 |
y | 3,741,559 | 143 | x1,x2,x3,x4,x5,value,y1,y2 | x1,x2,x3,x4,x5 |
Some joins are:
y[xkey, nomatch=0]
x1 x2 x3 x4 x5 value y1 y2
1: 1 1 1 1 0 1.20693421 57 1
2: 1 1 1 1 0 -0.36395694 57 2
3: 1 1 1 1 0 -1.91636684 57 3
4: 1 1 1 1 0 -0.38118758 57 4
5: 1 1 1 1 0 0.84860626 57 5
---
2555940: 3 1 1 21 2 0.49530287 11697 4400
2555941: 3 1 1 21 2 -2.03795092 11697 4401
2555942: 3 1 1 21 2 1.28866177 11697 4402
2555943: 3 1 1 21 2 -2.02472550 11697 4403
2555944: 3 1 1 21 2 0.01210244 11697 4404
xkey[y, nomatch=0]
x1 x2 x3 x4 x5 value y1 y2
1: 1 0 1 1 0 -0.693537811 57 70578
2: 1 0 1 1 0 0.585084541 57 70579
3: 1 0 1 1 0 0.384647254 57 70580
4: 1 0 1 1 0 -1.011123900 57 70581
5: 1 0 1 1 0 -0.008338746 57 70582
---
2936582: 3 1 1 21 2 0.495302870 11697 4400
2936583: 3 1 1 21 2 -2.037950918 11697 4401
2936584: 3 1 1 21 2 1.288661770 11697 4402
2936585: 3 1 1 21 2 -2.024725499 11697 4403
2936586: 3 1 1 21 2 0.012102439 11697 4404
y[xkey]
x1 x2 x3 x4 x5 value y1 y2
1: 1 0 1 1 0 NA NA NA
2: 1 0 1 1 10 NA NA NA
3: 1 0 1 1 20 NA NA NA
4: 1 0 1 1 30 NA NA NA
5: 1 0 1 1 40 NA NA NA
---
2573000: 3 1 3 21 56 NA NA NA
2573001: 3 1 3 21 57 NA NA NA
2573002: 3 1 3 21 58 NA NA NA
2573003: 3 1 3 21 59 NA NA NA
2573004: 3 1 3 21 60 NA NA NA
Remarkable is the first line of y[xkey], which says the key combination (1,0,1,1,0) in xkey has no match in y. But the first line of xkey[y, nomatch=0] shows that there is in fact such a line in y!
Any ideas?