Skip to content

Commit 27c3bd8

Browse files
authored
by = .EACHI key fix 4603 (#4917)
1 parent ec1259a commit 27c3bd8

File tree

3 files changed

+26
-1
lines changed

3 files changed

+26
-1
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212

1313
## BUG FIXES
1414

15+
1. `by=.EACHI` when `i` is keyed but `on=` different columns than `i`'s key could create an invalidly keyed result, [#4603](https://github.com/Rdatatable/data.table/issues/4603) [#4911](https://github.com/Rdatatable/data.table/issues/4911). Thanks to @myoung3 and @adamaltmejd for reporting, and @ColeMiller1 for the PR. An invalid key is where a `data.table` is marked as sorted by the key columns but the data is not sorted by those columns, leading to incorrect results from subsequent queries.
16+
1517
## NOTES
1618

1719
1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :

R/data.table.R

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1385,7 +1385,8 @@ replace_dot_alias = function(e) {
13851385
byval = i
13861386
bynames = if (missing(on)) head(key(x),length(leftcols)) else names(on)
13871387
allbyvars = NULL
1388-
bysameorder = haskey(i) || (is.sorted(f__) && ((roll == FALSE) || length(f__) == 1L)) # Fix for #1010
1388+
bysameorder = (haskey(i) && identical(leftcols, chmatch(head(key(i),length(leftcols)), names(i)))) || # leftcols leading subset of key(i); see #4917
1389+
(roll==FALSE && is.sorted(f__)) # roll==FALSE is fix for #1010
13891390
## 'av' correct here ?? *** TO DO ***
13901391
xjisvars = intersect(av, names_x[rightcols]) # no "x." for xvars.
13911392
# if 'get' is in 'av' use all cols in 'i', fix for bug #34

inst/tests/tests.Rraw

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17273,3 +17273,25 @@ if (test_bit64) {
1727317273
test(2164.3, d[, mean(b, na.rm=TRUE), by=a], data.table(a=INT(1,2), V1=c(2.5, 4)))
1727417274
}
1727517275

17276+
# invalid key when by=.EACHI, haskey(i) but on= non-leading-subset of i's key, #4603 #4911
17277+
X = data.table(id = c(6456372L, 6456372L, 6456372L, 6456372L,6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L),
17278+
id_round = c(197801L, 199405L, 199501L, 197901L, 197905L, 198001L, 198005L, 198101L, 198105L, 198201L, 198205L, 198301L, 198305L, 198401L),
17279+
field = c(NA, NA, NA, "medicine", "medicine", "medicine", "medicine", "medicine", "medicine", "medicine", "medicine", "medicine", "medicine", "medicine"),
17280+
key = "id")
17281+
Y = data.table(id = c(6456372L, 6456345L, 6456356L),
17282+
id_round = c(197705L, 197905L, 201705L),
17283+
field = c("medicine", "teaching", "health"),
17284+
prio = c(6L, 1L, 10L),
17285+
key = c("id_round", "id", "prio", "field" ))
17286+
test(2165.1, X[Y, on = .(id, id_round > id_round, field), .(x.id_round[1], i.id_round[1]), by=.EACHI][id==6456372L],
17287+
data.table(id=6456372L, id_round=197705L, field='medicine', V1=197901L, V2=197705L))
17288+
# Y$id_round happens to be sorted, so in 2165.2 we test Y$field which is not sorted
17289+
test(2165.2, X[Y, on="field", .(x.id_round[1]), by=.EACHI][field=="health"],
17290+
data.table(field="health", V1=NA_integer_))
17291+
# a minimal example too ...
17292+
X = data.table(A=c(4L,2L,3L), B=1:3, key="A")
17293+
Y = data.table(A=2:1, B=2:3, key=c("B","A"))
17294+
test(2165.3, X[Y], data.table(A=2:3, B=2:3, i.A=2:1, key="A")) # keyed
17295+
test(2165.4, X[Y, on=.(A)], data.table(A=2:1, B=c(2L,NA), i.B=2:3)) # no key
17296+
test(2165.5, X[Y, on=.(A), x.B, by=.EACHI], data.table(A=2:1, x.B=c(2L,NA))) # no key
17297+

0 commit comments

Comments
 (0)