You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if (any(duplicated(names(x)))) stop("x has some duplicated column name(s): ",paste(names(x)[duplicated(names(x))],collapse=","),". Please remove or rename the duplicate(s) and try again.")
10
10
if (any(duplicated(names(y)))) stop("y has some duplicated column name(s): ",paste(names(y)[duplicated(names(y))],collapse=","),". Please remove or rename the duplicate(s) and try again.")
11
11
12
-
## Determine by and rename columns of y, if by.x and by.y are supplied
13
-
if (!is.null(by.x)){
14
-
if (!is.null(by)) warning("Supplied both by and by.x and only by will be used")
15
-
else {
16
-
by<-by.x
17
-
if (length(by.x) != length(by.y)) stop("by.x and by.y must be of the same length")
18
-
setnames(y, by.y, by.x)
19
-
on.exit(setnames(y, by.x, by.y))
20
-
}
12
+
## set up 'by'/'by.x'/'by.y'
13
+
if ( (!is.null(by.x) ||!is.null(by.y)) && length(by.x)!=length(by.y) )
14
+
stop("`by.x` and `by.y` must be of same length.")
15
+
if (!missing(by) &&!missing(by.x))
16
+
warning("Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.")
17
+
if (!is.null(by.x)) {
18
+
if ( !is.character(by.x) ||!is.character(by.y))
19
+
stop("A non-empty vector of column names are required for `by.x` and `by.y`.")
20
+
if (!all(by.x%in% names(x)))
21
+
stop("Elements listed in `by.x` must be valid column names in x.")
22
+
if (!all(by.y%in% names(y)))
23
+
stop("Elements listed in `by.y` must be valid column names in y.")
24
+
by=by.x
25
+
names(by) =by.y
26
+
} else {
27
+
if (is.null(by))
28
+
by= intersect(key(x), key(y))
29
+
if (is.null(by))
30
+
by= key(x)
31
+
if (is.null(by))
32
+
stop("Can not match keys in x and y to automatically determine appropriate `by` parameter. Please set `by` value explicitly.")
33
+
if (length(by) ==0L||!is.character(by))
34
+
stop("A non-empty vector of column names for `by` is required.")
35
+
if (!all(by%in% intersect(colnames(x), colnames(y))))
36
+
stop("Elements listed in `by` must be valid column names in x and y")
37
+
by.x=by.y=by
21
38
}
22
-
23
-
## Try to infer proper value for `by`
24
-
if (is.null(by)) {
25
-
by<- intersect(key(x), key(y))
26
-
}
27
-
if (is.null(by)) {
28
-
by<- key(x)
29
-
}
30
-
if (is.null(by)) {
31
-
stop("Can not match keys in x and y to automatically determine ",
32
-
"appropriate `by` parameter. Please set `by` value explicitly.")
33
-
}
34
-
if (length(by) ==0L||!is.character(by)) {
35
-
stop("A non-empty vector of column names for `by` is required.")
36
-
}
37
-
if (!all(by%in% intersect(colnames(x), colnames(y)))) {
38
-
stop("Elements listed in `by` must be valid column names in x and y")
39
-
}
40
-
41
-
## Checks to see that keys on dt are set and are in correct order
Copy file name to clipboardExpand all lines: README.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,16 +64,16 @@
64
64
65
65
21.`setDF` also converts `list` of equal length to `data.frame` by reference now. Closes [#1132](https://github.com/Rdatatable/data.table/issues/1132).
66
66
67
-
22.`merge.data.table` now has new arguments `by.x`and `by.y`. Closes [#637](https://github.com/Rdatatable/data.table/issues/637). Thanks to @NelloBlaser.
67
+
22.`CJ` gains logical `unique` argument with default `FALSE`. If `TRUE`, unique values of vectors are automatically computed and used. This is convenient, for example, `DT[CJ(a, b, c, unique=TRUE)]` instead of doing `DT[CJ(unique(a), unique(b), unique(c))]`. Ultimately, `unique = TRUE` will be default. Closes [#1148](https://github.com/Rdatatable/data.table/issues/1148).
68
68
69
-
23.`CJ` gains logical `unique` argument with default `FALSE`. If`TRUE`, unique values of vectors are automatically computed and used. This is convenient, for example, `DT[CJ(a, b, c, unique=TRUE)]` instead of doing `DT[CJ(unique(a), unique(b), unique(c))]`. Ultimately, `unique = TRUE` will be default. Closes [#1148](https://github.com/Rdatatable/data.table/issues/1148).
69
+
23.Implemented `stringsAsFactors` argument for `fread()`. When`TRUE`, character columns are converted to factors. Default is `FALSE`. Thanks to Artem Klevtsov for filing [#501](https://github.com/Rdatatable/data.table/issues/501), and to @hmi2015 for [this SO post](http://stackoverflow.com/q/31350209/559784).
70
70
71
-
24.Implemented `stringsAsFactors` argument for `fread()`. When `TRUE`, character columns are converted to factors. Default is `FALSE`. Thanks to Artem Klevtsov for filing [#501](https://github.com/Rdatatable/data.table/issues/501), and to @hmi2015 for [this SO post](http://stackoverflow.com/q/31350209/559784).
71
+
24.`fread` gains `check.names` argument, with default value `FALSE`. When `TRUE`, it uses the base function `make.unique()`to ensure that the column names of the data.table read in are all unique. Thanks to David Arenburg for filing [#1027](https://github.com/Rdatatable/data.table/issues/1027).
72
72
73
-
25.`fread` gains `check.names` argument, with default value `FALSE`. When `TRUE`, it uses the base function `make.unique()` to ensure that the column names of the data.table read in are all unique. Thanks to David Arenburg for filing [#1027](https://github.com/Rdatatable/data.table/issues/1027).
73
+
25. data.tables can join now without having to set keys by using the new `on` argument. For example: `DT1[DT2, on=c(x = "y")]` would join column 'y' of `DT2` with 'x' of `DT1`. `DT1[DT2, on="y"]` would join on column 'y' on both data.tables. Closes [#1130](https://github.com/Rdatatable/data.table/issues/1130) partly.
74
+
75
+
22.`merge.data.table` gains arguments `by.x` and `by.y`. Closes [#637](https://github.com/Rdatatable/data.table/issues/637) and [#1130](https://github.com/Rdatatable/data.table/issues/1130). No copies are made even when the specified columns aren't key columns in data.tables, and therefore much more fast and memory efficient. Thanks to @blasern for the initial PRs.
74
76
75
-
26. data.tables can join now without having to set keys by using the new `on` argument. For example: `DT1[DT2, on=c(x = "y")]` would join column 'y' of `DT2` with 'x' of `DT1`. `DT1[DT2, on="y"]` would join on column 'y' on both data.tables. Closes [#1130](https://github.com/Rdatatable/data.table/issues/1130) partly.
76
-
77
77
#### BUG FIXES
78
78
79
79
1.`if (TRUE) DT[,LHS:=RHS]` no longer prints, [#869](https://github.com/Rdatatable/data.table/issues/869) and [#1122](https://github.com/Rdatatable/data.table/issues/1122). Tests added. To get this to work we've had to live with one downside: if a `:=` is used inside a function with no `DT[]` before the end of the function, then the next time `DT` or `print(DT)` is typed at the prompt, nothing will be printed. A repeated `DT` or `print(DT)` will print. To avoid this: include a `DT[]` after the last `:=` in your function. If that is not possible (e.g., it's not a function you can change) then `DT[]` at the prompt is guaranteed to print. As before, adding an extra `[]` on the end of a `:=` query is a recommended idiom to update and then print; e.g. `> DT[,foo:=3L][]`. Thanks to Jureiss and Jan Gorecki for reporting.
In versions \code{< v1.9.6}, if the specified columns in \code{by} was not the key (or head of the key) of \code{x} or \code{y}, then a \code{\link{copy}} is first rekeyed prior to performing the merge. This was less performant and memory inefficient.
77
+
78
+
In version \code{v1.9.4} secondary keys was implemented. In \code{v1.9.6}, the concept of secondary keys has been
79
+
extended to \code{merge}. No deep copies are made anymore and therefore very performant and memory efficient. Also there is better control for providing the columns to merge on with the help of newly implemented \code{by.x} and \code{by.y} arguments.
80
+
81
+
For a more \code{data.table}-centric way of merging two \code{data.table}s, see \code{\link{[.data.table}}; e.g., \code{x[y, ...]}. See FAQ 1.12 for a detailed comparison of \code{merge} and \code{x[y, ...]}.
82
+
83
+
Merges on numeric columns: Columns of numeric types (i.e., double) have their last two bytes rounded off while computing order, by defalult, to avoid any unexpected behaviour due to limitations in representing floating point numbers precisely. For large numbers (integers > 2^31), we recommend using \code{bit64::integer64}. Have a look at \code{\link{setNumericRounding}} to learn more.
0 commit comments