-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Join without merge #240
Join without merge #240
Conversation
Conflicts: R/step-colorder.R tests/testthat/test-step-join.R
R/step-colorder.R
Outdated
|
||
step_call(parent, | ||
if (is.integer(col_order)) { | ||
if (all(col_order == seq_along(col_order))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should use identical()
since ==
is vectorised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe move the check until after handling the different types? i.e. if (identical(vars, x$vars)) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check doesn't work like this as the data.table can have duplicate names. And this actually happens in merge()
:
library(data.table)
merge(
data.table(x = 1, y = 2),
data.table(x = 1, y = 3, y.x = 4),
by = "x"
)
#> Warning in merge.data.table(data.table(x = 1, y = 2), data.table(x = 1, : column
#> names 'y.x' are duplicated in the result
#> x y.x y.y y.x
#> 1: 1 2 3 4
Created on 2021-05-11 by the reprex package (v2.0.0)
And this then also happens in the new implementation of full_join()
Co-authored-by: Hadley Wickham <h.wickham@gmail.com>
Co-authored-by: Hadley Wickham <h.wickham@gmail.com>
Co-authored-by: Hadley Wickham <h.wickham@gmail.com>
switch(x$style, | ||
full = call2("merge", lhs, rhs, all = TRUE, by.x = x$on$x, by.y = x$on$y, allow.cartesian = TRUE), | ||
left = call2("[", lhs, rhs, on = on, allow.cartesian = TRUE), | ||
inner = call2("[", lhs, rhs, on = on, nomatch = NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgirlich does this deliberately not include allow.cartesian = TRUE
? Or was it omitted by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this was by accident.
The main goal was to ensure correct handling of a named
by
in the joins and to ensure correct column names also in edge cases. Now,left/right/inner_join()
always use[.data.table()
. Unfortunately, this makes the translation ofleft_join()
longer as it now usually requires calls to both,setcolorder()
andsetnames()
.Could make use of
step_setnames()
added in PR #241.Relates to #239. Fixes #243 and fixes #222 and fixes #198.