-
-
Notifications
You must be signed in to change notification settings - Fork 7
ard_compare draft #437 #527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…overlapping key values, enabling mismatches to surface through the full join comparison.
ddsjoberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @malanbos for this!
For now, let's skip env handling. I think it's more complex than we need. We can revisit in the future. Let me know if you'd like to discuss
| #' | ||
| #' ard_compare(ard_base, ard_modified)$stat | ||
| #' | ||
| ard_compare <- function(x, y, key_columns = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of our functions that begin with ard_*() create new ARDs. Let's name the function compare_ard().
| #' | ||
| #' ard_compare(ard_base, ard_modified)$stat | ||
| #' | ||
| ard_compare <- function(x, y, key_columns = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the default value keys = c(all_ard_groups(), all_ard_variables(), any_of(c("variable", "variable_level", "stat_name"))).
| #' | ||
| #' ard_compare(ard_base, ard_modified)$stat | ||
| #' | ||
| ard_compare <- function(x, y, key_columns = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add an argument of the columns to compare, compare = any_of(c("stat_label", "stat", "stat_fmt")).
(Is there anything else we should compare by default?)
| check_class(x, cls = "card") | ||
| check_class(y, cls = "card") | ||
|
|
||
| .validate_environment_metadata(x, y, call = get_cli_abort_call()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the env checking for now. It's quite complicated.
|
|
||
| .validate_environment_metadata(x, y, call = get_cli_abort_call()) | ||
|
|
||
| primary_x <- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can evaluate the keys and compare columns with
keys <- .process_keys_arg(x, y, keys = {{ keys }})
compare <- .process_compare_arg(x, y, compare = {{ compare }})
# outside the function we define these functions
.process_keys_arg <- function(x, y, keys) {
keys <- intersect(cards_select({{ keys }}, data = x), cards_select({{ keys }}, data = y))
.check_not_empty(keys)
cli::cli_inform("The comparison {.arg keys} are {.emph {.val {keys}}}.")
keys
}
.process_compare_arg <- function(x, y, compare) {
# add checks and return evaluated compare vector...
}
.check_not_empty <- function(x, arg_name = rlang::caller_arg(x)) {
if (rlang::is_empty()) {
cli::cli_abort("The {.arg {arg_name}} argument cannot be empty.")
}
invisible(x)
}| fmt_column <- if ("fmt_fun" %in% names(x) || "fmt_fun" %in% names(y)) { | ||
| "fmt_fun" | ||
| } else if ("fmt_fn" %in% names(x) || "fmt_fn" %in% names(y)) { | ||
| "fmt_fn" | ||
| } else { | ||
| "fmt_fun" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just use the columns provided in the compare argument to assess which comparisons to make. We can compare all columns in the same way.
| y_selected <- .ensure_column(y_selected, column) | ||
| } | ||
|
|
||
| # .check_rows_not_in_x_y(x_selected, y_selected, key_columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can initialize an empty list of results.
results <- rlang::rep_named(c("rows_in_x_not_y", "rows_in_y_not_x"), list(NULL))
results[["compare"]] <- rlang::rep_named(compare, list(NULL))In this example the "compare" element will also be a named list. The names are the columns that we compare.
We could then follow this up with calls to functions that will populate these parts of the list, e.g.
results[["rows_in_x_not_y"]] <- .compare_rows(x, y) # returns the results of the anti join of x and y on the key columns
results[["rows_in_y_not_x"]] <- .compare_rows(y, x) # same as above, but reversed
results[["compare"]] <- .compare_columns(x, y, compare) # loop through the columns we will compare and return a named list of data frames where each data frame contains the rows that are not equal between x and y. The data frame will have the key columns and the two columns compared (from x and y).|
|
||
| names(mismatch_list) <- names(comparison_targets) | ||
|
|
||
| mismatch_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lastly, the function will return the results object, and add a class onto this list.
After we get this settled, we will write a print method for class to make it nice.
What changes are proposed in this pull request?
New function to compare two ARDs: ard_compare
Fixes#437 , @malanbos
Comes with top-level ard_compare function, as well as a script with ard_compare_helpers.R, and check_environment.R to modularize it.
Pre-review Checklist (if item does not apply, mark is as complete)
usethis::pr_merge_main()devtools::test_coverage()Reviewer Checklist (if item does not apply, mark is as complete)
pkgdown::build_site(). Check the R console for errors, and review the rendered website.devtools::test_coverage()When the branch is ready to be merged:
NEWS.mdwith the changes from this pull request under the heading "# cards (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (seeNEWS.mdfor examples).Optional Reverse Dependency Checks:
Install
checkedwithpak::pak("Genentech/checked")orpak::pak("checked")