Skip to content

Conversation

@malanbos
Copy link
Collaborator

@malanbos malanbos commented Dec 1, 2025

What changes are proposed in this pull request?

New function to compare two ARDs: ard_compare

Fixes#437 , @malanbos

Comes with top-level ard_compare function, as well as a script with ard_compare_helpers.R, and check_environment.R to modularize it.


Pre-review Checklist (if item does not apply, mark is as complete)

  • All GitHub Action workflows pass with a ✅
  • PR branch has pulled the most recent updates from master branch: usethis::pr_merge_main()
  • If a bug was fixed, a unit test was added.
  • Code coverage is suitable for any new functions/features (generally, 100% coverage for new code): devtools::test_coverage()
  • Request a reviewer

Reviewer Checklist (if item does not apply, mark is as complete)

  • If a bug was fixed, a unit test was added.
  • Run pkgdown::build_site(). Check the R console for errors, and review the rendered website.
  • Code coverage is suitable for any new functions/features: devtools::test_coverage()

When the branch is ready to be merged:

  • Update NEWS.md with the changes from this pull request under the heading "# cards (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (see NEWS.md for examples).
  • All GitHub Action workflows pass with a ✅
  • Approve Pull Request
  • Merge the PR. Please use "Squash and merge" or "Rebase and merge".

Optional Reverse Dependency Checks:

Install checked with pak::pak("Genentech/checked") or pak::pak("checked")

# Check dev versions of `cardx`, `gtsummary`, and `tfrmt` which are in the `ddsjoberg` R Universe
Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2L, repos = c('https://ddsjoberg.r-universe.dev', 'https://cloud.r-project.org'))"

# Check CRAN reverse dependencies but run tests skipped on CRAN
Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2, repos = 'https://cloud.r-project.org')"

# Check CRAN reverse dependencies in a CRAN-like environment
Rscript -e "options(checked.check_envvars = c(NOT_CRAN = FALSE), checked.check_build_args = '--as-cran'); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2, repos = 'https://cloud.r-project.org')"

@malanbos malanbos requested a review from ddsjoberg December 1, 2025 15:24
Copy link
Collaborator

@ddsjoberg ddsjoberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @malanbos for this!

For now, let's skip env handling. I think it's more complex than we need. We can revisit in the future. Let me know if you'd like to discuss

#'
#' ard_compare(ard_base, ard_modified)$stat
#'
ard_compare <- function(x, y, key_columns = NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of our functions that begin with ard_*() create new ARDs. Let's name the function compare_ard().

#'
#' ard_compare(ard_base, ard_modified)$stat
#'
ard_compare <- function(x, y, key_columns = NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the default value keys = c(all_ard_groups(), all_ard_variables(), any_of(c("variable", "variable_level", "stat_name"))).

#'
#' ard_compare(ard_base, ard_modified)$stat
#'
ard_compare <- function(x, y, key_columns = NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an argument of the columns to compare, compare = any_of(c("stat_label", "stat", "stat_fmt")).

(Is there anything else we should compare by default?)

check_class(x, cls = "card")
check_class(y, cls = "card")

.validate_environment_metadata(x, y, call = get_cli_abort_call())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the env checking for now. It's quite complicated.


.validate_environment_metadata(x, y, call = get_cli_abort_call())

primary_x <-
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can evaluate the keys and compare columns with

keys <- .process_keys_arg(x, y, keys = {{ keys }})
compare <- .process_compare_arg(x, y, compare = {{ compare }})



# outside the function we define these functions

.process_keys_arg <- function(x, y, keys) {
  keys <- intersect(cards_select({{ keys }}, data = x), cards_select({{ keys }}, data = y))
  .check_not_empty(keys)
  cli::cli_inform("The comparison {.arg keys} are {.emph {.val {keys}}}.")
  keys
}

.process_compare_arg <- function(x, y, compare) {
  # add checks and return evaluated compare vector...
}

.check_not_empty <- function(x, arg_name = rlang::caller_arg(x)) {
  if (rlang::is_empty()) {
     cli::cli_abort("The {.arg {arg_name}} argument cannot be empty.")
  }
  invisible(x)
}

Comment on lines +140 to +146
fmt_column <- if ("fmt_fun" %in% names(x) || "fmt_fun" %in% names(y)) {
"fmt_fun"
} else if ("fmt_fn" %in% names(x) || "fmt_fn" %in% names(y)) {
"fmt_fn"
} else {
"fmt_fun"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just use the columns provided in the compare argument to assess which comparisons to make. We can compare all columns in the same way.

y_selected <- .ensure_column(y_selected, column)
}

# .check_rows_not_in_x_y(x_selected, y_selected, key_columns)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can initialize an empty list of results.

results <- rlang::rep_named(c("rows_in_x_not_y", "rows_in_y_not_x"), list(NULL))
results[["compare"]] <- rlang::rep_named(compare, list(NULL))

In this example the "compare" element will also be a named list. The names are the columns that we compare.

We could then follow this up with calls to functions that will populate these parts of the list, e.g.

results[["rows_in_x_not_y"]] <- .compare_rows(x, y) # returns the results of the anti join of x and y on the key columns
results[["rows_in_y_not_x"]] <- .compare_rows(y, x) # same as above, but reversed
results[["compare"]] <- .compare_columns(x, y, compare) # loop through the columns we will compare and return a named list of data frames where each data frame contains the rows that are not equal between x and y. The data frame will have the key columns and the two columns compared (from x and y).


names(mismatch_list) <- names(comparison_targets)

mismatch_list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly, the function will return the results object, and add a class onto this list.

After we get this settled, we will write a print method for class to make it nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants