Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2644: Use all variables for extract_duplicate_records by default #2651

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

## Updates of Existing Functions

- The function `extract_duplicate_records()` updated to use all variables if omitted the `by_vars` (#2644)
ynsec37 marked this conversation as resolved.
Show resolved Hide resolved

## Breaking Changes

- The following function arguments are entering the next phase of the [deprecation process](https://pharmaverse.github.io/admiraldev/articles/programming_strategy.html#deprecation): (#2487) (#2595)
Expand Down
12 changes: 9 additions & 3 deletions R/duplicates.R
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ get_duplicates_dataset <- function() {
#' @param by_vars Grouping variables
#'
#' Defines groups of records in which to look for duplicates.
#' If omitted, all variables used for defining groups.
ynsec37 marked this conversation as resolved.
Show resolved Hide resolved
#'
#' `r roxygen_param_by_vars()`
#'
Expand All @@ -55,9 +56,14 @@ get_duplicates_dataset <- function() {
#' adsl <- rbind(admiral_adsl[1L, ], admiral_adsl)
#'
#' extract_duplicate_records(adsl, exprs(USUBJID))
extract_duplicate_records <- function(dataset, by_vars) {
assert_expr_list(by_vars)
assert_data_frame(dataset, required_vars = extract_vars(by_vars), check_is_grouped = FALSE)
extract_duplicate_records <- function(dataset, by_vars = NULL) {
if (is.null(by_vars)) {
assert_data_frame(dataset, check_is_grouped = FALSE)
by_vars <- exprs(!!!parse_exprs(names(dataset)))
} else {
assert_expr_list(by_vars)
assert_data_frame(dataset, required_vars = extract_vars(by_vars), check_is_grouped = FALSE)
}

data_by <- dataset %>%
ungroup() %>%
Expand Down
3 changes: 2 additions & 1 deletion man/extract_duplicate_records.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/testthat/_snaps/duplicates.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# signal_duplicate_records Test 2: dataset of duplicate records can be accessed using `get_duplicates_dataset()`
# signal_duplicate_records Test 3: dataset of duplicate records can be accessed using `get_duplicates_dataset()`

Code
get_duplicates_dataset()
Expand Down
24 changes: 22 additions & 2 deletions tests/testthat/test-duplicates.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,29 @@ test_that("extract_duplicate_records Test 1: duplicate records are extracted", {
)
})

## Test 2: duplicate records for all variables ----
test_that("extract_duplicate_records Test 2: duplicate records for all variables", {
input <- tibble::tribble(
~USUBJID, ~COUNTRY, ~AAGE,
"P01", "GER", 22,
"P01", "JPN", 34,
"P02", "CZE", 41,
"P03", "AUS", 39,
"P04", "BRA", 21,
"P04", "BRA", 21
)
expected_ouput <- input[c(5:6), ]

expect_equal(
expected_ouput,
extract_duplicate_records(input)
)
})


# signal_duplicate_records ----
## Test 2: dataset of duplicate records can be accessed using `get_duplicates_dataset()` ----
test_that("signal_duplicate_records Test 2: dataset of duplicate records can be accessed using `get_duplicates_dataset()`", { # nolint
## Test 3: dataset of duplicate records can be accessed using `get_duplicates_dataset()` ----
test_that("signal_duplicate_records Test 3: dataset of duplicate records can be accessed using `get_duplicates_dataset()`", { # nolint
input <- tibble::tribble(
~USUBJID, ~COUNTRY, ~AAGE,
"P01", "GER", 22,
Expand Down
Loading