Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2644: Use all variables for extract_duplicate_records by default #2651

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

## Updates of Existing Functions

- The function `extract_duplicate_records()` was updated to consider all variables in the input dataset for the by group if the `by_vars` argument is omitted entirely. (#2644)

## Breaking Changes

- The following function arguments are entering the next phase of the [deprecation process](https://pharmaverse.github.io/admiraldev/articles/programming_strategy.html#deprecation): (#2487) (#2595)
Expand Down
12 changes: 9 additions & 3 deletions R/duplicates.R
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ get_duplicates_dataset <- function() {
#' @param by_vars Grouping variables
#'
#' Defines groups of records in which to look for duplicates.
#' If omitted, all variables in the input dataset are used in the by group.
#'
#' `r roxygen_param_by_vars()`
#'
Expand All @@ -55,9 +56,14 @@ get_duplicates_dataset <- function() {
#' adsl <- rbind(admiral_adsl[1L, ], admiral_adsl)
#'
#' extract_duplicate_records(adsl, exprs(USUBJID))
extract_duplicate_records <- function(dataset, by_vars) {
assert_expr_list(by_vars)
assert_data_frame(dataset, required_vars = extract_vars(by_vars), check_is_grouped = FALSE)
extract_duplicate_records <- function(dataset, by_vars = NULL) {
if (is.null(by_vars)) {
assert_data_frame(dataset, check_is_grouped = FALSE)
by_vars <- exprs(!!!parse_exprs(names(dataset)))
} else {
assert_expr_list(by_vars)
assert_data_frame(dataset, required_vars = extract_vars(by_vars), check_is_grouped = FALSE)
}

data_by <- dataset %>%
ungroup() %>%
Expand Down
3 changes: 2 additions & 1 deletion man/extract_duplicate_records.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/testthat/_snaps/duplicates.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# signal_duplicate_records Test 2: dataset of duplicate records can be accessed using `get_duplicates_dataset()`
# signal_duplicate_records Test 3: dataset of duplicate records can be accessed using `get_duplicates_dataset()`

Code
get_duplicates_dataset()
Expand Down
24 changes: 22 additions & 2 deletions tests/testthat/test-duplicates.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,29 @@ test_that("extract_duplicate_records Test 1: duplicate records are extracted", {
)
})

## Test 2: duplicate records for all variables ----
test_that("extract_duplicate_records Test 2: duplicate records for all variables", {
input <- tibble::tribble(
~USUBJID, ~COUNTRY, ~AAGE,
"P01", "GER", 22,
"P01", "JPN", 34,
"P02", "CZE", 41,
"P03", "AUS", 39,
"P04", "BRA", 21,
"P04", "BRA", 21
)
expected_ouput <- input[c(5:6), ]

expect_equal(
expected_ouput,
extract_duplicate_records(input)
)
})


# signal_duplicate_records ----
## Test 2: dataset of duplicate records can be accessed using `get_duplicates_dataset()` ----
test_that("signal_duplicate_records Test 2: dataset of duplicate records can be accessed using `get_duplicates_dataset()`", { # nolint
## Test 3: dataset of duplicate records can be accessed using `get_duplicates_dataset()` ----
test_that("signal_duplicate_records Test 3: dataset of duplicate records can be accessed using `get_duplicates_dataset()`", { # nolint
input <- tibble::tribble(
~USUBJID, ~COUNTRY, ~AAGE,
"P01", "GER", 22,
Expand Down
Loading