Enhancements to get_duplicates_dataset()
#2019
Replies: 15 comments 6 replies
-
Always resetting the duplicates dataset when What about using a list and a separate reset function. Each time |
Beta Was this translation helpful? Give feedback.
-
@zdz2101 @sadchla-codes Another interesting issue you might want to tackle. |
Beta Was this translation helpful? Give feedback.
-
It seems By restricting the use of |
Beta Was this translation helpful? Give feedback.
-
I'm not sure this approach would work in practice. |
Beta Was this translation helpful? Give feedback.
-
I think we can not automate resetting the duplicates dataset or issue a warning or error if it is out of date because we do not know when the user has fixed the program and rerun it. |
Beta Was this translation helpful? Give feedback.
-
Within the |
Beta Was this translation helpful? Give feedback.
-
User should never run |
Beta Was this translation helpful? Give feedback.
-
After going back to the drafting board for a bit, how does this sound (if user is running in an interactive environment like RStudio):
Draft available |
Beta Was this translation helpful? Give feedback.
-
Rerunning user commands looks dangerous to me. It could change the global environment and cause confusion for the user. E.g., if the last command was |
Beta Was this translation helpful? Give feedback.
-
Hey @galachad can we get your ideas on this one as well? |
Beta Was this translation helpful? Give feedback.
-
This issue is stale because it has been open for 90 days with no activity. |
Beta Was this translation helpful? Give feedback.
-
Hi all, @pharmaverse/admiral This issue is stale. Reviewing the discussion between @zdz2101, @bundfussr and @thomas-neitmann I don't see us implementing this proposal as it seems to invite a lot of potential user-error and confusion. What do you think? We can move it to a discussion and continue with folks experimenting and discussing and move it up the ladder for discussion at a Core Meeting? |
Beta Was this translation helpful? Give feedback.
-
I agree. I would close the issue. |
Beta Was this translation helpful? Give feedback.
-
@ddsjoberg @millerg23 @zdz2101 Hi all - Don't forget to put in your ideas from today's meeting around this issue. I was having trouble focusing at the end (lot of ideas being discussed today) and so wasn't able to capture it in my notes. |
Beta Was this translation helpful? Give feedback.
-
One way around this would be to print the code to run to see the duplicate values. # create a data frame with duplicates
df <- data.frame(USUBJID = c(letters, letters[1:2]))
columns_to_check <- "USUBJID"
cli::cli_warn(
c("!" = "Duplicate values were found in columns {.val {columns_to_check}}",
"i" = "Run {.run df[duplicated(df[{shQuote(columns_to_check)}]), {shQuote(columns_to_check)}, drop = FALSE]} to print the duplicate rows.")
)
#> Warning: ! Duplicate values were found in columns "USUBJID"
#> ℹ Run `df[duplicated(df['USUBJID']), 'USUBJID', drop = FALSE]` to print the
#> duplcicate rows.
df[duplicated(df['USUBJID']), 'USUBJID', drop = FALSE]
#> USUBJID
#> 27 a
#> 28 b Created on 2023-08-09 with reprex v2.0.2 CAVEAT: If we won't the name (ie the symbol) of the data frame we're checking, the code printed would need to replace the data frame name with something the user would need to replace. |
Beta Was this translation helpful? Give feedback.
-
User feedback:
Based upon that I think we should implement the following:
get_duplicated_dataset()
information about which call generates this dataset should be printed.signal_duplicates_records()
is called.Beta Was this translation helpful? Give feedback.
All reactions