Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2644: Use all variables for extract_duplicate_records by default #2651

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ynsec37
Copy link
Contributor

@ynsec37 ynsec37 commented Jan 22, 2025

Thank you for your Pull Request! We have developed this task checklist from the Development Process Guide to help with the final steps of the process. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the admiral codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the main branch until you have checked off each task.

  • Place Closes #<insert_issue_number> into the beginning of your Pull Request Title (Use Edit button in top-right if you need to update)
  • Code is formatted according to the tidyverse style guide. Run styler::style_file() to style R and Rmd files
  • Updated relevant unit tests or have written new unit tests, which should consider realistic data scenarios and edge cases, e.g. empty datasets, errors, boundary cases etc. - See Unit Test Guide
  • If you removed/replaced any function and/or function parameters, did you fully follow the deprecation guidance?
  • Review the Cheat Sheet. Make any required updates to it by editing the file inst/cheatsheet/admiral_cheatsheet.pptx and re-upload a PDF and a PNG version of it to the same folder. (The PNG version can be created by taking a screenshot of the PDF version.)
  • Update to all relevant roxygen headers and examples, including keywords and families. Refer to the categorization of functions to tag appropriate keyword/family.
  • Run devtools::document() so all .Rd files in the man folder and the NAMESPACE file in the project root are updated appropriately
  • Address any updates needed for vignettes and/or templates
  • Update NEWS.md under the header # admiral (development version) if the changes pertain to a user-facing function (i.e. it has an @export tag) or documentation aimed at users (rather than developers). A Developer Notes section is available in NEWS.md for tracking developer-facing issues.
  • Build admiral site pkgdown::build_site() and check that all affected examples are displayed correctly and that all new functions occur on the "Reference" page.
  • Address or fix all lintr warnings and errors - lintr::lint_package()
  • Run R CMD check locally and address all errors and warnings - devtools::check()
  • Link the issue in the Development Section on the right hand side.
  • Address all merge conflicts and resolve appropriately
  • Pat yourself on the back for a job well done! Much love to your accomplishment!

…rds by default

* update `by_vars` to use all variables
* add test for `by_vars = NULL`
* update documentation
* update NEWS
Copy link
Collaborator

@manciniedoardo manciniedoardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ynsec37 - this looks great! I've tested the code and it looks good to me. Thanks for also adding a new test. I've just left a couple of comments to make wording clearer.

Please feel free to also add your name in the Acknowledgments section of the README 😄

NEWS.md Outdated Show resolved Hide resolved
R/duplicates.R Outdated Show resolved Hide resolved
ynsec37 and others added 2 commits January 23, 2025 21:00
Co-authored-by: Edoardo Mancini <[email protected]>
Co-authored-by: Edoardo Mancini <[email protected]>
NEWS.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@bms63 bms63 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bundfussr and @manciniedoardo so we are okay with combing 100 variables for duplicates? no issues we should alert users or tests should be implemented for large data?

@bundfussr
Copy link
Collaborator

@bundfussr and @manciniedoardo so we are okay with combing 100 variables for duplicates? no issues we should alert users or tests should be implemented for large data?

Maybe we should add a note in the documentation of the by_vars argument that omitting it could increase the run-time.

@manciniedoardo
Copy link
Collaborator

@bundfussr and @manciniedoardo so we are okay with combing 100 variables for duplicates? no issues we should alert users or tests should be implemented for large data?

Maybe we should add a note in the documentation of the by_vars argument that omitting it could increase the run-time.

Yes, that's a good suggestion @bundfussr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Use all variables for extract_duplicate_records by default
4 participants