-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add assert_count_true()
to verify that an expected number of values are TRUE
#573
base: main
Are you sure you want to change the base?
Conversation
@billdenney thanks for this and sorry to take so long to review! Three things come to mind after reviewing: 1. Function name 2. Providing a before-and-after example I get the value of this function and would use it. But it took me reading each code block to understand its value. I think we can make it clearer upfront in the documentation (package vignette and function docs) that this doesn't change anything in the data; it's a wrapper that performs a data quality check mid-pipeline and halts things if an assertion fails. The best way I can think of is to show a scenario with before and after example. For instance: Imagine we have a column in a dataset that contains student test scores. A few scores were entered into the grade platform after the data file was exported, so we're filling them in manually during data cleaning. Ordinarily we might do that like this:
But this has a risk: if in a later version of the data, one of these students is missing two test scores, the value will be imputed for both cases.
3. janitor's approach in wading into the assertion space I am also considering how janitor complements the existing assertion functions out there. I looked at the assertr vignette and this blog post on 4 ways to be assertive in R. To me the big difference is that I would argue this niche is unfilled as far as I know and that's this function's reason for being. Do you agree? If so maybe we briefly note that in the function or package docs so people understand when/why to use it. |
For point 2, yes, I like the example. I will incorporate it in some way (maybe a new assertion vignette). For points 1 and 3, you understand all the ideas that I'm looking toward. I was thinking of the I agree that this would be a new chapter for |
Hi @billdenney - friendly ping here. I am gonna try to submit to CRAN by the end of the year, I got another email from them this month saying it was my final notice to fix the package and resubmit. Think you'll be able to implement the above feedback? If not we could try for a fast-follow release and get a few more outstanding things in Spring 2025. |
…true # Conflicts: # NEWS.md
…into assert_count_true
@sfirke, I think that I covered all the to-do items here. Please take a look. Also, I found a bug in either the |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #573 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 27 28 +1
Lines 1413 1428 +15
=========================================
+ Hits 1413 1428 +15
|
# Conflicts: # .gitignore
I merged the current |
Related to #539
I use a variant of this often in my data pipelines where I want to select a certain number of values and be sure that I'm only getting the number I expect. For example, I want to change one date for one person, and I want to be sure that I only change that one date: