Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transition from add_tailor(prop) and method to fit.workflow(calibration) #262

Merged
merged 5 commits into from
Oct 1, 2024

Conversation

simonpcouch
Copy link
Contributor

Closes #255, closes #254, closes #252, closes #233, and related to this gist.

This PR removes the add_tailor(prop) and method arguments in favor of an argument fit.workflow(calibration) giving the data to fit the calibrator on.

Benefits:

  • The workflow now truly has enough information to fit without leaking data (method wasn't actually enough)
  • add_tailor(method) really wasn't truly independent of the data/resampling scheme
  • workflows does not need to "know about" rsample

Thanks @hfrick for this suggestion—this feels much better.

This PR should be much easier to review commit-by-commit than altogether.

TODOs from here: update tune and (possibly?) rename .should_inner_split() to something like .workflow_needs_calibration().

workflows will no longer take an `add_tailor(prop)` or `add_tailor(method)` argument, instead taking a `fit.workflow(calibration)` argument that supersedes both of them.

first, remove machinery that relates specifically to those arguments.
…bration)`

* removes `add_tailor(prop)` and `add_tailor(method)`
* adds `fit.workflow(calibration)`
* various documentation updates
R/fit.R Outdated Show resolved Hide resolved
tests/testthat/test-post-action-tailor.R Show resolved Hide resolved
@simonpcouch simonpcouch requested a review from hfrick September 27, 2024 16:47
Copy link
Member

@hfrick hfrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a nice progression! I've left a few questions but overall I mainly wanted to say that your very structured whittling down of the discussion around this in the gist is what made my suggestion so easy. 🙇‍♀️ 🙌

R/fit.R Outdated Show resolved Hide resolved
#'
#' @param ... Not used
#'
#' @param calibration A data frame of predictors and outcomes to use when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should adapt the name slightly to make it more obvious that this the data for calibration, rather than, say, the method. data_calibration, calibration_data, calibration_set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I don't have strong preferences between any of these options, but agree that calibration by itself could be ambiguous.

R/fit.R Outdated Show resolved Hide resolved
R/fit.R Outdated Show resolved Hide resolved
R/post-action-tailor.R Outdated Show resolved Hide resolved
tests/testthat/_snaps/fit.md Outdated Show resolved Hide resolved
tests/testthat/_snaps/fit.md Outdated Show resolved Hide resolved
tests/testthat/test-post-action-tailor.R Show resolved Hide resolved
R/fit.R Outdated Show resolved Hide resolved
@@ -59,7 +59,7 @@
#' datasets, resulting in the preprocessor and model generating predictions on
#' rows they've seen before. Similarly problematic situations could arise in the
#' context of other resampling situations, like time-based splits.
#' In general, use the [rsample::inner_split()] function to prevent data
#' In general, use the `rsample::inner_split()` function to prevent data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about inner_split() being "internal" or not

* edits to `validate_has_calibration()`:
     * refer to "The workflow" rather than `caller_arg()`
     * Warn rather than error on unneeded calibration set
     * All arguments on one line in function signature
* comment on expectation re: predictions
* doc tweaks
* `.should_inner_split()` -> `.workflow_includes_calibration()`
* refactor conditional in `fit.workflow()` into two
@simonpcouch simonpcouch merged commit 7929511 into main Oct 1, 2024
11 checks passed
@simonpcouch simonpcouch deleted the calibration-set-arg branch October 1, 2024 12:51
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants