-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In classification problems, merging probably
package when determining best threshold.
#986
Comments
It is an important feature. After the posit conference, we will be working on post-processing tools and this is one of them. We'll try to make it natural so that you can treat the threshold parameter like any other tuning parameter. If you use a workflow, it will also adjust the hard class predictions automatically (once you've picked a threshold). |
Thank you so much for all the hard work you do to make the system more complete. |
I think that hyperparameterizing to find the optimal threshold would be time consuming and could lead to overfitting. Instead , I searched for a way to determine the optimal threshold. related paper In Section 2.3. Threshold criteria, The code to implement this in the training process is as follows. prob_to_class_2_custom <- function(x, object) {
obs_ratio<- object$fit$y |> mean()
pred_equal_obs_threshold <- quantile(x,probs = 1-obs_ratio)
x <- ifelse(x >= pred_equal_obs_threshold, object$lvl[2], object$lvl[1])
unname(x)
}
I would like to use this function as the default option. |
Long time no see😝 We've got some good news here, though—custom probability thresholds and other postprocessing functionality is now available via tailors, which can be added to workflows in the dev version of the workflows package. You can read more on that work on this blog post. Since these changes will otherwise live on the tailor repo, I'm going to go ahead and close! |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
As far as I can understand, we're using
prob_to_class_2
as the default option when predicting class.However, in many cases, the threshold is not 0.5. (Especially in imbalanced datasets.)
In this case, I wonder if we could use the
threshold_perf()
function in theprobably
package during the tuning process to check if the model is potentially classifying really well.I think it's a really necessary feature, what do you think?
The text was updated successfully, but these errors were encountered: