In classification problems, merging `probably` package when determining best threshold. #986

SHo-JANG · 2023-07-08T07:43:22Z

As far as I can understand, we're using prob_to_class_2 as the default option when predicting class.

prob_to_class_2 <- function(x, object) {
  x <- ifelse(x >= 0.5, object$lvl[2], object$lvl[1])
  unname(x)
}

However, in many cases, the threshold is not 0.5. (Especially in imbalanced datasets.)

In this case, I wonder if we could use the threshold_perf() function in the probably package during the tuning process to check if the model is potentially classifying really well.

I think it's a really necessary feature, what do you think?

The text was updated successfully, but these errors were encountered:

topepo · 2023-07-08T14:15:24Z

It is an important feature. After the posit conference, we will be working on post-processing tools and this is one of them.

We'll try to make it natural so that you can treat the threshold parameter like any other tuning parameter. If you use a workflow, it will also adjust the hard class predictions automatically (once you've picked a threshold).

SHo-JANG · 2023-07-09T07:31:17Z

Thank you so much for all the hard work you do to make the system more complete.

SHo-JANG · 2023-10-26T05:18:49Z

I think that hyperparameterizing to find the optimal threshold would be time consuming and could lead to overfitting.

Instead , I searched for a way to determine the optimal threshold. related paper

In Section 2.3. Threshold criteria,
(6)PredPrev = Obs.
This means that we want the class ratio of the predicted result to be equal to the ratio of the observed classes in the trained data, i.e., we use quantile(probs = 1- "Obs class ratio") from the predicted probability vector as the threshold.

The code to implement this in the training process is as follows.

prob_to_class_2_custom <- function(x, object) {
  obs_ratio<- object$fit$y |> mean()
  pred_equal_obs_threshold <- quantile(x,probs = 1-obs_ratio)
  x <- ifelse(x >= pred_equal_obs_threshold, object$lvl[2], object$lvl[1])
  unname(x)
}

I would like to use this function as the default option.
However, it seems that I need to redefine the engine to apply this function. Is there any way to use this function in an existing engine?

simonpcouch · 2024-10-08T16:00:17Z

Long time no see😝 We've got some good news here, though—custom probability thresholds and other postprocessing functionality is now available via tailors, which can be added to workflows in the dev version of the workflows package. You can read more on that work on this blog post.

Since these changes will otherwise live on the tailor repo, I'm going to go ahead and close!

github-actions · 2024-10-24T01:05:33Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

simonpcouch closed this as completed Oct 8, 2024

github-actions bot locked and limited conversation to collaborators Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In classification problems, merging `probably` package when determining best threshold. #986

In classification problems, merging `probably` package when determining best threshold. #986

SHo-JANG commented Jul 8, 2023 •

edited

Loading

topepo commented Jul 8, 2023

SHo-JANG commented Jul 9, 2023

SHo-JANG commented Oct 26, 2023

simonpcouch commented Oct 8, 2024

github-actions bot commented Oct 24, 2024

In classification problems, merging probably package when determining best threshold. #986

In classification problems, merging probably package when determining best threshold. #986

Comments

SHo-JANG commented Jul 8, 2023 • edited Loading

topepo commented Jul 8, 2023

SHo-JANG commented Jul 9, 2023

SHo-JANG commented Oct 26, 2023

simonpcouch commented Oct 8, 2024

github-actions bot commented Oct 24, 2024

In classification problems, merging `probably` package when determining best threshold. #986

In classification problems, merging `probably` package when determining best threshold. #986

SHo-JANG commented Jul 8, 2023 •

edited

Loading