Allow successes/failures matrix as response for `logistic_regression()` #266

juliasilge · 2020-02-25T03:29:10Z

In situations where users need to model an outcome that is a proportion (such as clicks out of impressions, registrations out of visits, etc) a useful approach is to use a generalized linear model with family = binomial (i.e. just like parsnip::logistic_regression()) but in the formula, instead of a factor, specify the response as a "two-column matrix with the columns giving the numbers of successes and failures", according to the docs.

We don't currently support this. This is what it looks like using the underlying glm() function, and this is the error we currently get trying to use parsnip:

library(Sleuth3)

glm(cbind(Extinct, AtRisk - Extinct) ~ log(Area), 
    family = binomial(), data = case2101)
#> 
#> Call:  glm(formula = cbind(Extinct, AtRisk - Extinct) ~ log(Area), family = binomial(), 
#>     data = case2101)
#> 
#> Coefficients:
#> (Intercept)    log(Area)  
#>     -1.1962      -0.2971  
#> 
#> Degrees of Freedom: 17 Total (i.e. Null);  16 Residual
#> Null Deviance:       45.34 
#> Residual Deviance: 12.06     AIC: 75.39

library(parsnip)

logistic_reg() %>%
    set_engine("glm") %>%
    fit(cbind(Extinct, AtRisk - Extinct) ~ log(Area),
        data = case2101)
#> Error: For classification models, the outcome should be a factor.

^{Created on 2020-02-24 by the reprex package (v0.3.0)}

The text was updated successfully, but these errors were encountered:

llendway · 2023-11-03T17:08:02Z

Agreed that this would be helpful! An example of where I might use this is if I wanted to model the probability a member/customer takes a certain action each month for the next 12 months. Using the binomial framework above, this gives me the probability for each month (assuming independence), and I can also estimate the number of months they will take an action. Without this option, I can model the probability that they will take an action in at least one of the months, but that doesn't quite get me what I'd need. I hope that explanation helps.

juliasilge added the feature a feature request or enhancement label Apr 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow successes/failures matrix as response for `logistic_regression()` #266

Allow successes/failures matrix as response for `logistic_regression()` #266

juliasilge commented Feb 25, 2020

llendway commented Nov 3, 2023

Allow successes/failures matrix as response for logistic_regression() #266

Allow successes/failures matrix as response for logistic_regression() #266

Comments

juliasilge commented Feb 25, 2020

llendway commented Nov 3, 2023

Allow successes/failures matrix as response for `logistic_regression()` #266

Allow successes/failures matrix as response for `logistic_regression()` #266