Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow successes/failures matrix as response for logistic_regression() #266

Open
juliasilge opened this issue Feb 25, 2020 · 1 comment
Open
Labels
feature a feature request or enhancement

Comments

@juliasilge
Copy link
Member

In situations where users need to model an outcome that is a proportion (such as clicks out of impressions, registrations out of visits, etc) a useful approach is to use a generalized linear model with family = binomial (i.e. just like parsnip::logistic_regression()) but in the formula, instead of a factor, specify the response as a "two-column matrix with the columns giving the numbers of successes and failures", according to the docs.

We don't currently support this. This is what it looks like using the underlying glm() function, and this is the error we currently get trying to use parsnip:

library(Sleuth3)

glm(cbind(Extinct, AtRisk - Extinct) ~ log(Area), 
    family = binomial(), data = case2101)
#> 
#> Call:  glm(formula = cbind(Extinct, AtRisk - Extinct) ~ log(Area), family = binomial(), 
#>     data = case2101)
#> 
#> Coefficients:
#> (Intercept)    log(Area)  
#>     -1.1962      -0.2971  
#> 
#> Degrees of Freedom: 17 Total (i.e. Null);  16 Residual
#> Null Deviance:       45.34 
#> Residual Deviance: 12.06     AIC: 75.39

library(parsnip)

logistic_reg() %>%
    set_engine("glm") %>%
    fit(cbind(Extinct, AtRisk - Extinct) ~ log(Area),
        data = case2101)
#> Error: For classification models, the outcome should be a factor.

Created on 2020-02-24 by the reprex package (v0.3.0)

@juliasilge juliasilge added the feature a feature request or enhancement label Apr 3, 2020
@llendway
Copy link

llendway commented Nov 3, 2023

Agreed that this would be helpful! An example of where I might use this is if I wanted to model the probability a member/customer takes a certain action each month for the next 12 months. Using the binomial framework above, this gives me the probability for each month (assuming independence), and I can also estimate the number of months they will take an action. Without this option, I can model the probability that they will take an action in at least one of the months, but that doesn't quite get me what I'd need. I hope that explanation helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants