Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple interfaces of (not to) an engine #1114

Open
hfrick opened this issue Apr 23, 2024 · 5 comments
Open

Support multiple interfaces of (not to) an engine #1114

hfrick opened this issue Apr 23, 2024 · 5 comments

Comments

@hfrick
Copy link
Member

hfrick commented Apr 23, 2024

We currently only allow one interface of an engine, set by set_fit(). Some engines have multiple interfaces themselves but we don't leverage that. This SO post runs into troubles with the formula interface of kernlab::ksvm() which could be resolved by using the matrix interface of the kernlab function. The workflow does use the tidymodels matrix interface but eventually translates it to the formula interface of kernlab because that's how it's registered in parnsip.

This single translation point from parsnip to engine is also a challenge for tidymodels/censored#311

library(tidymodels)
library(kernlab)
# [...]

x <- matrix(rnorm(2000000), nrow = 100, ncol = 20000)
colnames(x) <- paste0("x", 1:20000)
y <- rnorm(n = 100)
data <- cbind(y, x) %>% as.data.frame()

# formula interface struggles
svm.train <- ksvm(y ~ ., type="eps-svr", data = data, kernel ="rbfdot")
#> Error: protect(): protection stack overflow

# matrix interface works
svm.train <- ksvm(x = x, y = y, type = "eps-svr", kernel ="rbfdot")

# tidymodels always uses the formula interface of kernlab itself, 
# regardless of the tidymodels interface

svm_spec <- svm_rbf(engine = "kernlab", mode = "regression")

fit_f <- fit(svm_spec, y ~ ., data = data)
#> Error: protect(): protection stack overflow
fit_xy <- fit_xy(svm_spec, x = x, y = y)
#> Error: protect(): protection stack overflow

Created on 2024-04-23 with reprex v2.1.0

@simonpcouch
Copy link
Contributor

Do you anticipate any downsides to, instead, just fully switching to registering kernlab::ksvm() via its XY interface? Same question would go for coxnet_train() as well, I guess.

@hfrick
Copy link
Member Author

hfrick commented Apr 23, 2024

I'm not familiar with kernlab so can't give a qualified answer on that right now :)

For glmnet/coxnet: 😬

There is a fundamental design clash wrt to stratification. glmnet expects the response to be stratified which would mean that we would not have stratification information available at prediction time with tidymodels. To get out of that, coxnet_train() handles the translation of stratification.

@EmilHvitfeldt
Copy link
Member

having multiple interfaces might be nice once we have sparse tibble support. all sparsity should be done using _xy, but a given model might perform better non-sparse using a formula interface

@hfrick
Copy link
Member Author

hfrick commented Apr 23, 2024

+1 on the sparsity comment - one example (of probably a few more?) is tidymodels/censored#276

@EmilHvitfeldt
Copy link
Member

With the work I'm doing in #1165 and #1125. I can make it work, but having a dgCMatrix as a interface would make some of the code more clear as I'm right now forced to do some changes other places to make things work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants