Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naive Bayes slow prediction #1010

Closed
jxu opened this issue Oct 27, 2023 · 6 comments
Closed

Naive Bayes slow prediction #1010

jxu opened this issue Oct 27, 2023 · 6 comments
Labels

Comments

@jxu
Copy link

jxu commented Oct 27, 2023

I'm not sure if this is a problem with klaR or parsnip per se, but naive bayes prediction takes a long time on a large dataset, much longer than training. It might be because of the large number of warnings generated?

> warnings()
Warning messages:
1: In FUN(X[[i]], ...) :
  Numerical 0 probability for all classes with observation 1
2: In FUN(X[[i]], ...) :
  Numerical 0 probability for all classes with observation 2
3: In FUN(X[[i]], ...) :
  Numerical 0 probability for all classes with observation 3

etc.

@EmilHvitfeldt
Copy link
Member

Hello @jxu 👋

This is an known issue, and not much we can do on our end. We do have another engine naivebayes that has a significantly faster prediction time

library(parsnip)
library(modeldata)
library(discrim)

nb_klaR <- naive_Bayes() |>
  set_engine("klaR") |>
  fit(Street ~ ., data = ames)

tictoc::tic()
predict(nb_klaR, ames)
#> observation 2930
#> # A tibble: 2,930 × 1
#>    .pred_class
#>    <fct>      
#>  1 Pave       
#>  2 Pave       
#>  3 Pave       
#>  4 Pave       
#>  5 Pave       
#>  6 Pave       
#>  7 Pave       
#>  8 Pave       
#>  9 Pave       
#> 10 Pave       
#> # ℹ 2,920 more rows
tictoc::toc()
#> 4.247 sec elapsed

nb_naivebayes <- naive_Bayes() |>
  set_engine("naivebayes") |>
  fit(Street ~ ., data = ames)

tictoc::tic()
predict(nb_naivebayes, ames)
#> # A tibble: 2,930 × 1
#>    .pred_class
#>    <fct>      
#>  1 Pave       
#>  2 Pave       
#>  3 Pave       
#>  4 Pave       
#>  5 Pave       
#>  6 Pave       
#>  7 Pave       
#>  8 Pave       
#>  9 Pave       
#> 10 Pave       
#> # ℹ 2,920 more rows
tictoc::toc()
#> 0.025 sec elapsed

@jxu
Copy link
Author

jxu commented Oct 27, 2023

Thanks for getting back to me. Is there a reason why naivebayes is not the default? The Ames dataset, 2930 x 74, is not even that big, and naive bayes is supposed to be extremely fast.

@jxu
Copy link
Author

jxu commented Oct 27, 2023

Related #1007

@EmilHvitfeldt
Copy link
Member

{klaR} was the original engine for naive_Bayes() and when then {naivebayes} engine as added, {klaR} was kept the default to keep the old behavior. E.I. it was likely that some some people relied on the fact that a {klaR} model came through and it would break things if that changed.

tidymodels/discrim@b27a1c5

@simonpcouch
Copy link
Contributor

Will go ahead and close. Glad we've documented this here!

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants