Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ranger engine not outputting verbose messages #1004

Closed
jxu opened this issue Oct 18, 2023 · 5 comments
Closed

Ranger engine not outputting verbose messages #1004

jxu opened this issue Oct 18, 2023 · 5 comments
Labels

Comments

@jxu
Copy link

jxu commented Oct 18, 2023

You need a large dataset for this (not sure which built-in datasets are large enough). The ranger fit should produce progress messages like

Growing trees.. Progress: 16%. Estimated remaining time: 2 minutes, 45 seconds.
Growing trees.. Progress: 42%. Estimated remaining time: 1 minute, 25 seconds.

However, parsnip fit doesn't show progress, even with verbosity=2 setting.

rf_mod <- rand_forest(mode="classification") %>%
  set_engine("ranger") 
rf_fit <- rf_mod %>% 
  fit(price ~ ., data=train,
      control=control_parsnip(verbosity=2, catch=F))
@jxu
Copy link
Author

jxu commented Oct 18, 2023

I got verbose output to show up with set_engine("ranger", verbose=T, num.threads=32).
Also parsnip seems to default to 1 thread even though ranger defaults to number of CPUs. Does it change the defaults?

@EmilHvitfeldt
Copy link
Member

With regards to the progress bar. That is happening because the verbose argument has been defaulted to be FALSE:

set_fit(
model = "rand_forest",
eng = "ranger",
mode = "classification",
value = list(
interface = "data.frame",
data = c(x = "x", y = "y", weights = "case.weights"),
protect = c("x", "y", "weights"),
func = c(pkg = "ranger", fun = "ranger"),
defaults =
list(
num.threads = 1,
verbose = FALSE,
seed = expr(sample.int(10 ^ 5, 1))
)
)
)

You can overwrite it by setting verbose = TRUE in set_engine() as you have seen.

rf_mod <- rand_forest(mode = "classification") %>%
  set_engine("ranger", verbose = TRUE) 

With regards to the choice of num.threads = 1. We strongly believe that code should run single threaded by default, with an easy way to opt into multi threaded calculations. This stops a number of problems from happening:

  • grabbing all 1000 cores of a supercomputer is properly not ideal, but will happen when code defaults to all available cores
  • when fitting multiple models in parallel, which is often done in tidymodels when you do hyperparameter tuning, you generally want to fit one model in each thread. If each of those models tries to use all cores you run into issues as well. Hence the default of num.threads = 1.

@jxu
Copy link
Author

jxu commented Oct 18, 2023

Ok, thanks for the info.

@jxu jxu closed this as completed Oct 18, 2023
@jxu
Copy link
Author

jxu commented Oct 19, 2023

I think verbose default should be set to true. Tidyverse functions (ex. read_csv) print a lot of progress output anyway.

Copy link

github-actions bot commented Nov 3, 2023

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants