Allow mtry to accept a proportion of features, maybe via an mtry_prop parameter? #602

stevenpawley · 2021-11-16T19:43:44Z

Machine learning packages such as scikit-learn and mlr3 allow mtry/max_features type hyperparameters to be supplied (and tuned) as a proportion of available features ('max_features' in sklearn and 'mtry_ratio' in mlr3).

Currently, parsnip appears to require the number of features to be supplied as an integer for tree-based models, and with mtry needing to be finalized if this is unknown. Maybe I have missed something, but this appears problematic within a workflow if preceeding recipe steps are adding or selecting features because the total number of features available are unknown. I couldn't see a way of ensuring a proper range of mtry when tuning hyperparameters as part of a workflow that contains such recipe steps? Currently, the only option appears to be to set the upper range of mtry to be as large as the maximum possible number of features and allowing tune/ranger to provide the warning that mtry has exceeded the number of features available?

Although most random forest implementations in R seem to only allow mtry as a number of features rather than proportion, mlr3 appears to catch this in its fit method via the mtry_ratio hyperparameter. Would this be possible in parsnip?

The text was updated successfully, but these errors were encountered:

juliasilge · 2021-11-17T00:03:28Z

For xgboost, we did set up colsample_bynode (i.e. mtry) and colsample_bytree to be specified as either counts = FALSE or counts = TRUE so you can do this for xgboost FWIW.

We'd have to think about what it would take to support this more broadly.

stevenpawley · 2021-11-17T05:23:29Z

Thanks for the information re. XGBoost, very useful. FYI, ranger's mtry also accepts a function that returns the number of variables:

library(ranger)

ranger(Species ~ ., data = iris, mtry = function(x) ceiling(x * 0.3))

simonpcouch · 2022-05-31T16:24:25Z

Our docs will be clarified here on merge of #734. :)

jxu · 2023-11-03T17:59:59Z

There is no need for a counts flag right? If mtry is between [0,1], use as proportion, if it's an integer > 1, use as count, otherwise error? As long as it's documented it shouldn't be too confusing.

simonpcouch · 2023-11-03T18:01:33Z

mtry = 1 is ambiguous, unfortunately.🙂

jxu · 2023-11-03T18:11:56Z

if you mean count = 1, I think only decision stumps will be learned? This is technically possible but not meaningful.

simonpcouch · 2023-11-03T18:46:21Z

I mean mtry. If mtry = 1, it could be interpreted as a proportion, in which case all predictors are sampled, or it could be interpreted as a count, in which case 1 predictor is sampled.

jxu · 2023-11-03T19:13:17Z

Ok. I mean if you exclude the count interpretation for mtry = 1, then you can support both. But I guess a 1 predictor sampling option is desired.

mirkoruks · 2024-11-26T16:01:42Z

hi, are there any updates on this issue?

topepo added the feature a feature request or enhancement label Nov 17, 2021

simonpcouch mentioned this issue Jul 22, 2024

aorsf support for mtry_prop tidymodels/bonsai#87

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow mtry to accept a proportion of features, maybe via an mtry_prop parameter? #602

Allow mtry to accept a proportion of features, maybe via an mtry_prop parameter? #602

stevenpawley commented Nov 16, 2021

juliasilge commented Nov 17, 2021

stevenpawley commented Nov 17, 2021

simonpcouch commented May 31, 2022

jxu commented Nov 3, 2023

simonpcouch commented Nov 3, 2023

jxu commented Nov 3, 2023

simonpcouch commented Nov 3, 2023

jxu commented Nov 3, 2023

mirkoruks commented Nov 26, 2024

Allow mtry to accept a proportion of features, maybe via an mtry_prop parameter? #602

Allow mtry to accept a proportion of features, maybe via an mtry_prop parameter? #602

Comments

stevenpawley commented Nov 16, 2021

juliasilge commented Nov 17, 2021

stevenpawley commented Nov 17, 2021

simonpcouch commented May 31, 2022

jxu commented Nov 3, 2023

simonpcouch commented Nov 3, 2023

jxu commented Nov 3, 2023

simonpcouch commented Nov 3, 2023

jxu commented Nov 3, 2023

mirkoruks commented Nov 26, 2024