Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-15780 set weak learner parameter #15901

Merged
merged 2 commits into from
Nov 22, 2023

Conversation

valenad1
Copy link
Collaborator

@valenad1 valenad1 commented Nov 3, 2023

@valenad1 valenad1 force-pushed the valenad-15780-set-weak-learner-parameter branch from c00b6ad to 49c5497 Compare November 7, 2023 17:45
@valenad1 valenad1 marked this pull request as ready for review November 7, 2023 17:52
@valenad1 valenad1 changed the title Valenad 15780 set weak learner parameter GH-15780 set weak learner parameter Nov 8, 2023
@valenad1 valenad1 changed the base branch from rel-3.44.0 to valenad-15779-add-deeplearning-to-adaboost November 9, 2023 16:08
parms._seed = _parms._seed;
if (!useCustomWeakLearnerParameters()) {
parms._mtries = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if user specifies just some parameters that or not in this list? Shouldn't we rather set these defaults and let them override with custom parameters if specified?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of custom parameters overrides usage of default parameters.

Its not possible to e.g: Set default, but only cahnge max_depth=4. User in this case have to define all the parameters:

{'ntrees':1, 'mtries':1, 'min_rows':1, 'sample_rate':1, 'max_depth':4}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather set these defaults and let them override with custom parameters if specified?

It is possible, I would rather go with –> it is default or whatever user specify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valenad1 So what happens if the user sets params only as {'ntrees':1}? Model defaults are used?

Copy link
Collaborator Author

@valenad1 valenad1 Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: If your weak_learner=DRF then defaults of DRF algorithm are used...

Defaults of DRF algorithm are used. It means sample_rate=0.6320000291, mtries=-1,...

We discuss it with @mn-mikke but we didn't come to conclusion. My preferred way is to go either with default or whatever user specify. By your example if user specify {'ntrees':1} it does the same as RandomForestEstimator(ntrees=1). It make sense for me if you consider all of the other algorithms and their setup.

@mn-mikke preferred way would be to still apply defaults. E.g when user specify {'sample_rate':0.6} the it means RandomForestEstimator(ntrees=1, min_rows=1,sample_rate=0.6, max_depth=1). It make sense if you look on it only from AdaBoost perpective and DRF. Everybody assume AdaBoost with DRF is root with one split and user maybe want to play with it and try different "defaults". IMHO there is no assumption about AdaBoost with GBM, GLM, NN,...

My point of view is not to implement possibility for different defaults but make a user to be able to specify his own weak learner.

Only thing I know is that I have to document it perfectly, and maybe I would like to refactor weak_learner to algo and weak_learner_parms to algo_parms, to make it more clear.

What do you think @maurever, @wendycwong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, there are 3 ways parameter values can be set:

  1. user specified values (for either adaboost parameters or for weak-learner parameters that the user choose)
  2. in adaboost, it sets certain parameters to defaults. In addition, for the default algo (GBM), adaboost may choose to set the GBM parameters to certain values. Let's call these adaboost defaults;
  3. For each weak-learner, the algo will set default parameter values. Let's call these algorithm specific defaults.

So, in terms of priorities, we will always use user specified values for the parameters (adaboost or algorithm specific) as long as they are valid;

Then, for parameters that are not specified by users, we will first set them to adaboost defaults;

Next, for parameters that are not specified by users or adaboost, we will set them to algorithm specific defaults.

I think this is what the conversation chain converges to. Please let me know if you think otherwise. Thanks, W

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In private conversation, @wendycwong agrees that it should be AdaBoost default or whatever user specify.

E.g.

  • H2OAdaBoostEstimator(nlearners=2) –> weak_learner DRF with configuration
H2ORandomForestEstimator(ntrees=1, mtries=1, min_rows=1, sample_rate=1, max_depth=1)
  • H2OAdaBoostEstimator(nlearners=2, weak_learner_parms="{'ntrees':1,'max_depth':3}") –> weak_learner DRF with configuration
parameters of the weak_learner will be the same as if you call 
H2ORandomForestEstimator(ntrees=1,max_depth=3)

The reason is that AdaBoost default setting follow only convention or my idea of weak learner, but when user want to customize, then he can customize without worring about our default choices and focus only on weak_learner.

Base automatically changed from valenad-15779-add-deeplearning-to-adaboost to rel-3.44.0 November 16, 2023 17:21
…l be loaded each time. The parameters are not immutable objects and training of the previous algorithm can change parameters and therefore affect next training if we didn't reload the parameters
@valenad1 valenad1 force-pushed the valenad-15780-set-weak-learner-parameter branch from 49c5497 to 9712ad8 Compare November 21, 2023 15:49
Copy link
Contributor

@maurever maurever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Your final solution makes sense to me, @valenad1. Thanks.

@valenad1 valenad1 merged commit 77987a5 into rel-3.44.0 Nov 22, 2023
2 checks passed
@valenad1 valenad1 deleted the valenad-15780-set-weak-learner-parameter branch November 22, 2023 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants