-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-15780 set weak learner parameter #15901
GH-15780 set weak learner parameter #15901
Conversation
c00b6ad
to
49c5497
Compare
parms._seed = _parms._seed; | ||
if (!useCustomWeakLearnerParameters()) { | ||
parms._mtries = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if user specifies just some parameters that or not in this list? Shouldn't we rather set these defaults and let them override with custom parameters if specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage of custom parameters overrides usage of default parameters.
Its not possible to e.g: Set default, but only cahnge max_depth=4. User in this case have to define all the parameters:
{'ntrees':1, 'mtries':1, 'min_rows':1, 'sample_rate':1, 'max_depth':4}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we rather set these defaults and let them override with custom parameters if specified?
It is possible, I would rather go with –> it is default or whatever user specify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valenad1 So what happens if the user sets params only as {'ntrees':1}? Model defaults are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: If your weak_learner=DRF then defaults of DRF algorithm are used...
Defaults of DRF algorithm are used. It means sample_rate=0.6320000291, mtries=-1,...
We discuss it with @mn-mikke but we didn't come to conclusion. My preferred way is to go either with default or whatever user specify. By your example if user specify {'ntrees':1}
it does the same as RandomForestEstimator(ntrees=1)
. It make sense for me if you consider all of the other algorithms and their setup.
@mn-mikke preferred way would be to still apply defaults. E.g when user specify {'sample_rate':0.6}
the it means RandomForestEstimator(ntrees=1, min_rows=1,sample_rate=0.6, max_depth=1)
. It make sense if you look on it only from AdaBoost perpective and DRF. Everybody assume AdaBoost with DRF is root with one split and user maybe want to play with it and try different "defaults". IMHO there is no assumption about AdaBoost with GBM, GLM, NN,...
My point of view is not to implement possibility for different defaults but make a user to be able to specify his own weak learner.
Only thing I know is that I have to document it perfectly, and maybe I would like to refactor weak_learner to algo and weak_learner_parms to algo_parms, to make it more clear.
What do you think @maurever, @wendycwong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, there are 3 ways parameter values can be set:
- user specified values (for either adaboost parameters or for weak-learner parameters that the user choose)
- in adaboost, it sets certain parameters to defaults. In addition, for the default algo (GBM), adaboost may choose to set the GBM parameters to certain values. Let's call these adaboost defaults;
- For each weak-learner, the algo will set default parameter values. Let's call these algorithm specific defaults.
So, in terms of priorities, we will always use user specified values for the parameters (adaboost or algorithm specific) as long as they are valid;
Then, for parameters that are not specified by users, we will first set them to adaboost defaults;
Next, for parameters that are not specified by users or adaboost, we will set them to algorithm specific defaults.
I think this is what the conversation chain converges to. Please let me know if you think otherwise. Thanks, W
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In private conversation, @wendycwong agrees that it should be AdaBoost default or whatever user specify.
E.g.
- H2OAdaBoostEstimator(nlearners=2) –> weak_learner DRF with configuration
H2ORandomForestEstimator(ntrees=1, mtries=1, min_rows=1, sample_rate=1, max_depth=1)
- H2OAdaBoostEstimator(nlearners=2, weak_learner_parms="{'ntrees':1,'max_depth':3}") –> weak_learner DRF with configuration
parameters of the weak_learner will be the same as if you call
H2ORandomForestEstimator(ntrees=1,max_depth=3)
The reason is that AdaBoost default setting follow only convention or my idea of weak learner, but when user want to customize, then he can customize without worring about our default choices and focus only on weak_learner.
…l be loaded each time. The parameters are not immutable objects and training of the previous algorithm can change parameters and therefore affect next training if we didn't reload the parameters
49c5497
to
9712ad8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Your final solution makes sense to me, @valenad1. Thanks.
#15780