-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-15780 set weak learner parameter #15901
Merged
valenad1
merged 2 commits into
rel-3.44.0
from
valenad-15780-set-weak-learner-parameter
Nov 22, 2023
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if user specifies just some parameters that or not in this list? Shouldn't we rather set these defaults and let them override with custom parameters if specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage of custom parameters overrides usage of default parameters.
Its not possible to e.g: Set default, but only cahnge max_depth=4. User in this case have to define all the parameters:
{'ntrees':1, 'mtries':1, 'min_rows':1, 'sample_rate':1, 'max_depth':4}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible, I would rather go with –> it is default or whatever user specify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valenad1 So what happens if the user sets params only as {'ntrees':1}? Model defaults are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: If your weak_learner=DRF then defaults of DRF algorithm are used...
Defaults of DRF algorithm are used. It means sample_rate=0.6320000291, mtries=-1,...
We discuss it with @mn-mikke but we didn't come to conclusion. My preferred way is to go either with default or whatever user specify. By your example if user specify
{'ntrees':1}
it does the same asRandomForestEstimator(ntrees=1)
. It make sense for me if you consider all of the other algorithms and their setup.@mn-mikke preferred way would be to still apply defaults. E.g when user specify
{'sample_rate':0.6}
the it meansRandomForestEstimator(ntrees=1, min_rows=1,sample_rate=0.6, max_depth=1)
. It make sense if you look on it only from AdaBoost perpective and DRF. Everybody assume AdaBoost with DRF is root with one split and user maybe want to play with it and try different "defaults". IMHO there is no assumption about AdaBoost with GBM, GLM, NN,...My point of view is not to implement possibility for different defaults but make a user to be able to specify his own weak learner.
Only thing I know is that I have to document it perfectly, and maybe I would like to refactor weak_learner to algo and weak_learner_parms to algo_parms, to make it more clear.
What do you think @maurever, @wendycwong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, there are 3 ways parameter values can be set:
So, in terms of priorities, we will always use user specified values for the parameters (adaboost or algorithm specific) as long as they are valid;
Then, for parameters that are not specified by users, we will first set them to adaboost defaults;
Next, for parameters that are not specified by users or adaboost, we will set them to algorithm specific defaults.
I think this is what the conversation chain converges to. Please let me know if you think otherwise. Thanks, W
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In private conversation, @wendycwong agrees that it should be AdaBoost default or whatever user specify.
E.g.
The reason is that AdaBoost default setting follow only convention or my idea of weak learner, but when user want to customize, then he can customize without worring about our default choices and focus only on weak_learner.