Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-15780 set weak learner parameter #15901

Merged
merged 2 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 54 additions & 18 deletions h2o-algos/src/main/java/hex/adaboost/AdaBoost.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package hex.adaboost;

import com.google.gson.*;
import hex.Model;
import hex.ModelBuilder;
import hex.ModelCategory;
Expand All @@ -19,8 +20,8 @@
import water.util.Timer;
import water.util.TwoDimTable;

import java.util.ArrayList;
import java.util.List;
import java.lang.reflect.Field;
import java.util.*;

/**
* Implementation of AdaBoost algorithm based on
Expand All @@ -37,6 +38,7 @@ public class AdaBoost extends ModelBuilder<AdaBoostModel, AdaBoostModel.AdaBoost

private AdaBoostModel _model;
private String _weightsName = "weights";
private Gson _gsonParser;

// Called from an http request
public AdaBoost(AdaBoostModel.AdaBoostParameters parms) {
Expand Down Expand Up @@ -74,6 +76,34 @@ public void init(boolean expensive) {
if( !(0. < _parms._learn_rate && _parms._learn_rate <= 1.0) ) {
error("learn_rate", "learn_rate must be between 0 and 1");
}
if (useCustomWeakLearnerParameters()) {
try {
_gsonParser = new GsonBuilder()
.setFieldNamingStrategy(new PrecedingUnderscoreNamingStrategy())
.create();
_gsonParser.fromJson(_parms._weak_learner_params, JsonObject.class);
} catch (JsonSyntaxException syntaxException) {
error("weak_learner_params", "Provided parameters are not in the valid json format. Got error: " + syntaxException.getMessage());
}
}
}

private boolean useCustomWeakLearnerParameters() {
return _parms._weak_learner_params != null && !_parms._weak_learner_params.isEmpty();
}

private class PrecedingUnderscoreNamingStrategy implements FieldNamingStrategy
{
public String translateName(Field field)
{
String fieldName =
FieldNamingPolicy.LOWER_CASE_WITH_UNDERSCORES.translateName(field);
if (fieldName.startsWith("_"))
{
fieldName = fieldName.substring(1);
}
return fieldName;
}
}

private class AdaBoostDriver extends Driver {
Expand Down Expand Up @@ -181,21 +211,23 @@ private ModelBuilder chooseWeakLearner(Frame frame) {
}

private DRF getDRFWeakLearner(Frame frame) {
DRFModel.DRFParameters parms = new DRFModel.DRFParameters();
DRFModel.DRFParameters parms = useCustomWeakLearnerParameters() ? _gsonParser.fromJson(_parms._weak_learner_params, DRFModel.DRFParameters.class) : new DRFModel.DRFParameters();
parms._train = frame._key;
parms._response_column = _parms._response_column;
parms._weights_column = _weightsName;
parms._mtries = 1;
parms._min_rows = 1;
parms._ntrees = 1;
parms._sample_rate = 1;
parms._max_depth = 1;
parms._seed = _parms._seed;
if (!useCustomWeakLearnerParameters()) {
parms._mtries = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if user specifies just some parameters that or not in this list? Shouldn't we rather set these defaults and let them override with custom parameters if specified?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of custom parameters overrides usage of default parameters.

Its not possible to e.g: Set default, but only cahnge max_depth=4. User in this case have to define all the parameters:

{'ntrees':1, 'mtries':1, 'min_rows':1, 'sample_rate':1, 'max_depth':4}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather set these defaults and let them override with custom parameters if specified?

It is possible, I would rather go with –> it is default or whatever user specify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valenad1 So what happens if the user sets params only as {'ntrees':1}? Model defaults are used?

Copy link
Collaborator Author

@valenad1 valenad1 Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: If your weak_learner=DRF then defaults of DRF algorithm are used...

Defaults of DRF algorithm are used. It means sample_rate=0.6320000291, mtries=-1,...

We discuss it with @mn-mikke but we didn't come to conclusion. My preferred way is to go either with default or whatever user specify. By your example if user specify {'ntrees':1} it does the same as RandomForestEstimator(ntrees=1). It make sense for me if you consider all of the other algorithms and their setup.

@mn-mikke preferred way would be to still apply defaults. E.g when user specify {'sample_rate':0.6} the it means RandomForestEstimator(ntrees=1, min_rows=1,sample_rate=0.6, max_depth=1). It make sense if you look on it only from AdaBoost perpective and DRF. Everybody assume AdaBoost with DRF is root with one split and user maybe want to play with it and try different "defaults". IMHO there is no assumption about AdaBoost with GBM, GLM, NN,...

My point of view is not to implement possibility for different defaults but make a user to be able to specify his own weak learner.

Only thing I know is that I have to document it perfectly, and maybe I would like to refactor weak_learner to algo and weak_learner_parms to algo_parms, to make it more clear.

What do you think @maurever, @wendycwong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, there are 3 ways parameter values can be set:

  1. user specified values (for either adaboost parameters or for weak-learner parameters that the user choose)
  2. in adaboost, it sets certain parameters to defaults. In addition, for the default algo (GBM), adaboost may choose to set the GBM parameters to certain values. Let's call these adaboost defaults;
  3. For each weak-learner, the algo will set default parameter values. Let's call these algorithm specific defaults.

So, in terms of priorities, we will always use user specified values for the parameters (adaboost or algorithm specific) as long as they are valid;

Then, for parameters that are not specified by users, we will first set them to adaboost defaults;

Next, for parameters that are not specified by users or adaboost, we will set them to algorithm specific defaults.

I think this is what the conversation chain converges to. Please let me know if you think otherwise. Thanks, W

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In private conversation, @wendycwong agrees that it should be AdaBoost default or whatever user specify.

E.g.

  • H2OAdaBoostEstimator(nlearners=2) –> weak_learner DRF with configuration
H2ORandomForestEstimator(ntrees=1, mtries=1, min_rows=1, sample_rate=1, max_depth=1)
  • H2OAdaBoostEstimator(nlearners=2, weak_learner_parms="{'ntrees':1,'max_depth':3}") –> weak_learner DRF with configuration
parameters of the weak_learner will be the same as if you call 
H2ORandomForestEstimator(ntrees=1,max_depth=3)

The reason is that AdaBoost default setting follow only convention or my idea of weak learner, but when user want to customize, then he can customize without worring about our default choices and focus only on weak_learner.

parms._min_rows = 1;
parms._ntrees = 1;
parms._sample_rate = 1;
parms._max_depth = 1;
}
return new DRF(parms);
}

private GLM getGLMWeakLearner(Frame frame) {
GLMModel.GLMParameters parms = new GLMModel.GLMParameters();
GLMModel.GLMParameters parms = useCustomWeakLearnerParameters() ? _gsonParser.fromJson(_parms._weak_learner_params, GLMModel.GLMParameters.class) : new GLMModel.GLMParameters();
parms._train = frame._key;
parms._response_column = _parms._response_column;
parms._weights_column = _weightsName;
Expand All @@ -204,26 +236,30 @@ private GLM getGLMWeakLearner(Frame frame) {
}

private GBM getGBMWeakLearner(Frame frame) {
GBMModel.GBMParameters parms = new GBMModel.GBMParameters();
GBMModel.GBMParameters parms = useCustomWeakLearnerParameters() ? _gsonParser.fromJson(_parms._weak_learner_params, GBMModel.GBMParameters.class) : new GBMModel.GBMParameters();
parms._train = frame._key;
parms._response_column = _parms._response_column;
parms._weights_column = _weightsName;
parms._min_rows = 1;
parms._ntrees = 1;
parms._sample_rate = 1;
parms._max_depth = 1;
parms._seed = _parms._seed;
if (!useCustomWeakLearnerParameters()) {
parms._min_rows = 1;
parms._ntrees = 1;
parms._sample_rate = 1;
parms._max_depth = 1;
parms._seed = _parms._seed;
}
return new GBM(parms);
}

private DeepLearning getDeepLearningWeakLearner(Frame frame) {
DeepLearningModel.DeepLearningParameters parms = new DeepLearningModel.DeepLearningParameters();
DeepLearningModel.DeepLearningParameters parms = useCustomWeakLearnerParameters() ? _gsonParser.fromJson(_parms._weak_learner_params, DeepLearningModel.DeepLearningParameters.class) :new DeepLearningModel.DeepLearningParameters();
parms._train = frame._key;
parms._response_column = _parms._response_column;
parms._weights_column = _weightsName;
parms._seed = _parms._seed;
parms._epochs = 10;
parms._hidden = new int[]{2};
if (!useCustomWeakLearnerParameters()) {
parms._epochs = 10;
parms._hidden = new int[]{2};
}
return new DeepLearning(parms);
}

Expand Down
6 changes: 6 additions & 0 deletions h2o-algos/src/main/java/hex/adaboost/AdaBoostModel.java
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,11 @@ public static class AdaBoostParameters extends Model.Parameters {
*/
public double _learn_rate;

/**
* Custom _weak_learner parameters.
*/
public String _weak_learner_params;

@Override
public String algoName() {
return "AdaBoost";
Expand All @@ -128,6 +133,7 @@ public AdaBoostParameters() {
_nlearners = 50;
_weak_learner = Algorithm.AUTO;
_learn_rate = 0.5;
_weak_learner_params = "";
}
}
}
Loading