AMPL pipeline parameters (options)

The AMPL pipeline contains many parameters and options to fit models and make predictions. The parameters have been organized in the following sections:

Training Dataset Parameters

bucket


Description:	Name of datastore bucket. Specific to LLNL datastore system.
Default:	gsk_ml

dataset_key


Description:	Datastore key (LLNL system) or file path for dataset.

dataset_name


Description:	Parameter for overriding the output files/dataset object names. Default is set within model_pipeline.

dataset_oid


Description:	OID of the model dataset inserted into the datastore. Specific to LLNL datastore system.

datastore


Description:	Boolean flag for using an input file from the LLNL specific datastore system based on a key of dataset_key
Default:	FALSE
Type:	Bool

id_col


Description:	Name of column containing compound IDs. Will default to "compound_id" if not specified
Default:	compound_id

min_compound_number


Description:	Minimum number of dataset compounds considered adequate for model training. A warning message will be issued if the dataset size is less than this.
Default:	200
Type:	int

response_cols


Description:	name of column(s) containing response values. Will default to last column if not specified. Can be input as a string of comma separated values or as a comma separated list (e.g. 'column1','column2'). Multitask models will be generated when multiple columns are specified.

save_results


Description:	Save model results to MongoDB. LLNL model_tracker system specific
Default:	FALSE
Type:	BOOL

smiles_col


Description:	Name of column containing SMILES strings. Will default to "rdkit_smiles" if not specified
Default:	rdkit_smiles

Model Building Parameters

Autoencoders

autoencoder_bucket


Description:	datastore bucket for the autoencoder file. Specific to LLNL datastore system. TODO: Not yet implemented
Default:	gsk_ml

autoencoder_key


Description:	Base of key for the autoencoder. TODO: Not yet implemented

autoencoder_type


Description:	Type of autoencoder being used as features. TODO: not yet implemented
Default:	molvae

mol_vae_model_file


Description:	Trained model HDF5 file path, only needed for MolVAE featurizer

Classifiers

class_name


Description:	User specified list of names of each class

class_number


Description:	User specified number of classes. This is required for NN models but inferred for RF and XGBoost models.
Default:	2
Type:	int

Descriptors

descriptor_bucket


Description:	datastore bucket for the descriptor file. Specific to LLNL datastore system.
Default:	gskdata

descriptor_key


Description:	Base of key for descriptor table file. Subset files will be prepended with "subset" and appended with the dataset name. Specific to LLNL datastore system.

descriptor_oid


Description:	dataset_oid for the descriptor file in the datastore. Specific to LLNL datastore system.

descriptor_spec_bucket


Description:	Bucket where descriptor specification is located for a descriptor type. Specific to LLNL datastore system.
Default:	public

descriptor_spec_key


Description:	Datastore key or file path for a table specifying descriptor columns for each descriptor type. Specific to LLNL datastore system.
Default:	descriptor_sets_sources_by_descr_type.csv

descriptor_type


Description:	Type of descriptors being used as features, e.g. moe, dragon7, used when featurizer = "computed_descriptors". Sets the subclass within featurizer.py
Default:	moe
Options:	'moe', 'mordred_filtered', and 'rdkit_raw' are recommended. See atomsci/ddm/data/descriptor_sets_sources_by_descr_type.csv for more.

ECFP

ecfp_radius


Description:	Radius used for ECFP generation
Default:	2
Type:	int

ecfp_size


Description:	Size of ECFP bit vectors
Default:	1024
Type:	int

General

featurizer


Description:	Type of featurizer to use on chemical structures. Current supported options: ["ecfp","graphconv","molvae","computed_descriptors","descriptors"]. Further information on descriptors are in descriptor_type. Options are used to set the featurization subclass in the create_featurization method of featurization.py. Can be input as a comma separated list for hyperparameter search (e.g. 'ecfp','molvae')
Type:	str

model_choice_score_type


Description:	Type of score function used to choose best epoch and/or hyperparameters (defaults to "roc_auc" for classification and "r2" for regression).

model_type


Description:	Type of model to fit (NN, RF, or xgboost). The model_type sets the model subclass in model_wrapper. Can be input as a comma separated list for hyperparameter search (e.g. 'NN','RF','xgboost')
Type:	str

prediction_type


Description:	Sets the prediction type of the model to a choice between ["regression","classification"]. Used as a flag for model behavior throughout the pipeline.
Default:	regression
Type:	choice

previously_featurized


Description:	Boolean flag for loading in previously featurized data files. If set to True, the method get_featurized_data within model_datasets will attempt to load the featurized dataset associated with the given dataset_oid parameter
Default:	TRUE
Type:	Bool

uncertainty


Description:	Boolean flag for computing uncertainty estimates for regression model predictions. Will also change the default values for dropouts if set to True.
Default:	TRUE
Type:	Bool

verbose


Description:	True/False flag for setting verbosity
Default:	FALSE
Type:	Bool

production


Description:	True/False flag for training models in production mode. The entire dataset is used in training, validation, and test. If using training epocs
the model will train for max_epochs regardless of validation error.
Default:	FALSE
Type:	Bool

Graph Convolution

optimizer_type


Description:	Optimizer specific for graph conv, defaults to "adam"
Default:	adam

Mordred

mordred_cpus


Description:	Max number of CPUs to use for Mordred descriptor computations. None means use all available
Type:	int

Neural Networks

baseline_epoch


Description:	Baseline epoch at which to evaluate performance for DNN models
Default:	30
Type:	int

batch_size


Description:	Sets the model batch size within model_wrapper
Default:	50
Type:	int

bias_init_consts

Description:

Comma-separated list of initial bias parameters per layer for dense NN models with conditional values. Defaults to [1.0]*len(layer_sizes). Must be same length as layer_sizes. Can be input as a space-separated list of comma-separated lists for hyperparameters. Hyperparameter example: '1.0,1.0 0.9,0.9 0.8,0.9' Default behavior is set within __init__ method of DCNNModelWrapper. Defaults: all:[1.0,1.0]

dropouts

Description:

Comma-separated list of dropout rates per layer for NN models with default values conditional on featurizer. Default behavior is controlled in model_wrapper.py. Must be same length as layer_sizes. Can be input as a space-separated list of comma-separated lists for hyperparameters (e.g. '0.4,0.4 0.2,0.2 0.3,0.3'). Default behavior is set within __init__ method of DCNNModelWrapper. Defaults: graphconv: [0,0,0], non-graphconv:[0.40,0.40]

Type:

list

layer_sizes


Description:	Comma-separated list of layer sizes for NN models with default values conditional on featurizer. Must be same length as layer_sizes. Can be input as a space-separated list of comma-separated lists for hyperparameters (e.g. '64,16 200,100 1000,500'). Default behavior is set within __init__ method of DCNNModelWrapper. Defaults: graphconv: [64,64,128], ecfp: [1000,500], descriptors: [200,100]
Type:	list

learning_rate


Description:	Learning rate for dense NN models. Input as comma separated floats for hyperparameters (e.g. '0.0005,0.0004,0.0003')
Default:	0.0005

max_epochs


Description:	Maximum number of training epochs to run for DNN models. Default 30.
Default:	30
Type:	int

weight_decay_penalty


Description:	weight_decay_penalty: float. The magnitude of the weight decay penalty to use. Can be input as a comma separated list of strings for hyperparameter search (e.g. '0.0001,0.0002,0.0003') default 0.0001
Default:	0.0001

weight_decay_penalty_type


Description:	weight_decay_penalty_type: str. The type of penalty to use for weight decay, either "l1" or "l2". Can be input as a comma separated list for hyperparameter search (e.g. 'l1,l2') default: "l2"
Default:	l2
Type:	str

weight_init_stddevs


Description:	Comma-separated list of standard deviations per layer for initializing weights in dense NN models with conditional values. Must be same length as layer_sizes. Can be input as a space-separated list of comma-separated lists for hyperparameters (e.g. '0.001,0.001 0.002,0.002 0.03,003'). Default behavior is set within __init__ method of DCNNModelWrapper. Defaults: all: [0.02,0.02]
Default:	[0.02]*len(param.layer_size)

Random Forests

rf_estimators


Description:	Number of estimators to use in random forest models. Hyperparameter searching requires 3 inputs: start, end, step when used with search_type geometric or grid (example: '100,500,100') or can be input as a list of possible values for search_type user_specified (example: '100,200,300,400,500')
Default:	500

rf_max_depth


Description:	The maximum depth of a decision tree in the random forest. Hyperparameter searching requires 3 inputs: start, end, step when used with search_type geometric or grid (example: '4,7,1') or can be input as a list of possible values for search_type user_specified (example: '4,5,6,7')

rf_max_features


Description:	Max number of features to split random forest nodes. Hyperparameter searching requires 3 inputs: start, end, step when used with search_type geometric or grid (example: '16,32,4') or can be input as a list of possible values for search_type user_specified (example: '16,20,24,28,32')
Default:	32

Hybrid model

is_ki


Description:	True/False flag for noting whether the dose-response activity is Ki or XC50, if it is True, the following ki_convert_ratio is also needed to convert Ki into IC50 and to single concentration activity.
Default:	False

ki_convert_ratio


Description:	To convert Ki into IC50, a ratio is needed. It can be the ratio of [S]/Km for enzymatic inhibition assays, [S] is the concentration of substrate Km is the Michaelis constant. It can also be [S]/Kd for radioligand competitive binding, [S] is the concentration of the radioligand, Kd is its dissociation constant. The [S] and Kd/Km should have the same unit so that the ratio is unitless.
Default:	None

loss_func


Description:	The loss function used in the hybrid model training, currently support poisson and l2
Default:	poisson

Splitting

base_splitter


Description:	Type of splitter to use for train/validation split if temporal split used for test set. May be random, scaffold, or ave_min. The allowable choices are set in splitter.py
Default:	scaffold
Type:	str

butina_cutoff


Description:	cutoff Tanimoto similarity for clustering in Butina splitter. TODO: will be implemented when DeepChem updates their butina splitter. TODO rename to butina_cutoff in v2
Default:	0.18
Type:	float

cutoff_date


Description:	Cutoff date for test set compounds in temporal splitter TODO: Needs some formatting guidelines
Type:	str

date_col


Description:	Column in dataset containing dates for temporal splitter
Type:	str

num_folds


Description:	Number of k-folds to use in k-fold cross validation
Default:	5
Type:	int

previously_split


Description:	Boolean flag for loading in previously split train, validation, and test csv files.
Default:	FALSE
Type:	bool

split_strategy


Description:	Choice of splitting type between "k_fold_cv" for k fold cross validation and "train_valid_test" for a normal train/valid/test split. If split_test_frac or split_valid_frac are not set, "train_valid_test" sets are split according to the model type default
Default:	train_valid_test
Type:	Choice

split_test_frac


Description:	Fraction of data to put in held-out test set for train_valid_test split strategy. TODO: Behavior of split_test_frac is dependent on the DeepChem model_wrapper.
Default:	0.1
Type:	float

split_uuid


Description:	UUID for csv file containing train, validation, and test split information

split_valid_frac


Description:	Fraction of data to put in validation set for train_valid_test split strategy. TODO: Behavior of split_valid_frac is dependent on the DeepChem model_wrapper.
Default:	0.1
Type:	float

splitter


Description:	Type of splitter to use: index, random, scaffold, butina, ave_min, temporal, fingerprint, multitaskscaffold, or stratified. Used to set the splitting.py subclass. Can be input as a comma separated list for hyperparameter search (e.g. 'scaffold','random')
Default:	scaffold
Type:	str

mtss_num_super_scaffolds


Description:	This specifies the number of genes in a chromosome for the genetic algorithm. Scaffolds bins are often very small and only contain 1 compound. Scaffolds are therefore combined into super scaffolds to the number of genes and also reduce complexity and runtime.
Default:	40
Type:	int

mtss_num_generations


Description:	The number of generations the genetic algorithm will run.
Default:	20
Type:	int

mtss_num_pop


Description:	Size of population per generation in the genetic algorithm.
Default:	100
Type:	int

mtss_train_test_dist_weight


Description:	How much weight to give the tanimoto distance between training and test partitions.
Default:	1.0
Type:	float

mtss_train_valid_dist_weight


Description:	How much weight to give the tanimoto distance between training and valid partitions.
Default:	1.0
Type:	float

mtss_response_distr_weight


Description:	How much weight to give to matching the response value distributions between split subsets.
Default:	1.0
Type:	float

mtss_split_fraction_weight


Description:	How much weight to give adherence to requested subset franctions.
Default:	1.0
Type:	float

Transformers

feature_transform_type


Description:	type of transformation for the features
Default:	normalization
Type:	Choice

response_transform_type


Description:	type of transformation for the response column (defaults to "normalization") TODO: Not currently implemented
Default:	normalization

transformer_bucket


Description:	Datastore bucket where the transformer is stored. Specific to LLNL datastore system.
Default:	gsk_ml

transformer_key


Description:	Path to a saved transformer (stored as tuple, e.g. (transform_features, transform_response))
Type:	str

transformer_oid


Description:	Dataset oid of the transformer saved in the datastore. Specific to LLNL datastore system.

transformers


Description:	Boolean switch for using transformation on regression output. Default is True
Default:	TRUE
Type:	Bool

UMAP

umap_dim


Description:	Dimension of projected feature space, if UMAP transformation is requested. Can be input as a comma separated list for hyperparameter search (e.g. '2,6,10').
Default:	10

umap_metric


Description:	Distance metric used, if UMAP transformation is requested. Can be input as a comma separated list for hyperparameter search (e.g. 'euclidean','cityblock')
Default:	euclidean

umap_min_dist


Description:	Minimum distance used in UMAP projection, if UMAP transformation is requested. Can be input as a comma separated list for hyperparameter search (e.g. '0.01,0.02,0.05')
Default:	0.05

umap_neighbors


Description:	Number of nearest neighbors used in UMAP projection, if UMAP transformation is requested. Can be input as a comma separated list for hyperparameter search (e.g. '10,20,30')
Default:	20

umap_targ_wt


Description:	Weight given to training set response values in UMAP projection, if UMAP transformation is requested. Can be input as a comma separated list for hyperparameter search (e.g. '0.0,0.1,0.2')
Default:	0.0

XGBoost

xgb_colsample_bytree


Description:	Subsample ratio of columns when constructing each tree. Can be input as a comma separated list for hyperparameter search (e.g. '0.8,0.9,1.0')
Default:	1.0

xgb_gamma


Description:	Minimum loss reduction required to make a further partition on a leaf node of the tree. Can be input as a comma separated list for hyperparameter search (e.g. '0.0,0.1,0.2')
Default:	0.0

xgb_learning_rate


Description:	Boosting learning rate (xgboost's "eta"). Can be input as a comma separated list for hyperparameter search (e.g. '0.1,0.01,0.001')
Default:	0.1

xgb_max_depth


Description:	Maximum tree depth for base learners. Can be input as a comma separated list for hyperparameter search (e.g. '4,5,6')
Default:	6

xgb_min_child_weight


Description:	Minimum sum of instance weight(hessian) needed in a child. Can be input as a comma separated list for hyperparameter search (e.g. '1.0,1.1,1.2')
Default:	1.0

xgb_n_estimators


Description:	Number of estimators to use in xgboost models. Can be input as a comma separated list for hyperparameter search (e.g. '100,200,300')
Default:	100

xgb_subsample


Description:	Subsample ratio of the training instance. Can be input as a comma separated list for hyperparameter search (e.g. '0.8,0.9,1.0')
Default:	1.0

Additional DeepChem Models and Featurizers

As of version 1.3 AMPL partially supports several DeepChem models. It is possible to train and predict using these models, but they are not currently integrated with the hyperparameter search wrapper.

Models

AMPL supports the following models:

AttentiveFPModel
GCNModel
GraphConvModel
MPNNModel
PytorchMPNNModel

These models can be selected by using the model_type paramter e.g. "model_type":"AttentiveFPModel". Parameters for each model can be passed in by prefixing the parameter with the name of the model.

    "comment": "Model",
    "comment": "----------------------------------------",
    "model_type": "AttentiveFPModel",
    "AttentiveFPModel_num_layers":"3",
    "AttentiveFPModel_learning_rate": "0.0007",
    "AttentiveFPModel_n_tasks": "1",

Featurizers

AMPL supports the following DeepChem featurizers:

MolGraphConvFeaturizer
WeaveFeaturizer
ConvMolFeaturizer

Each DeepChem model expects a specific featurizer. Model/Featurizer compatibility is listed in this table. Featurizers can be specified by setting the featurizer parameter. Featurizer parameters are passed in the same way as model parameters.

    "comment": "Features",
    "comment": "----------------------------------------",
    "featurizer":"MolGraphConvFeaturizer",
    "MolGraphConvFeaturizer_use_edges":"True",

Model Saving

collection_name


Description:	MongoDB collection to save model results in. Specific to LLNL model tracker system.
Default:	model_tracker

data_owner


Description:	Option for setting group permissions for created files. Options: ['username', 'data_owner_group', 'gsk', 'public']. Specific to LLNL model tracker system.
Default:	gsk

data_owner_group


Description:	When data_owner is set to data_owner_group, this is the option for custom group name of created files. Specific to LLNL model tracker system.
Default:	gskcraa

model_bucket


Description:	Bucket in the datastore for the model. Specific to LLNL model tracker system.
Default:	gsk_ml
Type:	str

model_dataset_oid


Description:	OID of the model dataset inserted into the datastore. Specific to LLNL model tracker system

model_filter


Description:	Path to the model filter configuration file. Is loaded and stored as a dictionary. Specific to LLNL model tracker system.

model_uuid


Description:	UUID generated after model creation (pythonic_ID). Specific to LLNL model tracker system.
Type:	str

output_dir


Description:	File location where the model output will be saved. Defaults to <result_dir>/ TODO: this parameter is redundant with result_dir

result_dir


Description:	Parent of directory where result files will be written, defaults to '/usr/local/data'
Default:	/usr/local/data/

Model Metadata

system


Description:	Computational system you are running on, LC or twintron-blue. LLNL system specific
Default:	twintron-blue
Type:	str

Miscellaneous

config_file


Description:	Full path to the optional configuration file. The configuration file is a set of parameters in .json file format. TODO: Does not send a warning if set concurrently with other parameters.

num_model_tasks


Description:	DEPRECATED AND IGNORED. This argument is now infered from the response_cols. Number of tasks to run for. 1 means a singletask model, > 1 means a multitask model
Default:	1
Type:	int

Hyperparameter Optimization

dropout_list


Description:	Comma-separated list of dropout rates for permutation of NN layers (e.g. '0.0,0.4,0.6'). Used within permutate_NNlayer_combo_params to return combinations from layer_nums, node_nums, dropout_list and max_final_layer_size. dropout_list is used to set the allowable permutations of dropouts. For hyperparameters only.

hyperparam


Description:	Boolean flag to indicate whether we are running the hyperparameter search script
Default:	FALSE

hyperparam_uuid


Description:	UUID of hyperparam search run model was generated in. Not applicable for single-run jobs. Specific to LLNL model tracker system.

layer_nums


Description:	Comma-separated list of number of layers for permutation of NN layers. (e.g. '2,3,4'). Used within permutate_NNlayer_combo_params to return combinations from layer_nums, node_nums, dropout_list and max_final_layer_size. layer_nums is used to set the allowable lengths of layer_sizes. For hyperparameters only.

lc_account


Description:	SLURM account to charge hyperparameter batch runs to. This will be replaced by the slurm_account option. If lc_account and slurm_account are both set, slurm_account will be used. If set to None then this parameter will not be used.
Default:	baasic

max_final_layer_size


Description:	The max number of nodes in the last layer within layer_sizes and dropouts in hyperparameter search; max_final_layer_size = min(node_nums) if min(node_nums) > max_final_layer_size. (e.g. '16,32'). Used within permutate_NNlayer_combo_params to return combinations from layer_nums, node_nums, dropout_list and max_final_layer_size.
Default:	32

node_nums


Description:	Comma-separated list of number of nodes per layer for permutation of NN layers. (e.g. '4,8,16'). Used within permutate_NNlayer_combo_params to return combinations from layer_nums, node_nums, dropout_list and max_final_layer_size. node_num is used to set the node values within layer_sizes. For hyperparameters only.

max_jobs


Description:	Max number of jobs to be in the queue at one time for an LC machine. Specific to LLNL system.
Default:	80
Type:	int

nn_size_scale_factor


Description:	Scaling factor for constraining network size based on number of parameters in the network for hyperparam search
Default:	1
Type:	float

python_path


Description:	Path to desired python version
Default:	This defaults to the Python instllation used to parse the JSON file. This is done by using sys.executable

rerun


Description:	After parameter combos have been generated, `rerun=False` will check the model tracker to see if a model with a particular param combination has already been built. If it’s been built, do not create a new model or submit a slurm job. If `rerun=True`, the check will be skipped completely and a slurm job will be submitted regardless of whether a model has previously been built with these parameters. Specific to hyperparameter search.
Default:	TRUE
Type:	Bool

script_dir


Description:	Path where pipeline file you want to run hyperparam search from is located
Default:	.

search_type


Description:	Type of hyperparameter search to do. Options = [grid, random, geometric, hyperopt and user_specified]
Default:	grid

shortlist_key


Description:	CSV file of assays of interest. Specific to LLNL model tracker system.

slurm_account


Description:	SLURM account to charge hyperparameter batch runs to. This will replace the lc_account option. If lc_account and slurm_account are both set, slurm_account will be used. If set to None then this parameter will not be used.
Default:	None

slurm_export


Description:	SLURM environment variables propagated for hyperparameter search batch jobs. If set to None then this parameter will not be used.
Default:	ALL

slurm_nodes


Description:	Number of nodes for hyperparameter search batch jobs. If set to None then this parameter will not be used.
Default:	1
Type:	int

slurm_options


Description:	Additional SLURM options for hyperparameter search batch jobs. Example: '--option1=value1 --option2=value2'. If set to None then this parameter will not be used.
Default:	None

slurm_partition


Description:	SLURM partition to run hyperparameter batch runs on. If set to None then this parameter will not be used.
Default:	pbatch

slurm_time_limit


Description:	Time limit in minutes for hyperparameter search batch jobs.
Default:	1440
Type:	int

split_only


Description:	Boolean flag used with model_pipeline.py to indicate splitting of the datasets when running the hyperparameter search
Default:	FALSE
Type:	bool

use_shortlist


Description:	Use a list of assays. Specific to LLNL model tracker system.
Default:	FALSE
Type:	Bool

Bayesian Optimization

Search Domain Specifications

The following parameters are used to specify the search domains for certain model parameters in a Bayesian hyperparameter optimization. Each search domain parameter is tied to a specific model parameter. Only a subset of model parameters may be optimized in this way, but more will be supported in future releases. See the hyperopt package documentation at https://github.com/hyperopt/hyperopt/wiki/FMin#2-defining-a-search-space to learn more about the search domain format.

lr


Description:	Search domain for NN model `learning_rate` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `choice\|0.0001,0.0005,0.0002,0.001`. See https://github.com/ATOMScience-org/AMPL#hyperparameter-optimization
Default:	None

dp


Description:	Search domain for NN model `dropouts` parameter in Bayesian Optimization. The format is `scheme\|num_layers\|parameters`, e.g. `uniform\|3\|0,0.4`, Note that the number of layers (number between two \|) can not be changed during optimization, if you want to try different number of layers, just run several optimizations.
Default:	None

ls


Description:	Search domain for NN model `layer_sizes` parameter in Bayesian Optimization. The format is `scheme\|num_layers\|parameters`, e.g. `uniformint\|3\|8,512`, Note that the number of layers (number between two \|) can not be changed during optimization, if you want to try different number of layers, just run several optimizations.
Default:	None

rfe


Description:	Search domain for RF model `rf_num_estimators` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `uniformint\|8,512`.
Default:	None

rfd


Description:	Search domain for RF model `rf_max_depth` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `uniformint\|8,512`.
Default:	None

rff


Description:	Search domain for RF model `rf_max_features` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `uniformint\|8,200`.
Default:	None

xgbg


Description:	Search domain for XGBoost model `xgb_gamma` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `loguniform\|-9.2,-4.6`.
Default:	None

xgbl


Description:	Search domain for XGBoost model `xgb_learning_rate` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `loguniform\|-4.6,-2.3`.
Default:	None

xgbd


Description:	Search domain for XGBoost model `xgb_max_depth` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `uniformint\|3,10`.
Default:	None

xgbc


Description:	Search domain for XGBoost model `xgb_colsample_bytree` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `uniform\|0.1,1.0`.
Default:	None

xgbs


Description:	Search domain for XGBoost model `xgb_subsample` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `uniform\|0.1,1.0`.
Default:	None

xgbn


Description:	Search domain for XGBoost model `xgb_n_estimators` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `uniformint\|200,1000`.
Default:	None

xgbw


Description:	Search domain for XGBoost model `xgb_min_child_weight` parameter in Bayesian Optimization. The format is `scheme\|parameters`, e.g. `uniform\|0.5,2.0`.
Default:	None

Checkpointing parameters

hp_checkpoint_save


Description:	binary file to save a checkpoint of the HPO trial project, which can be use to continue the HPO search later.
Default:	None

hp_checkpoint_load


Description:	binary file to load a checkpoint of a previous HPO trial project, to continue the HPO search.
Default:	None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PARAMETERS.md

PARAMETERS.md

AMPL pipeline parameters (options)

Table of contents

Training Dataset Parameters

Model Building Parameters

Autoencoders

Classifiers

Descriptors

ECFP

General

Graph Convolution

Mordred

Neural Networks

Random Forests

Hybrid model

Splitting

Transformers

UMAP

XGBoost

Additional DeepChem Models and Featurizers

Models

Featurizers

Model Saving

Model Metadata

Miscellaneous

Hyperparameter Optimization

Bayesian Optimization

Search Domain Specifications

Checkpointing parameters

Files

PARAMETERS.md

Latest commit

History

PARAMETERS.md

File metadata and controls

AMPL pipeline parameters (options)

Table of contents

Training Dataset Parameters

Model Building Parameters

Autoencoders

Classifiers

Descriptors

ECFP

General

Graph Convolution

Mordred

Neural Networks

Random Forests

Hybrid model

Splitting

Transformers

UMAP

XGBoost

Additional DeepChem Models and Featurizers

Models

Featurizers

Model Saving

Model Metadata

Miscellaneous

Hyperparameter Optimization

Bayesian Optimization

Search Domain Specifications

Checkpointing parameters