-
Notifications
You must be signed in to change notification settings - Fork 1
Chapter 5: Writing Modules
The following chapter presents the modules interfaces and explains how to write custom modules for ASK.
Each kind of module has a strictly defined interface. For instance, every source module expects one configuration, one input and one output file. This design makes possible to use as a source module any module implementation adhering to its interface. This section describes the interface for each kind of module.
Except if noted otherwise, each time a file is used in the modules below the format must be in ASK's data exchange format.
A Bootstrap module selects the initial batch of points measured by ASK.
Interface: bootstrap <configuration> <output_file>
Input:
-
configuration
: ASK configuration file
Output:
-
output_file
: path of the output file. The output file contains a list of requested factors to be measured; therefore, it has no response column.
A source module computes the actual measures for the requested factors and returns the response.
Interface: source <configuration> <requested_file> <output_file>
Input:
-
configuration
: ASK configuration file -
requested_file
: path of the requested file containing the factors, which need to be measured
Output:
-
output_file
: path of the output file. The output must include the factors passed on the requested file with an additional column containing the measured response for each factor combination.
A Model module builds a surrogate model for the experiment on the sampled points. It builds a function that predicts the response for every factor combination. The model predicts the unknown points, that is to say, the not measured ones. Model modules are special, in that their output is module dependent. For instance, the GBM model produces General Boosted Models, the TGP model produces Tree Gaussian Process models. All the model modules distributed within ASK come in two parts:
-
model_build, which allows to build the model
-
model_predict, which allows to use the model to predict unknown points
Interface: model_build <configuration> <labelled_file> <output_file>
Input:
-
configuration
: ASK configuration file -
labelled_file
: path of the labelled file containing a list of measured points. It therefore contains factors columns and a response column.
Output:
-
output_file
: path of the output file. The output file format depends on the module, it may be or may not be in ASK's data exchange format. The output file contains the built model.
Interface: model_predict <configuration.conf> <model> <requested_file> <output_file>
Input:
-
configuration
: ASK configuration file -
model_file
: path of the model file, produced by the associatedmodel_build
command -
requested_file
: path of the labelled file containing a list of points to predict, therefore the response column is missing
Output:
-
output_file
: path of the output file. The output file contains the same points passed in the requested_file with a response column filled with the model’s predictions.
A Sampler module selects a new batch of points to measure for every iteration after the first. Bootstrap modules are in charge of selecting the first iteration’s batch of points.
Interface: sampler <configuration> <input_file> <output_file>
Input:
-
configuration
: ASK configuration file -
input_file
: path of the input file containing the points that have already been measured an a column with their responses
Output:
-
output_file
: path of the output file containing the new list of points to measure; therefore, it has no response column
A Control module decides when the sampling process ends. Two basic strategies are included in ASK: stopping when a predefined amount of points has been sampled or stopping when the accuracy improvement stays under a given threshold for a number of iterations.
Interface: points <configuration> <labelled_file> <model_file>
Input:
-
configuration
: ASK configuration file -
labelled_file
: path of the labelled file. It contains the points that have already been measured, it has a response column -
model_file
: path of the model file generated by the model module
Output:
- Control modules should return exit code 254 to stop the experiment, or any other code to pursue the experiment.
A Reporter module produces detailed statistics about the sampling.
Interface: reporter <configuration> <iteration> <labelled_file> <newly_labelled_file> <model_file>
Input:
-
configuration
: ASK configuration file -
iteration
: iteration number, the bootstrap iteration is numbered 0 -
labelled_file
: path of the labelled file containing the points that have already been measured, it has a response column -
newly_labelled_file
: path of the newly labelled file containing the points that have been measured in the last iteration, it has a response column -
model_file
: path of the model file generated by the model module
Output:
- None required.
Custom modules may be written in any chosen language as long as they adhere to the modules interfaces. As an example, the following section shows how to design a simple bootstrap module baptized Grid.
The grid module selects points regularly distributed in a grid. To make things simpler, it will only work with integer factors. The module is written in Python.
The grid module takes one parameter, grid_size
, which is the grid size. Supposing the design space has two factors x and y varying between 0 and 10, a grid size of 2 will select points (0,0), (0,2), (0,4) ... (0,10), (2,0), (2,2) ... (10,10).
The first step is writing a parser adhering to the Bootstrap interface:
bootstrap <configuration> <output_file>
In the example, the argparse module will be used:
#!/usr/bin/env python
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Sample the design space in a"
" grid fashion")
parser.add_argument("configuration")
parser.add_argument("output_file")
args = parser.parse_args()
After parsing the command line arguments, the module needs to parse the configuration file. ASK already provides a library for this purpose in python and in R.
from common.configuration import Configuration
conf = Configuration(args.configuration)
The conf object can be used to retrieve configurations values. To retrieve the grid_size
parameter, one can use:
grid_size = conf("modules.bootstrap.params.grid_size")
In the above code, if the parameter is missing, or is of the wrong type, the module fails and an error is logged. Fortunately, the conf object handles type checking and default values substituting the above code with:
grid_size = conf("modules.bootstrap.params.grid_size", int, 1)
Now, if the parameter is of the wrong type, an appropriate message is raised. Moreover, if the grid_size
parameter is missing, the default 1 value is used.
Then the module retrieves the factors’s configuration and checks that only integer factors were used:
from common.util import fatal
factors = conf("factors")
for f in factors:
if f["type"] != "integer":
fatal("Grid bootstrap only works with integer factors.")
Finally, pass all the input information to an appropriate function:
grid_bootstrap(args.output_file, grid_size, factors)
The previous code ensures that all the arguments are correctly parsed and strictly adheres to the Bootstrap interface. All that remains is writing the logic of the module. First, for each factor, the list of values in the grid is computed.
def grid_bootstrap(output, grid_size, factors):
from itertools import product
coords = []
for f in factors:
coords.append(xrange(f["range"]["min"], f["range"]["max"] + 1, grid_size))
Then, the module computes a Cartesian product of the previous values and writes the coordinates in ASK's data exchange format:
out = file(output, "w")
for c in product(*coords):
out.write(" ".join(map(str,c)) + "\n")
out.close()
The custom module is now complete. It can be used as a drop-in replacement for an existing bootstrap module. For example, changing the bootstrap module to “grid” inside examples/face/experiment.conf
with a grid_size of 20, produces the following report: