Collect RGB images encoded as jpeg or png containing objects that need to be detected. Make sure the training images have large variations in angle, resolution, lighting and background so that they generalize well with the test data. Use a reasonably large number if images per class to provide better results.
- Install Local Prerequisites
- Run the Setup Script
- Prepare Data for Training
- Customize Training
- Train the Model Using Watson Machine Learning
In this document $MODEL_REPO_HOME_DIR
refers to the cloned MAX model repository directory, e.g. /users/gone_fishing/MAX-Object-Detector
. The training script currently only supports the ssd_mobilenet_v1
model.
Open a terminal window, change dir into $MODEL_REPO_HOME_DIR/training
and install the Python prerequisites. (Model training requires Python 3.6 or above.)
$ cd training/
$ pip install -r requirements.txt
...
The directory contains two Python scripts, setup_max_model_training.py
and train_max_model.py
, which you'll use to prepare your environment for model training and to perform model training on Watson Machine Learning.
To perform model training, you need access to a Watson Machine Learning service instance and a Cloud Object Storage service instance on IBM Cloud. The setup_max_model_training.py
script prepares your IBM Cloud resources for model training and configures your local environment.
-
Open a terminal window.
-
Locate the training configuration file. It is named
max-object-detector-training-config.yaml
.$ ls *.yaml max-object-detector-training-config.yaml
-
Run
setup_max_model_training.py
and follow the prompts to configure model training.$ python setup_max_model_training.py max-object-detector-training-config.yaml ... ------------------------------------------------------------------------------ Model training setup is complete and your configuration file was updated. ------------------------------------------------------------------------------ Training data bucket name : object-detector-sample-input Local data directory : sample_training_data/ Training results bucket name: object-detector-sample-output Compute configuration : k80
The setup script updates the training configuration file using the information you've provided. For security reasons, confidential information, such as API keys or passwords, are not stored in this file. Instead the script displays a set of environment variables that you must define to make this information available to the training script.
-
Once setup is completed, define the displayed environment variables. The model training script
train_max_model.py
uses those variables to access your training resources.MacOS/Linux example:
$ export ML_APIKEY=... $ export ML_INSTANCE=... $ export ML_ENV=... $ export AWS_ACCESS_KEY_ID=... $ export AWS_SECRET_ACCESS_KEY=...
Microsoft Windows:
$ set ML_APIKEY=... $ set ML_INSTANCE=... $ set ML_ENV=... $ set AWS_ACCESS_KEY_ID=... $ set AWS_SECRET_ACCESS_KEY=...
If you re-run the setup script and select a different Watson Machine Learning service instance or Cloud Object Storage service instance the displayed values will change. The values do not change if you modify any other configuration setting, such as the input data bucket or the compute configuration.
You can test the model training process using the sample data in the sample_training_data
directory. To use your own data, follow the instructions in data_preparation/README.md.
-
Note the
local directory
displayed after running the setup script or directly look into local directory path configured inmax-object-detector-training-config.yaml
undertrain/data_source/train_data_local
. -
Create a folder named
initial_model
.
To initiate training using COCO pre-trained checkpoints, make sure no files are present under the folder
initial_model
.
To initiate training from the custom trained checkpoints, place the checkpoint files inside the folder
initial_model
under the configured local directory.
Checkpoint files include:
model.ckpt-<step-number>.data-00*
model.ckpt-<step-number>.index
model.ckpt-<step-number>.meta
To change the number of training steps, update the variable NUM_TRAIN_STEPS
in
training_code/train-max-model.sh
The train_max_model.py
script verifies your configuration settings, packages the model training code, uploads it to Watson Machine Learning, launches the training run, monitors the training run, and downloads the trained model artifacts.
Complete the following steps in the terminal window where the earlier mentioned environment variables are defined.
-
Verify that the training preparation steps complete successfully.
$ python train_max_model.py max-object-detector-training-config.yaml prepare ... # -------------------------------------------------------- # Checking environment variables ... # -------------------------------------------------------- ...
If preparation completed successfully:
- Training data is present in the Cloud Object Storage bucket that WML will access during model training.
- Model training code is packaged
max-object-detector-model-building-code.zip
-
Start model training.
$ python train_max_model.py max-object-detector-training-config.yaml package ... # -------------------------------------------------------- # Starting model training ... # -------------------------------------------------------- Training configuration summary: Training run name : train-max-... Training data bucket : ... Results bucket : ... Model-building archive: max-object-detector-model-building-code.zip Model training was started. Training id: model-... ...
-
Note the displayed
Training id
. It uniquely identifies your training run in Watson Machine Learning. -
Monitor training progress.
... Checking model training status every 15 seconds. Press Ctrl+C once to stop monitoring or press Ctrl+C twice to cancel training. Status - (p)ending (r)unning (e)rror (c)ompleted or canceled: ppppprrrrrrr...
To stop monitoring (but continue model training), press
Ctrl+C
once.To restart monitoring, run the following command, replacing
<training-id>
with the id that was displayed when you started model training.python train_max_model.py max-object-detector-training-config.yaml package <training-id>
To cancel the training run, press
Ctrl+C
twice.After training has completed the training log file
training-log.txt
is downloaded along with the trained model artifacts.... # -------------------------------------------------------- # Downloading training log file "training-log.txt" ... # -------------------------------------------------------- Downloading "training-.../training-log.txt" from bucket "..." to "training_output/training-log.txt" .. # -------------------------------------------------------- # Downloading trained model archive "model_training_output.tar.gz" ... # -------------------------------------------------------- Downloading "training-.../model_training_output.tar.gz" from bucket "..." to "training_output/model_training_output.tar.gz" ....................................................................................
If training was terminated early due to an error only the log file is downloaded. Inspect it to identify the problem.
$ ls training_output/ model_training_output.tar.gz trained_model/ training-log.txt
-
Return to the parent directory
$MODEL_REPO_HOME_DIR/training
.$ cd ..
Once the training run is complete, two files should be located in the $MODEL_REPO_HOME_DIR/custom_assets
directory: frozen_inference_graph.pb
and label_map.pbtxt
.
The model-serving microservice out of the box serves the pre-trained model, which was trained on COCO dataset. To serve the model trained model on your dataset you have to rebuild the Docker image:
-
Rebuild the Docker image. In
$MODEL_REPO_HOME_DIR
run$ docker build -t max-object-detector --build-arg use_pre_trained_model=false . ...
If the optional parameter
use_pre_trained_model
is set totrue
or if the parameter is not defined the Docker image will be configured to serve the pre-trained model. -
Run the customized Docker image.
$ docker run -it -p 5000:5000 max-object-detector