Keras implementation of RetinaNet object detection as described in Focal Loss for Dense Object Detection by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár.
- Clone this repository.
- In the repository, execute
python setup.py install --user
. Note that due to inconsistencies with howtensorflow
should be installed, this package does not define a dependency ontensorflow
as it will try to install that (which at least on Arch Linux results in an incorrect installation). Please make suretensorflow
is installed as per your systems requirements. Also, make sure Keras 2.1.2 is installed. - As of writing, this repository requires the master branch of
keras-resnet
(runpip install --user --upgrade git+https://github.com/broadinstitute/keras-resnet
). - Optionally, install
pycocotools
if you want to train / test on the MS COCO dataset by runningpip install --user git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
.
keras-retinanet
can be trained using this script.
Note that the train script uses relative imports since it is inside the keras_retinanet
package.
If you want to adjust the script for your own use outside of this repository,
you will need to switch it to use absolute imports.
If you installed keras-retinanet
correctly, the train script will be installed as retinanet-train
.
However, if you make local modifications to the keras-retinanet
repository, you should run the script directly from the repository.
That will ensure that your local changes will be used by the train script.
For training on Pascal VOC, run:
# Running directly from the repository:
keras_retinanet/bin/train.py pascal <path to VOCdevkit/VOC2007>
# Using the installed script:
retinanet-train pascal <path to VOCdevkit/VOC2007>
For training on MS COCO, run:
# Running directly from the repository:
keras_retinanet/bin/train.py coco <path to MS COCO>
# Using the installed script:
retinanet-train coco <path to MS COCO>
For training on a custom dataset, a CSV file can be used as a way to pass the data. See below for more details on the format of these CSV files. To train using your CSV, run:
# Running directly from the repository:
keras_retinanet/bin/train.py csv <path to csv file containing annotations> <path to csv file containing classes>
# Using the installed script:
retinanet-train csv <path to csv file containing annotations> <path to csv file containing classes>
In general, the steps to train on your own datasets are:
- Create a model by calling for instance
keras_retinanet.models.ResNet50RetinaNet
and compile it. Empirically, the following compile arguments have been found to work well:
model.compile(
loss={
'regression' : keras_retinanet.losses.regression_loss,
'classification': keras_retinanet.losses.focal_loss()
},
optimizer=keras.optimizers.adam(lr=1e-5, clipnorm=0.001)
)
- Create generators for training and testing data (an example is show in
keras_retinanet.preprocessing.PascalVocGenerator
). - Use
model.fit_generator
to start training.
An example of testing the network can be seen in this Notebook. In general, output can be retrieved from the network as follows:
_, _, detections = model.predict_on_batch(inputs)
Where detections
are the resulting detections, shaped (None, None, 4 + num_classes)
(for (x1, y1, x2, y2, cls1, cls2, ...)
).
Loading models can be done in the following manner:
from keras_retinanet.models.resnet import custom_objects
model = keras.models.load_model('/path/to/model.h5', custom_objects=custom_objects)
Execution time on NVIDIA Pascal Titan X is roughly 55msec for an image of shape 1000x600x3
.
The CSVGenerator
provides an easy way to define your own datasets.
It uses two CSV files: one file containing annotations and one file containing a class name to ID mapping.
The CSV file with annotations should contain one annotation per line. Images with multiple bounding boxes should use one row per bounding box. Note that indexing for pixel values starts at 0. The expected format of each line is:
path/to/image.jpg,x1,y1,x2,y2,class_name
Some images may not contain any labeled objects.
To add these images to the dataset as negative examples,
add an annotation where x1
, y1
, x2
, y2
and class_name
are all empty:
path/to/image.jpg,,,,,
A full example:
/data/imgs/img_001.jpg,837,346,981,456,cow
/data/imgs/img_002.jpg,215,312,279,391,cat
/data/imgs/img_002.jpg,22,5,89,84,bird
/data/imgs/img_003.jpg,,,,,
This defines a dataset with 3 images.
img_001.jpg
contains a cow.
img_002.jpg
contains a cat and a bird.
img_003.jpg
contains no interesting objects/animals.
The class name to ID mapping file should contain one mapping per line. Each line should use the following format:
class_name,id
Indexing for classes starts at 0. Do not include a background class as it is implicit.
For example:
cow,0
cat,1
bird,2
The MS COCO model can be downloaded here. Results using the cocoapi
are shown below (note: according to the paper, this configuration should achieve a mAP of 0.343).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.325
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.513
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.342
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.149
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.354
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.465
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.288
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.437
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.464
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.263
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.510
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623
Example output images using keras-retinanet
are shown below.
- This repository requires Keras 2.1.2.
- This repository is tested using OpenCV 3.3.
- Warnings such as
UserWarning: Output "non_maximum_suppression_1" missing from loss dictionary.
can safely be ignored. These warnings indicate no loss is connected to these outputs, but they are intended to be outputs of the network for the user (ie. resulting network detections) and not loss outputs.
Contributions to this project are welcome.
Feel free to join the #keras-retinanet
Keras Slack channel for discussions and questions.