Data+Train+Evaluate+App 4in1 repo within the paper [中文版] [English]
This is a repository that includes Pedestrian-Detection-on-YOLOv3_Research-and-APP, a 2020 undergraduate graduation project, ALL codes. The graduation project which has the Data+Train+Evaluate+App 4in1 repo Coded and paper Wrote by Ziqiang Xu from Jiangnan University.
Pedestrian Detection is a subset of Object Detection which only have one class of person. It aim to find out all pedestrians in the image or video's each frame, expressed location and size with bounding-boxes, just like this :
YOLO (You Look Only Once) is an advanced real-time object detection method. It is famous for processing images only once to get both location and classification, compared with previous object detection methods, while having similar accuracy with the state-of-the-art method, YOLO run faster.
This project researches Pedestrian Detection on YOLOv3 including Data-convert, keras-Train(keras-yolo3@qqwweee) and model-Evaluate. Finally I also build a Web App base on Flask to realize the visualization of pedestrian detection results of the real-time webcam, image, or video (whose language is chinese, but you can easily use by following 5. Web App or just translating).
For the whole project, I use python 3.6, and for packages needed :
pip install -r requirements.txt
official link : http://cocodataset.org/
Download link (2017 about 19GB) :
http://images.cocodataset.org/zips/train2017.zip
http://images.cocodataset.org/zips/val2017.zip
http://images.cocodataset.org/annotations/annotations_trainval2017.zip
annotation just use instances_train2017.json
+instances_val2017.json
official link : http://host.robots.ox.ac.uk:8080/pascal/VOC/
Download link (07+12 about 2.7GB) :
https://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar
https://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar
https://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar
after unzip just use /JPEGImages
+/Annotations
PS: unzip *.tar
file just like
tar xf *.tar
official link : http://pascal.inrialpes.fr/data/human/ down
Download link (about 969MB) :
-
official link : ftp://ftp.inrialpes.fr/pub/lear/douze/data/INRIAPerson.tardown -
Baidu Cloud Disk(中文) : https://pan.baidu.com/s/12TYw-8U9sxz9cUu2vxzvGQ password: jxqu
-
Google Drive : https://drive.google.com/file/d/1wTxod2BhY_HUkEdDYRVSuw-nDuqrgCu7/view?usp=sharing
after unzip just use /Train
+/Test
for train & test dataset divide
train = (COCO train) + (VOC 07+12 train+val) + (INRIA train)
test = (COCO val) + (VOC 07 test) + (INRIA test)
These dataset's annotation format is not same, so convert Ground-Truth to a Uniform format.
The train and test set are stored as a TXT file respectively, One row for One image, just like the following (without [ ]):
[image_path] [bbox1] [bbox2] [bbox3] … [bboxn]
bbox format:
[x_min,y_min,x_max,y_max,0]
PS: 0 is class id,because only one class(person), 0 is enough
For example:
./data/train/000000391895.jpg 339,22,492,322,0 471,172,506,220,0
./data/train/000000309022.jpg
./data/test/000000397133.jpg 388,69,497,346,0 0,262,62,298,0
For ./"2. Data", script are prepared individually by dataset. And the required directory structure 📂 is shown below before run python script.
./"2. Data"
├─COCO
│ ├─annotations
│ ├─train2017
│ └─val2017
├─VOC
│ └─VOCdevkit
│ ├─VOC2007
│ └─VOC2012
├─INRIA
| └─INRIAPerson
| ├─Test
| └─Train
├─coco_annotation.py
├─voc_annotation.py
└─INRIA_annotation.py
All python files are append write, so run separately is ok.
python coco_annotation.py
python voc_annotation.py
python INRIA_annotation.py
PS: the script will copy images to a new folder (./data/train or ./data/test)
After run script, the directory structure that will be added in the 2. Data folder is as follows
+
├─data
│ ├─train
│ └─test
├─train.txt
└─test.txt
✅ Almost full copy from keras-yolo3@qqwweee
By the way, job-batch32-95gpu4.sh
is the script to submit job for training model. Just as a souvenir, do not care :)
Here is the YOLOv3 network structure I draw for the project (by visio).
According to the codes+papers+blogs+experiments, I also conclude a loss function (for reference only).
In fact, the total loss is equal to the sum of the loss values of the 3 layer output feature map. This is the loss function of the output feature map of any layer.
core copy from keras-yolo3/kmeans.py@qqwweee, just ADD visualization
Before get anchors, that require train dataset Ground-Truth file (default train.txt
) from 2.4 Batch processing
python anchor_kmeans.py
After run script, the directory structure that will be added in the 3. Train folder is as follows
+
├─Anchors-by-Kmeans.png
├─yolo_anchors.txt
└─kmeans.npy
Anchors-by-Kmeans.png
is the visualization picture file.
yolo_anchors.txt
is exactly the file to store the anchors needed for training.
kmeans.npy
stores the Kmeans result, it helps to avoid repeated Kmeans when changing the visualization style.
Here is one of the anchors with relatively high accuracy, which is 65.81%, and it's also the train anchors I used. (Accuracy is the average value of IOU of each boundary-box and clustering center. The larger it is, the better the clustering effect of k-means will be.)
Anchors: (6,11), (12,22), (19,40), (33,65), (49,132), (68,81), (94,189), (170,287), (335,389)
follow keras-yolo3@qqwweee/README.md
1. Download YOLOv3 pre-trained weight file here (237 MB) from YOLO website. Or just run this:
wget https://pjreddie.com/media/files/yolov3.weights
2. Convert the Darknet YOLO model weights(*.cfg+*.weights) to a Keras model weights(*.h5).
python convert_yolov3_weight_darkent2keras.py -w yolov3.cfg yolov3.weights yolo_weights.h5
1. create class file(person_classes.txt
), each line is a class name, so just fill in one line: person.
vi person_classes.txt
2. make a folder (model_data
) place configuration file and pretrained weights.
mkdir model_data
3. move person_classes.txt
, yolo_anchors.txt
and yolo_weights.h5
to model_data
.
mv person_classes.txt yolo_anchors.txt yolo_weights.h5 ./model_data
Before training, don't forget move train dataset (/data/train
) and image_path+annotation file (train.txt
) from 2.4 Batch processing to current directory (3. Train
)
And then use keras-yolo3@qqwweee/train.py's default Train config, just save a weights per 10 epoch.
For train&val, All training data(train.txt
) are shuffled randomly, and divide train:val = 9:1.
For 1 training stage, Train with frozen layers first, to get a stable loss.
- freeze all but 3 output layers
- optimizer=Adam()
- learning_rate=1e-3
- epochs=50
For 2 training stage, Unfreeze and continue training, to fine-tune.
- Unfreeze all of the layers
- optimizer=Adam()
- learning_rate=1e-4
- epochs=300-50
- reduce_lr: When val_loss does not decrease for 3 epoch, the learning rate is reduced by 10% each time.
- early_stopping: When val_loss does not decrease for 10 epochs, the training is stopped early.
Now we can finally train !!!
python train.py
For the model I trained, I push it on pan.baidu.com
and drive.google.com
, it's a weight that has been trained 109 epochs.(235.4MB)
Download link:
-
Baidu Cloud Disk(中文) : https://pan.baidu.com/s/130J8c5B9RQILSDKAFKyK1A password: wmht
-
Google Drive : https://drive.google.com/file/d/1zxcpjklHKQ6hULa8HmeKIQqnkH_h_NSI/view?usp=sharing
The log was stored through the TensorBoard while training. So, just extract the log of loss and draw the loss curve.
The first parameter is the path to log of the first 50 epochs. The second parameter is the path to the rest of log after 50 epochs.
NOTE: Pay attention to the order
python Tensorboard_log_loss_visualization.py \
logs/batch32/events.out.tfevents.1586104689.r1cmpsrvs79-14ig0702 \
logs/batch32/events.out.tfevents.1586671921.r1cmpsrvs79-14ig0702
Here is the loss curve plot I trained model, it's 1 to 109 because it stopped early after 109 epoch. Finally, loss=9.05, val_loss=9.54.
Before Evaluating, require move test dataset (/data/test
) and image_path+annotation file (test.txt
) from 2.4 Batch processing to current folder (4. Evaluate
)
And of course, You need to make sure that the model weights(3.5 True Train process) are in the 4. Evaluate/model/
folder, default name: trained_weights_final.h5
NOTE: Using PASCAL metric, (predict bounding_boxes and Ground_Truth)'s IOU > 0.5 is good, to evaluate the detector. And filter box score is 0.3, NMS threshold is 0.45.
speed : average+max+min second per image, each image's time it takes
quality :
- exactly correct image number
- error image number
- missing image number
- Error&Missing image number
- Ground Truth number
- prediction bounding-box number
- number of correct prediction(good detection bounding-box)
- Precision
- Recall
Put ALL in testSet_eva.py
testSet_eva.py
's main func call func evaluate()
, which has two parameters:
IsVsImg
: Whether to draw Ground-Truth and predict BB(bounding-box) together on the image and store new image to a new location(4. Evaluate/test_eval/vs/
)IsErrorMissingCorrect
: Whether to copy the missing, error, correct image(with GT&pred_BB) to the new location and record the information to a TXT file(4. Evaluate/test_eval/ErrorMissingCorrect.txt
)
If IsVsImg==False and IsErrorMissingCorrect==False: will only print and save basic metric test result without getting any resulting images
default : IsVsImg=True,IsErrorMissingCorrect=True
The program will store the predict BB results, so you can run again without repeating predictions.
python testSet_eva.py
After run script, the directory structure that will be added in the 4. Evaluate folder is as follows
+
└─test_eval
├─correct
│ └─...
├─error
│ └─...
├─missing
│ └─...
├─vs
│ └─...
├─ErrorMissingCorrect.txt
├─Ground_Truth.npy
├─predict_bb.npy
└─pre_bb_sec.npy
vs
include all test set images with drawing Ground-Truth and predict bounding-box together.
correct
is the exactly correct image(with GT&pred_BB) folder. error
is the image(with GT&pred_BB) folder where image includes error prediction BB. missing
is the image(with GT&pred_BB) folder where image includes missing prediction BB. ErrorMissingCorrect.txt
stores missing, error, correct images info(including pred_num, gt_num, good_num)
Ground_Truth.npy
stores the Ground_Truth results, which helps to avoid repeated Ground_Truth extracted when runing again. predict_bb.npy
is the prediction BB results file, which helps run again without repeating predictions.
pre_bb_sec.npy
stores each image's time it takes to process.
Draw the run-time statistics, and the parameter(-p
or -pre_bb_sec_file
) is the path to the file of each image's seconds it takes to process prediction bounding-box.(default='test_eval/pre_bb_sec.npy'
)
python run_time_statistics.py -p test_eval/pre_bb_sec.npy
Here is the statistics of image number of run-time range on the test set(2.2 Data distribution) by the trained model(3.5 True Train process). By the way, GPU is a NVIDIA Tesla V100 SXM2 32GB
.
Don't care first image for Keras loading, it take about 3.2s >> others, remove it. So, there are total 10692 images, max=95ms, min=28ms, average=34ms ≈ 29fps, and 30-40ms is the most: 94%.
For testSet_eva.py
's print, there are some results to share.
There are total 10693 test images. At the level of images, display as following.
correct | error | missing | Error&Missing | |
---|---|---|---|---|
image number | 8234 | 1545 | 1798 | 884 |
percent | 77% | 14% | 17% | 8% |
At the level of bounding-box, there are 16820 Ground Truth, predictions bounding_boxes number is 15339 and the number of correct(good) prediction is 12160. Finally, precision and recall and so on are shown in the following table.
Precision | Recall | Error Rate | Miss Rate |
---|---|---|---|
79.28% | 72.29% | 20.72% | 27.72% |
Using the PR curve drawing tool and AP computing tool provided by Object-Detection-Metrics@rafaelpadilla.
PS: Only modify the plot and savePath part.
First, we need to prepare the ground truth bounding boxes files and detected bounding boxes files(score=0) according to Object-Detection-Metrics/README.md#how-to-use-this-project@rafaelpadilla's request.
python testSet_PR_AP_raw_bb_data.py
After run script, the directory structure that will be added in the 4. Evaluate folder is as follows
+
└─PRcurve_AP_raw_bb_data
├─gt
│ └─...
└─pre
└─...
And then we can plot PR-curve and compute AP.
python pascalvoc.py -gt ./PRcurve_AP_raw_bb_data/gt -det ./PRcurve_AP_raw_bb_data/pre -sp ./PR_AP_results
After run script, the directory structure that will be added in the 4. Evaluate folder is as follows
+
└─PR_AP_results
├─person.png
└─results.txt
person.png
is the PR-curve image. results.txt
includes AP computed result.
Here is the PR-curve for the trained model on test set. And its AP is 85.80%.
official link : http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/
In the case that the training set of the model does not contain the pictures in the Caltech dataset, make predictions on its test set, use the matlab tool provided by Caltech to draw the MR-FPPI curve, and download other algorithm test results for comparison.
The following workflow main copy from https://blog.csdn.net/qq_33614902/article/details/82622561 中文 and https://www.jianshu.com/p/6f3cf522d51b 中文
Caltech person dataset download link : http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/datasets/USA/
The testing data (set06-set10) consists of five sets, ~1GB each.
# download toolbox
git clone https://github.com/pdollar/toolbox pdollar_toolbox
# download extract/evaluation code
wget http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/code/code3.2.1.zip
mkdir code
unzip code3.2.1.zip -d code
# for video data
cd code
mkdir data-USA
cd data-USA
mkdir videos
move testing data (set06-set10) after decompression to code/data-USA/videos
.
open code/dbExtract.m
, add Toolbox path.
addpath(genpath('../pdollar_toolbox'));
And then just run code/dbExtract.m
to get test images in code/data-USA/images
.
python Caltech_predict_BB.py
After run script, the directory structure that will be added in the 4. Evaluate/code/data-USA/ folder is as follows
+
└─res
└─YOLOv3
├─set06
│ └─...
├─set07
│ └─...
├─set08
│ └─...
├─set09
│ └─...
└─set10
└─...
set06~set10
is the result folder of the prediction bounding-box according to the official requirements. Relative to the location of the image to store a prediction result TXT file with the same name, which include one bounding box per line. (line format is [x_min,y_min,width,height,score]
)
# download annotations
wget http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/datasets/USA/annotations.zip
unzip annotations.zip
move annotations/
to 4. Evaluate/code/data-USA/
.
Download other Algorithm's results to compare. link: http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/datasets/USA/res
After download, move these results to 4. Evaluate/code/data-USA/res/
too.
Finally, we can run part_algo-not_pdf-dbEval.m
. (select some algorithms and fix it work without Ghostscript and pdfcrop)
Here is the result to share. On the Caltech Reasonable test set, the YOLOv3 MR reached 19% when FPPI=0.1. Without training Caltech data, such result is not bad.
Prediction bounding-box: red | Ground-Truth: green
❗ 🔔 ❗ NOTE: The Web App currently only allows one client to access!!!
Development Environment:
- OS: Microsoft Windows 10 Home --version 10.0.18363
- CPU: Intel(R) Core(TM) i7-8550U [email protected] 1.99 GHz
- Memory: 8.00GB
- GPU: Interl(R) UHD Graphics 620 (Core graphics card)
NOTE: For no discrete graphics card, run keras slow, which takes 3~5s to process one image. It is unbearable in development and debugging, so I convert Keras weights to cpu-friendly Darkent format.
cd "keras2darknet_&_simpleEvaluate"
Before convert, we get weights rather than model from Train, so we change weights(trained_weights_final.h5
) to model(trained_model_final.h5
) first.
And of course, You need to make sure that the model weights(3.5 True Train process) are in the 5. App/keras2darknet_&_simpleEvaluate/model/
folder, default name: trained_weights_final.h5
python keras-yolo3_weights2model.py
After runing, you will get trained_model_final.h5
in the model/
directory.
And then, let's convert. model/yolo-person.cfg
is altered from yolov3.cfg
, which only modify anchors and last convolutional filters to fit the project.
python keras2darknet.py
After runing, you will get converted Darknet weights (yolov3-keras2darknet.weights
) in the model/
directory.
Now, just do some sample evaluation.
move test dataset (/data/test
) and image_path+annotation file (test.txt
) from 2.4 Batch processing to current folder (5. App/keras2darknet_&_simpleEvaluate/
)
python testSet_darknet-out-model_eva.py
testSet_darknet-out-model_eva.py
is almost exactly the same with 4. Evaluate/testSet_eva.py
. The only difference is that testSet_darknet-out-model_eva.py
use yolov3_opencv_dnn_detect.py
to predict bounding-box, but the interface is the same.
Here are some results to share.
There are total 10693 images, max=2.02s, min=0.98s, average=1.47s ≈ 0.68fps. It is much faster than Keras running on a computer without a independent GPU.
For quality, at the level of 10693 images, display as following.
correct | error | missing | Error&Missing | |
---|---|---|---|---|
Keras | 8234(77%) | 1545(14%) | 1798(17%) | 884(8%) |
Darknet | 8206(77%) | 1579(15%) | 1840(17%) | 932(9%) |
At the level of bounding-box, there are 16820 Ground Truth, predictions bounding_boxes number is 15183 and the number of correct(good) prediction is 12101. Finally, precision and recall and so on are shown in the following table.
Precision | Recall | Error Rate | Miss Rate | |
---|---|---|---|---|
Keras | 79.28% | 72.29% | 20.72% | 27.72% |
Darknet | 79.70% | 71.94% | 20.30% | 28.06% |
In terms of these two level quality metrics, the detection effect of the converted model is few different from that of the original model, so it can be used.
cd server
Of course, You need to make sure that the model weights(5.1 Keras to Darknet) are in the 5. App/server/model/
folder, default name: yolov3-keras2darknet.weights
.
PS:
- I also provided the keras detection file
keras_yolov3_detect.py
. If you would like to detect by keras, please make sure the model weights(3.5 True Train process) are in the5. App/server/model/
folder, default name:trained_weights_final.h5
and modifyconfig.ini
'sdetection_method
fromdarknet
tokeras
. - If you want to run without detecting person, please modify
config.ini
'sdetect_person
fromtrue
tofalse
.
Now, let's run the server.
python runserver.py
use IP:port
(xx.xx.xx.xx:5000) or just localhost:5000 to access the web app.
PS: If access by IP:port
, you have to add your IP
to ip_white_list.yml
, OR directly change config.ini
's use_ip_white_list
to false
.
There are 4 functions in it.
- Server Camera(remote webcam)
- Client Camera(local webcam)
- Image(online preview)
- Video(upload/download)
This page you can choose server camera. On my computer running the server, there are 2 cameras.
click to enter anyone, you would see like this:
The second option is client camera, there are 3 method to realize, see code for details 😛.
For only one camera on the client.
For multiple cameras on the client device.
Original web page
Drop and Submit
Initial
After processing done
The project has a detailed README and reproducible codes which takes two weeks to organize them.
This project researches Pedestrian Detection on YOLOv3. A lot of references to others' code, thanks here. Whether Data-convert, keras-Train or model-Evaluate and Web App, I did the best I could and they all constitute my 3 months life.
The university is about to end, I will also enter a new life, I wish myself a bright future~~