Postprocess taking ages #2486

gnscc · 2023-10-10T09:07:06Z

gnscc
Oct 10, 2023

I have trained an rtmdet instance segmentation model. When I compile the model to run it with the C++ sdk the inference time goes up to 150+ ms in a 3060 GPU, whereas in python in the normal mmdet inference it takes about 30 ms. I have analysed the profile output and the preprocess step is taking about 135 ms, which I consider a lot and does not meet my runtime requirements. It just seem odd to me to have such a difference in Python than in C++.

I have already trained the model with recommended postprocess optimizations, here is my test_cfg:

test_cfg=dict(
        mask_thr_binary=0.5,
        max_per_img=20,
        min_bbox_size=0,
        nms=dict(iou_threshold=0.6, type='nms'),
        nms_pre=200,
        score_thr=0.5)

And here is the profile output:

+-----------------------------+--------+-------+--------+---------+---------+---------+
|            name             | occupy | usage | n_call | t_mean  |  t_50%  |  t_90%  |
+=============================+========+=======+========+=========+=========+=========+
| ./Pipeline                  | -      | -     | 21     | 168.443 | 166.635 | 173.876 |
+-----------------------------+--------+-------+--------+---------+---------+---------+
|     Preprocess/Compose      | -      | -     | 21     | 0.921   | 0.889   | 0.983   |
+-----------------------------+--------+-------+--------+---------+---------+---------+
|         LoadImageFromFile   | 0.005  | 0.005 | 21     | 0.784   | 0.770   | 0.821   |
+-----------------------------+--------+-------+--------+---------+---------+---------+
|         Resize              | 0.000  | 0.000 | 21     | 0.031   | 0.030   | 0.032   |
+-----------------------------+--------+-------+--------+---------+---------+---------+
|         Pad                 | 0.000  | 0.000 | 21     | 0.019   | 0.016   | 0.017   |
+-----------------------------+--------+-------+--------+---------+---------+---------+
|         Normalize           | 0.000  | 0.000 | 21     | 0.030   | 0.028   | 0.029   |
+-----------------------------+--------+-------+--------+---------+---------+---------+
|         Pad                 | 0.000  | 0.000 | 21     | 0.002   | 0.002   | 0.003   |
+-----------------------------+--------+-------+--------+---------+---------+---------+
|         DefaultFormatBundle | 0.000  | 0.000 | 21     | 0.044   | 0.033   | 0.059   |
+-----------------------------+--------+-------+--------+---------+---------+---------+
|         Collect             | 0.000  | 0.000 | 21     | 0.007   | 0.007   | 0.009   |
+-----------------------------+--------+-------+--------+---------+---------+---------+
|     rtmdet                  | 0.189  | 0.189 | 21     | 31.810  | 31.629  | 31.655  |
+-----------------------------+--------+-------+--------+---------+---------+---------+
|     postprocess             | 0.805  | 0.805 | 21     | 135.679 | 134.025 | 141.250 |
+-----------------------------+--------+-------+--------+---------+---------+---------+

I would appreciate a lot some tips or some guidance in how to optimize postprocess in sdk inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postprocess taking ages #2486

{{title}}

Replies: 0 comments

Select a reply

Postprocess taking ages #2486

gnscc Oct 10, 2023

Replies: 0 comments

gnscc
Oct 10, 2023