You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the Yolo paper, Yolo actually uses 7x7 cells for predicting bboxes and doing classification. So after performing several convs and poolings, the activation maps will have the dimension of 7x7 (and actually the depth of 1024). The origin resolution of the image is 448x448. After four 2-stride-pooling layers and two 2-stride-conv layers (which can be found in the inference() function), the activation maps will be down-sampled into 448/(2^4)/(2^2) = 7. So the final activation maps will have the shape of 7x7x1024 which will be flattened into 49*1024 (in local() function) and fed into the FC layer.
Why you set the first fully connected layer's input dim 49 * 1024?
The text was updated successfully, but these errors were encountered: