You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a small doubt regarding the model parameters of the EfficientNet-B2 with 4 attention heads. In the paper, 13.64M are reported. However, in practice, after 'removing' the final classification layers from EfficientNet and adding the multi-head attention module, I get reported 7.71M instead of 13.64M. As you can see in the following screenshot, EfficientNet-B2 parameters are immediately reduced to 7.7M after getting rid of the classification layer. On top of that, the multi-head module only has around 11.000 parameters, resulting in 7.71M.
Am I missing something? I am reporting back the number of parameters of this model for my project but I am a bit confused about it. Could you clarify this for me? :)
The text was updated successfully, but these errors were encountered:
You are correct that the EfficientNet-B2 model without attention is 7.7M, the number of params of the multi-head attention module depends on the number of the classes of the task, so it changes with the task. In the paper, we report the model size for AudioSet (527 class). Below is the detailed calculation:
The 9.2M model is the original EfficientNet B2 model for 1,000 class image classification, which does not contain an attention module. In the efficientnet_pytorch implementation, the exact number of parameters is 9.109M, after removing the last fully connected layer for image classification that has 1.409M parameters (input size of 1,408 and output size of 1,000), the EfficientNet-B2 feature extractor has 7.700M parameters. For the attention module, each head has an attention branch and a classification branch, each having 1,408\times527=0.742M parameters. Hence, the four-headed attention module has 0.742M\times2\times4=5.936M parameters. The total model size is 7.700M+5.936M=13.64M parameters.
I am sorry, I was totally missing to add the number of classes to the computation. Now everything makes sense.
Thank you again for explaining everything so clear, for answering so quickly and for your help and consideration.
Hello!
I have a small doubt regarding the model parameters of the EfficientNet-B2 with 4 attention heads. In the paper, 13.64M are reported. However, in practice, after 'removing' the final classification layers from EfficientNet and adding the multi-head attention module, I get reported 7.71M instead of 13.64M. As you can see in the following screenshot, EfficientNet-B2 parameters are immediately reduced to 7.7M after getting rid of the classification layer. On top of that, the multi-head module only has around 11.000 parameters, resulting in 7.71M.
Am I missing something? I am reporting back the number of parameters of this model for my project but I am a bit confused about it. Could you clarify this for me? :)
The text was updated successfully, but these errors were encountered: