You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why did you choose the addition method for feature fusion instead of concatenation? Is there a specific reason behind this choice, or is it based on experimental results that show the addition method performs better?
I noticed in the code that the method "Use Bidirectional Attention Fusion" is mentioned, but I couldn't find the code for FuseBiAttn. Could you please provide more information on this?
Thank you for your time and assistance.
The text was updated successfully, but these errors were encountered:
We conducted experiments on various feature fusion methods, among which the main ones were ADDITION and CONCATENATION, and adopted ADDITION as the final model because it had the best sign language recognition performance in terms of WER.
The various feature fusion methods mentioned above include attention-based methods. In our initial experiments, the ATTENTION-based fusion was similar in performance to the ADDITION method, but it required excessive computational cost and GPU memory, so we did not consider it further. Therefore, the code was removed from the released repo. However, we inform you that our implementation was similar to the Nonlocal class (nonlocal_helper.py) in the slowfast repo.
Why did you choose the addition method for feature fusion instead of concatenation? Is there a specific reason behind this choice, or is it based on experimental results that show the addition method performs better?
I noticed in the code that the method "Use Bidirectional Attention Fusion" is mentioned, but I couldn't find the code for FuseBiAttn. Could you please provide more information on this?
Thank you for your time and assistance.
The text was updated successfully, but these errors were encountered: