Selection of Feature Fusion Methods #4

Anonymity-Aum · 2024-07-13T10:03:16Z

Why did you choose the addition method for feature fusion instead of concatenation? Is there a specific reason behind this choice, or is it based on experimental results that show the addition method performs better?
I noticed in the code that the method "Use Bidirectional Attention Fusion" is mentioned, but I couldn't find the code for FuseBiAttn. Could you please provide more information on this?

Thank you for your time and assistance.

junseok520 · 2024-07-16T14:51:46Z

We conducted experiments on various feature fusion methods, among which the main ones were ADDITION and CONCATENATION, and adopted ADDITION as the final model because it had the best sign language recognition performance in terms of WER.
The various feature fusion methods mentioned above include attention-based methods. In our initial experiments, the ATTENTION-based fusion was similar in performance to the ADDITION method, but it required excessive computational cost and GPU memory, so we did not consider it further. Therefore, the code was removed from the released repo. However, we inform you that our implementation was similar to the Nonlocal class (nonlocal_helper.py) in the slowfast repo.

Provide feedback