You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
May I ask why you perform rending transformer by the entire feature map instead of pixels? Does it work well if you do rendering transformer the same way for grounding transformer?
Thank you very much!
The text was updated successfully, but these errors were encountered:
@coolbay Thanks, very good question. There are two main reasons why we do this: 1. The transformer itself is a very computationally intensive operation. It is not necessary to use so many transformers in FPT; 2. It is actually meaningless to render a high-level object with the attributes of another distant low-level object or pixel positions.
May I ask why you perform rending transformer by the entire feature map instead of pixels? Does it work well if you do rendering transformer the same way for grounding transformer?
Thank you very much!
The text was updated successfully, but these errors were encountered: