Why perform rendering transformer by the entire feature map? #16

coolbay · 2020-12-01T08:49:35Z

May I ask why you perform rending transformer by the entire feature map instead of pixels? Does it work well if you do rendering transformer the same way for grounding transformer?

Thank you very much!

dongzhang89 · 2020-12-04T09:01:12Z

@coolbay Thanks, very good question. There are two main reasons why we do this: 1. The transformer itself is a very computationally intensive operation. It is not necessary to use so many transformers in FPT; 2. It is actually meaningless to render a high-level object with the attributes of another distant low-level object or pixel positions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why perform rendering transformer by the entire feature map? #16

Why perform rendering transformer by the entire feature map? #16

coolbay commented Dec 1, 2020

dongzhang89 commented Dec 4, 2020

Why perform rendering transformer by the entire feature map? #16

Why perform rendering transformer by the entire feature map? #16

Comments

coolbay commented Dec 1, 2020

dongzhang89 commented Dec 4, 2020