Question about time efficiency in the paper #3

jzhzhang · 2024-08-16T10:27:34Z

Thanks for your great work.

I noticed that you have leveraged various techniques to accelerate inference. In your paper, DriveVLM utilizes MA-LMM for video input encoding, which confuses me because MA-LMM is not time-efficient for encoding large frames. For example, it takes about 500 seconds to encode 40 frames, as reported in the MA-LMM paper. This seems challenging for real-time processing. Am I missing something, or could you provide further insights?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about time efficiency in the paper #3

Question about time efficiency in the paper #3

jzhzhang commented Aug 16, 2024

Question about time efficiency in the paper #3

Question about time efficiency in the paper #3

Comments

jzhzhang commented Aug 16, 2024