You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that you have leveraged various techniques to accelerate inference. In your paper, DriveVLM utilizes MA-LMM for video input encoding, which confuses me because MA-LMM is not time-efficient for encoding large frames. For example, it takes about 500 seconds to encode 40 frames, as reported in the MA-LMM paper. This seems challenging for real-time processing. Am I missing something, or could you provide further insights?
The text was updated successfully, but these errors were encountered:
Thanks for your great work.
I noticed that you have leveraged various techniques to accelerate inference. In your paper, DriveVLM utilizes MA-LMM for video input encoding, which confuses me because MA-LMM is not time-efficient for encoding large frames. For example, it takes about 500 seconds to encode 40 frames, as reported in the MA-LMM paper. This seems challenging for real-time processing. Am I missing something, or could you provide further insights?
The text was updated successfully, but these errors were encountered: