Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about time efficiency in the paper #3

Open
jzhzhang opened this issue Aug 16, 2024 · 0 comments
Open

Question about time efficiency in the paper #3

jzhzhang opened this issue Aug 16, 2024 · 0 comments

Comments

@jzhzhang
Copy link

Thanks for your great work.

I noticed that you have leveraged various techniques to accelerate inference. In your paper, DriveVLM utilizes MA-LMM for video input encoding, which confuses me because MA-LMM is not time-efficient for encoding large frames. For example, it takes about 500 seconds to encode 40 frames, as reported in the MA-LMM paper. This seems challenging for real-time processing. Am I missing something, or could you provide further insights?

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant