-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck during parallel inference. #3057
Comments
Please share the env information by running |
Here is the output of lmdeploy check_env
TorchVision: 0.19.1+cu121 Legend: X = Self |
Can you share the reproducible code snippet too? |
I may not be able to provide the complete code because this is a complex project, but essentially, I am performing normal inference using a VLM with a batch size of 8. `def batch_generate_entities_and_relations( model :
` |
Can you set
You can find |
Same problem when Infer 78B with H800 * 4. When I use lmdeploy ==0.6.3, it occurs occasionally, but definitely occurs using lmdeploy ==0.7.0. |
So, do I need to keep running until the “stuck” situation occurs, or do I only need the INFO logs when creating the pipeline? |
Set |
Here is the log |
I am performing parallel inference with a batch size of 8 on a machine with 4 * A6000 GPUs. However, after running inference for a while, it gets stuck and stops responding. Meanwhile, nvidia-smi shows the following situation:
The text was updated successfully, but these errors were encountered: