-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model transfer to Turbomind cost a lot of time. #3064
Comments
Which version of lmdeploy are you using? |
0.6.3 |
Can you try 0.7.0? We fixed it in 0.6.5 If |
language_model.model.layers.77.self_attn.v_proj.bias: 94%|█████████▍| 31/33 [00:00<00:00, 59.63w/s, dev=cpu] 0%| | 0/28 [00:00<?, ?w/s] 0%| | 0/7 [00:00<?, ?w/s] [TM][WARNING] [LlamaTritonModel] Convert to turbomind format: 0%| | 0/80 [00:00<?, ?it/s] Model already loaded, but the convert step costs a lot of time. |
I use 0.7.0, it's same. Sometime it can cost 70 mins. |
I did the following experiments sequentially.
My storage device is |
loading InternVL2-78B is fast, but converting to turbomind format is so slow. |
In my experiment, the time includes both the loading and converting and kv cache allocation, i.e., the time duration of creating pipeline |
I Infer with InternVL2.5-78B, when it transfer to turbomind mode, it cost a lot of time. I want to know what determines the time of this transfer step. When I use A800x4, it costs 10 mins. But when I use H800x4, it costs more than 30 mins.
The text was updated successfully, but these errors were encountered: