Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model transfer to Turbomind cost a lot of time. #3064

Open
LaoWangGB opened this issue Jan 21, 2025 · 8 comments
Open

model transfer to Turbomind cost a lot of time. #3064

LaoWangGB opened this issue Jan 21, 2025 · 8 comments
Assignees

Comments

@LaoWangGB
Copy link

LaoWangGB commented Jan 21, 2025

I Infer with InternVL2.5-78B, when it transfer to turbomind mode, it cost a lot of time. I want to know what determines the time of this transfer step. When I use A800x4, it costs 10 mins. But when I use H800x4, it costs more than 30 mins.

@lvhan028
Copy link
Collaborator

Which version of lmdeploy are you using?

@lvhan028 lvhan028 self-assigned this Jan 21, 2025
@LaoWangGB
Copy link
Author

Which version of lmdeploy are you using?

0.6.3

@lvhan028
Copy link
Collaborator

Can you try 0.7.0? We fixed it in 0.6.5

If InternVL2.5-78B is hosted on a remote machine, the initial model loading process might be time-consuming. However, subsequent loads should be much faster, as transformers caches the model locally.

@LaoWangGB
Copy link
Author

Can you try 0.7.0? We fixed it in 0.6.5

If InternVL2.5-78B is hosted on a remote machine, the initial model loading process might be time-consuming. However, subsequent loads should be much faster, as transformers caches the model locally.

language_model.model.layers.77.self_attn.v_proj.bias: 94%|█████████▍| 31/33 [00:00<00:00, 59.63w/s, dev=cpu]
language_model.model.layers.77.self_attn.v_proj.weight: 97%|█████████▋| 32/33 [00:00<00:00, 60.83w/s, dev=cpu]

0%| | 0/28 [00:00<?, ?w/s]
language_model.model.layers.77.input_layernorm.weight: 0%| | 0/28 [00:00<?, ?w/s, dev=cpu]
language_model.model.layers.77.mlp.down_proj.weight: 4%|▎ | 1/28 [00:00<00:00, 30.22w/s, dev=cpu]
language_model.model.layers.77.post_attention_layernorm.weight: 7%|▋ | 2/28 [00:00<00:00, 60.22w/s, dev=cpu]
language_model.model.layers.78.input_layernorm.weight: 11%|█ | 3/28 [00:00<00:00, 76.72w/s, dev=cpu]
language_model.model.layers.78.mlp.down_proj.weight: 14%|█▍ | 4/28 [00:00<00:00, 102.13w/s, dev=cpu]
language_model.model.layers.78.mlp.gate_proj.weight: 18%|█▊ | 5/28 [00:00<00:00, 127.43w/s, dev=cpu]
language_model.model.layers.78.mlp.up_proj.weight: 21%|██▏ | 6/28 [00:00<00:00, 131.52w/s, dev=cpu]
language_model.model.layers.78.post_attention_layernorm.weight: 25%|██▌ | 7/28 [00:00<00:00, 132.68w/s, dev=cpu]
language_model.model.layers.78.self_attn.k_proj.bias: 29%|██▊ | 8/28 [00:00<00:00, 137.53w/s, dev=cpu]
language_model.model.layers.78.self_attn.k_proj.weight: 32%|███▏ | 9/28 [00:00<00:00, 154.57w/s, dev=cpu]
language_model.model.layers.78.self_attn.o_proj.weight: 36%|███▌ | 10/28 [00:00<00:00, 171.57w/s, dev=cpu]
language_model.model.layers.78.self_attn.q_proj.bias: 39%|███▉ | 11/28 [00:00<00:00, 172.25w/s, dev=cpu]
language_model.model.layers.78.self_attn.q_proj.weight: 43%|████▎ | 12/28 [00:00<00:00, 171.63w/s, dev=cpu]
language_model.model.layers.78.self_attn.v_proj.bias: 46%|████▋ | 13/28 [00:00<00:00, 185.75w/s, dev=cpu]
language_model.model.layers.78.self_attn.v_proj.weight: 50%|█████ | 14/28 [00:00<00:00, 183.53w/s, dev=cpu]
language_model.model.layers.79.input_layernorm.weight: 54%|█████▎ | 15/28 [00:00<00:00, 196.48w/s, dev=cpu]
language_model.model.layers.79.mlp.down_proj.weight: 57%|█████▋ | 16/28 [00:00<00:00, 194.13w/s, dev=cpu]
language_model.model.layers.79.mlp.gate_proj.weight: 61%|██████ | 17/28 [00:00<00:00, 206.08w/s, dev=cpu]
language_model.model.layers.79.mlp.up_proj.weight: 64%|██████▍ | 18/28 [00:00<00:00, 202.67w/s, dev=cpu]
language_model.model.layers.79.post_attention_layernorm.weight: 68%|██████▊ | 19/28 [00:00<00:00, 199.96w/s, dev=cpu]
language_model.model.layers.79.post_attention_layernorm.weight: 71%|███████▏ | 20/28 [00:00<00:00, 197.91w/s, dev=cpu]
language_model.model.layers.79.self_attn.k_proj.bias: 71%|███████▏ | 20/28 [00:00<00:00, 197.81w/s, dev=cpu]
language_model.model.layers.79.self_attn.k_proj.weight: 75%|███████▌ | 21/28 [00:00<00:00, 207.59w/s, dev=cpu]
language_model.model.layers.79.self_attn.o_proj.weight: 79%|███████▊ | 22/28 [00:00<00:00, 217.31w/s, dev=cpu]
language_model.model.layers.79.self_attn.q_proj.bias: 82%|████████▏ | 23/28 [00:00<00:00, 214.65w/s, dev=cpu]
language_model.model.layers.79.self_attn.q_proj.weight: 86%|████████▌ | 24/28 [00:00<00:00, 212.58w/s, dev=cpu]
language_model.model.layers.79.self_attn.v_proj.bias: 89%|████████▉ | 25/28 [00:00<00:00, 221.13w/s, dev=cpu]
language_model.model.layers.79.self_attn.v_proj.weight: 93%|█████████▎| 26/28 [00:00<00:00, 220.17w/s, dev=cpu]
language_model.model.norm.weight: 96%|█████████▋| 27/28 [00:00<00:00, 228.54w/s, dev=cpu]

0%| | 0/7 [00:00<?, ?w/s]
mlp1.0.bias: 0%| | 0/7 [00:00<?, ?w/s, dev=3]
mlp1.0.weight: 14%|█▍ | 1/7 [00:00<00:00, 29.09w/s, dev=3]
mlp1.1.bias: 29%|██▊ | 2/7 [00:00<00:00, 57.95w/s, dev=3]
mlp1.1.weight: 43%|████▎ | 3/7 [00:00<00:00, 86.50w/s, dev=3]
mlp1.1.weight: 57%|█████▋ | 4/7 [00:05<00:03, 1.29s/w, dev=3]
mlp1.3.bias: 57%|█████▋ | 4/7 [00:05<00:03, 1.29s/w, dev=3]
mlp1.3.weight: 71%|███████▏ | 5/7 [00:05<00:02, 1.04s/w, dev=3]
language_model.lm_head.weight: 86%|████████▌ | 6/7 [00:08<00:01, 1.40s/w, dev=cpu]

[TM][WARNING] [LlamaTritonModel] max_context_token_num is not set, default to 8192.
udf-pod-37382-1-7e5bc46ecbe22193:200:200 [0] NCCL INFO Bootstrap : Using eth0:10.60.64.237<0>
udf-pod-37382-1-7e5bc46ecbe22193:200:200 [0] NCCL INFO cudaDriverVersion 12060
NCCL version 2.20.5+cuda12.4
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO P2P plugin v8 IBext_v8
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO NET/IB : No device found.
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO NET/IB : No device found.
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO NET/Socket : Using [0]eth0:10.60.64.237<0>
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Using non-device net plugin version 0
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Using network Socket
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Using non-device net plugin version 0
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Using non-device net plugin version 0
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Using network Socket
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Using network Socket
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Using non-device net plugin version 0
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Using network Socket
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO comm 0x564d2354b3f0 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 85000 commId 0x55bde81ef5b2dec - Init START
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO comm 0x564d1fa53910 rank 3 nranks 4 cudaDev 3 nvmlDev 3 busId dd000 commId 0x55bde81ef5b2dec - Init START
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO comm 0x564d24a7c950 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId c4000 commId 0x55bde81ef5b2dec - Init START
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO comm 0x564d24a83530 rank 2 nranks 4 cudaDev 2 nvmlDev 2 busId d0000 commId 0x55bde81ef5b2dec - Init START
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Setting affinity for GPU 1 to 03ff0000,00000000,00000000,03ff0000,00000000
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO NVLS multicast support is available on dev 1
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Setting affinity for GPU 2 to 03ff0000,00000000,00000000,03ff0000,00000000
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO NVLS multicast support is available on dev 2
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Setting affinity for GPU 0 to 03ff0000,00000000,00000000,03ff0000,00000000
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO NVLS multicast support is available on dev 0
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Setting affinity for GPU 3 to 03ff0000,00000000,00000000,03ff0000,00000000
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO NVLS multicast support is available on dev 3
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO comm 0x564d1fa53910 rank 3 nRanks 4 nNodes 1 localRanks 4 localRank 3 MNNVL 0
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO comm 0x564d24a7c950 rank 1 nRanks 4 nNodes 1 localRanks 4 localRank 1 MNNVL 0
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO comm 0x564d2354b3f0 rank 0 nRanks 4 nNodes 1 localRanks 4 localRank 0 MNNVL 0
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO comm 0x564d24a83530 rank 2 nRanks 4 nNodes 1 localRanks 4 localRank 2 MNNVL 0
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 00/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2 [2] -1/-1/-1->3->2 [3] -1/-1/-1->3->2 [4] -1/-1/-1->3->2 [5] -1/-1/-1->3->2 [6] -1/-1/-1->3->2 [7] -1/-1/-1->3->2 [8] -1/-1/-1->3->2 [9] -1/-1/-1->3->2 [10] -1/-1/-1->3->2 [11] -1/-1/-1->3->2 [12] -1/-1/-1->3->2 [13] -1/-1/-1->3->2 [14] -1/-1/-1->3->2 [15] -1/-1/-1->3->2
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO P2P Chunksize set to 524288
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 01/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 02/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO P2P Chunksize set to 524288
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 03/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 04/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 05/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 06/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO P2P Chunksize set to 524288
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 07/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 08/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 09/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 10/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 11/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 12/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 13/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 14/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 15/16 : 0 1 2 3
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO P2P Chunksize set to 524288
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 00/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 01/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 02/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 03/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 04/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 05/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 06/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 07/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 08/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 09/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 10/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 11/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 12/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 13/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 14/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 15/0 : 3[3] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Connected all rings
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Connected all rings
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Connected all rings
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Connected all rings
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/direct pointer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO Connected all trees
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO Connected all trees
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO Connected all trees
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO Connected all trees
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO NVLS comm 0x564d24a7c950 headRank 1 nHeads 4 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 805306368
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO NVLS comm 0x564d2354b3f0 headRank 0 nHeads 4 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 805306368
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO NVLS comm 0x564d24a83530 headRank 2 nHeads 4 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 805306368
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO NVLS comm 0x564d1fa53910 headRank 3 nHeads 4 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 805306368
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 16 p2p channels per peer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 16 p2p channels per peer
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 16 p2p channels per peer
[TM][WARNING] pad vocab size from 151674 to 151676
[TM][WARNING] pad embed size from 151676 to 151676
[TM][WARNING] pad vocab size from 151674 to 151676
[TM][WARNING] pad embed size from 151676 to 151676
[TM][WARNING] pad vocab size from 151674 to 151676
[TM][WARNING] pad embed size from 151676 to 151676
[TM][WARNING] pad vocab size from 151674 to 151676
[TM][WARNING] pad embed size from 151676 to 151676
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 2025-01-22 16:07:38,618 - lmdeploy - �[33mWARNING�[0m - turbomind.py:231 - get 2409 model params
512 | 512
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 16 p2p channels per peer
udf-pod-37382-1-7e5bc46ecbe22193:200:273 [0] NCCL INFO comm 0x564d2354b3f0 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 85000 commId 0x55bde81ef5b2dec - Init COMPLETE
udf-pod-37382-1-7e5bc46ecbe22193:200:275 [2] NCCL INFO comm 0x564d24a83530 rank 2 nranks 4 cudaDev 2 nvmlDev 2 busId d0000 commId 0x55bde81ef5b2dec - Init COMPLETE
udf-pod-37382-1-7e5bc46ecbe22193:200:276 [3] NCCL INFO comm 0x564d1fa53910 rank 3 nranks 4 cudaDev 3 nvmlDev 3 busId dd000 commId 0x55bde81ef5b2dec - Init COMPLETE
udf-pod-37382-1-7e5bc46ecbe22193:200:274 [1] NCCL INFO comm 0x564d24a7c950 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId c4000 commId 0x55bde81ef5b2dec - Init COMPLETE
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo

Convert to turbomind format: 0%| | 0/80 [00:00<?, ?it/s]
Convert to turbomind format: 1%|▏ | 1/80 [00:55<1:13:32, 55.86s/it]
Convert to turbomind format: 2%|▎ | 2/80 [01:37<1:02:05, 47.77s/it]
Convert to turbomind format: 4%|▍ | 3/80 [02:19<57:55, 45.14s/it]
Convert to turbomind format: 5%|▌ | 4/80 [03:02<55:56, 44.17s/it]
Convert to turbomind format: 6%|▋ | 5/80 [03:44<54:22, 43.50s/it]
Convert to turbomind format: 8%|▊ | 6/80 [04:27<53:06, 43.06s/it]
Convert to turbomind format: 9%|▉ | 7/80 [05:08<51:53, 42.66s/it]
Convert to turbomind format: 10%|█ | 8/80 [05:50<50:46, 42.31s/it]
Convert to turbomind format: 11%|█▏ | 9/80 [06:32<49:48, 42.10s/it]
Convert to turbomind format: 12%|█▎ | 10/80 [07:13<48:57, 41.96s/it]
Convert to turbomind format: 14%|█▍ | 11/80 [07:56<48:19, 42.03s/it]
Convert to turbomind format: 15%|█▌ | 12/80 [08:35<46:44, 41.24s/it]
Convert to turbomind format: 16%|█▋ | 13/80 [09:10<43:59, 39.39s/it]
Convert to turbomind format: 18%|█▊ | 14/80 [09:46<42:16, 38.43s/it]
Convert to turbomind format: 19%|█▉ | 15/80 [10:24<41:20, 38.16s/it]
Convert to turbomind format: 20%|██ | 16/80 [11:06<41:54, 39.29s/it]
Convert to turbomind format: 21%|██▏ | 17/80 [11:46<41:41, 39.70s/it]
Convert to turbomind format: 22%|██▎ | 18/80 [12:28<41:28, 40.14s/it]
Convert to turbomind format: 24%|██▍ | 19/80 [13:08<40:49, 40.15s/it]
Convert to turbomind format: 25%|██▌ | 20/80 [13:49<40:33, 40.56s/it]
Convert to turbomind format: 26%|██▋ | 21/80 [14:30<39:59, 40.66s/it]
Convert to turbomind format: 28%|██▊ | 22/80 [15:10<39:08, 40.49s/it]
Convert to turbomind format: 29%|██▉ | 23/80 [15:53<39:11, 41.25s/it]
Convert to turbomind format: 30%|███ | 24/80 [16:36<38:53, 41.68s/it]
Convert to turbomind format: 31%|███▏ | 25/80 [17:17<38:00, 41.46s/it]
Convert to turbomind format: 32%|███▎ | 26/80 [18:00<37:52, 42.09s/it]
Convert to turbomind format: 34%|███▍ | 27/80 [18:42<37:03, 41.94s/it]
Convert to turbomind format: 35%|███▌ | 28/80 [19:23<36:07, 41.69s/it]
Convert to turbomind format: 36%|███▋ | 29/80 [20:00<34:07, 40.15s/it]
Convert to turbomind format: 38%|███▊ | 30/80 [20:35<32:20, 38.82s/it]
Convert to turbomind format: 39%|███▉ | 31/80 [21:10<30:40, 37.56s/it]
Convert to turbomind format: 40%|████ | 32/80 [21:52<30:59, 38.74s/it]
Convert to turbomind format: 41%|████▏ | 33/80 [22:33<31:03, 39.65s/it]
Convert to turbomind format: 42%|████▎ | 34/80 [23:13<30:22, 39.61s/it]
Convert to turbomind format: 44%|████▍ | 35/80 [23:49<28:54, 38.55s/it]
Convert to turbomind format: 45%|████▌ | 36/80 [24:24<27:29, 37.50s/it]
Convert to turbomind format: 46%|████▋ | 37/80 [25:02<27:04, 37.78s/it]
Convert to turbomind format: 48%|████▊ | 38/80 [25:44<27:13, 38.88s/it]
Convert to turbomind format: 49%|████▉ | 39/80 [26:25<27:07, 39.71s/it]
Convert to turbomind format: 50%|█████ | 40/80 [27:07<26:47, 40.18s/it]
Convert to turbomind format: 51%|█████▏ | 41/80 [27:48<26:13, 40.35s/it]
Convert to turbomind format: 52%|█████▎ | 42/80 [28:28<25:35, 40.40s/it]
Convert to turbomind format: 54%|█████▍ | 43/80 [29:10<25:12, 40.87s/it]
Convert to turbomind format: 55%|█████▌ | 44/80 [29:51<24:33, 40.94s/it]
Convert to turbomind format: 56%|█████▋ | 45/80 [30:33<24:07, 41.34s/it]
Convert to turbomind format: 57%|█████▊ | 46/80 [31:15<23:24, 41.32s/it]
Convert to turbomind format: 59%|█████▉ | 47/80 [32:17<26:13, 47.70s/it]
Convert to turbomind format: 60%|██████ | 48/80 [32:59<24:32, 46.01s/it]
Convert to turbomind format: 61%|██████▏ | 49/80 [33:41<23:10, 44.86s/it]
Convert to turbomind format: 62%|██████▎ | 50/80 [34:22<21:48, 43.63s/it]
Convert to turbomind format: 64%|██████▍ | 51/80 [34:59<20:01, 41.42s/it]
Convert to turbomind format: 65%|██████▌ | 52/80 [35:34<18:27, 39.55s/it]
Convert to turbomind format: 66%|██████▋ | 53/80 [36:08<17:07, 38.07s/it]
Convert to turbomind format: 68%|██████▊ | 54/80 [36:51<17:05, 39.43s/it]
Convert to turbomind format: 69%|██████▉ | 55/80 [37:33<16:46, 40.27s/it]
Convert to turbomind format: 70%|███████ | 56/80 [38:14<16:10, 40.44s/it]
Convert to turbomind format: 71%|███████▏ | 57/80 [38:55<15:31, 40.52s/it]
Convert to turbomind format: 72%|███████▎ | 58/80 [39:35<14:51, 40.52s/it]
Convert to turbomind format: 74%|███████▍ | 59/80 [40:16<14:14, 40.67s/it]
Convert to turbomind format: 75%|███████▌ | 60/80 [40:57<13:31, 40.57s/it]
Convert to turbomind format: 76%|███████▋ | 61/80 [41:37<12:51, 40.60s/it]
Convert to turbomind format: 78%|███████▊ | 62/80 [42:22<12:32, 41.80s/it]
Convert to turbomind format: 79%|███████▉ | 63/80 [43:07<12:09, 42.91s/it]
Convert to turbomind format: 80%|████████ | 64/80 [43:55<11:51, 44.45s/it]
Convert to turbomind format: 81%|████████▏ | 65/80 [44:44<11:23, 45.58s/it]
Convert to turbomind format: 82%|████████▎ | 66/80 [45:26<10:26, 44.77s/it]
Convert to turbomind format: 84%|████████▍ | 67/80 [46:11<09:41, 44.72s/it]
Convert to turbomind format: 85%|████████▌ | 68/80 [46:52<08:42, 43.58s/it]
Convert to turbomind format: 86%|████████▋ | 69/80 [47:33<07:52, 42.92s/it]
Convert to turbomind format: 88%|████████▊ | 70/80 [48:14<07:01, 42.15s/it]
Convert to turbomind format: 89%|████████▉ | 71/80 [48:55<06:16, 41.88s/it]
Convert to turbomind format: 90%|█████████ | 72/80 [49:36<05:32, 41.57s/it]
Convert to turbomind format: 91%|█████████▏| 73/80 [50:16<04:48, 41.22s/it]
Convert to turbomind format: 92%|█████████▎| 74/80 [50:59<04:09, 41.62s/it]
Convert to turbomind format: 94%|█████████▍| 75/80 [51:40<03:27, 41.59s/it]
Convert to turbomind format: 95%|█████████▌| 76/80 [52:21<02:45, 41.34s/it]
Convert to turbomind format: 96%|█████████▋| 77/80 [53:02<02:03, 41.32s/it]
Convert to turbomind format: 98%|█████████▊| 78/80 [53:43<01:22, 41.08s/it]
Convert to turbomind format: 99%|█████████▉| 79/80 [54:25<00:41, 41.46s/it]
Convert to turbomind format: 100%|██████████| 80/80 [55:07<00:00, 41.41s/it]

Model already loaded, but the convert step costs a lot of time.

@LaoWangGB
Copy link
Author

Can you try 0.7.0? We fixed it in 0.6.5

If InternVL2.5-78B is hosted on a remote machine, the initial model loading process might be time-consuming. However, subsequent loads should be much faster, as transformers caches the model locally.

I use 0.7.0, it's same. Sometime it can cost 70 mins.

@lvhan028
Copy link
Collaborator

I did the following experiments sequentially.

  • A100, TP8
    loading InternVL2-78B from a remote machine costs 172.57s

  • A100, TP4
    loading InternVL2-78B costs 66.58s. It's much faster, because transformers caches the model in the first experiment

My storage device is nvme.

@LaoWangGB
Copy link
Author

I did the following experiments sequentially.

  • A100, TP8
    loading InternVL2-78B from a remote machine costs 172.57s
  • A100, TP4
    loading InternVL2-78B costs 66.58s. It's much faster, because transformers caches the model in the first experiment

My storage device is nvme.

loading InternVL2-78B is fast, but converting to turbomind format is so slow.

@lvhan028
Copy link
Collaborator

In my experiment, the time includes both the loading and converting and kv cache allocation, i.e., the time duration of creating pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants