You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
设置1:当使用8 x a100,per_device_train_batch_size=16,gradient_accumulation_steps=2,totol_batch_size = 128
设置2:当使用2 x 8 x a100,per_device_train_batch_size=16,gradient_accumulation_steps=1,totol_batch_size = 128
你好,我在sft阶段训练llama-7b版本时候发现个问题,训练超参数保持一致(lr、step、weight_decay、warmup等)
设置1:当使用8 x a100,per_device_train_batch_size=16,gradient_accumulation_steps=2,totol_batch_size = 128
设置2:当使用2 x 8 x a100,per_device_train_batch_size=16,gradient_accumulation_steps=1,totol_batch_size = 128
设置1的各个指标都比设置2好,请问这个问题您们有关注过么?
我使用gemma-2b也会出现同样的情况,这可能是关于多机多卡和单机多卡性能的问题,您们之前有注意过么?以及有什么解决方案么
The text was updated successfully, but these errors were encountered: