Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于多机多卡效果不如单机多卡好的问题 #111

Open
DePengW opened this issue May 9, 2024 · 1 comment
Open

关于多机多卡效果不如单机多卡好的问题 #111

DePengW opened this issue May 9, 2024 · 1 comment

Comments

@DePengW
Copy link

DePengW commented May 9, 2024

你好,我在sft阶段训练llama-7b版本时候发现个问题,训练超参数保持一致(lr、step、weight_decay、warmup等)

设置1:当使用8 x a100,per_device_train_batch_size=16,gradient_accumulation_steps=2,totol_batch_size = 128
设置2:当使用2 x 8 x a100,per_device_train_batch_size=16,gradient_accumulation_steps=1,totol_batch_size = 128

设置1的各个指标都比设置2好,请问这个问题您们有关注过么?

我使用gemma-2b也会出现同样的情况,这可能是关于多机多卡和单机多卡性能的问题,您们之前有注意过么?以及有什么解决方案么

@charlesCXK
Copy link

@DePengW
我跑 llava-1.5 的时候也发现了同样的问题,保持各种超参数一致,4 node 的结果不如 1 node 的。感觉自己做实验的时候保持node 数量一样就行。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants