-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
大数据量训练的时候卡住 #12
Comments
系统内存多大呢?以及Trainer的参数是多少? |
内存是 1.3T,trainer 大致是这样,内存的 peak 一般是在 merge 时候出现么 trainer = Trainer(order=6, max_vocab_size=80000, min_count=32, isolate_digits=True)
trainer.train(corpus_instance, workers=128, batch_size=2000) Edit |
同有这个现象,测试是200G的wudao数据,内存是1.0T,观察到内存的峰值使用率是100%,似乎是爆内存了
所以应当如何限制一下内存吗? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
20GB 数据量可以正常训练,100GB 在跑到某一步的时候会卡住。
bytepiece==0.6.3
。某个 thread 的堆栈信息,看不出来,直接问 GPT 似乎是多进程的问题:
The text was updated successfully, but these errors were encountered: