Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory #3

Open
duany049 opened this issue Dec 15, 2023 · 1 comment
Open

CUDA out of memory #3

duany049 opened this issue Dec 15, 2023 · 1 comment

Comments

@duany049
Copy link

I used A800-80G to train PB-LLM, but there was a memory overflow. Therefore, how to train PB-LLM using A800-80G and which device do you use to train the model

blow is error log:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 79.33 GiB of which 49.81 MiB is free. Including non-PyTorch memory, this process has 79.27 GiB memory in use. Of the allocated memory 74.53 GiB is allocated by PyTorch, and 4.23 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@davidray222
Copy link

same error.. have you solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants