You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used A800-80G to train PB-LLM, but there was a memory overflow. Therefore, how to train PB-LLM using A800-80G and which device do you use to train the model
blow is error log:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 79.33 GiB of which 49.81 MiB is free. Including non-PyTorch memory, this process has 79.27 GiB memory in use. Of the allocated memory 74.53 GiB is allocated by PyTorch, and 4.23 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered:
I used A800-80G to train PB-LLM, but there was a memory overflow. Therefore, how to train PB-LLM using A800-80G and which device do you use to train the model
blow is error log:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 79.33 GiB of which 49.81 MiB is free. Including non-PyTorch memory, this process has 79.27 GiB memory in use. Of the allocated memory 74.53 GiB is allocated by PyTorch, and 4.23 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered: