Skip to content

Commit

Permalink
Fix NCCL_ASYNC_ERROR_HANDLING deprecation warning
Browse files Browse the repository at this point in the history
It looks like the patch from pytorch/pytorch#114077 landed in torch 2.2.0.

Fixes NVIDIA#568.
  • Loading branch information
simonbyrne committed Nov 19, 2024
1 parent 9e96ddf commit 85733e9
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion modulus/distributed/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,11 @@ def initialize():
addr = os.getenv("MASTER_ADDR", "localhost")
port = os.getenv("MASTER_PORT", "12355")
# https://pytorch.org/docs/master/notes/cuda.html#id5
os.environ["NCCL_ASYNC_ERROR_HANDLING"] = "0"
# was changed in version 2.2
if torch.__version__ < (2, 2):
os.environ["NCCL_ASYNC_ERROR_HANDLING"] = "0"
else:
os.environ["TORCH_NCCL_ASYNC_ERROR_HANDLING"] = "0"
initialization_method = os.getenv("MODULUS_DISTRIBUTED_INITIALIZATION_METHOD")
if initialization_method is None:
try:
Expand Down

0 comments on commit 85733e9

Please sign in to comment.