Improve Guidance for Using DDP in examples/pytorch
#35667
Labels
Feature request
Request for a new feature
examples/pytorch
#35667
Feature request
The examples in
examples/pytorch/
(e.g., semantic-segmentation) would benefit from clearer guidance on how to use Distributed Data Parallel (DDP) in trainer version.Motivation
I modified the training script from run_semantic_segmentation.py for my task, and it worked well on one or two GPUs. However, when scaling to four GPUs, training became significantly slower. After several days of debugging, I realized that the default example in
README.md
does not useaccelerate
or another distributed launcher, which meant the script was running with Data Parallel (DP) instead of DDP the entire time.The default command of trainer version provided in the
README.md
is:To enable DDP, the command needs to be modified as follows:
While this might be obvious to experienced users, it can be misleading for new users like me, as the default command seems to imply it works efficiently across any number of GPUs.
Your contribution
To address this, we could include a note or alert in the
README.md
, highlighting that to use DDP with the Trainer, it is necessary to replacepython
withaccelerate launch
,torchrun
, or another distributed launcher. This would greatly improve clarity for beginners and help avoid confusion.The text was updated successfully, but these errors were encountered: