Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ManagedDeviceMesh composability issues #86

Merged
merged 4 commits into from
Jan 29, 2025
Merged

Conversation

fegin
Copy link
Contributor

@fegin fegin commented Jan 29, 2025

There are missing gaps of ManagedDeviceMesh to be actually used in TorchTitan. This PR fixes the gpas:

  1. ManagedDeviceMesh is now able to be torch.save()/torch.load().
  2. ManagedDeviceMesh will lie if there are zero replicated group participants. Size 0 DeviceMesh will cause confusion for training loops.
  3. Correctly returns coordinates.
  4. Remove pg reinitialization issue

There are missing gaps of ManagedDeviceMesh to be actually used in
TorchTitan. This PR fixes the gpas:

1. ManagedDeviceMesh is now able to be torch.save()/torch.load().
2. ManagedDeviceMesh will lie if there are zero replicated group participants. Size 0 DeviceMesh will cause confusion for training loops.
3. Corretly returns coordinates.
@fegin fegin requested review from d4l3k and H-Huang January 29, 2025 18:49
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 29, 2025
@fegin fegin requested a review from wz337 January 29, 2025 18:49
Copy link
Member

@d4l3k d4l3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fegin fegin merged commit 2a67d66 into main Jan 29, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants