Skip to content

Conversation

@XilunWu
Copy link
Contributor

@XilunWu XilunWu commented Jan 6, 2023

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 6, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91802

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e2b6a26:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

XilunWu added a commit that referenced this pull request Jan 6, 2023
ghstack-source-id: 9be42a7
Pull Request resolved: #91802
Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as the previous PR, can we do this check only when we are creating the world_pg?

Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, one nit

# mesh must be contiguous (i.e. from 0 to N-1)
if 2 * unique_mesh_values.sum().item() != world_size * (world_size - 1):
raise RuntimeError(
f"DeviceMesh is expected to be contiguous, but found {self.mesh.tolist()}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might not be contiguous as always, i.e. [[0, 2], [1, 3]] Maybe change the error msg to sth like:

DeviceMesh should have all ranks of the world

@XilunWu XilunWu deleted the gh/XilunWu/10/head branch April 11, 2023 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants