Skip to content

dynamically set the number of SMs in torch.distributed.all_reduce #144538

@Rainlin007

Description

@Rainlin007

🚀 The feature, motivation and pitch

I want to dynamically set the number of SMs in torch.distributed.all_reduce. NCCL supports using the nccl_max_nchannels environment variable setting.but cant dynamically set in the program. It is mentioned here that ncclCommInitRankConfig can be used in the program (link),, but the corresponding setting is not found in torch. Can this capability be supported? This is useful in inference optimization scenarios

Alternatives

No response

Additional context

No response

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

Metadata

Metadata

Assignees

Labels

oncall: distributedAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions