dynamically set the number of SMs in torch.distributed.all_reduce

### 🚀 The feature, motivation and pitch

I want to dynamically set the number of SMs in torch.distributed.all_reduce. NCCL supports using the nccl_max_nchannels environment variable setting.but cant dynamically set in the program. It is mentioned here that ncclCommInitRankConfig can be used in the program [(link),](https://github.com/NVIDIA/nccl/issues/1572), but the corresponding setting is not found in torch. Can this capability be supported? This is useful in inference optimization scenarios

### Alternatives

_No response_

### Additional context

_No response_

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dynamically set the number of SMs in torch.distributed.all_reduce #144538

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

dynamically set the number of SMs in torch.distributed.all_reduce #144538

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions