-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[PT-D][Tensor parallelism] Add documentations for TP #94421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7c58fdd
ef98f57
596c896
8d99683
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,60 @@ | ||
| .. role:: hidden | ||
| :class: hidden-section | ||
|
|
||
| Tensor Parallelism | ||
| ======================== | ||
| .. py:module:: torch.distributed.tensor.parallel | ||
| Tensor Parallelism - torch.distributed.tensor.parallel | ||
| ====================================================== | ||
|
|
||
| We built Tensor Parallelism(TP) on top of DistributedTensor(DTensor) and | ||
| provide several Parallelism styles: Rowwise, Colwise and Pairwise Parallelism. | ||
|
|
||
| .. warning :: | ||
| Tensor Parallelism is experimental and subject to change. | ||
| The entrypoint to parallelize your module and using tensor parallelism is: | ||
|
|
||
| .. automodule:: torch.distributed.tensor.parallel | ||
|
|
||
| .. currentmodule:: torch.distributed.tensor.parallel | ||
|
|
||
| .. autofunction:: parallelize_module | ||
|
|
||
| Tensor Parallelism supports the following parallel styles: | ||
|
|
||
| .. autoclass:: torch.distributed.tensor.parallel.style.RowwiseParallel | ||
| :members: | ||
|
|
||
| .. autoclass:: torch.distributed.tensor.parallel.style.ColwiseParallel | ||
| :members: | ||
|
|
||
| .. autoclass:: torch.distributed.tensor.parallel.style.PairwiseParallel | ||
| :members: | ||
|
|
||
| Because we use DTensor within Tensor Parallelism, we need to specify the | ||
| input and output placement of the module with DTensors so it can expectedly | ||
| interacts with the module before and after. The followings are functions | ||
| used for input/output preparation: | ||
|
|
||
|
|
||
| .. currentmodule:: torch.distributed.tensor.parallel.style | ||
|
|
||
| .. autofunction:: make_input_replicate_1d | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually I'm wondering if we make those APIs be something like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand where you are from... Although internally we don't have drastical changes here. From users' perspective we still change a Tensor to DTensor. So I really don't like the word "mark". (Definition: https://www.merriam-webster.com/dictionary/mark). It does not contain any meaning related to change. I would prefer verbs like "change", "convert", "transform", etc.. Or even construct maybe.. |
||
| .. autofunction:: make_input_shard_1d | ||
| .. autofunction:: make_input_shard_1d_last_dim | ||
| .. autofunction:: make_output_replicate_1d | ||
| .. autofunction:: make_output_tensor | ||
| .. autofunction:: make_output_shard_1d | ||
|
|
||
| Currently, there are some constraints which makes it hard for the `nn.MultiheadAttention` | ||
| module to work out of box for Tensor Parallelism, so we built this multihead_attention | ||
| module for Tensor Parallelism users. Also, in ``parallelize_module``, we automatically | ||
| swap ``nn.MultiheadAttention`` to this custom module when specifying ``PairwiseParallel``. | ||
|
|
||
| .. autoclass:: torch.distributed.tensor.parallel.multihead_attention_tp.TensorParallelMultiheadAttention | ||
| :members: | ||
|
|
||
| We also enabled 2D parallelism to integrate with ``FullyShardedDataParallel``. | ||
| Users just need to call the following API explicitly: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I remembered we have a FSDP extension, Is TP automatically register the extension now? Also, I wonder if we should give a small code snippet showing how the 2-D parallel look like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The registrations is in the |
||
|
|
||
|
|
||
| .. currentmodule:: torch.distributed.tensor.parallel.fsdp | ||
| .. autofunction:: is_available | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we really need to add this API to the doc? I remembered is_available is introduced when we are in tau, but since now it's pytorch I think fsdp should always be available?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, because of 2D hook registration.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will send a follow-up PR to address the naming of this one. |
||
Uh oh!
There was an error while loading. Please reload this page.