[jit] Support MultiheadedAttention module #24204

driazati · 2019-08-12T22:05:18Z

This changes up nn.MultiheadedAttention so that it can be compiled with TorchScript and adds a test that it compiles.

zhangguanheng66 · 2019-08-13T15:26:25Z

torch/nn/modules/activation.py

          L is the target sequence length, S is the source sequence length.
        """
-        if hasattr(self, '_qkv_same_embed_dim') and self._qkv_same_embed_dim is False:
+        if not self._qkv_same_embed_dim:


The above check is to avoid backward compatibility problem. There may be some models trained by the old version of nn.MHA, which has no _qkv_same_embed_dim attribute.
However, since we have released pytorch 1.2, can we finally retire the support for this? @cpuhrsch

Does this block JIT-ing? If so, why?

The hasattr call does, and this version determination shouldn't be happening in forward anyways, that's something that can be determined in _load_state_dict() like was suggested here

Since we add the warning message for the BC breaking in PyTorch 1.2 release, IMO, we could drop the support now. How's do you feel about it? @cpuhrsch

@zhangguanheng66 - let's move this mechanism into _load_state_dict like was originally suggested and then keep it around for now. The BC breaking change should we a separate PR so we can easily keep track of it.

In the old model, there is no _qkv_same_embed_dim attribute.

However, I cannot use _load_from_state_dict() func to add _qkv_same_embed_dim. The reason is that _qkv_same_embed_dim is not in the state_dict of the new model.

Is there a way that I can add a new attribute to an old model?

never mind. I figure out a way to walk around. After some tests, I will submit a PR first.

@driazati I chat this with Soumith and we feel that the fix in #24404 may actually confuse our users.

If the PR is blocked only by the hasattr func, could you add it to your JIT codebase? We saw hasattr func are applied many places (like quantization). It will be required anyway in the future.

zhangguanheng66 · 2019-08-13T17:16:24Z

torch/nn/modules/activation.py

                q_proj_weight=self.q_proj_weight, k_proj_weight=self.k_proj_weight,
                v_proj_weight=self.v_proj_weight)
        else:
-            if not hasattr(self, '_qkv_same_embed_dim'):


Same as above. Will this block JIT?

zhangguanheng66 · 2019-08-13T19:21:35Z

torch/nn/modules/activation.py

            self.k_proj_weight = Parameter(torch.Tensor(embed_dim, self.kdim))
            self.v_proj_weight = Parameter(torch.Tensor(embed_dim, self.vdim))
+        else:
+            self.register_parameter('q_proj_weight', None)


Does JIT also require to register all the parameter, even they are not used?

zhangguanheng66 · 2019-08-13T19:22:34Z

torch/nn/modules/activation.py

        >>> multihead_attn = nn.MultiheadAttention(embed_dim, num_heads)
        >>> attn_output, attn_output_weights = multihead_attn(query, key, value)
    """
+    __annotations__ = {


I guess annotations and constants are required for JIT-ing?

zhangguanheng66 · 2019-09-18T21:27:21Z

@driazati Just wondering if we are still going to add hasattr func scriptable and land this PR. Thanks.

driazati · 2019-10-24T00:03:42Z

Closing in favor of #28555

[jit] Support MultiheadedAttention module

827184c

pytorchbot added oncall: jit Add this issue/PR to JIT oncall triage queue module: nn Related to torch.nn labels Aug 12, 2019

driazati requested a review from zhangguanheng66 August 12, 2019 22:06

zhangguanheng66 reviewed Aug 13, 2019

View reviewed changes

zhangguanheng66 mentioned this pull request Aug 15, 2019

Support load_state_dict func for nn.MultiheadAttention #24404

Closed

driazati closed this Oct 24, 2019

facebook-github-bot deleted the driazati/multi branch July 13, 2020 17:55

[jit] Support MultiheadedAttention module #24204

[jit] Support MultiheadedAttention module #24204

Uh oh!

Conversation

driazati commented Aug 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

driazati Aug 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangguanheng66 Aug 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangguanheng66 commented Sep 18, 2019

Uh oh!

driazati commented Oct 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

driazati commented Aug 12, 2019 •

edited

Loading

driazati Aug 13, 2019 •

edited

Loading

zhangguanheng66 Aug 14, 2019 •

edited

Loading