Skip to content

Conversation

@glaringlee
Copy link
Contributor

@glaringlee glaringlee commented Jul 24, 2020

Stack from ghstack:

This is to fix #41951

Differential Revision: D22764717

glaringlee pushed a commit that referenced this pull request Jul 24, 2020
ghstack-source-id: bae3a2d
Pull Request resolved: #42037
@glaringlee glaringlee requested review from anjali411, pbelevich and zhangguanheng66 and removed request for ebetica and goldsborough July 24, 2020 21:37
@glaringlee
Copy link
Contributor Author

This is to correct the parameter registration in MultiheadAttention, the root cause is described in #41951.
cc @pbelevich @yf225

} else {
std::uniform_int_distribution<int> d(5, 20);
kv_dim = d(generator);
while (kv_dim == d_model) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch was aimed to test kv_dim != d_model case, but the generator d could generate a same value as d_model, for eg. 9, so I add this while.

bias_k = register_parameter("bias_k", torch::empty({1, 1, options.embed_dim()}));
bias_v = register_parameter("bias_v", torch::empty({1, 1, options.embed_dim()}));
} else {
bias_k = {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to register_parameter('bias_k', {}, /*requires_grad=*/false)?

Copy link
Contributor Author

@glaringlee glaringlee Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on python impl, No. It only register the non-empty case for bias_k and bias_v.

self.bias_k = self.bias_v = None

Copy link
Contributor

@zhangguanheng66 zhangguanheng66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant. We merged two PRs in H1 for nn.MHA on the Python side.
#31996
#33763

Copy link
Contributor

@zhangguanheng66 zhangguanheng66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the MHA functionality.

@facebook-github-bot
Copy link
Contributor

@glaringlee merged this pull request in 5246bc4.

@facebook-github-bot facebook-github-bot deleted the gh/glaringlee/24/head branch July 31, 2020 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants