C++ APIs Transformer NN Module Top Layer #44333

glaringlee · 2020-09-08T21:02:56Z

Stack from ghstack:

C++ APIs Transformer NN Module Top Layer #44333 C++ APIs Transformer NN Module Top Layer

This is to provide C++ APIs for Transformer NN Module which has been implemented in python here:https://github.com/pytorch/pytorch/blob/eace0533985641d9c2f36e43e3de694aca886bd9/torch/nn/modules/transformer.py

It also adjusted the TransformerDecoderLayer's module registration order to 100% match the python impl.

Differential Revision: D23584010

[ghstack-poisoned]

ghstack-source-id: 5f33394 Pull Request resolved: #44333

dr-ci · 2020-09-08T22:31:35Z

💊 CI failures summary and remediations

As of commit fd9f2aa (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm3.7-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 17 times.

codecov · 2020-09-09T00:28:00Z

Codecov Report

Merging #44333 into gh/glaringlee/25/base will increase coverage by 0.00%.
The diff coverage is n/a.

@@                  Coverage Diff                   @@
##           gh/glaringlee/25/base   #44333   +/-   ##
======================================================
  Coverage                  68.00%   68.00%           
======================================================
  Files                        382      382           
  Lines                      49527    49527           
======================================================
+ Hits                       33679    33680    +1     
+ Misses                     15848    15847    -1

Impacted Files	Coverage Δ
torch/testing/_internal/expecttest.py	`78.57% <0.00%> (+1.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ee4197...fd9f2aa. Read the comment docs.

This is to provide C++ APIs for Transformer NN Module which has been implemented in python here:https://github.com/pytorch/pytorch/blob/eace0533985641d9c2f36e43e3de694aca886bd9/torch/nn/modules/transformer.py It also adjusted the TransformerDecoderLayer's module registration order to 100% match the python impl. Differential Revision: [D23584010](https://our.internmc.facebook.com/intern/diff/D23584010) [ghstack-poisoned]

ghstack-source-id: d3cf909 Pull Request resolved: #44333

torch/csrc/api/include/torch/nn/options/transformer.h

torch/csrc/api/include/torch/nn/modules/transformercoder.h

torch/csrc/api/src/nn/modules/transformer.cpp

zhangguanheng66 · 2020-09-09T21:19:56Z

torch/csrc/api/src/nn/modules/transformer.cpp

+  return output;
+}
+
+Tensor TransformerImpl::generate_square_subsequent_mask(int64_t sz) {


I'm not sure if we want to have generate_square_subsequent_mask func within the transformer. Although in the python version it is, it's really not part of the paper "Attention is All Your Need".

@zhangguanheng66 do you know why generate_square_subsequent_mask is exposed as a part of the Python API?

@zhangguanheng66 if I could start from scratch, I would not include it in the transformer.

@zou3519 @zhangguanheng66
If we know someone already use this function in python, then I think we'd better keep this function here since otherwise, this will be an unexpected bc breaking change to user.

It's not really BC-breaking if the functionality doesn't exist in C++ yet. It sounds like we might want to deprecate generate_square_subsequent_mask on the Python side?

@zhangguanheng66
Should I deprecate this, I saw this function is in the python transformer test driver as well?

Or I probably can move this to the util namespace.
But I need to know the following:

Is this function used specifically for transformer? (If so, probably leave this here, otherwise, can put this in nn::util)

Is this a generic method to generate square subsequent masks?

Any other place such mask will be used within or codebase?
Thanks

This func is mainly used to generate a square masks for the transformer application. In the word language task, word in a sentence is only allowed to see the tokens before itself so we need an attention mask here. Can we leave this func in the transformer namespace but as an independent function, rather than part of the transformer model?

@zou3519 @zhangguanheng66
I am going to make this static method within TransformerImpl class. People just call Transformer::Impl::generate_sequare_subsequent_mask() instead of creating an obj and invoke it from obj.

torch/csrc/api/src/nn/modules/transformer.cpp

zou3519 · 2020-09-09T21:31:12Z

torch/csrc/api/src/nn/modules/transformer.cpp

+  // check IEEE754 support here since -inf is not guaranteed to be valid on non IEEE754 platform
+  if (std::numeric_limits<float>::is_iec559) {
+    mask = mask.masked_fill(mask == 0, -std::numeric_limits<float>::infinity()).masked_fill(mask == 1, 0.f);
+  }
+  // if IEEE754 is not supported, we use the smallest float number in current platform


This seems fine, but does the Python API have the same behavior?

Python doesn't have this issue.
I checked with @zhangguanheng66, using the smallest float number in this situation is fine.

test/cpp/api/transformer.cpp

zou3519

Overall this looks good to me, the two main questions I have are:

should generate_square_subsequent_mask be a part of the Transformer API? I don't know what it's for
Is there a better way to test this? It would be nice if we could use the parity harness; it's hard for a code reader to figure out correctness from reading the sample inputs/outputs

glaringlee · 2020-09-10T01:06:18Z

Overall this looks good to me, the two main questions I have are:

should generate_square_subsequent_mask be a part of the Transformer API? I don't know what it's for

Is there a better way to test this? It would be nice if we could use the parity harness; it's hard for a code reader to figure out correctness from reading the sample inputs/outputs

See my comments to @zhangguanheng66
Yes, parity will be turned on in next PR. This test is hard to read, since it is a deterministic test (I run the same thing in python and get the result, then run the c++ code and compare result with the python result), basically to ensure the logic is correct for this PR.

I will address other comments one by one.

glaringlee · 2020-09-10T18:46:35Z

This update fix all previous comments except the generate_square_subsequent_mask. Will do the update once @zhangguanheng66 comment.

This is to provide C++ APIs for Transformer NN Module which has been implemented in python here:https://github.com/pytorch/pytorch/blob/eace0533985641d9c2f36e43e3de694aca886bd9/torch/nn/modules/transformer.py It also adjusted the TransformerDecoderLayer's module registration order to 100% match the python impl. Differential Revision: [D23584010](https://our.internmc.facebook.com/intern/diff/D23584010) [ghstack-poisoned]

ghstack-source-id: 2936c7f Pull Request resolved: #44333

glaringlee · 2020-09-10T20:54:20Z

changed the generate_square_subsequent_mask as static. Expose a type alias Impl from Transformer class (due to the special implementation of Module, we can not use Transformer::generate_square_subsequent_mask directly. So I standardized this to Transformer::Impl::generate_square_subsequent_mask, Impl becomes a standard way to get underlying implementation class type from Model itself.

zou3519 · 2020-09-10T21:36:14Z

torch/csrc/api/src/nn/modules/transformer.cpp

-  // NOTE: reset() is for initializing the model only, call reset() after the model is created
-  // will cause throwing exceptions. Call reset_parameter() if the created model need a reset
+  // NOTE: reset() is for initializing the model only, calling reset() after the model is created
+  // will cause throwing exceptions. Call reset_parameter() if the created model needs a reset


nit: "will cause throwing exceptions" -> "will cause exceptions to be thrown", or "will throw exceptions"

zou3519 · 2020-09-10T21:36:35Z

torch/csrc/api/src/nn/modules/transformer.cpp

-    "norm3",
-    LayerNorm(LayerNormOptions(std::vector<int64_t> {options.d_model()})));
+  // NOTE: reset() is for initializing the model only, calling reset() after the model is created
+  // will cause throwing exceptions. Call reset_parameter() if the created model needs a reset.


nit: "will cause throwing exceptions" -> "will cause exceptions to be thrown", or "will throw exceptions"

zou3519 · 2020-09-10T21:37:33Z

torch/csrc/api/include/torch/nn/modules/transformer.h

+    ///   2. This function requires the platform support IEEE754, since `-inf` is guaranteed to
+    ///      be valid only when IEEE754 is supported. If the platform doesn't support IEEE754,
+    ///      this function will fill the mask with the smallest float number instead of `-inf`,
+    ///      a one time warning will be pop up as well.


nit: "will be pop up as well" -> "will pop up as well"

zou3519 · 2020-09-10T21:41:27Z

torch/csrc/api/include/torch/nn/modules/transformer.h

+/// model with corresponding parameters.
+///
+/// See https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html to
+/// learn abouut the exact behavior of this transformer model


nit: "abouut" -> "about"

zou3519 · 2020-09-10T21:43:16Z

torch/csrc/api/src/nn/modules/transformer.cpp

+    "src and tgt should have 3 dimensions, but got ", src.dim(), " and ", tgt.dim());
+
+  TORCH_CHECK(src.size(1) == tgt.size(1),
+    "src and tgt should have equal batch number (at dim 1), but got ", src.size(1), " and ", tgt.size(1));


nit: "batch size" or "batch dim size" sounds more pytorch-like than "batch number"

zou3519 · 2020-09-10T21:43:28Z

torch/csrc/api/src/nn/modules/transformer.cpp

+    "src and tgt should have equal batch number (at dim 1), but got ", src.size(1), " and ", tgt.size(1));
+
+  TORCH_CHECK(src.size(2) == options.d_model() && tgt.size(2) == options.d_model(),
+    "src and tgt should have same feature number as d_model (at dim 2), but got ",


nit: "feature size" sounds more pytorch-like than "feature number"

zou3519 · 2020-09-10T21:44:57Z

test/cpp/api/transformer.cpp

+    ASSERT_EQ(result.sizes(), ref_output.sizes());
+    ASSERT_TRUE(result.equal(result_cus));
+    ASSERT_TRUE(torch::allclose(result, ref_output, 1e-7, 1e-5, /*equal_nan=*/true));
+}


I'd also test the error conditions (e.g., src and target don't have the same size in the batch dim) with an ASSERT_RAISES to check that it doesn't crash in an unfortunate way

will add these tests

zou3519

LGTM, some very minor comments

glaringlee · 2020-09-10T21:47:54Z

LGTM, some very minor comments

will address

This is to provide C++ APIs for Transformer NN Module which has been implemented in python here:https://github.com/pytorch/pytorch/blob/eace0533985641d9c2f36e43e3de694aca886bd9/torch/nn/modules/transformer.py It also adjusted the TransformerDecoderLayer's module registration order to 100% match the python impl. Differential Revision: [D23584010](https://our.internmc.facebook.com/intern/diff/D23584010) [ghstack-poisoned]

ghstack-source-id: aed3be1 Pull Request resolved: #44333

facebook-github-bot · 2020-09-11T16:17:21Z

@glaringlee merged this pull request in 77cc7d1.

Summary: Pull Request resolved: #44333 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23584010 Pulled By: glaringlee fbshipit-source-id: 990026e3f1b5ae276776e344ea981386cb7528fe

C++ APIs Transformer NN Module Top Layer

af8ee38

[ghstack-poisoned]

glaringlee requested review from ebetica, goldsborough and yf225 as code owners September 8, 2020 21:02

glaringlee pushed a commit that referenced this pull request Sep 8, 2020

C++ APIs Transformer NN Module Top Layer

9003e18

ghstack-source-id: 5f33394 Pull Request resolved: #44333

glaringlee requested review from zhangguanheng66 and zou3519 and removed request for ebetica and goldsborough September 8, 2020 21:04

glaringlee pushed a commit that referenced this pull request Sep 9, 2020

C++ APIs Transformer NN Module Top Layer

9715249

ghstack-source-id: d3cf909 Pull Request resolved: #44333