Fix bug on the backpropagation of LayerNorm when create_graph=True #41595

vinnamkim · 2020-07-17T15:26:56Z

Solve an issue #41332

I found the bug at #41332 is caused by LayerNorm.

Current implementations of LayerNorm have a disparity between

With this bug-fix, #41332 is solved.

@ailing @BIT-silence

Signed-off-by: Vinnam Kim vinnamkim@gmail.com

facebook-github-bot

@BIT-silence has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

xiaomengy · 2020-07-17T15:41:13Z

Thanks for fixing this.

dr-ci · 2020-07-17T15:54:49Z

💊 CI failures summary and remediations

As of commit 556b74a (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_backward_compatibility_check_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 21 16:49:18 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.

Jul 21 16:49:18 processing existing schema:  __str__(__torch__.torch.classes._TorchScriptTesting._StackString _0) -> (str _0) 
Jul 21 16:49:18 processing existing schema:  __init__(__torch__.torch.classes._TorchScriptTesting._PickleTester _0, int[] _1) -> (None _0) 
Jul 21 16:49:18 processing existing schema:  __getstate__(__torch__.torch.classes._TorchScriptTesting._PickleTester _0) -> (int[] _0) 
Jul 21 16:49:18 processing existing schema:  __setstate__(__torch__.torch.classes._TorchScriptTesting._PickleTester _0, int[] _1) -> (None _0) 
Jul 21 16:49:18 processing existing schema:  top(__torch__.torch.classes._TorchScriptTesting._PickleTester _0) -> (int _0) 
Jul 21 16:49:18 processing existing schema:  pop(__torch__.torch.classes._TorchScriptTesting._PickleTester _0) -> (int _0) 
Jul 21 16:49:18 processing existing schema:  get(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0, Tensor _1) -> (str _0) 
Jul 21 16:49:18 processing existing schema:  __getstate__(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0) -> (int _0) 
Jul 21 16:49:18 processing existing schema:  __setstate__(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0, int _1) -> (None _0) 
Jul 21 16:49:18 processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (None _0) 
Jul 21 16:49:18 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.  
Jul 21 16:49:18  
Jul 21 16:49:18 Broken ops: [ 
Jul 21 16:49:18 	aten::split_with_sizes(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[]) 
Jul 21 16:49:18 ] 
Jul 21 16:49:18 + cleanup 
Jul 21 16:49:18 + retcode=1 
Jul 21 16:49:18 + set +x 
Jul 21 16:49:18 =================== sccache compilation log =================== 
Jul 21 16:49:18 =========== If your build fails, plea

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 19 times.

albanD

Thanks for the fix.

I am surprised that our tests didn't caught this. The layer norm double backward should be checked with finite difference. It is properly listed in https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_nn.py
Or is it that we don't test for cases where gamma is different from 1?

Also the ci failures looks related.

tools/autograd/templates/Functions.cpp

xiaomengy · 2020-07-17T16:03:04Z

Thanks for the fix.

I am surprised that our tests didn't caught this. The layer norm double backward should be checked with finite difference. It is properly listed in https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_nn.py
Or is it that we don't test for cases where gamma is different from 1?

Also the ci failures looks related.

I think somehow our double grad check didn't check the case when gamma and beta are defined. My comments in the code can help fix the test. We should check if gamma is defined before computing the grads.

facebook-github-bot

@BIT-silence has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2020-07-17T17:34:03Z

Please add a test in common_nn.py or in test_nn.py that would check this behavior (that is, would fail before your fix and pass with your fix).

vinnamkim · 2020-07-20T08:13:52Z

@albanD @BIT-silence @ngimel

It is hard to write a test code using current test frameworks(checking double backward with finite difference). It actually passed double backward checks with simple test inputs, :<(maybe current implementation has analytically no effects on double backward?). So, I had no option but to add ad-hoc test.

xiaomengy · 2020-07-20T15:40:54Z

@albanD @BIT-silence @ngimel

It is hard to write a test code using current test frameworks(checking double backward with finite difference). It actually passed double backward checks with simple test inputs, :<(maybe current implementation has analytically no effects on double backward?). So, I had no option but to add ad-hoc test.

I can understand it will be a little hard to add such a test in a short time. And when adding the test, there is also a risk that some other ops may fail this test. So I think we can open another issue to explain this missing test and tracking that in the new issue. What do you think? @ngimel

albanD

Makes sense that you need a custom test. That looks good. We might want to open a task to make this a generic test indeed.

test/test_nn.py

albanD · 2020-07-20T18:05:08Z

test/test_nn.py

If you just want to add a simple test outside of the generic test framework, you can just add a new method to the TestNN class above and it will just run.

Made a test class inherited from TestCase.

We usually avoid new classes as, as you can see just below, we have custom setup code that is simpler to handle if you have a shared class.
Would it be possible to just move this test in the TestNN class that is just below?

test/test_nn.py

Signed-off-by: Vinnam Kim <vinnamkim@gmail.com>

Co-authored-by: Xiaomeng Yang <bit.yangxm@gmail.com>

Signed-off-by: Vinnam Kim <vinnam.kim@gmail.com>

albanD

That looks great thank!
Can you just rebase on top of master to make sure all CI is fine?

albanD · 2020-07-21T15:08:44Z

test/test_nn.py

nit: the linter might not be happy with the removal of this line.

Signed-off-by: Vinnam Kim <vinnam.kim@gmail.com>

facebook-github-bot

@BIT-silence has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

xiaomengy

The BC test should not be related.

facebook-github-bot · 2020-07-22T08:12:05Z

@BIT-silence merged this pull request in 825a387.

pytorchbot added the open source label Jul 17, 2020

zou3519 requested review from albanD and zhangguanheng66 July 17, 2020 15:40

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 17, 2020

facebook-github-bot reviewed Jul 17, 2020

View reviewed changes

xiaomengy approved these changes Jul 17, 2020

View reviewed changes

xiaomengy requested a review from ngimel July 17, 2020 15:42

albanD requested changes Jul 17, 2020

View reviewed changes

xiaomengy suggested changes Jul 17, 2020

View reviewed changes

tools/autograd/templates/Functions.cpp Outdated Show resolved Hide resolved

facebook-github-bot reviewed Jul 17, 2020

View reviewed changes

albanD reviewed Jul 20, 2020

View reviewed changes

Vinnam Kim and others added 4 commits July 22, 2020 00:04

Fix bug on the backpropagation of LayerNorm when create_graph=True

f47106f

Signed-off-by: Vinnam Kim <vinnamkim@gmail.com>

Update tools/autograd/templates/Functions.cpp

63a1cd6

Co-authored-by: Xiaomeng Yang <bit.yangxm@gmail.com>

Add a test for comparing two cases: create_graph=True and False

3817a7b

Signed-off-by: Vinnam Kim <vinnam.kim@gmail.com>

Custom test inherited from TestCase

1fd3dd2

Signed-off-by: Vinnam Kim <vinnam.kim@gmail.com>

albanD requested a review from xiaomengy July 21, 2020 15:08

albanD approved these changes Jul 21, 2020

View reviewed changes

test/test_nn.py Outdated

Copy link

Collaborator

albanD Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the linter might not be happy with the removal of this line.

vinnamkim force-pushed the patch_layernorm branch 2 times, most recently from 09cccc6 to 0f68a81 Compare July 21, 2020 15:13

Move test to TestNN

556b74a

Signed-off-by: Vinnam Kim <vinnam.kim@gmail.com>

vinnamkim force-pushed the patch_layernorm branch from 0f68a81 to 556b74a Compare July 21, 2020 15:15

facebook-github-bot reviewed Jul 21, 2020

View reviewed changes

xiaomengy approved these changes Jul 22, 2020

View reviewed changes

facebook-github-bot closed this in 825a387 Jul 22, 2020

facebook-github-bot added the merged label Jul 22, 2020

mruberry added the Merged label Oct 28, 2020

Fix bug on the backpropagation of LayerNorm when create_graph=True #41595

Fix bug on the backpropagation of LayerNorm when create_graph=True #41595

Uh oh!

Conversation

vinnamkim commented Jul 17, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

xiaomengy commented Jul 17, 2020

Uh oh!

dr-ci bot commented Jul 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_backward_compatibility_check_test (1/1)

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xiaomengy commented Jul 17, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel commented Jul 17, 2020

Uh oh!

vinnamkim commented Jul 20, 2020

Uh oh!

xiaomengy commented Jul 20, 2020

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

albanD Jul 20, 2020

Choose a reason for hiding this comment

Uh oh!

vinnamkim Jul 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD Jul 21, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Jul 21, 2020

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

xiaomengy left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

dr-ci bot commented Jul 17, 2020 •

edited

Loading

vinnamkim Jul 21, 2020 •

edited

Loading