Skip to content

Conversation

@chang-l
Copy link
Collaborator

@chang-l chang-l commented Jul 6, 2022

Description

To fix #4202, the crash due to in-place operations for one variable needed for backward propagation.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

At the end of forward function, to compute average output over all stacks, instead of doing += (in-place operations) as:
output += features
using unsqueeze() + cat() to store all stack outputs and return the mean of them.

Additional notes

With profiling, no significant performance penalty observed with this fix (less than 10% slow-down)

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 6, 2022

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 6, 2022

Commit ID: c00d0ab

Build ID: 1

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

output += feats
return output / self.K
tot_output = torch.cat((tot_output, output), 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer defining tot_output as a list to store outputs and then calculating their mean by torch.stack(...).mean().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will update accordingly.

@yaox12 yaox12 added the Release Candidate Candidate PRs for the upcoming release label Jul 7, 2022
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 7, 2022

Commit ID: 02eaa62d35d135b59f86ca8481a0918d289b3f24

Build ID: 2

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 8, 2022

Commit ID: 35b681a

Build ID: 3

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 8, 2022

Commit ID: a5cd0e5

Build ID: 4

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

@yaox12 yaox12 merged commit 52d4312 into dmlc:master Jul 8, 2022
@chang-l chang-l deleted the fix-example-arma branch July 13, 2022 20:38
BarclayII pushed a commit to BarclayII/dgl that referenced this pull request Aug 10, 2022
* Fix arma example

* Update

Co-authored-by: Xin Yao <xiny@nvidia.com>
@frozenbugs frozenbugs removed the Release Candidate Candidate PRs for the upcoming release label Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Example][Bug] Running error on the example case: example/pytorch/arma

4 participants