Skip to content

Conversation

@rohan-varma
Copy link
Contributor

@rohan-varma rohan-varma commented Sep 23, 2020

Stack from ghstack:

This request came up in feature review for DDP uneven inputs, so this PR adds a warning when there is much higher than expected amount of
discrepancy of inputs across different processes when running with uneven
inputs. This is because a skew in the thousands can reduce performance a
nontrivial amount as shown in benchmarks in #42577, and it was proposed to add this
warning as a result. Tested by running the tests so the threshold is hit and
observing the output.

Differential Revision: D23719270

…ining

Adds a warning when there is much higher than expected amount of
discrepancy of inputs across different processes when running with uneven
inputs. This is because a skew in the thousands can reduce performance a
nontrivial amount as shown in benchmarks, and it was proposed to add this
warning as a result. Tested by running the tests so the threshold is hit and
observing the output.

Differential Revision: [D23719270](https://our.internmc.facebook.com/intern/diff/D23719270/)

[ghstack-poisoned]
… in DDP training"


This request came up in feature review for DDP uneven inputs, so this PR adds a warning when there is much higher than expected amount of
discrepancy of inputs across different processes when running with uneven
inputs. This is because a skew in the thousands can reduce performance a
nontrivial amount as shown in benchmarks, and it was proposed to add this
warning as a result. Tested by running the tests so the threshold is hit and
observing the output.

Differential Revision: [D23719270](https://our.internmc.facebook.com/intern/diff/D23719270/)

[ghstack-poisoned]
… in DDP training"


This request came up in feature review for DDP uneven inputs, so this PR adds a warning when there is much higher than expected amount of
discrepancy of inputs across different processes when running with uneven
inputs. This is because a skew in the thousands can reduce performance a
nontrivial amount as shown in benchmarks, and it was proposed to add this
warning as a result. Tested by running the tests so the threshold is hit and
observing the output.

Differential Revision: [D23719270](https://our.internmc.facebook.com/intern/diff/D23719270/)

[ghstack-poisoned]
… in DDP training"


This request came up in feature review for DDP uneven inputs, so this PR adds a warning when there is much higher than expected amount of
discrepancy of inputs across different processes when running with uneven
inputs. This is because a skew in the thousands can reduce performance a
nontrivial amount as shown in benchmarks in #42577, and it was proposed to add this
warning as a result. Tested by running the tests so the threshold is hit and
observing the output.

Differential Revision: [D23719270](https://our.internmc.facebook.com/intern/diff/D23719270/)

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Sep 23, 2020

💊 CI failures summary and remediations

As of commit fb43098 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


codecov.io: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 6 times.

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

"other currently active ranks. This level of skew could "
"lead to performance degradation during training."
)
warned = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to ignore. This can also be done using warnings.simplefilter("once") IIUC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is better than toggling the boolean.

… in DDP training"


This request came up in feature review for DDP uneven inputs, so this PR adds a warning when there is much higher than expected amount of
discrepancy of inputs across different processes when running with uneven
inputs. This is because a skew in the thousands can reduce performance a
nontrivial amount as shown in benchmarks in #42577, and it was proposed to add this
warning as a result. Tested by running the tests so the threshold is hit and
observing the output.

Differential Revision: [D23719270](https://our.internmc.facebook.com/intern/diff/D23719270/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 24, 2020
…ining

Pull Request resolved: #45238

Adds a warning when there is much higher than expected amount of
discrepancy of inputs across different processes when running with uneven
inputs. This is because a skew in the thousands can reduce performance a
nontrivial amount as shown in benchmarks, and it was proposed to add this
warning as a result. Tested by running the tests so the threshold is hit and
observing the output.
ghstack-source-id: 112773552

Differential Revision: [D23719270](https://our.internmc.facebook.com/intern/diff/D23719270/)
@codecov
Copy link

codecov bot commented Sep 24, 2020

Codecov Report

Merging #45238 into gh/rohan-varma/179/base will decrease coverage by 0.00%.
The diff coverage is 12.50%.

Impacted file tree graph

@@                     Coverage Diff                     @@
##           gh/rohan-varma/179/base   #45238      +/-   ##
===========================================================
- Coverage                    68.01%   68.00%   -0.01%     
===========================================================
  Files                          393      393              
  Lines                        50847    50854       +7     
===========================================================
+ Hits                         34583    34584       +1     
- Misses                       16264    16270       +6     
Impacted Files Coverage Δ
torch/nn/parallel/distributed.py 41.37% <12.50%> (-0.68%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99242ec...fb43098. Read the comment docs.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in e57a081.

@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/179/head branch September 28, 2020 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants