Skip to content

Conversation

@osalpekar
Copy link
Member

@osalpekar osalpekar commented Sep 1, 2020

Stack from ghstack:

This PR adds a getNumKeys API to the TCP Store, which essentially returns the number of keys in the store at that point. This API will be useful for some applications related to debug logging in ProcessGroupNCCL going forward. This PR also adds some C++ tests for this API and Python tests are added in #45223. We will build on this functionality in the future by implementing this API for FileStore, HashStore, RedisStore, and ZeusStore.

Differential Revision: D22985085

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

TCPStore needs a getNumKeys API for our logging needs.

Differential Revision: [D22985085](https://our.internmc.facebook.com/intern/diff/D22985085/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22985085/)!

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Sep 1, 2020

💊 CI failures summary and remediations

As of commit c857acf (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_backward_compatibility_check_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 25 20:58:20 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.
Sep 25 20:58:20 processing existing schema:  __setstate__(__torch__.torch.classes.quantized.EmbeddingPackedParamsBase _0, (int, Tensor[], float[], int[]) _1) -> (None _0) 
Sep 25 20:58:20 processing existing schema:  bit_rate(__torch__.torch.classes.quantized.EmbeddingPackedParamsBase _0) -> (int _0) 
Sep 25 20:58:20 processing existing schema:  version(__torch__.torch.classes.quantized.EmbeddingPackedParamsBase _0) -> (int _0) 
Sep 25 20:58:20 processing existing schema:  __getstate__(__torch__.torch.classes.xnnpack.LinearOpContext _0) -> ((Tensor, Tensor?, Scalar?, Scalar?) _0) 
Sep 25 20:58:20 processing existing schema:  __setstate__(__torch__.torch.classes.xnnpack.LinearOpContext _0, (Tensor, Tensor?, Scalar?, Scalar?) _1) -> (None _0) 
Sep 25 20:58:20 processing existing schema:  __getstate__(__torch__.torch.classes.xnnpack.Conv2dOpContext _0) -> ((Tensor, Tensor?, int[], int[], int[], int, Scalar?, Scalar?) _0) 
Sep 25 20:58:20 processing existing schema:  __setstate__(__torch__.torch.classes.xnnpack.Conv2dOpContext _0, (Tensor, Tensor?, int[], int[], int[], int, Scalar?, Scalar?) _1) -> (None _0) 
Sep 25 20:58:20 processing existing schema:  __getstate__(__torch__.torch.classes.xnnpack.TransposeConv2dOpContext _0) -> ((Tensor, Tensor?, int[], int[], int[], int[], int, Scalar?, Scalar?) _0) 
Sep 25 20:58:20 processing existing schema:  __setstate__(__torch__.torch.classes.xnnpack.TransposeConv2dOpContext _0, (Tensor, Tensor?, int[], int[], int[], int[], int, Scalar?, Scalar?) _1) -> (None _0) 
Sep 25 20:58:20 processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (None _0) 
Sep 25 20:58:20 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.  
Sep 25 20:58:20  
Sep 25 20:58:20 Broken ops: [ 
Sep 25 20:58:20 	static::mul.a(Tensor a, Tensor b) -> (Tensor) 
Sep 25 20:58:20 	static::mul.b(Tensor a, int b) -> (Tensor) 
Sep 25 20:58:20 	static::add(Tensor a, Tensor b) -> (Tensor) 
Sep 25 20:58:20 ] 
Sep 25 20:58:20 =================== sccache compilation log =================== 
Sep 25 20:58:20 + cleanup 
Sep 25 20:58:20 + retcode=1 
Sep 25 20:58:20 + set +x 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 19 times.

TCPStore needs a getNumKeys API for our logging needs.

Differential Revision: [D22985085](https://our.internmc.facebook.com/intern/diff/D22985085/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22985085/)!

[ghstack-poisoned]
TCPStore needs a getNumKeys API for our logging needs.

Differential Revision: [D22985085](https://our.internmc.facebook.com/intern/diff/D22985085/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22985085/)!

[ghstack-poisoned]
TCPStore needs a getNumKeys API for our logging needs.

Differential Revision: [D22985085](https://our.internmc.facebook.com/intern/diff/D22985085/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22985085/)!

[ghstack-poisoned]
TCPStore needs a getNumKeys API for our logging needs.

Differential Revision: [D22985085](https://our.internmc.facebook.com/intern/diff/D22985085/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22985085/)!

[ghstack-poisoned]
@codecov
Copy link

codecov bot commented Sep 23, 2020

Codecov Report

Merging #43962 into gh/osalpekar/79/base will increase coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@                   Coverage Diff                    @@
##           gh/osalpekar/79/base   #43962      +/-   ##
========================================================
+ Coverage                 68.06%   68.10%   +0.04%     
========================================================
  Files                       393      393              
  Lines                     50987    50945      -42     
========================================================
- Hits                      34706    34698       -8     
+ Misses                    16281    16247      -34     
Impacted Files Coverage Δ
torch/nn/modules/batchnorm.py 73.79% <0.00%> (-0.36%) ⬇️
torch/nn/parallel/distributed.py 41.37% <0.00%> (-0.21%) ⬇️
torch/_torch_docs.py 100.00% <0.00%> (ø)
torch/nn/qat/modules/conv.py 100.00% <0.00%> (ø)
torch/nn/qat/modules/linear.py 100.00% <0.00%> (ø)
...ch/testing/_internal/common_methods_invocations.py 91.41% <0.00%> (ø)
.../testing/_internal/distributed/distributed_test.py 30.85% <0.00%> (+<0.01%) ⬆️
torch/nn/intrinsic/qat/modules/conv_fused.py 88.63% <0.00%> (+0.08%) ⬆️
...orch/testing/_internal/distributed/rpc/rpc_test.py 25.76% <0.00%> (+0.15%) ⬆️
torch/cuda/__init__.py 55.38% <0.00%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00e704e...c857acf. Read the comment docs.

}

int64_t FileStoreHandler::getNumKeys() {
CHECK(false) << "getNumKeys not implemented for FileStoreHandler";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be TORCH_CHECK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the Caffe2 codebase so I don't think we should use TORCH_CHECK. We could use CAFFE_ENFORCE, but the other unimplemented function in this class used CHECK(false) so I used it as well.

}

int64_t FileStore::getNumKeys() {
throw std::runtime_error("getNumKeys not implemented for FileStore");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TORCH_CHECK here for consistency?

}

int64_t HashStore::getNumKeys() {
throw std::runtime_error("getNumKeys not implemented for HashStore");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TORCH_CHECK/

Copy link
Contributor

@lw lw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since currently this method (and deleteKey in the next pull request) is only supported in the TcpStore, does it make sense to add it to the virtual superclass? Most "users" of the stores (I am thinking of the RPC agents, but I suppose it's the same elsewhere) only accept a pointer to the virtual superclass. How can they reliably call getNumKeys on it, knowing that such a call would fail on most concrete implementations?

An alternative approach could be to add getNumKeys and deleteKey only to TcpStore, and not to the superclass. If someone wants to call these methods, they can first dynamic_cast the Store pointer to a TcpStore pointer (and, by doing so, confirm that it is indeed a TcpStore and not some other type). And, if that works, they can confidently call getNumKeys.

This PR adds a getNumKeys API to the TCP Store, which essentially returns the number of keys in the store at that point. This API will be useful for some applications related to debug logging in ProcessGroupNCCL going forward. This PR also adds some C++ tests for this API and Python tests are added in #45223. We will build on this functionality in the future by implementing this API for FileStore, HashStore, RedisStore, and ZeusStore.

Differential Revision: [D22985085](https://our.internmc.facebook.com/intern/diff/D22985085/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22985085/)!

[ghstack-poisoned]
This PR adds a getNumKeys API to the TCP Store, which essentially returns the number of keys in the store at that point. This API will be useful for some applications related to debug logging in ProcessGroupNCCL going forward. This PR also adds some C++ tests for this API and Python tests are added in #45223. We will build on this functionality in the future by implementing this API for FileStore, HashStore, RedisStore, and ZeusStore.

Differential Revision: [D22985085](https://our.internmc.facebook.com/intern/diff/D22985085/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22985085/)!

[ghstack-poisoned]
This PR adds a getNumKeys API to the TCP Store, which essentially returns the number of keys in the store at that point. This API will be useful for some applications related to debug logging in ProcessGroupNCCL going forward. This PR also adds some C++ tests for this API and Python tests are added in #45223. We will build on this functionality in the future by implementing this API for FileStore, HashStore, RedisStore, and ZeusStore.

Differential Revision: [D22985085](https://our.internmc.facebook.com/intern/diff/D22985085/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22985085/)!

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 304e1d1.

@facebook-github-bot facebook-github-bot deleted the gh/osalpekar/79/head branch September 29, 2020 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants