clamp Categorical logit from -inf to min_fifo when calculating entropy #41002

mattip · 2020-07-06T06:31:57Z

Fixes gh-40553 by clamping logit values when calculating Categorical.entropy

dr-ci · 2020-07-06T06:43:11Z

💊 CI failures summary and remediations

As of commit e3148bb (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 16 times.

ngimel · 2020-07-07T00:40:07Z

Flake errors are real.
@fritzo, @neerajprad do the changes look good to you otherwise?

neerajprad · 2020-07-07T05:58:54Z

torch/distributions/categorical.py

This will normalize but won't convert -inf to real.

I meant "real" as "non-inf".

Suggested change

# Convert -inf to a real number

# Normalize -inf to min_real

I meant that normalization below would retain -inf (indices with 0 prob).

>>> t = torch.tensor([float('-inf'), 2]) >>> t - t.logsumexp(dim=-1, keepdim=True) tensor([-inf, 0.])

So the comment should be merely "Normalize", right?

That's right, I just wanted to also make sure that we aren't assuming that the logits will get clipped here, but as far as I can see, entropy is the only place where -inf values could cause issues and that has been fixed separately.

What would you prefer the comment read? "Normalize -inf in logit"?

Simply "Normalize"

neerajprad · 2020-07-07T06:20:03Z

torch/distributions/categorical.py

This will be a good place to use xlogy when available.

I think xlogy calculates x * log(y) where here we want x * y

That's right, we are using the precomputed probs / logits attributes directly.

fritzo

LGTM except for the comment on line 54

torch/distributions/categorical.py

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2020-07-08T23:23:46Z

I see a related error on internal tests:

stderr:
test_entropy (test_distributions.TestJit) ... caffe2/torch/csrc/jit/ir/node_hashing.cpp:203:42: runtime error: -1.79769e+308 is outside the range of representable values of type 'float'
    #0 0x7ff8fec85d97 in torch::jit::HashNode::operator()(torch::jit::Node const*) const (/mnt/xarfuse/uid-30041/b83ac162-ns-4026533886/libomnibus.so+0x7013bd97)
    #1 0x7ff8fe5cd2f2 in std::pair<std::__detail::_Node_iterator<torch::jit::Node*, true, true>, bool> std::_Hashtable<torch::jit::Node*, torch::jit::Node*, std::allocator<torch::jit::Node*>, std::__detail::_Identity, torch::jit::EqualNode, torch::jit::HashNode, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert<torch::jit::Node* const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<torch::jit::Node*, true> > > >(torch::jit::Node* const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<torch::jit::Node*, true> > > const&, std::integral_constant<bool, true>) (/mnt/xarfuse/uid-30041/b83ac162-ns-4026533886/libomnibus.so+0x6fa832f2)
    #2 0x7ff8fe5e3c1c in torch::jit::(anonymous namespace)::ConstantPooling(torch::jit::Block*, std::unordered_set<torch::jit::Node*, torch::jit::HashNode, torch::jit::EqualNode, std::allocator<torch::jit::Node*> >&, torch::jit::AliasDb const&) (/mnt/xarfuse/uid-30041/b83ac162-ns-4026533886/libomnibus.so+0x6fa99c1c)
    #3 0x7ff8fe5e33d6 in torch::jit::ConstantPooling(std::shared_ptr<torch::jit::Graph> const&) (/mnt/xarfuse/uid-30041/b83ac162-ns-4026533886/libomnibus.so+0x6fa993d6)
    #4 0x7ff8fe79c93a in torch::jit::GraphFunction::optimized_graph() const (/mnt/xarfuse/uid-30041/b83ac162-ns-4026533886/libomnibus.so+0x6fc5293a)

This is a jit version of test_entropy which was likely touched by this diff, I don't know why OSS CI did not catch it.

fritzo · 2020-07-08T23:29:57Z

Do you understand why that error is being triggered? It looks as if torch.finfo() sees self.logits.dtype as torch.double, but that self.logits.clamp(...) is using type torch.float 😕

ngimel · 2020-07-09T00:43:36Z

I don't, maybe jit script does not correctly script finfo, or TestJit is using default double datatype? cc @suo, @eellison for internal jit failure.

ngimel · 2020-07-14T00:50:29Z

@mattip can you please add the following to your diff

diff --git a/pytorch/aten/src/ATen/core/ivalue.cpp b/fbcode/caffe2/aten/src/ATen/core/ivalue.cpp
--- a/pytorch/aten/src/ATen/core/ivalue.cpp
+++ b/pytorch/aten/src/ATen/core/ivalue.cpp
@@ -360,7 +360,7 @@
     case IValue::Tag::Double: {
       double d = v.toDouble();
       int c = std::fpclassify(d);
-      if (c == FP_NORMAL || c == FP_ZERO) {
+      if ((c == FP_NORMAL || c == FP_ZERO) && std::abs(d) < 1e10) {
         int64_t i = int64_t(d);
         if (double(i) == d) {
           return out << i << ".";
diff --git a/pytorch/torch/csrc/jit/ir/node_hashing.cpp b/fbcode/caffe2/torch/csrc/jit/ir/node_hashing.cpp
--- a/pytorch/torch/csrc/jit/ir/node_hashing.cpp
+++ b/pytorch/torch/csrc/jit/ir/node_hashing.cpp
@@ -200,7 +200,7 @@
     } else if (
         type->isSubtypeOf(NumberType::get()) &&
         k->kindOf(attr::value) == AttributeKind::f) {
-      constant_hash = std::hash<float>{}(k->f(attr::value));
+      constant_hash = std::hash<double>{}(k->f(attr::value));
     } else if (type->isSubtypeOf(BoolType::get())) {
       constant_hash = std::hash<bool>{}(k->i(attr::value));
     }

This solves internal test failures, but I want to make sure that OSS CI is ok with this too. Thanks!

mattip · 2020-07-14T06:14:19Z

This solves internal test failures, but I want to make sure that OSS CI is ok

@ngimel From my limited understanding the changes seem OK. I applied the patch. It would be good if we could also at some point add a test for this code path, maybe there are more like this lurking around

mattip · 2020-07-14T06:17:04Z

It seems there are merge conflicts. Rebased to clear them.

ngimel · 2020-07-14T16:35:05Z

These failures were coming from running regular tests under ASAN, we have an asan build in OSS CI, but for some reason it's not triggering those.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-07-15T00:31:31Z

@ngimel merged this pull request in a0f1101.

mattip mentioned this pull request Jul 6, 2020

Categorical entropy of logits is inconsistent with probs #40553

Closed

pytorchbot added the open source label Jul 6, 2020

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 7, 2020

neerajprad reviewed Jul 7, 2020

View reviewed changes

fritzo reviewed Jul 7, 2020

View reviewed changes

mattip commented Jul 7, 2020

View reviewed changes

torch/distributions/categorical.py Outdated Show resolved Hide resolved

fritzo approved these changes Jul 7, 2020

View reviewed changes

ngimel approved these changes Jul 8, 2020

View reviewed changes

facebook-github-bot reviewed Jul 8, 2020

View reviewed changes

mattip requested a review from apaszke as a code owner July 14, 2020 06:12

mattip added 4 commits July 14, 2020 09:16

clamp Categorical logit from -inf to min_fifo when calculating entropy

91c081d

flake8

2191ab0

changes from review

0182cda

fixes from internal review

e3148bb

mattip force-pushed the issue-40553 branch from 6017c24 to e3148bb Compare July 14, 2020 06:16

facebook-github-bot reviewed Jul 14, 2020

View reviewed changes

facebook-github-bot closed this in a0f1101 Jul 14, 2020

facebook-github-bot added the merged label Jul 15, 2020

mruberry added the Merged label Oct 28, 2020

clamp Categorical logit from -inf to min_fifo when calculating entropy #41002

clamp Categorical logit from -inf to min_fifo when calculating entropy #41002

Uh oh!

Conversation

mattip commented Jul 6, 2020

Uh oh!

dr-ci bot commented Jul 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

ngimel commented Jul 7, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel commented Jul 8, 2020

Uh oh!

fritzo commented Jul 8, 2020

Uh oh!

ngimel commented Jul 9, 2020

Uh oh!

ngimel commented Jul 14, 2020

Uh oh!

mattip commented Jul 14, 2020

Uh oh!

mattip commented Jul 14, 2020

Uh oh!

ngimel commented Jul 14, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dr-ci bot commented Jul 6, 2020 •

edited

Loading