Skip to content

Conversation

@mattip
Copy link
Contributor

@mattip mattip commented Jul 6, 2020

Fixes gh-40553 by clamping logit values when calculating Categorical.entropy

@dr-ci
Copy link

dr-ci bot commented Jul 6, 2020

💊 CI failures summary and remediations

As of commit e3148bb (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 16 times.

@ngimel
Copy link
Collaborator

ngimel commented Jul 7, 2020

Flake errors are real.
@fritzo, @neerajprad do the changes look good to you otherwise?

@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 7, 2020
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will normalize but won't convert -inf to real.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant "real" as "non-inf".

Suggested change
# Convert -inf to a real number
# Normalize -inf to min_real

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that normalization below would retain -inf (indices with 0 prob).

>>> t = torch.tensor([float('-inf'), 2])
>>> t - t.logsumexp(dim=-1, keepdim=True)
tensor([-inf, 0.])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the comment should be merely "Normalize", right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I just wanted to also make sure that we aren't assuming that the logits will get clipped here, but as far as I can see, entropy is the only place where -inf values could cause issues and that has been fixed separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you prefer the comment read? "Normalize -inf in logit"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply "Normalize"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a good place to use xlogy when available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think xlogy calculates x * log(y) where here we want x * y

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, we are using the precomputed probs / logits attributes directly.

Copy link
Collaborator

@fritzo fritzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the comment on line 54

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ngimel
Copy link
Collaborator

ngimel commented Jul 8, 2020

I see a related error on internal tests:

stderr:
test_entropy (test_distributions.TestJit) ... caffe2/torch/csrc/jit/ir/node_hashing.cpp:203:42: runtime error: -1.79769e+308 is outside the range of representable values of type 'float'
    #0 0x7ff8fec85d97 in torch::jit::HashNode::operator()(torch::jit::Node const*) const (/mnt/xarfuse/uid-30041/b83ac162-ns-4026533886/libomnibus.so+0x7013bd97)
    #1 0x7ff8fe5cd2f2 in std::pair<std::__detail::_Node_iterator<torch::jit::Node*, true, true>, bool> std::_Hashtable<torch::jit::Node*, torch::jit::Node*, std::allocator<torch::jit::Node*>, std::__detail::_Identity, torch::jit::EqualNode, torch::jit::HashNode, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert<torch::jit::Node* const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<torch::jit::Node*, true> > > >(torch::jit::Node* const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<torch::jit::Node*, true> > > const&, std::integral_constant<bool, true>) (/mnt/xarfuse/uid-30041/b83ac162-ns-4026533886/libomnibus.so+0x6fa832f2)
    #2 0x7ff8fe5e3c1c in torch::jit::(anonymous namespace)::ConstantPooling(torch::jit::Block*, std::unordered_set<torch::jit::Node*, torch::jit::HashNode, torch::jit::EqualNode, std::allocator<torch::jit::Node*> >&, torch::jit::AliasDb const&) (/mnt/xarfuse/uid-30041/b83ac162-ns-4026533886/libomnibus.so+0x6fa99c1c)
    #3 0x7ff8fe5e33d6 in torch::jit::ConstantPooling(std::shared_ptr<torch::jit::Graph> const&) (/mnt/xarfuse/uid-30041/b83ac162-ns-4026533886/libomnibus.so+0x6fa993d6)
    #4 0x7ff8fe79c93a in torch::jit::GraphFunction::optimized_graph() const (/mnt/xarfuse/uid-30041/b83ac162-ns-4026533886/libomnibus.so+0x6fc5293a)

This is a jit version of test_entropy which was likely touched by this diff, I don't know why OSS CI did not catch it.

@fritzo
Copy link
Collaborator

fritzo commented Jul 8, 2020

Do you understand why that error is being triggered? It looks as if torch.finfo() sees self.logits.dtype as torch.double, but that self.logits.clamp(...) is using type torch.float 😕

@ngimel
Copy link
Collaborator

ngimel commented Jul 9, 2020

I don't, maybe jit script does not correctly script finfo, or TestJit is using default double datatype? cc @suo, @eellison for internal jit failure.

@ngimel
Copy link
Collaborator

ngimel commented Jul 14, 2020

@mattip can you please add the following to your diff

diff --git a/pytorch/aten/src/ATen/core/ivalue.cpp b/fbcode/caffe2/aten/src/ATen/core/ivalue.cpp
--- a/pytorch/aten/src/ATen/core/ivalue.cpp
+++ b/pytorch/aten/src/ATen/core/ivalue.cpp
@@ -360,7 +360,7 @@
     case IValue::Tag::Double: {
       double d = v.toDouble();
       int c = std::fpclassify(d);
-      if (c == FP_NORMAL || c == FP_ZERO) {
+      if ((c == FP_NORMAL || c == FP_ZERO) && std::abs(d) < 1e10) {
         int64_t i = int64_t(d);
         if (double(i) == d) {
           return out << i << ".";
diff --git a/pytorch/torch/csrc/jit/ir/node_hashing.cpp b/fbcode/caffe2/torch/csrc/jit/ir/node_hashing.cpp
--- a/pytorch/torch/csrc/jit/ir/node_hashing.cpp
+++ b/pytorch/torch/csrc/jit/ir/node_hashing.cpp
@@ -200,7 +200,7 @@
     } else if (
         type->isSubtypeOf(NumberType::get()) &&
         k->kindOf(attr::value) == AttributeKind::f) {
-      constant_hash = std::hash<float>{}(k->f(attr::value));
+      constant_hash = std::hash<double>{}(k->f(attr::value));
     } else if (type->isSubtypeOf(BoolType::get())) {
       constant_hash = std::hash<bool>{}(k->i(attr::value));
     }

This solves internal test failures, but I want to make sure that OSS CI is ok with this too. Thanks!

@mattip mattip requested a review from apaszke as a code owner July 14, 2020 06:12
@mattip
Copy link
Contributor Author

mattip commented Jul 14, 2020

This solves internal test failures, but I want to make sure that OSS CI is ok

@ngimel From my limited understanding the changes seem OK. I applied the patch. It would be good if we could also at some point add a test for this code path, maybe there are more like this lurking around

@mattip
Copy link
Contributor Author

mattip commented Jul 14, 2020

It seems there are merge conflicts. Rebased to clear them.

@ngimel
Copy link
Collaborator

ngimel commented Jul 14, 2020

These failures were coming from running regular tests under ASAN, we have an asan build in OSS CI, but for some reason it's not triggering those.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in a0f1101.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Categorical entropy of logits is inconsistent with probs

7 participants