-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix multinomial sampling with total/partial probabilities = 0 #2896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm surprised we allow negative values for the probability vector. This doesn't particularly make sense to me. @MicaelCarvalho is this something you want to be able to do? |
|
@killeent no, I don't see any reason to allow negative values as probability. The examples I gave are just to show that this problem wasn't addressed, while on the current implementation it kinda works (because of line 294, I suppose). Maybe for the current implementation there are some corner cases for negative values that are not dealt with, and I can do further analysis if the problem is pertinent (IMHO it is not). A question I would raise is: should negative probabilities be allowed (and I don't think they should), isn't 0 supposed to have priority over them? |
|
@MicaelCarvalho I agree that that is something that we should not allow. Note that you would also have to modify As far as the changes themselves, I'm okay with replacing the undefined outputs with -1's for cases where we have zero-probability regions. This change wouldn't be modifying the correctness of code, as if they wanted to index using the result then they would still have the same problems as before. I don't think this would be a breaking change particularly. I'm less sure about allowing the |
|
@killeent I'm going to check what need to be done in THC. I don't have experience with CUDA but I should be able to learn it quickly. As for the values, should we actually check for negatives or just assume the input is correct (as before)? The categories it is an easy check, just putting back the code that did it. |
|
I haven't looked at the code yet, but returning -1 doesn't seem like a good idea. We should try to raise errors instead of silently returning outputs that are not what the user meant. |
|
@apaszke unfortunately in this situation raising an exception is not possible. Suppose that for each sample in your batch you're trying to retrieve the closest matches, without always taking the closest one, but rather taking the similarity as probabilities of selecting them (i.e. a stochastic approach). For this situation, you will have a probability matrix that might look like this: (oversimplified example, of course) In this case, we cannot raise an exception, because each line has a different number of "valid" probabilities. The second line can pick 4 elements, but the third can only pick one. Trimming the matrix before calling Being able to pass a vector/matrix of probabilities that has a 0 inside to On a side note: I have not yet had the time to look into THC, I hope to be able to do it this week. |
|
Just because different distributions might have different numbers of "valid" classes (for some definition of valid), doesn't mean we can't raise errors. The only question is if we can do this without sacrifising performance. If it's not possible then I think I'd be ok with making |
|
@apaszke I understand your point, but in this case the user would have to "manually" count the number of invalid probabilities in each line to later decide what is valid and what is not. In other words: there would be no direct way of knowing if the second sample drawn for line 3 originally had probability 0 or not — i.e. whether it is valid or not. That being said, it is definitely doable, just not straightforward. :) |
|
@MicaelCarvalho any update on this PR? Also a rebase is needed now. |
|
this PR isn't relevant anymore afaik. cc: @t-vi who knows this code the best at the moment. |
|
Sorry, I got caught up in some projects and did not follow up on this PR. It no longer applies to the current build of PyTorch. However, it's worth mentioning that the problem of returning random-like indices was not addressed. The following code exemplifies the behavior: Outputs: With replacement=False, the function should not repeat positions, even if the remaining ones have probability of 0. This is the same issue I tried to correct with my initial PR. In my opinion, the Am I wrong to understand it this way? |
|
I think the repeat=False bit hasn't changed much, we've worked to avoid prob 0 for repeat=True. I'll look into this next week. Of course, repeat=False makes limited sense when the number of samples is close or larger than the number of classes - it's more like a "random, but probability weighted distribution". Returning -1 might be an option when considering multinomial by itself, but that might wreck havoc in the further use (eg index_* functions).
|
Summary: Fixes #12260 #2896 ``` torch.multinomial(torch.FloatTensor([0, 1, 0, 0]), 3, replacement=False) ``` The old behavior is that we return `0` after we run out of postive categories. Now we raise an error based on discussion in the issue thread. - Add testcase for cpu & cuda case, in cuda case `n_samples=1` is a simple special case, so we test against `n_sample=2` instead. Pull Request resolved: #12490 Differential Revision: D10278794 Pulled By: ailzhang fbshipit-source-id: d04de7a60f60d0c0d648b975db3f3961fcf42db1
|
We've since fixed this: Please let me know if that doesn't fully address your points @MicaelCarvalho |
This PR probably needs to be discussed. I'm going to present the problem, what changes with the PR and the possible concerns with respect to these changes.
The problem
First known report on Slack's channel
beginneron 27 sept, by myself.The function
multinomialassumes non-negativity and non-zero sum for the input, however, it does not treat elements with value = zero, and returns random-like indices instead. Example:Basically, in the first example it runs out of 'positive' probabilities and starts outputting 0 (always). But on the second example it outputs 1 when it doesn't have any probs left. The biggest problem is that there is no reason for it to output 1, and it is not always 1 (sometimes 0, never 2).
Taking a closer look on the code, I realized the function was designed to work with non-zero values, and not non-zero sum, it presents erratic behavior when there are zeros inside the matrix. This seems to be an undesirable behavior, since the caller would have to manually check every probability, and it is not that uncommon to find situations in which we have a weight matrix with zeros inside and we still want to sample from it.
What changed
Two constraints were removed from the
multinomialfunction:Values are sampled in the same was as before, except when there is no values left to be sampled. In the latter case, the sampled position is
-1, and the examples given before output the following now:With the difference that zero-sum probabilities are accepted, and the number of samples can be bigger than the number of categories:
What did not change
Values are still supposed to be non-negative. Passing negative values for the probabilities will affect the behavior of the function:
What should be debated about these changes
The behavior of the function changed: it used to output "random" (not really, but well...) indexes when the vector reached probability 0 because of some zero-valued item inside of it ; now it outputs -1 for every sample not drawn. The previous behavior allowed for anyone to use the output of the
multinomialfunction to index a matrix, later erasing invalid elements -- the problem is that these invalid elements were not easily identifiable.With the proposed changes, the user would have to replace the -1 indexes with a valid index before using the output of
multinomialto index a vector/matrix. The benefit of this approach is that it would easy to identify any invalid elements (i.e. prob = 0) to remove them from the samples.