Fix multinomial sampling with total/partial probabilities = 0 #2896

MicaelCarvalho · 2017-09-29T15:26:39Z

This PR probably needs to be discussed. I'm going to present the problem, what changes with the PR and the possible concerns with respect to these changes.

The problem

First known report on Slack's channel beginner on 27 sept, by myself.

The function multinomial assumes non-negativity and non-zero sum for the input, however, it does not treat elements with value = zero, and returns random-like indices instead. Example:

import torch
weights = torch.Tensor([0, 10, 3, 0])
torch.multinomial(weights, 4, replacement=False)
# > 1, 2, 0, 0
weights = torch.Tensor([1, 10, 3, 0])
torch.multinomial(weights, 4, replacement=False)
# > 1, 0, 2, 1

Basically, in the first example it runs out of 'positive' probabilities and starts outputting 0 (always). But on the second example it outputs 1 when it doesn't have any probs left. The biggest problem is that there is no reason for it to output 1, and it is not always 1 (sometimes 0, never 2).

Taking a closer look on the code, I realized the function was designed to work with non-zero values, and not non-zero sum, it presents erratic behavior when there are zeros inside the matrix. This seems to be an undesirable behavior, since the caller would have to manually check every probability, and it is not that uncommon to find situations in which we have a weight matrix with zeros inside and we still want to sample from it.

What changed

Two constraints were removed from the multinomial function:

Number of samples must be inferior or equal to the number of categories
Sum of values must be positive

Values are sampled in the same was as before, except when there is no values left to be sampled. In the latter case, the sampled position is -1, and the examples given before output the following now:

import torch
weights = torch.Tensor([0, 10, 3, 0])
torch.multinomial(weights, 4, replacement=False)
# > 1, 2, -1, -1
weights = torch.Tensor([1, 10, 3, 0])
torch.multinomial(weights, 4, replacement=False)
# > 1, 2, 0, -1 # or "1, 0, 2, -1", etc. Still a weighted random on the probs

With the difference that zero-sum probabilities are accepted, and the number of samples can be bigger than the number of categories:

import torch
weights = torch.Tensor([0, 0, 0, 0])
torch.multinomial(weights, 4, replacement=False) # zero sum
# > -1, -1, -1, -1
weights = torch.Tensor([1, 10, 3, 0])
torch.multinomial(weights, 6, replacement=False) # 6 samples
# > 2, 1, 0, -1, -1, -1

What did not change

Values are still supposed to be non-negative. Passing negative values for the probabilities will affect the behavior of the function:

import torch
weights = torch.Tensor([0, 0, 6, -5])
torch.multinomial(weights, 4, replacement=False) # negative value inside
# > 2, -1, -1, -1 # positive value outweighs negative value, but things are weird in the probability vector inside the function
weights = torch.Tensor([1, 4, 1, -5])
torch.multinomial(weights, 4, replacement=False) # negative value inside
# > 0, -1, -1, -1 # positive value outweighs negative value and allow for only 1 sample to be drawn, weird things in prob vector as well
weights = torch.Tensor([0, 0, 1, -5])
torch.multinomial(weights, 4, replacement=False) # negative value inside
# > -1, -1, -1, -1 # Sum is negative, it will skip sampling

What should be debated about these changes

The behavior of the function changed: it used to output "random" (not really, but well...) indexes when the vector reached probability 0 because of some zero-valued item inside of it ; now it outputs -1 for every sample not drawn. The previous behavior allowed for anyone to use the output of the multinomial function to index a matrix, later erasing invalid elements -- the problem is that these invalid elements were not easily identifiable.

With the proposed changes, the user would have to replace the -1 indexes with a valid index before using the output of multinomial to index a vector/matrix. The benefit of this approach is that it would easy to identify any invalid elements (i.e. prob = 0) to remove them from the samples.

killeent · 2017-09-29T16:52:29Z

I'm surprised we allow negative values for the probability vector. This doesn't particularly make sense to me. @MicaelCarvalho is this something you want to be able to do?

MicaelCarvalho · 2017-09-29T17:01:04Z

@killeent no, I don't see any reason to allow negative values as probability. The examples I gave are just to show that this problem wasn't addressed, while on the current implementation it kinda works (because of line 294, I suppose).

Maybe for the current implementation there are some corner cases for negative values that are not dealt with, and I can do further analysis if the problem is pertinent (IMHO it is not). A question I would raise is: should negative probabilities be allowed (and I don't think they should), isn't 0 supposed to have priority over them?

killeent · 2017-09-29T17:25:50Z

@MicaelCarvalho I agree that that is something that we should not allow.

Note that you would also have to modify THC to make the same changes to that codebase. I haven't looked into it enough to know how difficult that would be.

As far as the changes themselves, I'm okay with replacing the undefined outputs with -1's for cases where we have zero-probability regions. This change wouldn't be modifying the correctness of code, as if they wanted to index using the result then they would still have the same problems as before. I don't think this would be a breaking change particularly.

I'm less sure about allowing the number of samples can be bigger than the number of categories when sampling without replacement. What does it mean to sample 6 values from a distribution of four probabilities, without replacement? This to me seems like it should be an error still.

MicaelCarvalho · 2017-09-29T17:44:29Z

@killeent I'm going to check what need to be done in THC. I don't have experience with CUDA but I should be able to learn it quickly.

As for the values, should we actually check for negatives or just assume the input is correct (as before)? The categories it is an easy check, just putting back the code that did it.

apaszke · 2017-09-29T18:10:55Z

I haven't looked at the code yet, but returning -1 doesn't seem like a good idea. We should try to raise errors instead of silently returning outputs that are not what the user meant.

MicaelCarvalho · 2017-10-02T09:45:48Z

@apaszke unfortunately in this situation raising an exception is not possible. Suppose that for each sample in your batch you're trying to retrieve the closest matches, without always taking the closest one, but rather taking the similarity as probabilities of selecting them (i.e. a stochastic approach).

For this situation, you will have a probability matrix that might look like this:

2, 1, 0, 3
3, 5, 1, 2
6, 0, 0, 0
0, 0, 2, 3

(oversimplified example, of course)

In this case, we cannot raise an exception, because each line has a different number of "valid" probabilities. The second line can pick 4 elements, but the third can only pick one.

Trimming the matrix before calling multinomial is not an option, because we would cut out 3 valid samples from line two that had potential to be selected. The only solution I see to allow this kind of approach is to have multinomial return something to indicate that no sample could be drawn.

Being able to pass a vector/matrix of probabilities that has a 0 inside to multinomial seems to be a valid assumption, but in the current implementation it is does not work...

On a side note: I have not yet had the time to look into THC, I hope to be able to do it this week.

apaszke · 2017-10-02T10:36:52Z

Just because different distributions might have different numbers of "valid" classes (for some definition of valid), doesn't mean we can't raise errors. The only question is if we can do this without sacrifising performance. If it's not possible then I think I'd be ok with making replacement=False to return classes with 0 prob.

MicaelCarvalho · 2017-10-02T10:53:45Z

@apaszke I understand your point, but in this case the user would have to "manually" count the number of invalid probabilities in each line to later decide what is valid and what is not. In other words: there would be no direct way of knowing if the second sample drawn for line 3 originally had probability 0 or not — i.e. whether it is valid or not.

That being said, it is definitely doable, just not straightforward. :)

weiyangfb · 2018-08-14T22:14:01Z

@MicaelCarvalho any update on this PR? Also a rebase is needed now.

soumith · 2018-08-15T18:15:02Z

this PR isn't relevant anymore afaik.

cc: @t-vi who knows this code the best at the moment.

MicaelCarvalho · 2018-08-21T14:25:44Z

Sorry, I got caught up in some projects and did not follow up on this PR. It no longer applies to the current build of PyTorch.

However, it's worth mentioning that the problem of returning random-like indices was not addressed. The following code exemplifies the behavior:

import torch
weights = torch.Tensor([1, 10, 3, 0])
torch.multinomial(weights, 4, replacement=False)

Outputs:

>>> torch.multinomial(weights, 4, replacement=False)
tensor([2, 1, 0, 1])
>>> torch.multinomial(weights, 4, replacement=False)
tensor([1, 2, 0, 1])
>>> torch.multinomial(weights, 4, replacement=False)
tensor([0, 1, 2, 0])
>>> torch.multinomial(weights, 4, replacement=False)
tensor([1, 2, 0, 1])
>>> torch.multinomial(weights, 4, replacement=False)
tensor([1, 0, 2, 2])

With replacement=False, the function should not repeat positions, even if the remaining ones have probability of 0. This is the same issue I tried to correct with my initial PR.

In my opinion, the torch.multinomial function should behave like numpy.random.choice with p=weights when replacement=False (but accepting probs of 0), and like numpy.random.multinomial when replacement=True.

Am I wrong to understand it this way?
Would love to see your opinions on this.

t-vi · 2018-08-21T16:01:32Z

I think the repeat=False bit hasn't changed much, we've worked to avoid prob 0 for repeat=True. I'll look into this next week. Of course, repeat=False makes limited sense when the number of samples is close or larger than the number of classes - it's more like a "random, but probability weighted distribution". Returning -1 might be an option when considering multinomial by itself, but that might wreck havoc in the further use (eg index_* functions).

Summary: Fixes #12260 #2896 ``` torch.multinomial(torch.FloatTensor([0, 1, 0, 0]), 3, replacement=False) ``` The old behavior is that we return `0` after we run out of postive categories. Now we raise an error based on discussion in the issue thread. - Add testcase for cpu & cuda case, in cuda case `n_samples=1` is a simple special case, so we test against `n_sample=2` instead. Pull Request resolved: #12490 Differential Revision: D10278794 Pulled By: ailzhang fbshipit-source-id: d04de7a60f60d0c0d648b975db3f3961fcf42db1

zou3519 · 2018-12-11T17:44:46Z

We've since fixed this:

In [2]: import torch
   ...: weights = torch.Tensor([1, 10, 3, 0])
   ...: torch.multinomial(weights, 4, replacement=False)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-31cc2507d325> in <module>
      1 import torch
      2 weights = torch.Tensor([1, 10, 3, 0])
----> 3 torch.multinomial(weights, 4, replacement=False)

RuntimeError: invalid argument 2: invalid multinomial distribution (with replacement=False, not enough non-negative cat
egory to sample) at ../aten/src/TH/generic/THTensorRandom.cpp:320

Please let me know if that doesn't fully address your points @MicaelCarvalho

Fix multinomial sampling with total/partial probabilities = 0

9702131

weiyangfb added the awaiting response (this tag is deprecated) This tag is deprecated while we figure out what to do with it label Aug 14, 2018

t-vi mentioned this pull request Sep 21, 2018

CUDA error with torch.multinomial #9062

Closed

ailzhang mentioned this pull request Oct 9, 2018

Multinomial raise error #12490

Closed

zou3519 closed this Dec 11, 2018

ezyang added the open source label Jun 24, 2019

Fix multinomial sampling with total/partial probabilities = 0 #2896

Fix multinomial sampling with total/partial probabilities = 0 #2896

Uh oh!

Conversation

MicaelCarvalho commented Sep 29, 2017

The problem

What changed

What did not change

What should be debated about these changes

Uh oh!

killeent commented Sep 29, 2017

Uh oh!

MicaelCarvalho commented Sep 29, 2017

Uh oh!

killeent commented Sep 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MicaelCarvalho commented Sep 29, 2017

Uh oh!

apaszke commented Sep 29, 2017

Uh oh!

MicaelCarvalho commented Oct 2, 2017

Uh oh!

apaszke commented Oct 2, 2017

Uh oh!

MicaelCarvalho commented Oct 2, 2017

Uh oh!

weiyangfb commented Aug 14, 2018

Uh oh!

soumith commented Aug 15, 2018

Uh oh!

MicaelCarvalho commented Aug 21, 2018

Uh oh!

t-vi commented Aug 21, 2018 via email

Uh oh!

zou3519 commented Dec 11, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

killeent commented Sep 29, 2017 •

edited

Loading