Skip to content

Conversation

@MicaelCarvalho
Copy link
Contributor

This PR probably needs to be discussed. I'm going to present the problem, what changes with the PR and the possible concerns with respect to these changes.


The problem

First known report on Slack's channel beginner on 27 sept, by myself.

The function multinomial assumes non-negativity and non-zero sum for the input, however, it does not treat elements with value = zero, and returns random-like indices instead. Example:

import torch
weights = torch.Tensor([0, 10, 3, 0])
torch.multinomial(weights, 4, replacement=False)
# > 1, 2, 0, 0
weights = torch.Tensor([1, 10, 3, 0])
torch.multinomial(weights, 4, replacement=False)
# > 1, 0, 2, 1

Basically, in the first example it runs out of 'positive' probabilities and starts outputting 0 (always). But on the second example it outputs 1 when it doesn't have any probs left. The biggest problem is that there is no reason for it to output 1, and it is not always 1 (sometimes 0, never 2).

Taking a closer look on the code, I realized the function was designed to work with non-zero values, and not non-zero sum, it presents erratic behavior when there are zeros inside the matrix. This seems to be an undesirable behavior, since the caller would have to manually check every probability, and it is not that uncommon to find situations in which we have a weight matrix with zeros inside and we still want to sample from it.

What changed

Two constraints were removed from the multinomial function:

  1. Number of samples must be inferior or equal to the number of categories
  2. Sum of values must be positive

Values are sampled in the same was as before, except when there is no values left to be sampled. In the latter case, the sampled position is -1, and the examples given before output the following now:

import torch
weights = torch.Tensor([0, 10, 3, 0])
torch.multinomial(weights, 4, replacement=False)
# > 1, 2, -1, -1
weights = torch.Tensor([1, 10, 3, 0])
torch.multinomial(weights, 4, replacement=False)
# > 1, 2, 0, -1 # or "1, 0, 2, -1", etc. Still a weighted random on the probs

With the difference that zero-sum probabilities are accepted, and the number of samples can be bigger than the number of categories:

import torch
weights = torch.Tensor([0, 0, 0, 0])
torch.multinomial(weights, 4, replacement=False) # zero sum
# > -1, -1, -1, -1
weights = torch.Tensor([1, 10, 3, 0])
torch.multinomial(weights, 6, replacement=False) # 6 samples
# > 2, 1, 0, -1, -1, -1

What did not change

Values are still supposed to be non-negative. Passing negative values for the probabilities will affect the behavior of the function:

import torch
weights = torch.Tensor([0, 0, 6, -5])
torch.multinomial(weights, 4, replacement=False) # negative value inside
# > 2, -1, -1, -1 # positive value outweighs negative value, but things are weird in the probability vector inside the function
weights = torch.Tensor([1, 4, 1, -5])
torch.multinomial(weights, 4, replacement=False) # negative value inside
# > 0, -1, -1, -1 # positive value outweighs negative value and allow for only 1 sample to be drawn, weird things in prob vector as well
weights = torch.Tensor([0, 0, 1, -5])
torch.multinomial(weights, 4, replacement=False) # negative value inside
# > -1, -1, -1, -1 # Sum is negative, it will skip sampling

What should be debated about these changes

The behavior of the function changed: it used to output "random" (not really, but well...) indexes when the vector reached probability 0 because of some zero-valued item inside of it ; now it outputs -1 for every sample not drawn. The previous behavior allowed for anyone to use the output of the multinomial function to index a matrix, later erasing invalid elements -- the problem is that these invalid elements were not easily identifiable.

With the proposed changes, the user would have to replace the -1 indexes with a valid index before using the output of multinomial to index a vector/matrix. The benefit of this approach is that it would easy to identify any invalid elements (i.e. prob = 0) to remove them from the samples.

@killeent
Copy link
Contributor

I'm surprised we allow negative values for the probability vector. This doesn't particularly make sense to me. @MicaelCarvalho is this something you want to be able to do?

@MicaelCarvalho
Copy link
Contributor Author

@killeent no, I don't see any reason to allow negative values as probability. The examples I gave are just to show that this problem wasn't addressed, while on the current implementation it kinda works (because of line 294, I suppose).

Maybe for the current implementation there are some corner cases for negative values that are not dealt with, and I can do further analysis if the problem is pertinent (IMHO it is not). A question I would raise is: should negative probabilities be allowed (and I don't think they should), isn't 0 supposed to have priority over them?

@killeent
Copy link
Contributor

killeent commented Sep 29, 2017

@MicaelCarvalho I agree that that is something that we should not allow.

Note that you would also have to modify THC to make the same changes to that codebase. I haven't looked into it enough to know how difficult that would be.

As far as the changes themselves, I'm okay with replacing the undefined outputs with -1's for cases where we have zero-probability regions. This change wouldn't be modifying the correctness of code, as if they wanted to index using the result then they would still have the same problems as before. I don't think this would be a breaking change particularly.

I'm less sure about allowing the number of samples can be bigger than the number of categories when sampling without replacement. What does it mean to sample 6 values from a distribution of four probabilities, without replacement? This to me seems like it should be an error still.

@MicaelCarvalho
Copy link
Contributor Author

@killeent I'm going to check what need to be done in THC. I don't have experience with CUDA but I should be able to learn it quickly.

As for the values, should we actually check for negatives or just assume the input is correct (as before)? The categories it is an easy check, just putting back the code that did it.

@apaszke
Copy link
Contributor

apaszke commented Sep 29, 2017

I haven't looked at the code yet, but returning -1 doesn't seem like a good idea. We should try to raise errors instead of silently returning outputs that are not what the user meant.

@MicaelCarvalho
Copy link
Contributor Author

@apaszke unfortunately in this situation raising an exception is not possible. Suppose that for each sample in your batch you're trying to retrieve the closest matches, without always taking the closest one, but rather taking the similarity as probabilities of selecting them (i.e. a stochastic approach).

For this situation, you will have a probability matrix that might look like this:

2, 1, 0, 3
3, 5, 1, 2
6, 0, 0, 0
0, 0, 2, 3

(oversimplified example, of course)

In this case, we cannot raise an exception, because each line has a different number of "valid" probabilities. The second line can pick 4 elements, but the third can only pick one.

Trimming the matrix before calling multinomial is not an option, because we would cut out 3 valid samples from line two that had potential to be selected. The only solution I see to allow this kind of approach is to have multinomial return something to indicate that no sample could be drawn.

Being able to pass a vector/matrix of probabilities that has a 0 inside to multinomial seems to be a valid assumption, but in the current implementation it is does not work...

On a side note: I have not yet had the time to look into THC, I hope to be able to do it this week.

@apaszke
Copy link
Contributor

apaszke commented Oct 2, 2017

Just because different distributions might have different numbers of "valid" classes (for some definition of valid), doesn't mean we can't raise errors. The only question is if we can do this without sacrifising performance. If it's not possible then I think I'd be ok with making replacement=False to return classes with 0 prob.

@MicaelCarvalho
Copy link
Contributor Author

@apaszke I understand your point, but in this case the user would have to "manually" count the number of invalid probabilities in each line to later decide what is valid and what is not. In other words: there would be no direct way of knowing if the second sample drawn for line 3 originally had probability 0 or not — i.e. whether it is valid or not.

That being said, it is definitely doable, just not straightforward. :)

@weiyangfb
Copy link
Contributor

@MicaelCarvalho any update on this PR? Also a rebase is needed now.

@weiyangfb weiyangfb added the awaiting response (this tag is deprecated) This tag is deprecated while we figure out what to do with it label Aug 14, 2018
@soumith
Copy link
Contributor

soumith commented Aug 15, 2018

this PR isn't relevant anymore afaik.

cc: @t-vi who knows this code the best at the moment.

@MicaelCarvalho
Copy link
Contributor Author

Sorry, I got caught up in some projects and did not follow up on this PR. It no longer applies to the current build of PyTorch.

However, it's worth mentioning that the problem of returning random-like indices was not addressed. The following code exemplifies the behavior:

import torch
weights = torch.Tensor([1, 10, 3, 0])
torch.multinomial(weights, 4, replacement=False)

Outputs:

>>> torch.multinomial(weights, 4, replacement=False)
tensor([2, 1, 0, 1])
>>> torch.multinomial(weights, 4, replacement=False)
tensor([1, 2, 0, 1])
>>> torch.multinomial(weights, 4, replacement=False)
tensor([0, 1, 2, 0])
>>> torch.multinomial(weights, 4, replacement=False)
tensor([1, 2, 0, 1])
>>> torch.multinomial(weights, 4, replacement=False)
tensor([1, 0, 2, 2])

With replacement=False, the function should not repeat positions, even if the remaining ones have probability of 0. This is the same issue I tried to correct with my initial PR.

In my opinion, the torch.multinomial function should behave like numpy.random.choice with p=weights when replacement=False (but accepting probs of 0), and like numpy.random.multinomial when replacement=True.

Am I wrong to understand it this way?
Would love to see your opinions on this.

@t-vi
Copy link
Collaborator

t-vi commented Aug 21, 2018 via email

facebook-github-bot pushed a commit that referenced this pull request Oct 11, 2018
Summary:
Fixes #12260 #2896

```
torch.multinomial(torch.FloatTensor([0, 1, 0, 0]), 3, replacement=False)
```
The old behavior is that we return `0` after we run out of postive categories. Now we raise an error based on discussion in the issue thread.

- Add testcase for cpu & cuda case, in cuda case `n_samples=1` is a simple special case, so we test against `n_sample=2` instead.
Pull Request resolved: #12490

Differential Revision: D10278794

Pulled By: ailzhang

fbshipit-source-id: d04de7a60f60d0c0d648b975db3f3961fcf42db1
@zou3519
Copy link
Contributor

zou3519 commented Dec 11, 2018

We've since fixed this:

In [2]: import torch
   ...: weights = torch.Tensor([1, 10, 3, 0])
   ...: torch.multinomial(weights, 4, replacement=False)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-31cc2507d325> in <module>
      1 import torch
      2 weights = torch.Tensor([1, 10, 3, 0])
----> 3 torch.multinomial(weights, 4, replacement=False)

RuntimeError: invalid argument 2: invalid multinomial distribution (with replacement=False, not enough non-negative cat
egory to sample) at ../aten/src/TH/generic/THTensorRandom.cpp:320

Please let me know if that doesn't fully address your points @MicaelCarvalho

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting response (this tag is deprecated) This tag is deprecated while we figure out what to do with it open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants