Showing posts with label estimation. Show all posts
Showing posts with label estimation. Show all posts

Tuesday, January 24, 2012

Beating Condorcet (well, sort of)

This builds on, but also goes back over the ground of, my previous post.

I've been playing with voting methods, or as I might prefer to call them "utility estimate aggregation methods." My basic model is there are n options (say, candidates) to choose between and m evaluators ("voters"). The evaluators would like to choose the option that has the highest utility. Unfortunately, the actual utilities of the options are not known, and all we have are estimates of the utilities by all the evaluators.

A standard method for this is the Condorcet method. An option is a Condorcet winner provided that it "beats" every other option, when an option x "beats" an option y provided that a majority of the evaluators estimates x more highly than y. If there is no Condorcet winner, there are further resolution methods, but I will only be looking at cases where there is a Condorcet winner.

My first method is

  • Method A: Estimate each option's utility with the arithmetical average of the reported utilities assigned to it by all the evaluators, and choose the option with the highest utility.
(I will be ignoring tie-resolution in this post, because all the utilities I will work with are real-numbered, and the probability of a tie will be zero.) This method can be proved to maximize epistemically expected utility under the
  • Basic Setup: Each evaluator's reported estimate of each option's utility is equal to the actual utility plus an error term. The error terms are (a) independent of the actual utilities and (b) normally distributed with mean zero. Moreover, (c) our information as to the variances of the error terms is symmetric between the evaluators, but need not be symmetric between the options (thus, we may know that option 3 has a higher variance in its error terms than option 7; we may also know that some evaluators have a greater variance in their error terms; but we do not know which evaluators have a greater variance than which).

Unfortunately, it is really hard to estimate absolute utility numbers. It is a lot easier to rank order utilities. And that's all Condorcet needs. So in that way at least, Condorcet is superior to Method A. To fix this, modify the Basic Setup to:

  • Modified Setup: Just like the Basic Setup, except that what is reported by each evaluator is not the actual utility plus error term, but the rank order of the actual utility plus error term.
In particular, we still assume that beneath the surface—perhaps implicitly—there is a utility estimate subject to the same conditions. Our method now is
  • Method B: Replace each evaluator's rank ordering with roughly estimated Z-scores by using the following algorithm: a rank of k (between 1 and n) is transformed to f((n+1/2−k)/n), where f is the inverse of the cumulative normal distribution function. Each option's utility is then estimated as the arithmetical average of the roughly estimated Z-scores across the evaluators, and the option with the highest estimate utility is chosen.

Now time for some experiments. Add to the Basic Setup the assumptions that (d) the actual utilities in the option pool are normally distributed with mean zero and variances one, and (e) the variances of all the evaluators' error terms are equal to 1/4 (i.e., standard deviation 1/2). All the experiments use 2000 runs. Because I developed this when thinking about grad admissions, the cases that interest me most are ones with a small number of evaluators and a large number of options, which is the opposite of how political cases work (though unlike in admissions, I am simplifying by looking for just the best option). Moreover, it doesn't really matter whether we choose the optimal option. What matters is how close the actual utility of the chosen option is to the actual utility of the optimal option. The difference in these utilities will be called the "error". If the error is small enough, there is no practically significant difference. Given the normal distribution of option utilities, about 95% of actual utilities are between -2 and 2, so if we have about 20 option, we can expect the best option to have a utility of somewhere of the order of magnitude of 2. Choosing at random would then give us an average error of the order of magnitude of 2. The tables below give the average errors for the 2000 runs of the experiments. Moreover, so as to avoid between different choices of resolution methods, I am discarding data from runs during which there was no Condorcet winners, and hence comparing Method A and Method B to Condorcet at its best (interestingly, Method A and Method B also work less well when there was no Condorcet winner). Discarded runs were approximately 2% of runs. Source code is available on request.

Experiment 1: 3 evaluators, 50 options.

Condorcet0.030
Method A0.023
Method B0.029
So, with a small number of evaluators and a large number of options, Method A significantly beats Condorcet. Method B slightly beats Condorcet.

Experiment 2: 50 evaluators, 50 options.

Condorcet0.0017
Method A0.0011
Method B0.0015
So we have a similar distribution of values, but of course with a larger number of evaluators, the error is smaller. It is interesting, however, that even with only three evaluators, the error was already pretty small, about 0.03 sigma for all the methods.

Experiment 3: 3 evaluators, 3 options.

Condorcet0.010
Method A0.007
Method B0.029
Method B is much worse than Condorcet and Method A in this case. That's because with three options, the naive Z-score estimation method in Method B fails miserably. With 3 options Method B is equivalent to a very simple method we might call Method C where we simply average the rank order numbers of the options across the evaluators. At least with 3 options, that is a bad way to go. Condorcet is much better, and Method A is even better if it is workable.

Experiment 4: 50 evaluators, 3 options.

Condorcet0.0003
Method A0.0002
Method B0.0159
The badness of Method B for a small number of options really comes across here. Condorcet and Method A really benefit from boosting the number of evaluators, but with only 3 options, Method B works miserably.

So, one of the interesting consequences is that Method B is strongly outperformed by Condorcet when the number of options is small. How small? A bunch of experiments suggests that it's kind of complicated. For three evaluators, Method B catches up with Condorcet at around 12 options. Somewhat surprisingly, for a greater number of evaluators, it needs more options for Method B to catch up with Condorcet. I conjecture that Method B works better than Condorcet when the number of options is significantly greater than the number of evaluators. In particular, in political cases where the opposite inequality holds, Condorcet far outperforms Method B.

One could improve on Method B, whose Achilles heel is the Z-score estimation, by having the evaluators include in their rankings options that are not presently available. One way to do that would be to increase the size of the option pool by including fake options. (In the case of graduate admissions, one could include a body of fake applications generated by a service.) Another way would be by including options from past evaluations (e.g., applicants from previous years). Then these would enter into the Z-score estimation, thereby improving Method B significantly. Of course, the down side of that is that it would be a lot more work for the evaluators, thereby making this unworkable.

Method A is subject to extreme evaluator manipulation, i.e., "strategic voting". Any evaluator can produce any result she desires by just reporting her utilities to swamp the utilities set by others. (The Basic Setup's description of the errors rules this out.) Method B is subject to more moderate evaluator manipulation. Condorcet, I am told, does fairly well. If anything like Method A is used, what is absolutely required is a community of justified mutual trust and reasonableness. Such mutual trust does, however, make possible noticeably better joint choices, which is an interesting result of the above.

So, yes, in situations of great trust where all evaluators can accurately report their utility estimates, we can beat Condorcet by adopting Method A. But that's a rare circumstance. In situations of moderate trust and where the number of candidates exceeds the number of evaluators, Method B might be satisfactory, but its benefits over Condorcet are small.

One interesting method that I haven't explored numerically would be this:

  • Method D: Have each evaluator assign a numerical evaluations to the options on a fixed scale (say, integers from 1 to 50). Adjust the numerical evaluations to Z-scores, using data from the evaluator's present and past evaluations using some good statistical method. Average these estimated Z-scores across evaluators and choose the option with the highest average.
Under appropriate conditions, this method should converge to Method A over time in the Modified Setup. There would be possibilities for manipulation, but they would require planning ahead, beyond the particular evaluation (e.g., one could keep all one's evaluations in a small subset of the scale, and then when one really wants to make a difference, one jumps outside of that small subset).

Monday, January 23, 2012

An optimal voting method (under some generally implausible assumptions)

Let me qualify what I'm going to say by saying that I know next to nothing about the voting literature.

It's time for admissions committees to deliberate. But Arrow's Theorem says that there is no really good voting method with more than two options.

In some cases, however, there is a simple voting method that, with appropriate assumptions, is provably optimal. The method is simply to have each voter estimate a voter-independent utility of every option, and then to average these estimates, and choose the option with the highest average. By a "voter-independent utility", I mean a utility that does not vary from voter to voter. This could be a global utility of the option or it could be a utility-for-the-community of the voter or even a degree to which a certain set of shared goals are furthered. In other words, it doesn't have to be a full agent-neutral utility, but it needs to be the case that the voters are all estimating the same value—so it can depend on the group of voters as a whole.

Now if we are instead to choose n non-interacting options (i.e., the utilities of the options are additive), then we just choose the n with the highest averages. Under some assumptions, these simple methods are optimal. The assumptions are onerous, however.

Voting theory, as far as I can tell, is usually conducted in terms of preferences between options. In political elections, many people's preferences are probably agent-centered: people are apt to vote for candidates they think will do more for them and for those they take to be close to them. In situations like that, the simple method won't work, because people aren't estimating voter-indepenent utilities but agent-centered utilities.

But there are cases where people really are doing something more like estimating voter-independent utilities. For instance, take graduate admissions or hiring. The voters there really are trying to optimize something like "the objective value of choosing this candidate or these candidates", though of course their deliberations suffer from all sorts of errors.

In such cases, instead of thinking of the problem as a preference reconciliation problem, we can think of it as an estimation problem. We have a set of unknown quantities, the values of the options. If we knew what these quantities are, we'd know what decision to take: we'd go for the option(s) with the highest values. Instead, we have a number of evaluators who are each trying to estimate this unknown. Assume that each evaluator's estimate of the unknown quantity simply adds an independent random error to the quantity, and that the error is normally distributed with mean zero. Assume, further, that either the variances of the normal errors are the same between evaluators or that our information about these variances is symmetric between the evaluators (thus, we may know that evaluators are not equally accurate, but we don't know which ones are the ones who are more accurate). Suppose that I have no further relevant information about the differences in the values of the options besides the evaluators' estimates, and so I have the same prior probability distribution for the value of each option (maybe it's a pessimistic one that says that the option is probably bad).

Given all of the above information, I now want to choose the option that maximizes, with respect to my epistemic probabilities, the expected value of the option. It turns out by Bayes' Theorem together with some properties of normal random variables that the expected value of an option o, given the above information, can be written Aa0+Ba(o), where a0 is the mean-value of my baseline estimate for all the options and a(o) is the average of the evaluators' evaluations of o, and where both A and B are positive. It follows that under the above assumptions, if I am trying to maximize expected value, choosing the option(s) with the highest value of a(o) is provably optimal.

Now there are some serious problems here, besides the looming problem that the whole business of numerical utilities may be bankrupt (which I think in some cases isn't so big an issue, because numerical utilities can be a useful approximation in some cases). One of them is that one evaluator can skew the evaluations by assigning such enormous utilities to the candidates that her evaluations swamp everyone else's data. The possibility of such an evaluator violates my assumption that each person's evaluation is equal to the unknown plus an error term centered on zero. Such an evaluator is either really stupid, or dishonest (i.e., not reporting her actual estimates of utilities). This problem by itself is enough to ensure that the method can't be used except in a community of justified mutual trust.

A second serious problem is that we're not very good at making absolute utility judgments, and are probably better at rank ordering. The optimality condition requires that we work with utilities rather than rank orderings. But in a case where the number of options is largish—admissions and hiring cases are like that—if we assume that value is normally distributed in the option pool, we can get an approximation to the utilities from an evaluator's rank ordering of the n options. One way to do this is to use the rank ordering to assign estimated percentile ranks to each option, and then convert them to one's best estimate of the normally distributed value (maybe this can just be done by applying the inverse normal cumulative distribution function—I am not a statistician). Then average these between evaluators. Doing this also compensates for any affine shift, such as that due to the exaggerating evaluator in the preceding paragraph. I can't prove the optimality of this method, and it is still subject to manipulation by a dishonest evaluator (say, one who engages in strategic voting rather than reporting her real views).

I think the above can also work under some restrictive assumptions even if the evaluators are evaluating value-for-them rather than voter-independent value.

The basic thought in the above is that in some cases instead of approaching a voting situation as a preference situation, we approach it as a scientific estimation situation.