Alexander Pruss's Blog: inconsistency

Showing posts with label inconsistency. Show all posts

Monday, September 20, 2021

A defense of probabilistic inconsistency

Evidence E is misleading with regard to a hypothesis H provided that Bayesian update on E changes one’s credence in H in the direction opposed to truth. It is known that pretty much any evidence is misleading with regard to some hypothesis or other. That’s no tragedy. But sometimes evidence is misleading with regard to an important hypothesis. That’s no tragedy of the shift in the credence of that important hypothesis is small. But it could be tragic if the shift is significant—think of a quack cure for cancer beating out the best medication in a study due to experimental error or simply chance.

In other words, misleadingness by itself is not a big deal. But significant misleadingness with respect to an important hypothesis can be tragic.

Suppose I am lucky enough to start with consistent credences in a limited algebra F of propositions including q, and suppose I have a low credence in a consistent proposition q. Now two friends, whom I know for sure to speak only truth, speak to me:

Alice: “Proposition q is actually true.”
Bob: “She’s right, as always, but the fact that q is true is significantly misleading with respect to a number of quite important hypotheses in F.”

What should I do? If I were a perfect Bayesian agent, my likelihoods would be sufficiently well defined that I would just update on Alice saying her piece and Bob saying her piece, and be done with it. My likelihoods would embody prior probability assignments to hypotheses about the kinds of reasons that Alice and Bob could have for giving me their information, the kinds of important hypotheses in F that q could be misleading about, etc.

But this is too complicated for a more ordinary Bayesian agent like me. Suppose I could, just barely, do a Bayesian update on q, and gain a new consistent credence assignment on F. Even if Bob were not to have said anything, updating on q would not be ideal, because the ideal agent would update not just on q, but on the facts that Alice chose to inform me of q at that very moment, in those very words, in that very tone of voice, etc. But that’s too complicated for me. For one, I don’t have enough clear credences in hypotheses about different informational choices Alice could have made. So if all I heard was Alice’s announcement, updating on q would be a reasonable choice given my limitations.

But with Bob speaking, the consequences of my simply updating on q could be tragic, because Bob has told me that q is significantly misleading in regard to important stuff. What should I do? One possibility is to ignore both statements, and leave my credences unchanging, pretending I didn’t hear Alice. But that’s silly: I did hear her.

But if I accept q on the basis of Alice’s statement (and Bob’s confirmation), what should I do about Bob’s warning? Here is one option: I could raise my credence in q to 1, but leave everything else unchanged. This is a better move than just ignoring what I heard. For it gets me closer to the truth with regard to q (remember that Alice only says the truth), and I don’t get any further from the truth regarding anything else. The result will be an inconsistent probability assignment. But I can actually do a little better. Assuming q is true, it cannot be misleading about propositions entailed by q. For if q is true, then all propositions entailed by q are true, and raising my credences in them to 1 only improves my credences. Thus, I can safely raise my credence in everything entailed by q to 1. Similarly, I can safely lower my credence in anything that entails ∼q to 0.

Here, then, is a compromise: I set my credence in everything in F entailed by q to 1, and in everything in F that entails ∼q to 0, and leave all other credences for things in F unchanged. This has gotten me closer to the truth by any reasonable measure. Moreover, the resulting credences for F satisfy the Zero, Non-negativity, Normalization, Monotonicity, and Binary Non-Disappearance axioms, and as a result I can use a Level-Set Integral prevision to avoid various Dutch Book and domination problems. [Let’s check Monotonicity. Suppose r entails s. We need to show that C(r)≤C(s). Given that my original credences were consistent and hence had Monotonicity, the only way I could lack Monotonicity now would be if q entailed r and s entailed ∼q. Since r entails s, this would mean that q would entail ∼q, which would imply that q is not consistent. But I assumed it was consistent.]

I think this line of reasoning shows that there are indeed times when it can be reasonable to have an inconsistent credence assignment.

By the way, if I continue to trust the propositions I had previously assigned extreme credences to despite Bob’s ominous words, an even better update strategy would be to set my credence to 1 for everything entailed by q conjoined with something that already had credence 1, and to 0 for everything that when conjoined with something that had credence 1 entails ∼q.

Wednesday, January 27, 2021

Nonadditive strictly proper scoring rules and arguments for probabilism

[This post uses the wrong concept of a strictly proper score. See the comments.]

A scoring rule for a credence assignment is a measure of the inaccuracy of the credences: the lower the value, the better.

A proper scoring rule is a scoring rule with the property that for each probabilistically consistent credence assignment P, the expected value according to P of the score for P is maximized at P. If it’s maximized uniquely at P, the scoring rule is said to be strictly proper.

A scoring rule is additive provided that it is the sum of scoring rules each of which depends only on the credence assigned to a single proposition and the truth value of that proposition.

The formal epistemology literature has a lot of discussion of a strict domination theorem that given an additive strictly proper scoring rule, you will do better to have a credence assignment that is probabilistically consistent: indeed, another credence assignment will give a better score in every possible world.

The assumption of strict propriety gets a fair amount of discussion. Not so the assumption of additivity.

It turns out that if you drop additivity, the theorem fails. Indeed: this is trivial. Consider any strictly proper scoring rule s, and modify it to a rule s^* that assigns the score −∞ to any inconsistent credence. Then any inconsistent credence receives the best possible score in every possible world. Moreover, s^* is still strictly proper if s is because the definition of strict propriety only involves the behavior of the scoring rule as applied to consistent credences, and hence s^* is strictly proper if and only if s is. And, of course, s^* is not additive.

But of course my rule s^* is very much ad hoc and it is gerrymandered to reward inconsistency. Can we make a non-additive scoring for which the domination theorem fails that lacks such gerrymandering and is somewhat natural?

I think so. Consider a finite probability space Ω, with n points ω₁, ..., ω_n in it. Now, consider a scoring rule generated as follows.

Say that a simple gamble g on Ω is an assignment of values to the n points. Let G be a set of simple gambles. Imagine an agent who decides which simple gamble g in G to take by the following natural method: she calculates ∑_iP({ω_i})g(ω_i), where P is her credence assignment, and chooses the gamble g that maximizes this sum. If there is a tie, she has some tie-resolution mechanism. Then, we can say that the G-score of her credences is the negative of the utility gained from the gamble she chose. In other words, her G-score at location ω_i is −g(ω_i) where g is a maximally auspicious gamble according to her credences.

It is easy to see that G-score is a proper score. Moreover, if there are never any ties in choosing the maximally auspicious gamble, the score is strictly proper.

This is a very natural way to generate a score: we generate a score by looking how well you would do when acting on the credences in the face of a practical decision. But any scores generated in this way will fail to satisfy the domination theorem. Here’s why: the scoring rule scores any inconsistent non-negative credence P that is non-zero on some singleton the same way as it scores the consistent credence P^* defined by P^*(A)=∑_ω ∈ AP({ω})/∑_{ω ∈ Ω}P({ω}). Thus, the domination theorem will fail to apply to any scoring rule generated in the above way, since the domination thing does not happen for consistent credences.

The only thing that remains is to check that there is some natural strictly proper rule that can be generated using the above method. Here’s one. Let G_n be the set of simple gambles that assign to the n points of Ω values that lie in the n-dimensional unit ball. In other words, each simple gamble g ∈ G_n is such that ∑_i(g(a_i))² ≤ 1.

A bit of easy constrained maximization using Lagrange multipliers shows that if P is a credence assignment on Ω such that P({ω_i}) ≠ 0 for at least one point ω_i ∈ Ω, then there is a unique maximally auspicious gamble g and it is given by g(ω_j)=P({ω_j})/(∑_i(P({ω_i}))²)^1/2. Because of the uniqueness, we have a strictly proper scoring rule.

The G_n-score of a credence assignment P is then s(P, ω_j)= − P({ω_j})/(∑_i(P({ω_i}))²)^1/2.

This looks fairly natural. The choice of G_n seems fairly natural as well. There is no gerrymandering going on. And yet the domination theorem fails for the G_n-score. (I think any strictly convex set of simple gambles works for G_n, actually.)

Thus, absent some good argument for why G_n-score is a bad way to score credences, it seems that the scoring rule domination argument isn’t persuasive.

More generally, consider any credence-based procedure for deciding between finite sets of gambles that has the following two properties:

The procedure yields a gamble that maximizes expected utility in the case of consistent credences, and
The procedure never recommends a gamble that is dominated by another gamble.

There are such procedures that apply to interesting classes of inconsistent credences and that are nonetheless pretty natural. Given any such procedure, we can extend it arbitrarily to apply to all inconsistent credences, we assign a score to a credence assignment as the negative of the value of the selected gamble, and we have a proper score to which the domination theorem doesn’t apply. And if make our set of gambles be the n-ball G_n, then the score is strictly proper.

Monday, July 6, 2020

Avoiding Dutch Books Despite Inconsistent Credences

Preprint is available here. Just accepted by Synthese.

Thursday, February 13, 2020

Domination and probabilistic consistency

Suppose that ≼ is a total preorder on simple utility functions on some space Ω with an algebra of subsets. Define f ≺ g iff f ≼ g but not g ≼ f. Think of ≼ as a decision procedure: you are required (permitted) to choose g over f iff f ≺ g (g ≼ f).

Suppose ≼ doesn’t allow choosing a dominated wager:

If f < g everywhere, then f ≺ g.

Let 1_A be the function that is 1 on A and 0 outside A. Define Ef = sup{c : c ⋅ 1_Ω ≼ f}. Here are some facts about E:

If c < Ef < c′, then c ⋅ 1_Ω ≺ f ≺ c′⋅1_Ω.
E(c ⋅ 1_Ω)=c
If f ≤ g everywhere, then Ef ≤ Eg.
If Ef < Eg, then f ≺ g.

(But we can’t count on its being the case that f ≺ g if and only if Ef < Eg.)

Now consider what I’ve called the independent and cumulative decision procedures for sequences of choices. On an independent decision procedure, at each stage you must choose a wager that is ≼-maximal (and you may choose any of the maximal ones). On a cumulative decision procedure, at each stage you must choose a wager that when added to what you’ve already chosen yields a ≼-maximal wager (and you may choose any of the maximal ones).

I think (I haven’t written it all down) I can prove that the following conditions are equivalent:

E is an expected value with respect to a finitely-additive probability P on Ω.
The independent decision procedure applied to a sequence of binary choices never permits you to choose a sequence of wagers whose sum is strictly dominated by the sum of a different sequence of wagers you could have chosen.
The cumulative decision procedure applied to a sequence of binary choices never permits you to choose a sequence of wagers whose sum is strictly dominated by the sum of a different sequence of wagers you could have chosen.

The probability P is defined by P(A)=E(1_A).

All that said, I think that when your credences are inconsistent, you may need to decide neither independently nor cumulatively, but holistically, taking into account what wagers you made and what wagers you expect you will make.

Tuesday, November 12, 2019

More complications for Dutch Book results

Think of a wager as a sequence of event-payoff pairs:

W = ((e₁, u₁),...,(e_n, u_n)).

There are then two different ways to calculate the expected value of the wager. First, directly:

E^D(W)=u₁P(e₁)+...+u_nP(e_n).

Second, indirectly by letting U_W be the utility function defined by W, i.e., U_W = u₁ ⋅ 1_e₁ + ... + u_n ⋅ 1_{e_n} (where 1_e is the function that is 1 if e happens and 0 otherwise) and then calculating the expected utility of the function U_W:

E^I(W)=E(U_W).

If the credence function P is additive, then the two ways are equivalent. But without additivity, they come apart. Moreover, there is more than one way of calculating E(U) if the credences are inconsistent, but for now I will assume the standard Lebesgue sum way where, assuming U has only finitely many values, E(U)=∑_yyP(U = y).

The most common de Finetti Dutch Book Theorem, which says that inconsistent probabilities give rise to a Dutch Book, makes use of the direct way of calculating the values of wagers. Specifically, it considers wagers where you pay an amount x for a chance to win amount y if event E eventuates, and it calculates the value of such a wager as yP(E)−x. However, if instead one uses the indirect method of calculation, the value of such a wager becomes (y − x)P(E)−xP(E^c), where E^c is the complement of E.

This actually makes a real difference to Dutch Book theorems. Consider this inconsistent credence for a coin toss:

P(H)=1/4
P(T)=1/4
P(H&T)=0
P(H ∨ T)=1.

Then for any credence function U, it turns out that E^I(U)>0 if and only if the expected value of U is positive given the standard consistent fair-toss measure. The reason is this. Either U has the same value at heads and tails or it does not. If it has the same value at heads and tails, then E^I(U) has the same value as the expectation using the fair measure, since P agrees with the fair measure regarding H ∨ T. On the other hand, if U has different values at heads and tails, then E^I(U)=(1/4)U(H)+(1/4)U(T) which is exactly half of the fair measure’s expectation for U, and hence, again, E^I(U)>0 if and only if the fair measure says the expectation is positive. It seems to follow that E^I recommends exactly the same wagers as the standard consistent fair-toss measure.

Except that this isn’t quite true, either. For in addition to two ways of calculating expected values, there are two ways of making decisions on their basis in the case where a sequence of wagers is offered:

Accept a wager whose individual expected utility is positive.
Accept a wager when the expected utility of the already-accepted wagers combined with the currently offered wager exceeds the expected value of the combination of the already-accepted wagers.

Here, the combination of two wagers is concatenation. For instance ((e₁, u₁),(e₂, u₂)) combiness with ((e₃, u₃)) to form the wager ((e₁, u₁),(e₂, u₂),(e₃, u₃)). Given consistent credences, we have, E(W₁ + W₂)=E(W₁)+E(W₂), and (3) and (4) are equivalent. But, again, for inconsistent credences this additivity property can fail, and so a choice needs to be made between (3) and (4).

Note that (4) is itself an oversimplification. For theoretically, what wagers one accepts earlier on may depend on one’s best estimate as to what wagers will be offered later.

All in all, I know of five utility maximization decision procedures for sequences of wagers, generated by the answers to these questions:

Direct or indirect utility calculation for a wager? (D or I)
If indirect, Lebesgue sum or level set integral for calculating expectations? (LSum or LSet)
If indirect, is the presently offered wager combined with previously accepted wagers in calculating expectations? (Indiv or Combo)

For consistent probabilities, these are all equivalent.

Moreover, there are two kinds of Dutch Books. There are Simple Dutch Books, where from the original position the agent accepts a Dutch Book, and Incremental Dutch Books, where after accepting some wagers, the agent goes on to accept a Dutch Book.

What happens with Dutch Books varies between the different procedures, and I am still working out the details. Say that a credence P is monotonic provided that P(∅)=0, P(Ω)=1 and P(A)≤P(B) whenever A ⊆ B. Here is what I have:

D: Simple Dutch Books whenever probabilities are inconsistent.
I+LSum+Indiv: I conjecture Incremental Dutch Books for some but not all inconsistent monotonic credences.
I+LSum+Combo: I conjecture Incremental Dutch Books for all non-additive credences.
I+LSet+Indiv: I don’t know.
I+LSet+Combo: No Dutch Books of either sort for any monotonic credences.

Thursday, November 7, 2019

Expected utility and inconsistent credences

Suppose that we have a utility function U and an inconsistent credence function P, and for simplicity let’s suppose that our utility function takes on only finitely many values. The standard way of calculating the expected utility of U with respect to P is to look at all the values U can take, multiply each by the credence that it takes that value, and add:

E(U)=∑_yyP(U = y).

Call this the Block Way or Lebesgue Sums.

Famously, doing this leads to Dutch Books if the credence function fails additivity. But there is another way to calculate the expected utility:

E(U)=∫₀^∞P(U > y)dy − ∫_−∞⁰P(U < y)dy.

Call this the Level Set Way, because sets of points in a space where some function like U is bigger or smaller than some value are known as level sets.

Here is a picture of the two ways:

Blocks vs. Level Sets

On the Block Way, we broke up the sample space into chunks where the utility function is constant and calculated the contribution of each chunk using the inconsistent credence function, and then added. On the Level Set Way, we broke it up into narrow strips, and calculated the contribution of each strip, and then added.

It turns out that if the credence function P is at least monotone, so that P(A)≤P(B) if A ⊆ B, a condition strictly weaker than additivity, then an agent who maximizes utilities calculated the Level Set Way will not be Dutch Booked.

Here is another fact about the Level Set Way. Suppose two credence functions U₁ and U₂ are certain to be close to each other: |U₁ − U₂|≤ϵ everywhere. Then on the Block Way, their expected utilities may be quite far apart, even assuming monotonicity. On the other hand, on the Level Set Way, their expected utilities are guaranteed to be within ϵ, too. The difference between the two Ways can be quite radical. Suppose a coin is tossed, and the monotone inconsistent credences are:

heads: 0.01
tails: 0.01
heads-or-tails: 1
neither: 0

Suppose that U₁ says that you are paid a constant $100 no matter what happens. Both the Block Way and the Level Set Way agree that the expected utility is $100.
But now suppose that U₂ says you get paid $99 on heads and $101 on tails. Then the Block Way yields:

E(U₂)=0.01 ⋅ 99 + 0.01 ⋅ 101 = 1

while the Level Set Way yields:

E(U₂)=1 ⋅ 99 + 0.01 ⋅ 2 = 99.02

Thus, the Block Way makes the expected value of U₂ ridiculously small, and far from that of U₁, while the Level Set Way is still wrong—after all, the credences are stupid—but is much closer.

So, it makes sense to think of the Level Set Way as harm reduction for those agents whose credences are inconsistent but still monotone.

That said, many irrational agents will fail monotonicity.

Alexander Pruss's Blog