Showing posts with label Bayesianism. Show all posts
Showing posts with label Bayesianism. Show all posts

Monday, October 20, 2025

Another infinite dice game

Suppose infinitely many people independently roll a fair die. Before they get to see the result, they will need to guess whether the die shows a six or a non-six. If they guess right, they get a cookie; if they guess wrong, an electric shock.

But here’s another part of the story. An angel has considered all possible sequences of fair die outcomes for the infinitely many people, and defined the equivalence relation ∼ on the sequences, where α ∼ β if and only if the sequences α and β differ in at most finitely many places. Furthermore, the angel has chosen a set T that contains exactly one sequence from each ∼-equivalence class. Before anybody guesses, the angel is going to look at everyone’s dice and announce the unique member α of T that is -equivalent to the actual die rolls.

Consider two strategies:

  1. Ignore what the angel says and say “not six” regardless.

  2. Guess in accordance with the unique member α: if α says you have six, you guess “six”, and otherwise you guess “not six”.

When the two strategies disagree for a person, there is a good argument that the person should go with strategy (1). For without the information from the angel, the person should go with strategy (1). But the information received from the angel is irrelevant to each individual x, because which -equivalence class the actual sequence of rolls falls into depends only on rolls other than x’s. And following strategy (1) in repeats of the game results in one getting a cookie five out of six times on average.

However, if everyone follows strategy (2), then it is guaranteed that in each game only finitely many people get a shock and everyone else gets a cookie.

This seems to be an interesting case where self-interest gets everyone to go for strategy (1), but everyone going for strategy (2) is better for the common good. There are, of course, many such games, such as Tragedy of the Commons or the Prisoner’s Dilemma, but what is weird about the present game is that there is no interaction between the players—each one’s payoff is independent of what any of the other players do.

(This is a variant of a game in my infinity book, but the difference is that the game in my infinity book only worked assuming a certain rare event happened, while this game works more generally.)

My official line on games like this is that their paradoxicality is evidence for causal finitism, which thesis rules them out.

Monday, September 8, 2025

Observations and risk of confirmation/disconfirmation

It seems that a rational agent cannot guarantee their credence in a hypothesis H to go up by choosing what observation to perform. For if no matter what I observe, my credence in H goes up given my observation, then my credence should already have gone up prior to the observation—I should boost my credence from the armchair.

But this reasoning is false in general. For in performing the observation, I not only learn which of the possible observable results is in place, but I also learn that I have performed the observation. In cases where the truth of H has a correlation with whether I actually perform the observation, this can have a predictable direction of effect on my credence in H.

Suppose that the hypothesis H is the conjunction that I am going to look in the closet and there is life on Mars. By looking to check if there is a mouse in my closet, I ensure that the first conjunct of H is true, and hence I increase my credence in H—no matter what I find out about mice.

This is a very trivial fact. But it does mean that we need to qualify the statement that any observation that can confirm a hypothesis can also disconfirm it. We need to specify that the confirmation and disconfirmation happen after one has already updated on the fact that one has performed the observation.

Friday, February 21, 2025

Adding or averaging epistemic utilities?

Suppose for simplicity that everyone is a good Bayesian and has the same priors for a hypothesis H, and also the same epistemic interests with respect to H. I now observe some evidence E relevant to H. My credence now diverges from everyone else’s, because I have new evidence. Suppose I could share this evidence with everyone. It seems obvious that if epistemic considerations are the only ones, I should share the evidence. (If the priors are not equal, then considerations in my previous post might lead me to withhold information, if I am willing to embrace epistemic paternalism.)

Besides the obvious value of revealing the truth, here are two ways to reason for this highly intuitive conclusion.

First, good Bayesians will always expect to benefit from more evidence. If my place and that of some other agent, say Alice, were switched, I’d want the information regarding E to be released. So by the Golden Rule, I should release the information.

Second, good Bayesians’ epistemic utilities are measured by a strictly proper scoring rule. But if Alice’s epistemic utilities for H are measured by a strictly proper (accuracy) scoring rule s that assigns an epistemic utility s(p,t) to a credence p when the actual truth value of H is t, which can be zero or one. By definition of strict propriety, the expectation by my lights of what Alice’s epistemic utility for a given credence should be is strictly maximized when that credence equals my credence. Since Alice shares the priors I had before I observed E, if I can make E evident to her, her new posteriors will match my current ones, and so revealing E to her will maximize my expectation of her epistemic utility.

So far so good. But now suppose that the hypothesis H = HN is that there exist N people other than me, and my priors assign probability 1/2 to there being N and 1/2 to its being n, where N is much larger than n. Suppose further that my evidence E ends up significantly supporting hypothesis Hn, so that my posterior p in HN is smaller than 1/2.

Now, my expectation of the total epistemic utility of other people if I reveal E is:

  • UR = pNs(p,1) + (1−p)ns(p,0).

And if I conceal E, my expectation is:

  • UC = pNs(1/2,1) + (1−p)ns(1/2,0).

If we had N = n, then it would be guaranteed by strict propriety that UR > UC, and so I should reveal. But we have N > n. Moreover, s(1/2,1) > s(p,1): if some hypothesis is true, a strictly proper accuracy scoring rule increases strictly monotonically with the credence. If N/n is sufficiently large, the first terms of UR and UC will dominate, and hence we will have UC > UR, and thus I should conceal.

The intuition behind this technical argument is this. If I reveal the evidence, I decrease people’s credence in HN. If it turns out that the number of people other than me actually is N, I have done a lot of harm, because I have decreased the credence of a very large number N of people. Since N is much larger than n, this consideration trumps considerations of what happens if the number of people is n.

I take it that this is the wrong conclusion. On epistemic grounds, if everyone’s priors are equal, we should release evidence. (See my previous post for what happens if priors are not equal.)

So what should we do? Well, one option is to opt for averaging rather than summing of epistemic utilities. But the problem reappears. For suppose that I can only communicate with members of my own local community, and we as a community have equal credence 1/2 for the hypothesis Hn that our local community of n people contains all agents, and credence 1/2 for the hypothesis Hn + N that there is also a number N of agents outside our community much greater than n. Suppose, further, that my priors are such that I am certain that all the agents outside our community know the truth about these hypotheses. I receive a piece of evidence E disfavoring Hn and leading to credence p < 1/2. Since my revelation of E only affects the members of my own commmunity, depending on which hypothesis is true, if p is my credence after updating on E, the relevant part of the expected contribution to the utility of revealing E with regard to hypothesis Hn is:

  • UR = p((n−1)/n)s(p,1) + (1−p)((n−1)/(n+N))s(p,0).

And if I conceal E, my expectation contribution is:

  • UC = p((n−1)/n)s(1/2,1) + (1−p)((n−1)/(n+N))s(p,0).

If N is sufficiently large, again UC will beat UR.

I take it that there is something wrong with epistemic utilitarianism.

Tuesday, January 28, 2025

Comparing binary experiments for non-binary questions

In my last two posts (here and here), I introduced the notion of an experiment being epistemically at least as good as another for a set of questions. I then announced a characterization of when this happens in the special case where the set of questions consists of a single binary (yes/no) question and the experiments are themselves binary.

The characterization was as follows. A binary experiment will result in one of two posterior probabilities for the hypothesis that our yes/no question concerns, and we can form the “posterior interval” between them. It turns out that one experiment is at least as good as another provided that the first one’s posterior interval contains the second one’s.

I then noted that I didn’t know what to say for non-binary questions (e.g., “How many mountains are there on Mars?”) but still binary experiments. Well, with a bit of thought, I think I now have it, and it’s almost exactly the same. A binary experiment now defines a “posterior line segment” in the space of probabilities, joining the two possible credence outcomes. (In the case of a probability space with a finite number n of points, the space of probabilities can be identified as the set of points in n-dimensional Euclidean space all of whose coordinates are non-negative and add up to 1.) A bit of thought about convex functions makes it pretty obvious that E2 is at least as good as E1 if and only if E2’s posterior line segment contains E1’s posterior line segment. (The necessity of this geometric condition is easy to see: consider a convex function that is zero everywhere on E2’s posterior line segment but non-zero on one of E1’s two possible posteriors, and use that convex function to generate the scoring rule.)

This is a pretty hard to satisfy condition. The two experiments have to be pretty carefully gerrymandered to make their posterior line segments be parallel, much less to make one a subset of the other. I conclude that when one’s interest is in more than just one binary question, one binary experiment will not be overall better than another except in very special cases.

Recall that my notion of “better” quantified over all proper scoring rules. I guess the upshot of this is that interesting comparisons of scoring rules are not only relative to a set of questions but to a specific proper scoring rule.

Monday, January 20, 2025

Open-mindedness and epistemic thresholds

Fix a proposition p, and let T(r) and F(r) be the utilities of assigning credence r to p when p is true and false, respectively. The utilities here might be epistemic or of some other sort, like prudential, overall human, etc. We can call the pair T and F the score for p.

Say that the score T and F is open-minded provided that expected utility calculations based on T and F can never require you to ignore evidence, assuming that evidence is updated on in a Bayesian way. Assuming the technical condition that there is another logically independent event (else it doesn’t make sense to talk about updating on evidence), this turns out to be equivalent to saying that the function G(r) = rT(r) + (1−r)F(r) is convex. The function G(r) represents your expected value for your utility when your credence is r.

If G is a convex function, then it is continuous on the open interval (0,1). This implies that if one of the functions T or F has a discontinuity somewhere in (0,1), then the other function has a discontinuity at the same location. In particular, the points I made in yesterday’s post about the value of knowledge and anti-knowledge carry through for open-minded and not just proper scoring rules, assuming our technical condition.

Moreover, we can quantify this discontinuity. Given open-mindedness and our technical condiiton, if T has a jump of size δ at credence r (e.g., in the sense that the one-sided limits exist and differ by y), then F has a jump of size rδ/(1−r) at the same point. In particular, if r > 1/2, then if T has a jump of a given size at r, F has a larger jump at r.

I think this gives one some reason to deny that there are epistemically important thresholds strictly between 1/2 and 1, such as the threshold between non-belief and belief, or between non-knowledge and knowledge, even if the location of the thresholds depends on the proposition in question. For if there are such thresholds, then now imagine cases of propositions p with the property that it is very important to reach a threshold if p is true while one’s credence matters very little if p is false. In such a case, T will have a larger jump at the threshold than F, and so we will have a violation of open-mindedness.

Here are three examples of such propositions:

  • There are objective norms

  • God exists

  • I am not a Boltzmann brain.

There are two directions to move from here. The first is to conclude that because open-mindedness is so plausible, we should deny that there are epistemically important thresholds. The second is to say that in the case of such special propositions, open-mindedness is not a requirement.

I wondered initially whether a similar argument doesn’t apply in the absence of discontinuities. Could one have T and F be openminded even though T continuously increases a lot faster than F decreases? The answer is positive. For instance the pair T(r) = e10r and F(r) =  − r is open-minded (though not proper), even though T increases a lot faster than F decreases. (Of course, there are other things to be said against this pair. If that pair is your utility, and you find yourself with credence 1/2, you will increase your expected utility by switching your credence to 1 without any evidence.)

Wednesday, September 4, 2024

Independent invariant regular hyperreal probabilities: an existence result

A couple of years ago I showed how to construct hyperreal finitely additive probabilities on infinite sets that satisfy certain symmetry constraints and have the Bayesian regularity property that every possible outcome has non-zero probability. In this post, I want to show a result that allows one to construct such probabilities for an infinite sequence of independent random variables.

Suppose first we have a group G of symmetries acting on a space Ω. What I previously showed was that there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity (i.e., P(A) > 0 for every non-empty A) if and only if the action of G on Ω is “locally finite”, i.e.:

  • For any finitely generated subgroup H of G and any point x in G, the orbit Hx is finite.

Here is today’s main result (unless there is a mistake in the proof):

Theorem. For each i in an index set, suppose we have a group Gi acting on a space Ωi. Let Ω = ∏iΩi and G = ∏iGi, and consider G acting componentwise on Ω. Then the following are equivalent:

  1. there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity and the independence condition that if A1, ..., An are subsets of Ω such that Ai depends only on coordinates from Ji ⊆ I with J1, ..., Jn pairwise disjoint if and only if the action of G on Ω is locally finite

  2. there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity

  3. the action of G on Ω is locally finite.

Here, an event A depends only on coordinates from a set J just in case there is a subset A′ of j ∈ JΩj such that A = {ω ∈ Ω : ω|J ∈ A′} (I am thinking of the members of a product of sets as functions from the index set to the union of the Ωi). For brevity, I will omit “finitely additive” from now on.

The equivalence of (b) and (c) is from my old result, and the implication from (a) to (b) is trivial, so the only thing to be shown is that (c) implies (a).

Example: If each group Gi is finite and of size at most N for a fixed N, then the local finiteness condition is met. (Each such group can be embedded into the symmetric group SN, and any power of a finite group is locally finite, so a fortiori its action is locally finite.) In particular, if all of the groups Gi are the same and finite, the condition is met. An example like that is where we have an infinite sequence of coin tosses, and the symmetry on each coin toss is the reversal of the coin.

Philosophical note: The above gives us the kind of symmetry we want for each individual independent experiment. But intuitively, if the experiments are identically distributed, we will want invariance with respect to a shuffling of the experiments. We are unlikely to get that, because the shuffling is unlikely to satisfy the local finiteness condition. For instance, for a doubly infinite sequence of coin tosses, we would want invariance with respect to shifting the sequence, and that doesn’t satisfy local finiteness.

Now, on to a sketch of the proof from (c) to (a). The proof uses a sequence of three reductions using an ultraproduct construction to cases exhibiting more and more finiteness.

First, note that without loss of generality, the index set I can be taken to be finite. For if it’s infinite, for any finite partition K of I, and any J ∈ K, let GJ = ∏i ∈ JGi, let ΩJ = ∏i ∈ JΩi, with the obvious action of GJ on ΩJ. Then G is isomorphic to J ∈ KGJ and Ω to J ∈ KΩJ. Then if we have the result for finite index sets, we will get a regular hyperreal G-invariant probability on Ω that satisfies the independence condition in the special case where J1, ..., Jn are such that Ji and Jj for distinct i and j are such that at least one of Ji ∩ J and Jj ∩ J is empty for every J ∈ K. We then take an ultraproduct of these probability measures with respect to K and an ultrafilter on the partially ordered set of finite partitions of I ordered by fineness, and then we get the independence condition in full generality.

Second, without loss of generality, the groups Gi can be taken as finitely generated. For suppose we can construct a regular probability that is invariant under H = ∏iHi where Hi is a finitely generated subgroup of Gi and satisfies the independence condition. Then we take an ultraproduct with respect to an ultrafilter on the partially ordered set of sequences of finitely generated groups (Hi)i ∈ I where Hi is a subgroup of Gi and where the set is ordered by componentwise inclusion.

Third, also without loss of generality, the sets Ωi can be taken to be finite, by replacing each Ωi with an orbit of some finite collection of elements under the action of the finitely generated Gi, since such orbits will be finite by local finiteness, and once again taking an appropriate ultraproduct with respect to an ultrafilter on the partially ordered set of sequences of finite subsets of Ωi closed under Gi ordered by componentwise inclusion. The Bayesian regularity condition will hold for the ultraproduct if it holds for each factor in the ultraproduct.

We have thus reduced everything to the case where I is finite and each Ωi is finite. The existence of the hyperreal G-invariant finitely additive regular probability measure is now trivial: just let P(A) = |A|/|Ω| for every A ⊆ Ω. (In fact, the measure is countably additive and not merely finitely additive, real and not merely hyperreal, and invariant not just under the action of G but under all permutations.)

Monday, August 5, 2024

Natural reasoning vs. Bayesianism

A typical Bayesian update gets one closer to the truth in some respects and further from the truth in other respects. For instance, suppose that you toss a coin and get heads. That gets you much closer to the truth with respect to the hypothesis that you got heads. But it confirms the hypothesis that the coin is double-headed, and this likely takes you away from the truth. Moreover, it confirms the conjunctive hypothesis that you got heads and there are unicorns, which takes you away from the truth (assuming there are no unicorns; if there are unicorns, insert a “not” before “are”). Whether the Bayesian update is on the whole a plus or a minus depends on how important the various propositions are. If for some reason saving humanity hangs on you getting it right whether you got heads and there are unicorns, it may well be that the update is on the whole a harm.

(To see the point in the context of scoring rules, take a weighted Brier score which puts an astronomically higher weight on you got heads and there are unicorns than on all the other propositions taken together. As long as all the weights are positive, the scoring rule will be strictly proper.)

This means that there are logically possible update rules that do better than Bayesian update. (In my example, leaving the probability of the proposition you got heads and there are unicorns unchanged after learning that you got heads is superior, even though it results in inconsistent probabilities. By the domination theorem for strictly proper scoring rules, there is an even better method than that which results in consistent probabilities.)

Imagine that you are designing a robot that maneouvers intelligently around the world. You could make the robot a Bayesian. But you don’t have to. Depending on what the prioritizations among the propositions are, you might give the robot an update rule that’s superior to a Bayesian one. If you have no more information than you endow the robot with, you won’t be able to expect to be able to design such an update rule. (Bayesian update has optimal expected accuracy given the pre-update information.) But if you know a lot more than you tell the robot—and of course you do—you might well be able to.

Imagine now that the robot is smart enough to engage in self-reflection. It then notices an odd thing: sometimes it feels itself pulled to make inferences that do not fit with Bayesian update. It starts to hypothesize that by nature it’s a bad reasoner. Perhaps it tries to change its programming to be more Bayesian. Would it be rational to do that? Or would it be rational for it to stick to its programming, which in fact is superior to Bayesian update? This is a difficult epistemology question.

The same could be true for humans. God and/or evolution could have designed us to update on evidence differently from Bayesian update, and this could be epistemically superior (God certainly has superior knowledge; evolution can “draw on” a myriad of information not available to individual humans). In such a case, switching from our “natural update rule” to Bayesian update would be epistemically harmful—it would take us further from the truth. Moreover, it would be literally unnatural. But what does rationality call on us to do? Does it tell us to do Bayesian update or to go with our special human rational nature?

My “natural law epistemology” says that sticking with what’s natural to us is the rational thing to do. We shouldn’t redesign our nature.

Friday, May 24, 2024

Three or four ways to implement Bayesianism

We tend to imagine a Bayesian agent as starting with some credences, “the ur-priors”, and then updating the credences as the observations come in. It’s as if there was a book of credences in the mind, with credences constantly erased and re-written as the observations come in. When we ask the Bayesian agent for their credence in p, they search through the credence book for p and read off the number written beside it.

In this post, I will assume the ur-priors are “regular”: i.e., everything contingent has a credence strictly between zero and one. I will also assume that observations are always certain.

Still the above need not be the right model of how Bayesianism is actually implemented. Another way is to have a book of ur-priors in the mind, and an ever-growing mental book of observations. When you ask such a Bayesian agent what their credence in p, they on the spot look at their book of ur-priors and their book of observations, and then calculate the posterior for p.

The second way is not very efficient: you are constantly recalculating, and you need an ever-growing memory store for all the accumulated evidence. If you were making a Bayesian agent in software, the ever-changing credence book would be more efficient.

But here is an interesting way in which the second way would be better. Suppose you came to conclude that some of your ur-priors were stupid, through some kind of an epistemic conversion experience, say. Then you could simply change your ur-priors without rewriting anything else in your mind, and all your posteriors would automatically be computed correctly as needed.

In the first approach, if you had an epistemic conversion, you’d have to go back and reverse-engineer all your priors, and fix them up. Unfortunately, some priors will no longer be recoverable. From your posteriors after conditionalizing on E, you cannot recover your original priors for situations incompatible with E. And yet knowing what these priors were might be relevant to rewriting all your priors, including the ones compatible with E, in light of your conversion experience.

Here is a third way to implement Bayesianism that combines the best of the two approaches. You have a book of ur-priors and a book of current credences. You update the latter in ordinary updates. In case of an epistemic conversion experience, you rewrite your book of ur-priors, and conditionalize on the conjunction of all the propositions that you currently have credence one in, and replace the contents of your credence book with the result.

We’re not exactly Bayesian agents. Insofar as we approximate being Bayesian agents, I think we’re most like the agents of the first sort, the ones with one book which is ever rewritten. This makes epistemic conversions more difficult to conduct responsibly.

Perhaps we should try to make ourselves a bit more like Bayesian agents of the third sort by keeping track of our epistemic history—even if we cannot go all the way back to ur-priors. This could be done with a diary.

Friday, May 17, 2024

Yet another argument for thirding in Sleeping Beauty?

Suppose that a fair coin has been flipped in my absence. If it’s heads, there is an independent 50% chance that I will be irresistably brainwashed tonight after I go to bed in a way that permanently forces my credence in heads to zero. If it’s tails, there will be no brainwashing. When I wake up tomorrow, there will be a foul taste in my mouth of the brainwashing drugs if and only if I’ve been brainwashed.

So, I wake up tomorrow, find no taste of drugs in my mouth, and I wonder what I should to my credence of heads. The obvious Bayesian approach would be to conditionalize on not being brainwashed, and lower my credence in heads to 1/3.

Next let’s evaluate epistemic policies in terms of a strictly proper scoring accuracy rule (T,F) (i.e., T(p) and F(q) are the epistemic utilities of having credence p when the hypothesis is in fact true or false respectively). Let’s say that the policy is to assign credence p upon observing that I wasn’t brainwashed. My expected epistemic utility is then (1/4)T(p) + (1/4)T(0) + (1/2)F(p). Given any strictly proper scoring rule, this is optimized at p = 1/3. So we get the same advice as before.

So far so good. Now consider a variant where instead of a 50% chance of being brainwashed, I am put in a coma for the rest of my life. I think it shouldn’t matter whether I am brainwashed or put in a coma. Either way, I am no longer an active Bayesian agent with respect to the relevant proposition (namely, whether the coin was heads). So if I find myself awake, I should assign 1/3 to heads.

Next consider a variant where instead of a coma, I’m just kept asleep for all of tomorrow. Thus, on heads, I have a 50% chance of waking up tomorrow, and on tails I am certain to wake up tomorrow. It shouldn’t make a difference whether we’re dealing with a life-long coma or a day of sleep. Again, if I find myself away, I should assign 1/3 to heads.

Now suppose that for the next 1000 days, each day on heads I have a 50% chance of waking up, and on tails I am certain to wake up, and after each day my memory of that day is wiped. Each day is the same as the one day in the previous experiment, so each day I am awake I should assign 1/3 to heads.

But by the Law of Large Numbers, this is basically an extended version of Sleeping Beauty: on heads I will wake up on approximately 500 days and on tails on 1000 days. So I should assign 1/3 to heads in Sleeping Beauty.

Tuesday, May 7, 2024

Mushrooms

Some people have the intuition that there is something fishy about doing standard Bayesian update on evidence E when one couldn’t have observed the absence of E. A standard case here is where the evidence E is being alive, as in firing squad or fine-tuning cases. In such cases, the intuition goes, you should just ignore the evidence.

I had a great conversation with a student who found this line of thought compelling, and came up with this pretty convincing (and probably fairly standard) case that you shouldn’t ignore evidence E like that. You’re stranded on a desert island, and the only food is mushrooms. They come in a variety of easily distinguishable species. You know that half of the species have a 99% chance of instantly killing you, and otherwise having no effect on you other than nourishment, and the other half have a 1% chance of instantly killing you, again otherwise having no effect on you other than nourishment. You don’t know which are which.

To survive until rescue, you need to eat one mushroom a day. Consider two strategies:

  1. Eat a mushroom from a random species the first day. If you survive, conclude that this species is likely good, and keep on eating mushrooms of the same species.

  2. Eat a mushroom from a random species every day.

The second strategy makes just as much sense as the first if your survival does not count as evidence. But we all know what will happen if you follow the second strategy: you’ll be very likely dead after a few days, as your chance of surviving n mushrooms is (1/2)n. On the other hand, if you follow the first strategy, your chance of surviving n mushrooms is slightly bigger than (1/2)(0.99)n. And the first strategy is precisely what is favored by updating on your survival: you take your survival to be evidence that the mushroom you ate was one of the safer ones, so you keep on eating mushrooms from the same species. If you want to live until rescue, the first strategy is your best bet.

Suppose you’re not yet convinced. Here’s a variant. You have a phone. You call your mom on the first day, and describe your predicament. She comforts you and tells you that rescue will come in a week. And then she tells you that she was once stuck for a week on this very island, and ate the pink lacy mushrooms. Then your battery dies. You rejoice: you will eat the pink lacy mushrooms and thus survive! But then suddenly you get worried. You don’t know when your mom was stuck on the island. If she was stuck on the island before you were conceived, then had she not survived the mushrooms, you wouldn’t have been around to hear it. And in that case, you think her evidence is worthless, because you wouldn’t have any evidence had she not survived. So now it becomes oddly epistemically relevant to you whether your mom was on the island before or after you were conceived. But it seems largely epistemically irrelevant when your mom’s visit to the island was.

Tuesday, September 19, 2023

The evidential force of there being at least one gratuitous evil is low

Suppose we keep fixed in our epistemic background K general facts about human life and the breadth and depth of evil in the world, and consider the impact on theism of the additional piece of evidence that at least one of the evils is apparently gratuitous—i.e., one such that has resisted finding a theodicy despite strenuous investigation.

Now, clearly, if we found that there is not even one gratuitous evil would be extremely good evidence for the existence of God—for if there is no God, it is amazing if of the many evils there are, none were apparently gratuitous, but less amazing if there is a God. And hence, by a standard Bayesian theorem, finding that there is at least one gratuitous evil must be some evidence against the existence of God. But at the same time, the fact that F is strong evidence for T does not mean that the absence of F is strong evidence against T. Whether it is or is not depends on details.

But the background K contains some relevant facts. One of these is that we are limited knowers, and while we have had spectacular successes in our ability to understand the world and events around us, it is not incredibly uncommon to find things that have (so far) defeated our strenuous investigation. Some of these are scientific questions, and some are interpersonal questions—“Why did he do that?” Given this, it seems unsurprising, even if God exists, that we would sometimes be stymied in figuring out why God did something, including why he failed to prevent some evils. Thus, the probability of at least one of the vast numbers of evils in K being apparently gratuitous, given the existence of God, is pretty high, though slightly lower than given the non-existence of God. This means that the evidential force for atheism of there being at least one apparently gratuitous evil is fairly low.

Furthermore, one can come up with a theodicy for the gratuitous part of a gratuitous evil. When a person’s motives are not transparent to us we are thereby provided with an opportunity for exercising the virtue of trust. And reversely, a person’s always explaining themselves when they have been apparently unjustified does not build trust, on the other hand, but suspicion. Given the evils themselves as part of the background K, that some of them be apparently gratuitous provides us with an opportunity to exercise trust in God in a way that we would not be able to if none of the evils were apparently gratuitous. Given K (which presumably includes facts about us not being always in the luminous presence of God), it would be somewhat surprising if God always made sure we could figure out why he allowed evils. Again, this makes the evidential force for atheism of the apparent gratuity of evil fairly low.

Now, it may well be that when we consider the number or the type (perhaps they are of a type where divine explanations of permission would be reasonably expected) of apparently gratuitous evils, things change. Nothing I have said in this post undermines that claim. My only point is that the mere existence of an apparently gratuitous evil is very little evidence against theism.

Thursday, May 4, 2023

Reflection and null probability

Suppose a number Z is chosen uniformly randomly in (0, 1] (i.e., 0 is not allowed but 1 is) and an independent fair coin is flipped. Then the number X is defined as follows. If the coin is heads, then X = Z; otherwise, X = 2Z.

  • At t0, you have no information about what Z, X and the coin toss result are, but you know the above setup.

  • At t2, you learn the exact value of X.

Here’s the puzzling thing. At t2, when you are informed that X = x (for some specific value of x) your total evidence since t0 is:

  • Ex: Either the coin landed heads and Z = x, or the coin landed tails and Z = x/2.

Now, if x > 1, then when you learn Ex, you know for sure that the coin was tails.

On the other hand, if x ≤ 1, then Ex gives you no information about whether the coin landed heads or tails. For Z is chosen uniformly and independently of the coin toss, and so as long as both x and x/2 are within the range of possibilities for Z, learning Ex seems to tell you nothing about the coin toss. For instance, if you learn:

  • E1/4: Either the coin landed heads and Z = 1/4, or the coin landed tails and Z = 1/8,

that seems to give you no information about whether the coin landed heads or tails.

Now add one more stage:

  • At t1, you are informed whether x ≤ 1 or x > 1.

Suppose that at t1 what you learn is that x ≤ 1. That is clearly evidence for the heads hypothesis (since x > 1 would conclusively prove the tails hypothesis). In fact, standard Bayesian reasoning implies you will assign probability 2/3 to heads and 1/3 to tails at this point.

But now we have a puzzle. For at t1, you assign credence 2/3 to heads, but the above reasoning shows you that at t2, you will assign credence 1/2 to heads. For at t2 your total eidence since t0 will be summed up by Ex for some specific x ≤ 1 (Ex already includes the information given to you at t1). And we saw that if x ≤ 1, then Ex conveys no evidence about whether the coin was heads or tails, so your credence in heads at t2 must be the same as at t1.

So at t1 you assign 2/3 to heads, but you know that when you receive further more specific evidence, you will move to assign 1/2 to heads. This is counterintuitive, violates van Fraassen’s reflection principle, and lays you open to a Dutch Book.

What went wrong? I don’t really know! This has been really puzzling me. I have four solutions, but none makes me very happy.

The first is to insist that Ex has zero probability and hence we simply cannot probabilistically update on it. (At most we can take P(H|Ex) to be an almost-everywhere defined function of x, but that does not provide a meaningful result for any particular value of x.)

The second is to say that true uniformity of distribution is impossible. One can have the kind of uniformity that measure theorists talk about (basically, translation invariance), but that’s not enough to yield non-trivial comparisons of the probabilities of individual values of Z (we assumed that x and x/2 were equally likely options for Z if X ≤ 1).

The third is some sort of finitist thesis that rules out probabilistic scenarios with infinitely many possible outcomes, like the choice of Z.

The fourth is to bite the bullet, deny the reflection principle, and accept the Dutch Book.

Wednesday, May 3, 2023

Conditionalizing on classically null events

Some events have probability zero in classical probability. For instance, if you spin a continuous and fair spinner, the probability of its landing on any specific value is classically zero.

Some philosophers think we should be able to conditionalize on possible events that classically have zero probability, say by assigning non-zero infinitesimal probabilities to such events or by using Popper functions. I think there is very good reason to be suspicious of this.

Consider these very plausible claims:

  1. For y equal to 1 or 2, let Hy be a hypothesis about the production of the random variables X such that the conditional distribution of X on Hy is uniform over the interval [0, y). Suppose H1 and H2 have non-zero priors. Then the fact that the value of X is x supports H1 over H2 if x < 1.

  2. If two claims are logically equivalent and they can be conditionalized on, and one supports a hypothesis over another hypothesis, so does the other.

  3. If a random variable is independent of each of two hypotheses, then no fact about the value of the random variable supports either hypothesis over the other.

But (1) yields a counterexample to the method of conditionalization by infinitesimal probabilities. For suppose a random variable Z is uniformly randomly chosen in [0, 1) by some specific method. Suppose further that a fair coin, independent of Z, was flipped, and on heads we let X = Z and on tails we let X = 2Z. Let H1 be the heads hypothesis and let H2 be the tails hypothesis. Then X is uniformly distributed over [0, y) conditionally on Hy for y = 1, 2.

But now let E be the fact that X = 0, and suppose we can conditionalize on E. By (1), E supports H1 over H2 as 0 < 1. But E is logically equivalent to the fact that Z = 0. By (2), then Z = 0 supports H1 over H2. But Z is independent of H1 and of H2. So we have a contradiction to (3).

I think this line of thought undercuts my toy model argument in my last post.

Monday, April 24, 2023

Predictability of future credence changes

Suppose you update your credences via Bayesian update at discrete moments of time (i.e., at any future time, your credence is the result of a finite number of Bayesian updates from your present credence). Then it can be proved that you cannot be sure (i.e., assign probability one) that your credence will ever be higher than it is now, and similarly you cannot be sure that your credence will ever be lower than it is now.

The same is not true for continuous Bayesian update, as is shown by Alan Hajek’s Cable Guy story. Cable Guy will come tomorrow between 8:00 am and 4:00 pm, with 4:00 pm included but 8:00 am excluded. Your current credence that they will come in the morning is 1/2 and your current credence that they will come in the afternoon is also 1/2.

Then it is guaranteed that there will be a time after 8:00 am when Cable Guy hasn’t come yet. At that time, because you have ruled out some of the morning possibilities but none of the afternoon possibilities, your credence that the Cable Guy will come in the afternoon will have increased and your credence that the Cable Guy will come in the morning will have decreased.

Proof of fact in first paragraph: A Bayesian agent’s credences are a martingale. To obtain a contradiction, suppose there is probability 1 that the credences will go above their current value. Let Cn be the agent’s credence after the nth update, and consider everything from the point of view of the agent right now, before the updates, with current credence r. Let τ be the first time such that Cτ > r (this is defined with probability one). By Doob’s Optional Sampling Theorem, E[Cτ] = r. But this contradicts the inequality Cτ > r.

Thursday, April 13, 2023

Barn facades and random numbers

Suppose we have a long street with building slots officially numbered 0-999, but with the numbers not posted. At numbers 990–994 and 996–999, we have barn facades with no barn behind them. At all the other numbers, we have normal barns. You know all these facts.

I will assume that the barns are sufficiently widely spaced that you can’t tell by looking around where you are on the street.

Suppose if you find yourself at #5 and judge you are in front of a barn. Intuitively, you know you are in front of a barn. But if you find yourself at #995 and judge you are in front of a barn, you are right, but you don’t know it, as you are surrounded by mere barn facades nearby.

At least that’s the initial intuition (it’s a “safety” intuition in epistemology parlance). But note first that this intuition is based on an unstated assumption, that the buildings are numbered in order. Suppose, instead, that the building numbers were allocated by someone suffering from a numeral reversal disorder, so that, from east to west, the slots are:

  • 000, 100, 200, …, 900, 010, 110, 210, …, 999.

Then when you are at #995, your immediate neighborhood looks like:

  • 595, 695, 795, 895, 995, 006, 106, 206, 306.

And all these are perfectly normal barns. So it seems you know.

But why should knowledge depend on geometry? Why should it matter whether the numerals are apportioned east to west in standard order, or in the order going with the least-significant-digit-first reinterpretation?

Perhaps the intuition here is that when you are at a given number, you could “easily have been” a few buildings to the east or to the west, while it would have been “harder” for you to have been at one of the further away numbers. Thus, it matters whether you are geometrically surrounded by mere barn facades or not.

Let’s assume from now on that the buildings are arranged east to west in standard order: 000, 001, 002, …, 999, and you are at #995.

But how did you get there? Here is one possibility. A random number was uniformly chosen between 0 and 999, hidden from you, and you were randomly teleported to that number. In this case, is there a sense in which it was “easy” for you to have been assigned a neighboring number (say, #994)? That depends on details of the random selection. Here are four cases:

  1. A spinner with a thousand slots was spun.

  2. A ten-sided die (sides numbered 0-9) was rolled thrice, generating digits the digits in order from left to right.

  3. The same as the previous, except the digits were generated in order from right to left.

  4. A computer picked the random number by first accessing a source of randomness, such as the time, to the millisecond, at which the program was started (or timings of keystrokes or fine details of mouse movements). Then a mathematical transformation was applied to the initial random number, to generate a sequence of cryptographically secure pseudorandom numbers whose relationship to the initial source of randomness is quite complex, eventually yielding the selected number. The mathematical transformations are so designed that one cannot assume that when the inputs are close to each other, the outputs are as well.

In case 1, it is intuitively true that if you landed at #995, you could “easily have been” at 994 or 996, since a small perturbation in the input conditions (starting position of spinner and force applied) would have resulted in a small change in the output.

In case 2, you could “easily have been” at 990-994 or 996-999 instead of 995, since all of these would have simply required the last die roll to have been different. In case 3, it is tempting to say that you could easily have been at these neighboring numbers since that would have simply required the first die roll to have been different. But actually I think cases 2 and 3 are further apart than they initially seem. If the first die roll came out differently, likely rolls two and three would have been different as well. Why? Well, die rolls are sensitive to initial conditions (the height from which the die is dropped, the force with which it is thrown, the spin imparted, the initial position, etc.) If the initial conditions for the first roll were different for some reason, it is very likely that this would have disturbed the initial conditions for the second roll. And getting a different result for the first roll would have affected the roller’s psychological state, and that psychological state feeds in a complex way into the way they will do the second and third rolls. So in case 3, I don’t think we can say that you could “easily” have ended up at a neighboring number. That would have required the first die roll to be different, and then, likely, you would have ended up quite far off.

Finally, in case 4, a good pseudorandom number generator is so designed that the relationship between the initial source of randomness and the outputs is sufficiently messy that a slight change in the inputs is apt to lead to a large change in the outputs, so it is false that you could easily have ended up at a neighboring number—intuitively, had things been different, you wouldn’t have been any more likely to end up at 994 or 996 than at 123 or 378.

I think at this point we can’t hold on to the initial intuition that at #995 you don’t know you’re at a barn but at #5 you would have known without further qualifications about how you ended up where you are. Maybe if you ended up at #995 via the spinner and the left-to-right die rolls, you don’t know, but if you ended up there via the right-to-left die rolls or the cryptographically secure pseudorandom number generator, then there is no relevant difference between #995 and #5.

At this point, I think, the initial intuition should start getting destabilized. There is something rather counterintuitive about the idea that the details of the random number generation matter. Does it really matter for knowledge whether the buildin number you were transported to was generated right-to-left or left-to-right by die rolls?

Why not just say that you know in all the cases? In all the cases, you engage in simple statistical reasoning: of the 1000 barn facades, 999 of them are fronts of a real barns, one is a mere facade, and it’s random which one is in front of you, so it is reasonable to think that you are in front of a real barn. Why should the neighboring buildings matter at all?

Perhaps it is this. In your reasoning, you are assuming you’re not in the 990-999 neighborhood. For if you realized you were in that neighborhood, you wouldn’t conclude you’re in front of a barn. But this response seems off-base for two reasons. First, by the same token you could say that when you are at #5, you are assuming you’re not in front of any of the buildings from the following set: {990, 991, 992, 993, 994, 5, 996, 997, 998, 999}. For if you realized you were in front of a building from that set, you wouldn’t have thought you are in front of a barn. But that’s silly. Second, you aren’t assuming that you’re not in the 990-999 neighborhood. For if you were assuming that, then your confidence that you’re in front of a real barn would have been the same as your confidence that you’re not in the 990-999 neighborhood, namely 0.990. But in fact, your confidence that you’re in front of a real barn is slightly higher than that, it is 0.991. For your confidence that you’re in front of a real barn takes into account the possibility that you are at #995, and hence that you are in the 990-999 neighborhood.

Monday, February 27, 2023

Species relativity of priors

  1. It would be irrational for us to assign a very high prior probability to the thesis that spiky teal fruit is a healthy food.

  2. If a species evolved to naturally assign a very high prior probability to the thesis that spiky teal fruit is a healthy food, it would not be irrational for them to do this.

  3. So, what prior probabilities are rational is species relative.

Tuesday, February 21, 2023

Continuity, scoring rules and domination

Pettigrew claimed, and Nielsen and I independently proved (my proof is here) that any strictly proper scoring rule on a finite space that is continuous on the probabilities has the domination property that any non-probability is strictly dominated in score by some probability.

An interesting question is how far one can weaken the continuity assumption. While I gave necessary and sufficient conditions, those conditions are rather complicated and hard to work with. So here is an interesting question: Is it sufficient for the domination property that the scoring rule be continuous at all the regular probabilities, those that assign non-zero values to every point, and finite?

I recently posted, and fairly quickly took down, a mistaken argument for a negative answer. I now have a proof of a positive answer. It took me way too long to get that positive answer, when in fact it was just a simple geometric argument (see Lemma 1).

Slightly more generally, what’s sufficient is that the scoring rule be continuous at all the regular probabilities as well as at every point where the score is infinite.

Thursday, February 2, 2023

Rethinking priors

Suppose I learned that all my original priors were consistent and regular but produced by an evil demon bent upon misleading me.

The subjective Bayesian answer is that since consistent and regular original priors are not subject to rational evaluation, I do not need to engage in any radical uprooting of my thinking. All I need to do is update on this new and interesting fact about my origins. I would probably become more sceptical, but all within the confines of my original priors, which presumably include such things as the conditional probability that I have a body given that I seem to have a body but there is an evil demon bent upon misleading me.

This answer seems wrong. So much the worse for subjective Bayesianism. A radical uprooting would be needed. It would be time to sit back, put aside preconceptions, and engage in some fallibilist version of the Cartesian project of radical rethinking. That project might be doomed, but it would be my only hope.

Now, what if instead of the evil demon, I learned of a random process independent of truth as the ultimate origin of my priors. I think the same thing would be true. It would be a time to be brave and uproot it all.

I think something similar is true piece by piece, too. I have a strong moral intuition that consequentialism is false. But suppose that I learned that when I was a baby, a mad scientist captured me and flipped a coin with the plan that on heads a high prior in anti-consequentialism would be induced and on tails it would be a high prior in consequentialism instead. I would have to rethink consequentialism. I couldn’t just stick with the priors.

From strict anti-anti-Bayesianism to strict propriety

In my previous post, I showed that a continuous anti-anti-Bayesian accuracy scoring rule on probabilities defined on a sub-algebra of events satisfying the technical assumption that the full algebra contains an event logically independent of the sub-algebra is proper. However, I couldn’t figure out how to prove strict propriety given strict anti-anti-Bayesianism. I still don’t, but I can get closer.

First, a definition. A scoring rule on probabilities on the sub-algebra H is strictly anti-anti-Bayesian provided that one expects it to penalize non-trivial binary anti-Bayesian updates. I.e., if A is an event with prior probability p neither zero nor one, and Bayesian conditionalization on A (or, equivalently, on Ac) modifies the probability of some member of H, then the p-expected score of finding out whether A or Ac holds and conditionalizing on that is strictly better than if one adopted the procedure of conditionalizing on the complement of the actually obtaining event.

Suppose we have continuity, the technical assumption and anti-anti-Bayesianism. My previous post shows that the scoring rule is proper. I can now show that it is strictly proper if we strengthen anti-anti-Bayesianism to strict anti-anti-Bayesianism and add the technical assumption that the scoring rule satisfies the finiteness condition that Eps(p) is finite for any probability p on H. Since we’re working with accuracy scoring rules and these take values in [−∞,M] for finite M, the only way to violate the finiteness condition is to have Eps(p) =  − ∞, which would mean that s is very pessimistic about p: by p’s own lights, the expected score of p is infinitely bad. The finiteness condition thus rules out such maximal pessimism.

Here is a sketch of the proof. Suppose we do not have strict propriety. Then there will be two distinct probabilities p and q such that Eps(p) ≤ Eps(q). By propriety, the inequality must be an equality. By Proposition 9 of a recent paper of mine, it follows that s(p) = s(q) everywhere (this is where the finiteness condition is used). Now let r = (p+q)/2. Using the trick from the Appendix here, we can find a probability p′ on the full algebra and an event Z such that r is the restriction of p to H, p is the restriction of the Bayesian conditionalization of p′ on Z to H, and q is the restriction of the Bayesian conditionalization of q on Zc to H. Then the scores of p and q will be the same, and hence the scores of Bayesian and anti-Bayesian conditionalization on finding out whether Z or Zc is actual are guaranteed to be the same, and this violates strict anti-anti-Bayesianism.

One might hope that this will help those who are trying to construct accuracy arguments for probabilism—the doctrine that credences should be probabilities. The hitch in those arguments is establishing strict propriety. However, I doubt that what I have helps. First, I am working in a sub-algebra setting. Second, and more importantly, I am working in a context where scoring rules are defined only for probabilities, and so the strict propriety inequality I get is only for scores of pairs of probabilities, while the accuracy arguments require strict propriety for pairs of credences exactly one of which is not a probability.

Wednesday, February 1, 2023

From anti-anti-Bayesianism to propriety

Let’s work in the setting of my previous post, including technical assumption (3), and also assume Ω is finite and that our scoring rules are all continuous.

Say that an anti-Bayesian update is when you take a probability p, receive evidence A, and make your new credence be p(⋅|Ac), i.e., you conditionalize on the complement of the evidence. Anti-Bayesian update is really stupid, and you shouldn’t get rewarded for it, even if all you care about are events other than A and Ac.

Say that an H-scoring rule s is anti-anti-Bayesian providing that the expected score of a policy of anti-Bayesian update on an event A whose prior probability is neither zero nor one is never better than the expected score of a policy of Bayesian update.

I claim that given continuity, anti-anti-Bayesianism implies that the scoring rule is proper.

First, note that by continuity, if it’s proper at all the regular probabilities (ones that do not assign 0 or 1 to any non-empty set) on H, then it’s proper (I am assuming we handle infinities like in this paper, and use Lemma 1 there).

So all we need to do is show that it’s proper at all the regular probabilities on H. Let p be a regular probability, and contrary to propriety suppose that Eps(p) < Eps(q) for another probability q. For t ≥ 0, let pt be such that tq + (1−t)pt = p, i.e., let pt = (ptq)/(1−t). Since p is regular, for t sufficiently small, pt will be a probability (all we need is that it be non-negative). Using the trick from the Appendix of the previous post with q in place of p1 and pt in place of p2, we can set up a situation where the Bayesian update will have expected score:

  • tEqs(q) + (1−t)Epts(pt)

and the anti-Bayesian update will have the expected score:

  • tEqs(pt) + (1−t)Epts(q).

Given anti-anti-Bayesianism, we must have

  • tEqs(pt) + (1−t)Epts(q) ≤ tEqs(q) + (1−t)Epts(pt).

Letting t → 0 and using continuity, we get:

  • Ep0s(q) ≤ Ep0s(p0).

But p0 = p. So we have propriety.