Alexander Pruss's Blog: independence

Showing posts with label independence. Show all posts

Monday, October 20, 2025

Another infinite dice game

Suppose infinitely many people independently roll a fair die. Before they get to see the result, they will need to guess whether the die shows a six or a non-six. If they guess right, they get a cookie; if they guess wrong, an electric shock.

But here’s another part of the story. An angel has considered all possible sequences of fair die outcomes for the infinitely many people, and defined the equivalence relation ∼ on the sequences, where α ∼ β if and only if the sequences α and β differ in at most finitely many places. Furthermore, the angel has chosen a set T that contains exactly one sequence from each ∼-equivalence class. Before anybody guesses, the angel is going to look at everyone’s dice and announce the unique member α of T that is ∼-equivalent to the actual die rolls.

Consider two strategies:

Ignore what the angel says and say “not six” regardless.
Guess in accordance with the unique member α: if α says you have six, you guess “six”, and otherwise you guess “not six”.

When the two strategies disagree for a person, there is a good argument that the person should go with strategy (1). For without the information from the angel, the person should go with strategy (1). But the information received from the angel is irrelevant to each individual x, because which ∼-equivalence class the actual sequence of rolls falls into depends only on rolls other than x’s. And following strategy (1) in repeats of the game results in one getting a cookie five out of six times on average.

However, if everyone follows strategy (2), then it is guaranteed that in each game only finitely many people get a shock and everyone else gets a cookie.

This seems to be an interesting case where self-interest gets everyone to go for strategy (1), but everyone going for strategy (2) is better for the common good. There are, of course, many such games, such as Tragedy of the Commons or the Prisoner’s Dilemma, but what is weird about the present game is that there is no interaction between the players—each one’s payoff is independent of what any of the other players do.

(This is a variant of a game in my infinity book, but the difference is that the game in my infinity book only worked assuming a certain rare event happened, while this game works more generally.)

My official line on games like this is that their paradoxicality is evidence for causal finitism, which thesis rules them out.

Wednesday, September 4, 2024

Independent invariant regular hyperreal probabilities: an existence result

A couple of years ago I showed how to construct hyperreal finitely additive probabilities on infinite sets that satisfy certain symmetry constraints and have the Bayesian regularity property that every possible outcome has non-zero probability. In this post, I want to show a result that allows one to construct such probabilities for an infinite sequence of independent random variables.

Suppose first we have a group G of symmetries acting on a space Ω. What I previously showed was that there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity (i.e., P(A) > 0 for every non-empty A) if and only if the action of G on Ω is “locally finite”, i.e.:

For any finitely generated subgroup H of G and any point x in G, the orbit Hx is finite.

Here is today’s main result (unless there is a mistake in the proof):

Theorem. For each i in an index set, suppose we have a group G_i acting on a space Ω_i. Let Ω = ∏_iΩ_i and G = ∏_iG_i, and consider G acting componentwise on Ω. Then the following are equivalent:

there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity and the independence condition that if A₁, ..., A_n are subsets of Ω such that A_i depends only on coordinates from J_i ⊆ I with J₁, ..., J_n pairwise disjoint if and only if the action of G on Ω is locally finite
there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity
the action of G on Ω is locally finite.

Here, an event A depends only on coordinates from a set J just in case there is a subset A′ of ∏_j ∈ JΩ_j such that A = {ω ∈ Ω : ω|_J ∈ A′} (I am thinking of the members of a product of sets as functions from the index set to the union of the Ω_i). For brevity, I will omit “finitely additive” from now on.

The equivalence of (b) and (c) is from my old result, and the implication from (a) to (b) is trivial, so the only thing to be shown is that (c) implies (a).

Example: If each group G_i is finite and of size at most N for a fixed N, then the local finiteness condition is met. (Each such group can be embedded into the symmetric group S_N, and any power of a finite group is locally finite, so a fortiori its action is locally finite.) In particular, if all of the groups G_i are the same and finite, the condition is met. An example like that is where we have an infinite sequence of coin tosses, and the symmetry on each coin toss is the reversal of the coin.

Philosophical note: The above gives us the kind of symmetry we want for each individual independent experiment. But intuitively, if the experiments are identically distributed, we will want invariance with respect to a shuffling of the experiments. We are unlikely to get that, because the shuffling is unlikely to satisfy the local finiteness condition. For instance, for a doubly infinite sequence of coin tosses, we would want invariance with respect to shifting the sequence, and that doesn’t satisfy local finiteness.

Now, on to a sketch of the proof from (c) to (a). The proof uses a sequence of three reductions using an ultraproduct construction to cases exhibiting more and more finiteness.

First, note that without loss of generality, the index set I can be taken to be finite. For if it’s infinite, for any finite partition K of I, and any J ∈ K, let G_J = ∏_i ∈ JG_i, let Ω_J = ∏_i ∈ JΩ_i, with the obvious action of G_J on Ω_J. Then G is isomorphic to ∏_J ∈ KG_J and Ω to ∏_J ∈ KΩ_J. Then if we have the result for finite index sets, we will get a regular hyperreal G-invariant probability on Ω that satisfies the independence condition in the special case where J₁, ..., J_n are such that J_i and J_j for distinct i and j are such that at least one of J_i ∩ J and J_j ∩ J is empty for every J ∈ K. We then take an ultraproduct of these probability measures with respect to K and an ultrafilter on the partially ordered set of finite partitions of I ordered by fineness, and then we get the independence condition in full generality.

Second, without loss of generality, the groups G_i can be taken as finitely generated. For suppose we can construct a regular probability that is invariant under H = ∏_iH_i where H_i is a finitely generated subgroup of G_i and satisfies the independence condition. Then we take an ultraproduct with respect to an ultrafilter on the partially ordered set of sequences of finitely generated groups (H_i)_i ∈ I where H_i is a subgroup of G_i and where the set is ordered by componentwise inclusion.

Third, also without loss of generality, the sets Ω_i can be taken to be finite, by replacing each Ω_i with an orbit of some finite collection of elements under the action of the finitely generated G_i, since such orbits will be finite by local finiteness, and once again taking an appropriate ultraproduct with respect to an ultrafilter on the partially ordered set of sequences of finite subsets of Ω_i closed under G_i ordered by componentwise inclusion. The Bayesian regularity condition will hold for the ultraproduct if it holds for each factor in the ultraproduct.

We have thus reduced everything to the case where I is finite and each Ω_i is finite. The existence of the hyperreal G-invariant finitely additive regular probability measure is now trivial: just let P(A) = |A|/|Ω| for every A ⊆ Ω. (In fact, the measure is countably additive and not merely finitely additive, real and not merely hyperreal, and invariant not just under the action of G but under all permutations.)

Wednesday, October 19, 2022

More on independence

Suppose that I uniformly randomly choose a number x between 0, inclusive, and 1, exclusive. I then look at the bits b₁, b₂, ... after the binary point in the binary expansion x = 0.b₁b₂.... Each bit has equal probability 1/2 of being 0 or 1, and the bits are independent by the standard mathematical definition of independence.

Now, what I said is actually underspecified. For some numbers have two binary expansions. E.g., 1/2 can be written as 0.100000... or as 0.011111... (compare how in decimal we have 1/2 = 0.50000... = 0.49999...). So when talked of “the” binary expansion, I need to choose one of the two. Suppose I do the intuitive thing, and consistently choose the expansion that ends with an infinite string of zeroes over the expansion that ends with an infinite string of ones.

This fine point doesn’t affect anything I said about independence, given the standard mathematical definition thereof. But there is an intuitive sense of independence in which we can now see that the bits are not independent. For instance, while each bit can be 1 on its own, it is impossible to have all the bits be 1 (this is actually impossible regardless of how I decided on choosing the expansion, because x = 1 is excluded), and indeed impossible to have all the bits be 1 from some point on. There is a very subtle dependence between the bits that we cannot define within classical probability, a dependence that would be lacking if we tossed an infinite number of "really" independent fair coins.

Wednesday, September 8, 2021

Two spinners and infinitesimal probabilities

Suppose you do two independent experiments, A and B, each of which uniformly generates a number in the interval I = [0, 1).

Here are some properties we would like to have on our probability assignment P:

There is a value α such that P(A = x)=P(B = x)=α for all x ∈ I and P((A, B)=z)=α² for all z ∈ I².
For every subset U of I² consisting of a finite union of straight lines, P((A, B)∈U) is well-defined.
For any measurable U ⊆ I², if P((A, B)∈U|A = x)=y for all x ∈ I, then P((A, B)∈U)=y.
For any measurable U ⊆ I², if P((A, B)∈U|B = x)=y for all x ∈ I, then P((A, B)∈U)=y.
The assignment P satisfies the axioms of finitely additive probability with values in some real field.

Here is an interesting consequence. Let U consist of two line segments, one from (0, 0) to (1, 1/2) and the other from (0, 1/2) to (1, 1). Then every vertical line in I² intersects U in exactly two points. This is measurable by (2). It follows from (1) and (5) that P((A, B)∈U|A = x)=2α for all x ∈ I. Thus, P((A, B)∈U)=2α by (3). On the other hand, every horizontal line in I² meets U in exactly one point, so P((A, B)∈U|B = x)=α by (1) and P((A, B)∈U)=α by (4). Thus, 2α = α, and so α = 0.

In other words, if we require (1)-(5) to hold, then the probability of every single point outcome of either experiment must be exactly zero. In particular, it is not possible for the probability of a single point outcome to be a positive infinitesimal.

Cognoscenti of these kinds of arguments will recognize (3) and (4) as special cases of conglomerability, and are likely to say that we cannot expect conglomerability when dealing with infinitesimal probabilities. Maybe so: but (3) and (4) are only a special case of conglomerability, and they feel particularly intuitive to me, in that we are partitioning the sample space I² on the basis of the values of one of the two independent random variables that generate the sample space. The setup—say, two independent spinners—seems perfectly natural and unparadoxical, the partitions seem perfectly natural, and the set U to which we apply (3) and (4) is also a perfectly natural set, a union of two line segments. Yet even in this very natural setup, the friend of infinitesimal probabilities has to embrace a counterintuitive violation of (3) and (4).

Monday, December 7, 2020

Independence, spinners and infinitesimals

Say that a “spinner” is a process whose output is an angle from 0^∘ (inclusive) to 360^∘ (exclusive). Take as primitive a notion of uniform spinner. I don’t know how to define it. A necessary condition for uniformity is that every angle has the same probability, but this necessary condition is not sufficient.

Consider two uniform and independent spinners, generating angles X and Y. Consider a third “virtual spinner”, which generates the angle Z obtained by adding X and Y and wrapping to be in the 0^∘ to 360^∘ range (thus, if X = 350^∘ and Y = 20^∘, then Z = 10^∘). This virtual spinner is intuitively statistically independent of each of X and Y on its own but not of both.

Suppose we take the intuitive statistical independence at face value. Then:

P(Z = 0^∘)P(X = 0^∘)=P(Z = X = 0^∘)=P(Y = X = 0^∘)=P(Y = 0^∘)P(X = 0^∘),

where the second equality followed from the fact that if X = 0^∘ then Z = 0^∘ if and only if Y = 0^∘. Suppose now that P(X = 0^∘) is an infinitesimal α. Then we can divide both sides by α, and we get

P(Z = 0^∘)=P(Y = 0^∘).

By the same reasoning with X and Y swapped:

P(Z = 0^∘)=P(X = 0^∘).

We conclude that

P(X = 0^∘)=P(Y = 0^∘).

We thus now have an argument for a seemingly innocent thesis:

Any two independent uniform spinners have the same probability of landing at 0^∘.

But if we accept that uniform spinners have infinitesimal probabilities of landing at a particular value, then (1) is false. For suppose that X and Y are angles from two independent uniform spinners for which (1) is true. Consider a spinner whose angle is 2Y (wrapped to the [0^∘, 360^∘) range). This doubled spinner is clearly uniform, and independent of X. But its probability of yielding 0^∘ is equal to the probability of Y being 0^∘ or 180^∘, which is twice the probability of Y being 0^∘, and hence twice the probability of X being 0^∘, in violation of (1) if P(X = 0^∘)>0.

So, something has gone wrong for friends of infinitesimal probabilities. I see the following options available for them:

Deny that Z = 0^∘ has non-zero probability.
Deny that Z is statistically independent of X as well as being statisticlaly independent of Y.

I think (3) is probably the better option, though it strikes me as unintuitive. This option has the interesting consequence: we cannot independently rerandomize a spinner by giving it another spin.

The careful reader will notice that this is basically the same argument as the one here.

Wednesday, December 2, 2020

Another problem for infinitesimal probabilities

Here’s another problem with independence for friends of infinitesimal probabilities.

Let ..., X₋₂, X₋₁, X₀, X₁, X₂, ... be an infinite sequence of independent fair coin tosses. For i = 0, 1, 2, ..., define E_i to be heads if X_i and X_{−1 − i} are the same and tails otherwise.

Now define these three events:

L: X₋₁, X₋₂, ... are all heads
R: X₀, X₁, ... are all heads
E: E₀, E₁, ... are all heads.

Friends of infinitesimal probabilities insist that P(R) and P(L) are positive infinitesimals.

I now claim that E is independent of R, and the same argument will show that E is independent of L. This is because of this principle:

If Y₀, Y₁, ... is a sequence of independent random variables, and f and g are functions such that f(Y_i) and g(Y_i) are independent of each other for each fixed i, then the sequences f(Y₀),f(Y₁),... and g(Y₀),g(Y₁),... are independent of each other.

But now let Y_i = (X_i, X_{−1 − i}). Then Y₀, Y₁, ... is a sequence of independent random variables. Let f(x, y)=x and let g(x, y) be heads if x = y and tails otherwise. Then it is easy to check that f(Y_i) and g(Y_i) are independent of each other for each fixed i. Thus, by (1), f(Y₀),f(Y₁),... and g(Y₀),g(Y₁),... are independent of each other. But f(Y_i)=X_i and g(Y_i)=E_i. So, X₀, X₁, ... and E₀, E₁, ... are independent of each other, and hence so are E and R.

The same argument shows that E and L are independent.

Write AB for the conjunction of A and B and note that EL, ER and RL are the same event—namely, the event of all the coins being heads. Then:

P(E)P(L)=P(EL)=P(RL)=P(R)P(L)

Since friends of positive infinitesimals insist that P(R) and P(L) are positive infinitesimals, we can divide both sides by P(L) and get P(E)=P(R). The same argument with L and R swapped shows that P(E)=P(L). So, P(L)=P(R).

But now let X_i^* = X_i + 1, and define L^* to be the event of X₋₁^*, X₋₂^* being all heads, and R^* the event of X₀^*, X₁^*,… being all heads. The exact same argument as above will show that P(L^*)=P(R^*). But friends of infinitesimal probabilities have to say that P(R^*)>P(R) and P(L^*)<P(L), and so we have a contradiction if P(L)=P(R) and P(L^*)=P(R^*).

I think the crucial question is whether (1) is still true in settings with infinitesimal probabilities. I don’t have a great argument for it. It is, of course, true in classical probabilistic settings.

Monday, November 30, 2020

Independence, uniformity and infinitesimals

Suppose that a random variable X is uniformly distributed (in some intuitive sense) over some space. Then :

P(X = y)=P(X = z) for any y and z in that space.

But I think something stronger should also be true:

Let Y and Z be any random variables taking values in the same space as X, and suppose each variable is independent of X. Then P(X = Y)=P(X = Z).

Fixed constants are independent of X, so (1) follows from (2).

But if we have (2), and the plausible assumption:

If X and Y are independent, then X and f(Y) are independent for any function f,

we cannot have infinitesimal probabilities. Here’s why. Suppose X and Y are independent random variables uniformly distributed over the interval [0, 1). Assume P(X = a) is infinitesimal for a in [0, 1). Then, so is P(X = Y).

Let f(x)=2x for x < 1/2 and f(x)=2x − 1 for 1/2 ≤ x. Then if X and Y are independent, so are X and f(Y). Thus:

P(X = Y)=P(X = f(Y)).

Let g(x)=x/2 and let h(x)=(1 + x)/2. Then:

P(Y = g(X)) = P(Y = X)

and

P(Y = h(X)) = P(Y = X).

But now notice that:

Y = g(X) if and only if X = f(Y) and Y < 1/2

and

Y = h(X) if and only if X = f(Y) and 1/2 ≤ Y.

Thus:

(Y = g(X) or Y = h(X)) if and only if X = f(Y)

and note that we cannot have both Y = g(X) and Y = h(X). Hence:

P(X = Y)=P(X = f(Y)) = P(Y = g(X)) + P(Y = h(X)) = P(Y = X)+P(Y = X)=2P(X = Y).

Therefore:

P(X = Y)=0,

which contradicts the infinitesimality of P(X = Y).

This argument works for any uniform distribution on an infinite set U. Just let A and B be a partition of U into two subsets of the same cardinality as U (this uses the Axiom of Choice). Let g be a bijection from U onto A and h a bijection from U onto B. Let f(x)=g⁻¹(x) for x ∈ A and f(x)=h⁻¹(x) for x ∈ B.

Note: We may wish to restrict (3) to intuitively “nice” functions, ones that don’t introduce non-measurability. The functions in the initial argument are “nice”.

Friday, August 21, 2020

Complete Probabilistic Characterizations

Consider the concept of a complete probabilistic characterization (CPC) of an experiment. It’s a bit of a fuzzy concept, but we can get some idea about it. For instance, if I have a coin loaded in favor of heads, then saying that heads is more likely than tails is not a CPC. Minimally, the CPC will give exact numbers where the probabilities have exact numbers. But the CPC may go beyond giving numerical probabilities. For instance, if you toss infinitely main fair coins, the numerical probability that they are all heads is zero as is the probability that all the even numbered ones are heads. But intuitively it is more likely that the even numbered ones are heads than that all of them are heads. If there is something to this intuition, the CPC will include the relevant information: it may do that by assigning different infinitesimal probabilities to the two events, or by giving conditional probabilities conditioned on various zero-probability events.

A deep question that has sometimes been discussed by philosophers of probability is what CPCs are like. Here are three prominent candidates:

classical real-valued probabilities
hyperreal probabilities assigning non-zero (but perhaps infinitesimal) probability to every possible event
primitive conditional probabilities allowing conditioning on every possible event.

The argument against (1) and for (2) and (3) is that (1) doesn’t distinguish things that should be distinguished—like the heads case above. I want to offer an argument against (2) and (3), however.

Here is a plausible principle:

If X and Y are measurements of two causally independent experiments, then the CPC of the pair (X, Y) is determined by the CPCs of X and Y together with the fact of independence.

If (4) is true, then a challenge for a defender of a particular candidate for CPC is to explain how the CPC of the pair is determined by the individual CPCs of the independent experiments.

In the case of (1), the challenge is easily met: the pair (X, Y) has as its probability measure the product of the probability measures for X and Y.

In the cases of (2) and (3), the challenge has yet to be met, and there is some reason to think it cannot be met. In this post, I will argue for this in the case of (2): the case of (3) follows from the details of the argument in the case of (2) plus the correspondence between Popper functions and hyperreal probabilities.

Consider the case where X and Y are uniformly distributed over the interval [0, 1]. By independence, we want the pair (X, Y) to have a hyperreal finitely additive probability measure P such that P(X ∈ A, Y ∈ B)=P(X ∈ A)P(Y ∈ B) for all events A and B. But it turns out that this requirement on P highly underdetermines P. In particular, it seems to be that for any positive real number r, we can find a hyperreal measure P such that P(X ∈ A, Y ∈ B)=P(X ∈ A)P(Y ∈ B) for all A and B, and such that P(X = Y)=rP(Y = 0). Hence, independence highly underdetermines what value P assigns to the diagonal X = Y as compared to the value it assigns to Y = 0.

Maybe some other conditions can be added that would determine the CPC of the pair. But I think we don’t know what these would be. As it stands, we don’t know how to determine the CPC of the pair in light of the CPC of the members of the pair, if CPCs are of type (2).

Thursday, August 13, 2020

Another simple way to see a problem with infinitesimal probabilities

Suppose I independently randomly and uniformly choose X and Y between 0 and 1, not including 1 but possibly including 0. Now in the diagram above, let the blue event B be that the point (X, Y) lies one one of the two blue line segments, and let the red event R be that it lies on one of the two red line segments. (The red event is the graph of the fractional part of 2x; the blue event is the reflection of this in the line y = x.) As usual, a filled circle indicates a point included and an unfilled circle indicates a point not included; the purple point at (0, 0) is in both the red and blue events.

It seems that B is twice as likely as R. For, given any value of X—see the dotted line in the diagram—there are two possible values of Y that put one in B but only one possible value of X that puts one in R.

But of course the situation is completely symmetric between X and Y, and the above reasoning can be repeated with X and Y swapped to conclude that R is about twice as likely as B.

Hmm.

Of course, there is no paradox in classical probability theory where we just say that the red and blue events have zero probability, and twice zero equals zero.

But if we have any probability theory that distinguishes different events that are classically of zero-probability and says things like “it’s more likely that Y is 0.2 or 0.8 than that Y is 0.2” (say because both events have infinitesimal probability, with one of these infinitesimals being twice as big as the other), then the above reasoning should yield the absurd conclusion that B is more likely than R and R is more likely than B.

Technically, there is nothing new in the above. It just shows that when we have a probability theory that distinguishes classically zero-probability events, that probability theory will fail conglomerability. I.e., we have to reject the reasoning that just because conditionally on any value of X it’s twice as likely that we’re in B as in R, therefore it’s twice as likely that we’re in B as in R. We already knew that conglomerability reasoning had to be rejected in such probability theories. But I think this is a really vivid way of showing the point, as this instance of conglomerability reasoning seems super plausible. And I think the vividness of it makes it clear that the problem doesn’t depend on any kind of weird trickery with strange sets, and that no mere technical tweak (such as moving to qualitative or comparative probabilities) is likely to get us out of it.

Tuesday, July 28, 2020

Independence and probability

Thesis: If we stick to real-numbered probabilities, genuine independence of events A and B cannot be defined in terms of any condition on the conditional probabilities P(X|Y) where X and Y are events can be constructed from A and B by using the boolean operations of union, intersection and complement, even if the conditional probabilities are taken to be primitive rather than defined as ratios.

Argument: Suppose that three genuinely independent darts are thrown uniformly at the interval [0, 1], and consider the events:

A: the first dart hits [0, 1/2) or the third hits 1/2
B: the second dart hits [0, 1/2) or the third hits 1/2.

The events A and B are not genuinely independent. The third dart’s hitting 1/2 would guarantee that both events A and B happen. But it is easy to check that the conditional probabilities for any boolean combination of A and B are exactly the same as for the corresponding boolean combination of A′ and B′, where:

A′: the first dart hits [0, 1/2)
B′: the second dart hits [0, 1/2).

So, conditional probabilities can’t distinguish the non-genuinely independent pair A and B from the genuinely independent pair A′ and B′.

Nor should we mind this fact. For genuine independence is a concept about causation or rationality, while probabilities give us a statistical concept.

Monday, September 30, 2019

Classical probability theory is not enough

Here’s a quick argument that classical probability cannot capture all probabilistic phenomena even if we restrict our attention to phenomena where numbers should be assigned. Consider a nonmeasurable event E, maybe a dart hitting a nonmeasurable subset of the target, and consider a fair coin flip that is causally isolated from E. Let H and T be the heads and tails results of the flip. Then let A be this disjunctive event:

(E and H) or (not-E and T).

Intuitively, event A clearly has probability 1. If E happens, the probability of A is 1/2 (heads) and if E doesn’t happen, it’s also 1/2 (tails). (The argument uses finite conglomerability, but it is also highly intuitive.)

So a precise number should be assigned to A, namely 1/2. And ditto to H. But we cannot have these assignments in classical probability theory. For if we did that, then we would also have to assign a probability to the conjunction of H and A, which is equivalent to the conjunction of E and H. But we cannot assign a probability to the conjunction of E and H, because E and H are independent, and so we would have a precise probability for E, namely P(E)P(H)/P(H)=P(E&H)/P(H), contrary to the nonmeasurability of E.

Wednesday, May 16, 2018

Possibly giving a finite description of a nonmeasurable set

It is often assumed that one couldn’t finitely specify a nonmeasurable set. In this post I will argue for two theses:

It is possible that someone finitely specifies a nonmeasurable set.
It is possible that someone finitely specifies a nonmeasurable set and reasonably believes—and maybe even knows—that she is doing so.

Here’s the argument for (1).

Imagine we live an uncountable multiverse where the universes differ with respect to some parameter V such that every possible value of V corresponds to exactly one universe in the multiverse. (Perhaps there is some branching process which generates a universe for every possible value of V.)

Suppose that there is a non-trivial interval L of possible values of V such that all and only the universes with V in L have intelligent life. Suppose that within each universe with V in L there runs a random evolutionary process, and that the evolutionary processes in different universes are causally isolated of each other.

Finally, suppose that for each universe with V in L, the chance that the first instance of intelligent life will be warm-blooded is 1/2.

Now, I claim that for every subset W of L, the following statement is possible:

The set W is in fact the set of all the values of V corresponding to universes in which the first instance of intelligent life is warm-blooded.

The reason is that if some subset W of L were not a possible option for the set of all V-values corresponding to the first instance of intelligent life being warm-blooded, then that would require some sort of an interaction or dependency between the evolutionary processes in the different universes that rules out W. But the evolutionary procesess in the different universes are causally isolated.

Now, let W be any nonmeasurable subset of L (I am assuming that there are nonmeasurable sets, say because of the Axiom of Choice). Then since (3) is possible, it follows that it is possible that the finite description “The set of values of V corresponding to universes in which the first instance of intelligent life is warm blooded” describes W, and hence describes a nonmeasurable set. It is also plainly compossible with everything above that somebody in this multiverse in fact makes use of this finite description, and hence (1) is true.

The argument for (2) is more contentious. Enrich the above assumptions with the added possibility that the people in one of the universes have figured out that they live in a multiverse such as above: one parametrized by values of V, with an interval L of intelligent-life-permitting values of V, with random and isolated evolutionary processes, and with the chance of intelligent life being warm-blooded being 1/2 conditionally on V being in L. For instance, the above claims might follow from particularly elegant and well-confirmed laws of nature.

Given that they have figured this out, they can then let “Q” be an abbreviation for “The set of all values of V corresponding to universes wehre the first instance of intelligent life is warm-blooded.” And they can ask themselves: Is Q likely to be measurable or not?

The set Q is a randomly chosen subset of L. On the standard (product measure) understanding of how to probabilistically make sense of this “random choice” of subset, the event of Q being nonmeasurable is itself nonmeasurable (see the Sawin answer here). However, intuitively we would expect Q to be nonmeasurable. Terence Tao shares this intuition (see the paragraph starting “Intuitively”). His reason for the intuition is that if Q were measurable, then by something like the Law of Large Numbers, we would expect the intersection of Q with a subinterval I of L to have a measure equal to half of the measure of I, which would be in tension with the Lebesgue Density Theorem. This reasoning may not be precisifiable mathematically, but it is intuitively compelling. One might also just have a reasonable and direct intuition that the nonmeasurability is the default among subsets, and so a “random subset” is going to be nonmeasurable.

So, the denizens of our multiverse can use these intuitions to reasonably conclude that Q is nonmeasurable. Hence, (2) is true. Can they leverage these intuitions into knowledge? That’s less clear to me, but I can’t rule it out.

Thursday, August 10, 2017

Uncountable independent trials

Suppose that I am throwing a perfectly sharp dart uniformly randomly at a continuous target. The chance that I will hit the center is zero.

What if I throw an infinite number of independent darts at the target? Do I improve my chances of hitting the center at least once?

Things depend on what size of infinity of darts I throw. Suppose I throw a countable infinity of darts. Then I don’t improve my chances: classical probability says that the union of countably many zero-probability events has zero probability.

What if I throw an uncountable infinity of darts? The answer is that the usual way of modeling independent events does not assign any meaningful probabilities to whether I hit the center at least once. Indeed, the event that I hit the center at least once is “saturated nonmeasurable”, i.e., it is nonmeasurable and every measurable subset of it has probability zero and every measurable superset of it has probability one.

Proposition: Assume the Axiom of Choice. Let P be any probability measure on a set Ω and let N be any non-empty event with P(N)=0. Let I be any uncountable index set. Let H be the subset of the product space Ω^I consisting of those sequences ω that hit N, i.e., ones such that for some i we have ω(i)∈N. Then H is saturated nonmeasurable with respect to the I-fold product measure P^I (and hence with respect to its completion).

One conclusion to draw is that the event H of hitting the center at least once in our uncountable number of throws in fact has a weird “nonmeasurable chance” of happening, one perhaps that can be expressed as the interval [0, 1]. But I think there is a different philosophical conclusion to be drawn: the usual “product measure” model of independent trials does not capture the phenomenon it is meant to capture in the case of an uncountable number of trials. The model needs to be enriched with further information that will then give us a genuine chance for H. Saturated nonmeasurability is a way of capturing the fact that the product measure can be extended to a measure that assigns any numerical probability between 0 and 1 (inclusive) one wishes. And one requires further data about the system in order to assign that numerical probability.

Let me illustrate this as follows. Consider the original single-case dart throwing system. Normally one describes the outcome of the system’s trials by the position z of the tip of the dart, so that the sample space Ω equals the set of possible positions. But we can also take a richer sample space Ω^* which includes all the possible tip positions plus one more outcome, α, the event of the whole system ceasing to exist, in violation of the conservation of mass-energy. Of course, to be physically correct, we assign chance zero to outcome α.

Now, let O be the center of the target. Here are two intuitions:

If the number of trials has a cardinality much greater than that of the continuum, it is very likely that O will result on some trial.
No matter how many trials—even a large infinity—have been performed, α will not occur.

But the original single-case system based on the sample space Ω^* does not distinguish O and α probabilistically in any way. Let ψ be a bijection of Ω^* to itself that swaps O and α but keeps everything else fixed. Then P(ψ[A]) = P(A) for any measurable subset A of Ω^* (this follows from the fact that the probability of O is equal to the probability of α, both being zero), and so with respect to the standard probability measure on Ω^*, there is no probabilistic difference between O and α.

If I am right about (1) and (2), then what happens in a sufficiently large number of trials is not captured by the classical chances in the single-case situation. That classical probabilities do not capture all the information about chances is something we should already have known from cases involving conditional probabilities. For instance P({O}|{O, α}) = 1 and P({α}|{O, α}) = 0, even though O and α are on par.

One standard solution to conditional probability case is infinitesimals. Perhaps P({α}) is an infinitesimal ι but P({O}) is exactly zero. In that case, we may indeed be able to make sense of (1) and (2). But infinitesimals are not a good model on other grounds. (See Section 3 here.)

Thinking about the difficulties with infinitesimals, I get this intuition: we want to get probabilistic information about the single-case event that has a higher resolution than is given by classical real-valued probabilities but lower resolution than is given by infinitesimals. Here is a possibility. Those subsets of the outcome space that have probability zero also get attached to them a monotone-increasing function from cardinalities to the set [0, 1]. If N is such a subset, and it gets attached to it the function f_N, then f_N(κ) tells us the probability that κ independent trials will yield at least one outcome in N.

We can then argue that f_N(κ) is always 0 or 1 for infinite. Here is why. Suppose f_N(κ)>0. Then, κ must be infinite, since if κ is finite then f_N(κ)=1 − (1 − P(N))^κ = 0 as P(N)=0. But f_N(κ + κ)=(f_N(κ))², since probabilities of independent events multiply, and κ + κ = κ (assuming the Axiom of Choice), so that f_N(κ)=(f_N(κ))², which implies that f_N(κ) is zero or one. We can come up with other constraints on f_N. For instance, if C is the union of A and B, then f_C(κ) is the greater of f_A(κ) and f_B(κ).

Such an approach could help get a solution to a different problem, the problem of characterizing deterministic causation. To a first approximation, the solution would go as follows. Start with the inadequate story that deterministic causation is chancy causation with chance 1. (This is inadequate, because in the original dart-throwing case, the chance of missing the center is 1, but throwing the dart does not deterministically cause one to hit a point other than the center.) Then say that deterministic causation is chancy causation such that the failure event F is such that f_F(κ)=0 for every cardinal κ.

But maybe instead of all this, one could just deny that there are meaningful chances to be assigned to events like the event of uncountably many trials missing or hitting the center of the target.

Sketch of proof of Proposition: The product space Ω^I is the space of all functions ω from I to Ω, with the product measure P^I generated by the product measures of cylinder sets. The cylinder sets are product sets of the form A = ∏_i∈IA_i such that there is a finite J ⊆ I such that A_i = Ω for i ∉ J, and the product measure of A is defined to be ∏_i∈JP(A_i).

First I will show that there is an extension Q of P^I such that Q(H)=0 (an extension of a measure is a measure on a larger σ-algebra that agrees with the original measure on the smaller σ-algebra). Any P^I-measurable subset of H will then have Q measure zero, and hence will have P^I-measure zero since Q extends P^I.

Let Q₁ be the restriction of P to Ω − N (this is still normalized to 1 as N is a null set). Let Q₁^I be the product measure on (Ω − N)^I. Let Q be a measure on Ω defined by Q(A)=Q₁^I(A ∩ Ω^N). Consider a cylinder set A = ∏_i∈IA_i where there is a finite J ⊆ I such that A_i = Ω whenever i ∉ J. Then
Q(A)=∏_i∈JQ₁(A_i − N)=∏_i∈JP(A_i − N)=∏_i∈JP(A_i)=P^N(A).
Since P^N and Q agree on cylinder sets, by the definition of the product measure, Q is an extension of P^N.

To show that H is saturated nonmeasurable, we now only need to show that any P^I-measurable set in the complement of H must have probability zero. Let A be any P^I-measurable set in the complement of H. Then A is of the form {ω ∈ Ω^I : F(ω)}, where F(ω) is a condition involving only coordinates of ω numbered by a fixed countable set of indices from I (i.e., there is a countable subset J of I and a subset B of Ω^J such that F(ω) if and only if ω|_J is a member of B, where ω|_J is the restriction of ω to J). But no such condition can exclude the possibility that a coordinate of Ω outside that countable set is in H, unless the condition is entirely unsatisfiable, and hence no such set A lies in the complement of H, unless the set is empty. And that’s all we need to show.

Thursday, July 21, 2016

Divine aseity and light-weight Platonism

Here's a standard theistic argument against Platonism: If Platonism is true, then God is dependent on properties like divinity, goodness, omniscience and omnipotence. But God is not dependent on anything. So, Platonism is false.

I think it's worth noting that this argument only works given heavy-weight Platonism. The light- and heavy-weight Platonists agree that, at least if F is fundamental, x is F if and only if x instantiates Fness. But the heavy-weight Platonist adds the claim that if x is F, it is F because it instantiates Fness. The light-weight Platonist--van Inwagen is the most prominent example--makes no such explanatory claim.

Without the explanatory claim, the dependence argument for a conflict between Platonism and theism fails. For while it may be true on light-weight Platonism (assuming "is divine" is fundamental--something that Jon Jacobs at least will deny--or an abundant Platonism) that God is divine if and only if God instantiates divinity, we cannot conclude that God's being divine depends on God's instantiating divinity or on any other property. Indeed, the light-weight Platonist could (but does not have to) even make the opposite claim, that God instantiates divinity (or goodness, omniscience and omnipotence) because he is divine (and good, omniscient and omnipotent).

Of course, the aseity argument isn't the only reason to deny Platonism. God is the creator of everything other than himself, and that causes problems for properties, too.

Monday, February 22, 2016

Frequentism and explanation

This is really very obvious, and no doubt in the literature, but somehow hasn't occurred to me until now. Suppose that a fair coin is tossed an infinite number times. Suppose, further, than in the first hundred tosses it lands heads about half the time. It's no mystery why it lands heads about half the time in the first hundred tosses: it's because the probability of heads is 1/2 (plus properties of the binomial distribution). But suppose frequentism is true. Then the reason the probability of heads is 1/2 is that the infinite sequence has a limiting proportion of heads of 1/2. Now consider these three statements:

A: approximately half of the first 100 tosses are heads
B: the limiting proportion of heads is 1/2
C: the limiting proportion of heads starting with the 101st toss is 1/2.

Then, C is statistically independent of A, as A depends on the first 100 tosses and C depends on the other tosses. Clearly, C has no explanatory power with respect to A. But B is logically equivalent to C (the first 100 tosses disappear in the limit). How can B explain A, then?

There are some gaps in the argument--explanation is hyperintensional, for instance. But I think the argument has a lot of intuitive force.

Tuesday, November 24, 2015

Dutch Books and infinite sequences of coin tosses

Suppose we have an infinite sequence of independent and fair coins. A betting portfolio is a finite list of subsets of the space of outcomes (heads-tails sequences) together with a payoff for each subset. Assume:

Permutation: If a rational agent would be happy to pay x for a betting portfolio, and A is one of the subsets in the betting portfolio, then she would also be happy to pay x for a betting portfolio that is exactly the same but with A replaced by A*, where A* is isomorphic to A under a permutation of the coins.
Equivalence: A rational agent who is happy to pay x for one betting scenario, will be willing to accept an equivalent betting scenario---one that is certain to give the same payoff for each outcome---for the same price.
Great Deal: A rational agent will be happy to pay $1.00 for a betting scenario where she wins $1.25 as long as the outcome is not all-heads or all-tails.

Leveraging the Axiom of Choice and using the methods of the Banach-Tarski Paradox, one can then find two betting portfolios that the agent would be happy to accept for $1.00 each such that it is certain that if she accepts them both, she will win at most $1.25; hence she will accept a sure loss of at least $0.75. For details of a closely related result, see Chapter 6 in Infinity, Causation and Paradox (draft temporarily here).

So what to do? I think one should accept Causal Finitism, the doctrine that causal antecedents are always finite. Given Causal Finitism, one can't have a real betting scenario based an infinite number of coin tosses. Moreover, the only known way to operationalize the use of the Axiom of Choice in the proof in a betting scenario also involves infinite causal dependencies.

Monday, April 20, 2015

Escaping infinitely many arrows

Suppose infinitely many thin arrows are independently shot at a continuous target, with hitting points uniformly distributed over the target. How many arrows would we need to shoot to make it likely that the center of the target has been hit?

Given finitely or countably infinitely many arrows, the probability that the center will be hit is zero. But what if there are as many arrows as points in the continuum? And what if there are more?

I don't know of a good mathematical model for these questions. Standard mathematical probability is defined up to sets of measure zero, and this makes it not useful for answering questions like this. Questions like this seem to make sense, nonetheless, thereby indicating a limitation of our mathematical models. But perhaps that is a mere seeming.

Monday, October 21, 2013

Popper functions and infinite sequences of heads

Williamson gave a lovely argument that infinitesimals can't capture the probability of an infinite sequence of heads in fair independent trials: Let H_n be the event that we have heads in each of the trials n,n+1,n+2,.... Then, P(H₁)=(1/2)P(H₂). But P(H₁)=P(H₂) since they're both just the probability of getting an infinite sequence of heads. Thus, P(H₁)=(1/2)P(H₁) and so P(H₁) is zero, not infinitesimal.

It turns out that a somewhat similar result holds for Popper functions as well. For technical reasons, I need a bidirectionally infinite sequence of coin tosses, one for each integer (positive, zero or negative). Our probability space Ω of infinite sequences will then be the set of all functions from the integers Z to {H,T}. Let G be the group of transformations of Ω generated by translations defined by reflections on Z. In other words, G is generated by the transformations R_a, where a is an integer or a half-integer, and (R_as)(n)=s(2a−n) for any sequence s in Ω.

Let F be any G-invariant field in Ω that contains all the H_n. A very plausible symmetry condition on the Popper function on Ω representing the double sequence of heads then is:

For any g in G and any A and B in F, P(A|B)=P(gA|gB).

In other words, if we flip the sequences around, we don't change the probabilities. E.g., the probability of getting heads on tosses 2,3,4,5,... conditionally on B is the same as the probability of getting heads on tosses 1,0,-1,-2,... conditionally on R_1.5B. This symmetry condition is related to Williamson's symmetry assumption that P(H₂)=P(H₁).

A second obvious condition is that the probability of getting heads on tosses 1,2,3,... given that one has heads on 2,3,... is equal to the probability of getting heads on toss 1, i.e., is 1/2:

P(H₁|H₂)=1/2.

Theorem: There is no Popper function on F satisfying (1) and (2).

Wednesday, February 6, 2013

An argument for a version of the Axiom of Choice

This is an argument for the Axiom of Choice where the sets we're choosing from are all subsets of the real numbers. The argument needs the notion of really independent random processes. Real independence is not just probabilistic independence (if you're not convinced, read this). I don't know how to characterize real independence, but here is a necessary condition for it. If S is a collection of really independent processes producing outcomes, and for each s in S, U_s is a non-empty subset of the range of s (where the range of a process here is all the outcomes that it can generate) then it is metaphysically possible that each member s of S produces an outcomes in U_s. (This need not hold for merely probabilistically independent processes.)

Now, let U be a set of disjoint non-empty subsets of the real numbers R. Let N be the cardinality of U. It is surely metaphysically possible to have N really independent random processes each of which has range R. For instance, one might have a multiverse with N universes, in each of which there is a random process that produces a particle at a normally distributed point from the emitter, and the outcome of the process can be taken to be the x-coordinate of where the particle is produced.

Now, there is a one-to-one correspondence between the members of U and the random processes. If r is one of the random processes, let U_r be the member of U that corresponds to it (after fixing one such correspondence). By real independence, it is metaphysically possible that for all r, the outcome of r is in U_r. Take a world w where this is the case. In that world, the set of outcomes of our processes will contain exactly one member from each member of U, and hence will be a choice set. But what sets of real numbers there are surely does not differ between worlds (I can imagine questioning this, though). So if in w there is a choice set, there actually is a choice set.

Granted, this only gives us the Axiom of Choice for subsets of the reals. But that's enough to generate the Banach-Tarski, Hausdorff and Vitali non-measurable sets. It's paradoxical enough.