Showing posts with label symmetry. Show all posts
Showing posts with label symmetry. Show all posts

Thursday, November 13, 2025

Symmetric relations and logic

Suppose Alice and Bob are friends, and that friendship is a fundamental relation. Consider the facts expressed by these two sentences:

  1. Alice and Bob are friends.

  2. Bob and Alice are friends.

It is implausible that these are different facts. For if they were different facts, they would both be fundamental facts (otherwise, which of them would be the fundamental one?), and we would be multiplying fundamental facts beyond necessity—only one of the two is needed in the totality of fundamental facts.

Furthermore, I think that the propositions expressed by (1) and (2) are the same. Here’s one reason to think this. Imagine a three-dimensional written language where plural symmetric predicates like “are friends” are written (say, laser-inscribed inside a piece of glass) with “and are friends” on one horizontal layer, with “Alice” on a layer below the “and” and “Bob” on a layer above it. If (1) and (2) express different propositions, we would have to ask which of them is a better translation of the three-dimensional language. But surely there is no fact about that.

If this is right, then First Order Logic (FOL) fails to accurately represent propositions about fundamental relations, by having two atomic sentences, F(a,b) and F(b,a), where there is only one fundamental fact. Moreover, FOL will end up having non-trivial proofs whose conclusion expresses the same proposition as the premise, since we will presumably have an axiom like xy(F(x,y)→F(y,x)) that lets us prove F(b,a) from F(a,b). This is not the only example of this phenomenon. Take the proof that xF(x) follows from ∀yF(y), even though surely the two express the same proposition, namely that everything is F.

In particular, the logic of sentences appears to differ from the logic of propositions, since the proposition that Bob and Alice are friends follows by reiteration from the proposition that Alice and Bob are friends if they are the same proposition, but sentence (2) does not follow from sentence (1) by reiteration (nor is this true for the FOL versions).

If we think there is a One True Logic, it presumably will be a logic of propositions rather than sentences, then. But what it will be like is a difficult question, to answer which we will have to a worked out theory of when we have the same proposition and when we have different ones.

Tuesday, October 15, 2024

More on full conditional probabilities and comparative probabilities

I claim that there is no general, straightforward and satisfactory way to define a total comparative probability with the standard axioms using full conditional probabilities. By a “straightforward” way, I mean something like:

  1. A ≲ B iff P(AB|AΔB) ≤ P(BA|AΔB)

or:

  1. A ≲ B iff P(A|AB) ≤ P(B|AB).

The standard axioms of comparative probability are:

  1. Transitivity, reflexivity and totality.

  2. Non-negativity: ⌀ ≤ A for all A

  3. Additivity: If A ∪ B is disjoint from C, then A ≲ B iff A ∪ C ≲ B ∪ C.

A “straightforward” definition is one where the right-hand-side is some expression involving conditional probabilities of events definable in a boolean way in terms of A and B.

To be “satisfactory”, I mean that it satisfies some plausible assumptions, and the one that I will specifically want is:

  1. If P(A|C) < P(B|C) where A ∪ B ⊆ C, then A < B.

Definitions (1) and (2) are straightforward and satisfactory in the above-defined senses, but (1) does not satisfy transitivity while (2) does not satisfy the right-to-left direction of additivity.

Here is a proof of my claim. If the definition is straightforward, then if A ≲ B, and A′ and B are events such that there is a boolean algebra isomorphism ψ from the algebra of events generated by A and B to the algebra of events generated by A′ and B such that ψ(A) = A, ψ(B) = B and P(C|D) = P(ψ(C)|ψ(D)) for all C and D in the algebra generated by A and B, then A′ ≲ B.

Now consider a full conditional probability P on the interval [0,1] such that P(A|[0,1]) is equal to the Lebesgue measure of A when A is an interval. Let A = (0,1/4) and suppose B is either (1/4,1/2) or (1/4, 1/2]. Then there is an isomorphism ψ from the algebra generated by A and B to the same algebra that swaps A and B around and preserves all conditional probabilities. For the algebra consists of the eight possible unions of sets taken from among A, B and [0,1] − (AB), and it is easy to define a natural map between these eight sets that swaps A and B, and this will preserve all conditional probabilities. It follows from my definition of straightforwardness that we have A ≲ B if and only if we have B ≲ B. Since the totality axiom for comparative probabilities implies that either A ≲ B or B ≲ A, so we must have both A ≲ B and B ≲ A. Thus A ∼ B. Since this is true for both choices of B, we have

  1. (0,1/4) ∼ (1/4,1/2) ∼ (1/4, 1/2].

But now note that ⌀ < {1/2} by (3) (just let A = ⌀, B = {1/2} and C = {1/2}). The additivity axiom then implies that (1/4,1/2) < (1/4, 1/2], a contradiction.

I think that if we want to define a probability comparison in terms of conditional probabilities, what we need to do is to weaken the axioms of comparative probabilities. My current best suggestion is to replace Additivity with this pair of axioms:

  1. One-Sided Additivity: If A ∪ B is disjoint from C and A ≲ B, then A ∪ C ≲ B ∪ C.

  2. Weak Parthood Principle: If A and B are disjoint, then A < A ∪ B or B < A ∪ B.

Definition (2) satisfies the axioms of comparable probabilities with this replacement.

Here is something else going for this. In this paper, I studied the possibility of defining non-classical probabilities (full conditional, hyperreal or comparative) that are invariant under a group G of transformations. Theorem 1 in the paper characterizes when there are full conditional probabilities that are strongly invariant. Interesting, we can now extend Theorem 1 to include this additional clause:

  1. There is a transitive, reflexive and total relation satisfying (4), (8) and (9) as well as the regularity assumption that ⌀ < A whenever A is non-empty and that is invariant under G in the sense that gA ∼ A whenever both A and gA are subsets of Ω.

To see this, note that if there is are strongly invariant full conditional probabilities, then (2) will define in a way that satisfies (vi). For the converse, suppose (vi) is true. We show that condition (ii) of the original theorem is true, namely that there is no nonempty paradoxical subset. For to obtain a contradiction suppose there is a non-empty paradoxical subset E. Then E can be written as the disjoint union of A1, ..., An, and there are g1, ..., gn in G and 1 ≤ m < n such that g1A1, ..., gmAm and gm + 1Am + 1, ..., gnAn are each a partition of E.

A standard result for additive comparative probabilities in Krantz et al.’s measurement book is that if B1, ..., Bn are disjoint, and C1, ..., Cn are disjoint, with Bi ≲ Ci for all i, then B1 ∪ ... ∪ Bn ≲ C1 ∪ ... ∪ Cn. One can check that the proof only uses One-Sided Additivity, so it holds in our case. It follows from G-invariance that A1 ∪ ... ∪ Am ∼ E ∼ Am + 1 ∪ ... ∪ An. Since E is the disjoint union of A1 ∪ ... ∪ Am with Am + 1 ∪ ... ∪ An, this violates the Weak Parthood Principle.

Thursday, August 29, 2024

Three invariance arguments

Suppose we have two infinite collections of items Ln and Rn indexed by integers n, and suppose we have a total preorder ≤ on all the items. Suppose further the following conditions hold for all n, m and k:

  1. Ln > Ln − 1

  2. Rn > Rn + 1

  3. If Ln ≤ Rm, then Ln + k ≤ Rm + k.

Theorem: It follows that either Ln > Rm for all n and m, or Rn > Lm for all n and m.

(I prove this in a special case here, but the proof works for the general case.)

Here are three interesting applications. First, suppose that an integer X is fairly chosen. Let Ln be the event that X ≤ n and let Rn be the event that X ≥ n. Let our preorder be comparison of the probabilities of events: A ≤ B means that A is no less likely than B. Intuitively, it is less likely that X is less than n − 1 than that it is less than n, so we have (1), and similar reasoning gives (2). Claim (3) says that the relationship between Ln and Rm is the same as that between Ln + k ≤ Rm + k and that seems right, too.

So all the conditions seem satisfied, but the conclusion of the Theorem seems wrong. It just doesn’t seem right to think that all the left-ward events (X being less than or equal to something) are more likely than all the right-ward events (X being bigger than or equal to something), nor that it be the other way around.

I am inclined to conclude that countable infinite fair lotteries are impossible.

Second application. Suppose that for each integer n, a coin is tossed. Let Ln be the event that all the coins ..., n − 2, n − 1, n are heads. Let Rn be the event that all the coins n, n + 1, n + 2, ... are heads. Let ≤ compare probabilities in reverse: bigger is less likely. Again, the conditions (1)–(3) all sound right: it is less likely that ..., n − 2, n − 1, n are heads than that ..., n − 2, n − 1 are heads, and similarly for the right-ward events. But the conclusion of the theorem is clearly wrong here. The rightward all-heads events aren’t all more likely, nor all less likely, than the leftward ones.

I am inclined to conclude that all the Ln and Rn have equal probability (namely zero).

Third application. Supppose that there is an infinite line of people, all morally on par, standing on numbered positions one meter apart, with their lives endangered in the same way. Let Ln be the action of saving the lives of the people at positions ...., n − 2, n − 1, n and let Rn be the action of saving the lives of the people at positions n, n + 1, n + 2, .... Let ≤ measure moral worseness: A ≤ B means that B is at least as bad as A. Then intuitively we have (1) and (2): it is worse to save fewer people. Moreover, (3) is a plausible symmetry condition: if saving one group of people beats saving another group of people, shifting both groups by the same amount doesn’t change that comparison. But again the conclusion of the theorem is clearly wrong.

I am less clear on what to say. I think I want to deny the totality of ≤, allowing for cases of incommensurability of actions. In particular, I suspect that Ln and Rm will always be incommensurable.

Wednesday, February 28, 2024

More on benefiting infinitely many people

Once again let’s suppose that there are infinitely people on a line infinite in both directions, one meter apart, on positions numbered in meters. Suppose all the people are on par. Fix some benefit (e.g., saving a life or giving a cookie). Let Ln be the action of giving the benefit to all the people to the left of position n. Let Rn be the action of giving the benefit to all the people to the right of position n.

Write A ≤ B to mean that action B is at least as good as action A, and write A < B to mean that A ≤ B but not B ≤ A. If neither A ≤ B nor B ≤ A, then we say that A and B are noncomparable.

Consider these three conditions:

  • Transitivity: If A ≤ B and B ≤ C, then A ≤ C for any actions A, B and C from among the {Lk} and the {Rk}.

  • Strict monotonicity: Ln < Ln + 1 and Rn > Rn + 1 for all n.

  • Weak translation invariance: If Ln ≤ Rm, then Ln + k ≤ Rm + k and if Ln ≥ Rm, then Ln + k ≥ Rm + k, for any n, m and k.

Theorem: If we have transitivity, strict monotonicity and weak translation invariance, then exactly one of the following three statements is true:

  1. For all m and n, Lm and Rn are incomparable

  2. For all m and n, Lm < Rn

  3. For all m and n, Lm > Rn.

In other words, if any of the left-benefit actions is comparable with any of the right-benefit actions, there is an overwhelming moral skew whereby either all the left-benefit actions beat all the right-benefit actions or all the right-benefit actions beat all the left-benefit actions.

Proposition 1 in this paper is a special case of the above theorem, but the proof of the theorem proceeds in basically the same way. For a reductio, assume that (i) is false. Then either Lm ≥ Rn or Lm ≤ Rn for some m and n. First suppose that Lm ≥ Rn. Then the second and third paragraphs of the proof of Proposition 1 show that (iii) holds. Now suppose that Lm ≤ Rn. Let Lk* = Rk and Rk* = Lk. Say that A*B iff A* ≤ B*. Then transitivity, strict monotonicity and weak translation invariance hold for ≤*. Moreover, we have Lm ≤ Rn, so Rm*Ln. Applying the previous case with  − m and  − n in place of n and m respectively we conclude that we always have Lj>*Rk and hence that we always have Lj < Rk, i.e., (ii).

I suppose the most reasonable conclusion is that there is complete incomparability between the left- and right-benefit actions. But this seems implausible, too.

Again, I think the big conclusion is that human ethics has limits of applicability.

I hasten to add this. One might reasonably think—Ian suggested this in a recent comment—that decisions about benefiting or harming infinitely many people (at once) do not come up for humans. Well, that’s a little quick. To vary the Pascal’s Mugger situation, suppose a strange guy comes up to you on the street, and tells you that there are infinitely many people in a line drowning in a parallel universe, and asks you if you want him to save all the ones to the left of position 123 or all the ones to the right of position  − 11, because he can magically do either one, and nothing else, and he needs help in his moral dilemma. You are, of course, very dubious of what he is saying. Your credence that he is telling the truth is very, very small. But as any good Bayesian will tell you, it shouldn’t be zero. And now the decision you need to make is a real one.

Thursday, August 18, 2022

Non-uniqueness of "uniform" full conditional probabilities

Consider a fair spinner that uniformly chooses an angle between 0 and 360. Intuitively, I’ve just fully described a probabilistic situation. In classical probability theory, there is indeed a very natural model of this: Lebesgue probability measure on the unit circle. This model’s probability measure can be proved to be the unique function λ on the subsets of the unit circle that satisfies these conditions:

  1. Kolmogorov axioms with countable additivity

  2. completeness: if λ(B) is zero and A ⊆ B, then λ is defined for A

  3. rotational invariance

  4. at least one arc on the circle of length greater than zero and less than 360 has an assigned probability

  5. minimality: any other function that satisfies 1-4 agrees with λ on the sets where λ is defined.

In that sense “uniformly chooses” can be given a precise and unique meaning.

But we may be philosophically unhappy with λ as our probabilistic model of the spinner for one of two reasons. First, but less importantly, we may want to have meaningful probabilities for all subsets of the unit circle, while λ famously has “non-measurable sets” where it is not defined. Second, we may want to do justice to such intuitions as that it is more likely that the spinner will land exactly at 0 or 180 than that it will land exactly at 0. But λ as applied to any finite (in fact, any countable) set of positions yields zero: there is no chance of the spinner landing there. Moreover, we want to be able to update our probabilities on learning, say, that the spinner landed on 0 or 180—presumably, after learning that disjunction, we want 0 and 180 to have probability 1/2—but λ provides no guidance how to do that.

One way to solve this is to move to probabilities whose values are in some field extending the reals, say the hyperreals. Then we can assign a non-zero (but in some cases infinitesimal) probability to every subset of the circle. But this comes with two serious costs. First, we lose rotational invariance: it is easy to prove that we cannot have rotational invariance in such a context. Second, we lose uniqueness: there are many ways of assigning non-zero probabilities, and we know of no plausible set of conditions that makes the assignment unique. Both costs put in serious question whether we have captured the notion of “uniform distribution”, because uniformity sure sounds like it should involve rotational invariance and be the kind of property that should uniquely determine the probability model given some plausible assumptions like (1)–(5).

There is another approach for which one might have hope: use Popper functions, i.e., take conditional probabilities to be primitive. It follows from results of Armstrong and the supramenability of the group of rotations on the circle that there is a rotation-invariant (and, if we like, rotation and reflection invariant) finitely-additive full conditional probability on the circle, which assigns a meaningful real number to P(A|B) for any subsets A and B with B non-empty. Moreover, if Ω is the whole circle, then we can further require that P(A|Ω) = λ(A) if λ(A) is defined. And now we can compare the probability of two points and the probability of one point. For although P({x,y}|Ω) = λ({x,y}) = 0 = λ({x}) = P({x}|Ω) when x ≠ y, there is a natural sense in which {x, y} is more likely than {x} because P({x}|{x,y}) = 1/2.

Unfortunately, the conditional probability approach still doesn’t have uniqueness, and this is the point of this post. Let’s say that what we require of our conditional probability assignment P is this:

  1. standard axioms of finitely-additive full conditional probabilities

  2. (strong) rotational and reflection invariance

  3. being defined for all pairs of subsets of the circle with the second one non-empty

  4. P(A|Ω) = λ(A) for any Lebesgue-measurable A.

Unfortunately, these conditions fail to uniquely define P. In fact, they fail to uniquely define P(A|B) for countably infinite B.

Here’s why. Let E be a countably infinite subset of the circle with the following property: for any non-identity isometry ρ of the circle (combination of rotations and reflections), E ∩ ρE is finite. (One way to generate E is this. Let E0 be any singleton. Given En, let Gn be the set of isometries ρ such that ρx = y for some x, y in E. Then Gn is finite. Let z be any point not in {ρx : ρ ∈ Gn, x ∈ E}. Let En + 1 = En ∪ {z} (since z is not unique, we’re using the Axiom of Dependent Choice, but a lot of other stuff depends on stronger versions of Choice anyway). Let E be the union of the En. Then it’s easy to see that E ∩ ρE contains at most one point for any non-identity isometry ρ.)

Let μ be any finitely additive probability on E that assigns zero to finite subsets. Note that μ is not unique: there are many such μ. Now define a finitely additive measure ν on Ω as follows. If A is uncountable, let ν(A) = ∞. Otherwise, let ν(A) = ∑ρμ(EρA), where the sum is taken over all isometries ρ. The condition that E ∩ ρE is finite for non-identity ρ and that μ is zero for finite sets ensures that if A ⊆ E, then ν(A) = μ(A). It is clear that ν is isometrically invariant.

Let λ* be any invariant extension of Lebesgue measure to a finitely additive measure on all subsets of the circle. By Armstrong’s results (most relevantly Proposition 1.7), there is a full conditional probability P satisfying (6)–(8) and such that P(A|E) = μ(AE) and P(A|Ω) = λ*(A) (here we use the fact that ν(A) = ∞ whenever λ*(A) > 0, since λ*(A) > 0 only for uncountable A). Since μ wasn’t unique and E is countable, conditions (6)–(9) fail to uniquely define P for countably additive conditions.

Friday, February 18, 2022

How not to value wagers

Given the Axiom of Choice, there is a rotationally invariant finitely additive probability measure defined for all subsets of a circle. We can use such a finitely probability measure to define an expected value Ef or integral of a bounded function f on the circle, and we might want to have a decision theory based on this expected value. Given a wager that pays f(z) at a uniformly randomly chosen location z on the circle, we are indifferent to buying the wager at price Ef, we must accept the wager at lower prices, and we must reject it at higher prices.

This procedure, however, leads to the following interesting thing: There will be bounded wagers that pay more than y no matter what, but where one is indifferent with respect to buying the wager at price y. To see this, let x be an irrational number, and as in my previous post, let u be a bounded function on the circle such that u(ρz) > u(z) for all z where ρ is rotation by x degrees. Then let f(z) = u(ρz) − u(z). Because of the additivity of integrals with respect to finitely additive measures and rotational invariance, we have Ef = ∫f(ρz)dP(z) − ∫f(z)dP(z) = 0. But f(z) > 0 for all z. So the decision theory tells us to be indifferent to the game where you get payoff f(z) at z when the game is offered for free, even though no matter what the outcome of the game, you will received a strictly positive amount.

More generally, given the Axiom of Choice, there is no finitely-additive rotationally-invariant expected value assignment for bounded utilities that respects the principle that any gamble that is sure to pay more than y ought to be accepted at price y.

Wednesday, February 16, 2022

Domination and uniform spinners

About a decade ago, I offered a counterexample to the following domination principle:

  1. Given two wagers A and B, if in every state B is at least as good as A and in at least one state B is better than A, then one should choose B over A.

But perhaps (1) is not so compelling anyway. For it might be that it’s reasonable to completely ignore zero probability outcomes. If a uniform spinner is spun, and on A you get a dollar as long as the spinner doesn’t land at 90 and on B you get a dollar no matter what, then (1) requires you to go for B, but it doesn’t seem crazy to say “It’s almost surely not going to land at 90, so I’ll be indifferent between A and B.”

But now consider the following domination principle:

  1. Given two wagers A and B, if in every state B is better than A, then one should choose B over A.

This seems way more reasonable. But here is a potential counterexaple. Consider a spinner which uniformly selects a point on the circumference of a circle. Assume x is any irrational number. Consider a function u such that u(z) is a real number for any z on the circumference of the circle. Imagine two wagers:

  • A: After the spinner is spun and lands at z, you get u(z) units of utility

  • B: After the spinner is spun, the spinner is moved exactly x degrees counterclockwise to yield a new landing point z′, and you get u(z′) units of utility.

Intuitively, it seems absurd to think that B could be preferable to A. But it turns out that given the Axiom of Choice, we can define a function u such that:

  1. For any z on the circumference of the circle, if z is the result of rotating z by x degrees counterclockwise around the circle, then u(z′) > u(z).

And then if we take the states to be the initial landing points of the spinner, B always pays strictly better than A, and so by the domination principle (2), we should (seemingly absurdly) choose B.

Remarks:

  • The proof of the existence of u requires the Axiom of Choice for collections of countable sets of reals). In my Infinity book, I argued that this version of the Axiom of Choice is true. However, arguments similar to those in the book’s Axiom of Choice chapter suggest that the causal finitist has a good way out of the paradox by denying the implementability of the function u.

  • Some people don’t like unbounded utilities. But we can make sure that u is bounded if we want (if the original function u is not bounded, then replace u(z) by arctan u(z)).

  • Of course the function u is Lebesgue non-measurable. To see this, replacing u by its arctangent if necessary, we may assume u is bounded. If u were measurable and bounded, it would be integrable, and its Lebesgue integral around the circle would be rotation invariant, which is impossible given (3).

It remains to prove the existence of u. Let be the relation for points on the (circumference of the circle) defined by z ∼ z if the angle between z and z is an integer multiple of x degrees. This is an equivalence relation, and hence it partitions the circle into equivalence classes. Let A be a choice set that contains exactly one element from each of the equivalence classes. For any z on the circle, let z0 be the point in A such that z0 ∼ z. Let u(z) be the (unique!) integer n such that rotating z0 counterclockwise around the circle by an angle of nx degrees yields z. Then for any z, if z is the result of rotating z by x degrees around the circle, then u(z′) = u(z) + 1 > u(z) and so we have (3).

Thursday, October 28, 2021

Symmetric qualitative (and other) probabilities

I recently worked out the precise conditions under which one can have Popper functions, hyperreal probabilities or qualitative probabilities that are invariant under some group of symmetries and are regular in the sense that they assign a bigger probability to non-empty sets than to the empty set.

But what if we don’t require regularity? Then the following is mainly a matter of putting together known theorems:

Proposition. Suppose G is a group acting on set Ω* ⊇ Ω where Ω is non-empty. Then the following are equivalent:

  1. There is a finitely additive G-invariant real-valued probability measure on the powerset of Ω

  2. There is a finitely additive G-invariant hyperreal probability measure on the powerset of Ω

  3. There is a finitely additive approximately G-invariant hyperreal probability measure on the powerset of Ω

  4. There is a strongly G-invariant total qualitative probability ⪅ on the powerset of Ω such that ⌀ < Ω

  5. There is a strongly G-invariant partial qualitative probability ⪅ on the powerset of Ω such that ⌀ < Ω

  6. The set Ω is not G-paradoxical.

The definitions are in the paper I linked to at the top, except that approximate G-invariance only requires that P(A)−P(gA) be infinitesimal rather than requiring that it be zero.

Proof: Trivially, (a) implies (b) which implies (c). The standard part of a finitely additive approximately G-invariant hyperreal probability measure will be a finitely additive G-invariant real-valued probability measure, so (c) implies (a). Thus, (a)–(c) are equivalent.

Condition (a) implies condition (d): just define A ⪅ B iff P(A)≤P(B) where P is the measure in (a). And (d) implies (e) trivially.

Now we show that not-(f) implies not-(e). Suppose Ω is G-paradoxical, so Ω has disjoint subsets A and B with partitions A1, ..., Am and B1, ..., Bn respectively, and there are elements g1, ..., gm and h1, ..., hn of G such that g1A1, ..., gmAm and h1B1, ..., hnBn are each a partition of Ω. Then by a standard result on qualitative probabilities (use the proof of Krantz, et al., Lemma 5.3.1.2):

  1. A = A1 ∪ ... ∪ Am ≈ g1A1 ∪ ... ∪ gmAm = Ω

  2. B = B1 ∪ ... ∪ Bn ≈ h1B1 ∪ ... ∪ hnBn = Ω.

Since ⌀ < Ω, we have ⌀ < A by (1). By the proof of Corollary 5.3.1.2 in Krantz, et al., we have B < Ω iff ⌀ < Ω − B. But A ⊆ Ω − B, and ⌀ < A, so indeed we must have B < Ω, which contradicts (2).

Finally, Tarski’s Theorem says that (f) implies (a). □

Note 1: The two results from Krantz et al. are given for total qualitative probabilities, but the proofs do not use totality. (In the linked paper, I didn’t notice that Krantz et al. are working with total qualitative probabilities, but fortunately all works out.)

Note 2: There is a pleasing direct construction of a partial qualitative probability satisfying (d). For each A ⊆ Ω, let [A] be the corresponding member of the equidecomposability type semigroup. Then define A ⪅ B providing there is a c in the semigroup such that [A]+c ≤ [B]+c. It turns out that the condition ⌀ < Ω is then equivalent to 2[Ω]≠[Ω], i.e., is equivalent to the non-paradoxicality of Ω under G.

Saturday, April 17, 2021

Regular Hyperreal and Qualitative Probabilities Invariant Under Symmetries

I just noticed that my talk "Regular Hyperreal and Qualitative Probabilities Invariant Under Symmetries" is up on YouTube. And the paper that this is based on (preprint here) has  just been accepted by Synthese.



Thursday, October 22, 2020

Preprint: Conditional, Regular Hyperreal and Regular Qualitative Probabilities Invariant Under Symmetries

Abstract: Classical countably additive real-valued probabilities come at a philosophical cost: in many infinite situations, they assign the same probability value---namely, zero---to cases that are impossible as well as to cases that are possible. There are three non-classical approaches to probability that can avoid this drawback: full conditional probabilities, qualitative probabilities and hyperreal probabilities. These approaches have been criticized for failing to preserve intuitive symmetries that can easily be preserved by the classical probability framework, but there has not been a systematic study of the conditions under which these symmetries can and cannot be preserved. This paper fills that gap by giving complete characterizations under which symmetries understood in a certain "strong" way can be preserved by these non-classical probabilities, as well as by offering some results to make it plausible that the strong notion of symmetry here may be the right one. Philosophical implications are briefly discussed, but the main purpose of the paper is to offer technical results to inform more sophisticated further philosophical discussion.

Preprint here.

Wednesday, October 7, 2020

Weak invariance of full conditional probabilities

In two papers (here and here), I explored two different concepts of symmetry for conditional probabilities. The concept of strong invariance says that P(gA|B)=P(A|B) for a symmetry g as long as A and gA are subsets of B. The concept of weak invariance says that P(gA|gB)=P(A|B) for a symmetry g. In some special cases, the weak concept implies the strong concept.

Anyway, here’s an interesting thing: the weak concept does not capture our symmetry intuitions. Take perhaps the simplest case, a lottery on the set of integers Z, and say that the symmetries are shifts. It turns out that there is a weakly shift-invariant full conditional probability P such that:

  1. P({m}|{m, n}) = P({n}|{m, n}) (singleton fairness)

  2. P(A|A ∪ B)=0 and P(B|A ∪ B)=1 whenever B has infinitely many positive integers and A has finitely many positive integers.

Condition (2) implies that it is more likely that the winning ticket is a power of two than that that is a negative integer. So weak shift invariance is very far from strong invariance.

(And in fact one can have strong invariance for the lottery on Z if one wants. One can even have have strong invariance under shifts and reflections if one wants.)

The proof is a modification of West's proof of a result for qualitative probabilities.

Tuesday, September 8, 2020

The fairness of infinite lotteries and qualitative probabilities

Suppose that we wish to model an infinite fair lottery with tickets numbered by integers by means of qualitative probabilities, i.e., a reflexive and transitive relation ≲ between sets of tickets that satisfies the non-negativity constraint that ∅ ≲ A for all A and the additivity constraint that A ≲ B iff A − B ≲ B − A. Suppose, further, that we want to have the regularity constraint that ∅ < A if A is not empty.

At this point, we want to ask what “fairness” is. One proposal is that fairness is strong translation invariance: if A is a set of integers and n + A is the set {n + m : m ∈ A} of all the members of A shifted over by m, then A and n + A are equally probable. Unfortunately, if we require strong translation invariance, then we violate the regularity constraint, since we will have to assign the same probability to the winning ticket being in {1, 2, 3, ...} as to the winning ticket being in {2, ...}, which (given additivity) violates the constraint that ∅ < {1}.

One possible option that I’ve been thinking about is is to require weak translation invariance. Weak translation invariance says that A ≲ B iff n + A ≲ n + B. Thus, a set might not have the same probability as a shift of itself, but comparisons between sets are not changed by shifts. I’ve spent a good chunk of the last week or two trying to figure out whether (given the other constraints) it is coherent to require weak translation invariance. Last night, Harry West gave an elegant affirmative proof on MathOverflow. So, yes, one can require weak translation invariance.

However, weak translation invariance does not capture the concept of fairness. Here is one reason why.

Say that a set B of integers is right-to-left (RTL) bigger than a set A of integers provided that there is an integer n such that:

  1. n ∈ B but not n ∈ A, and

  2. for every m > n, if m ∈ A, then m ∈ B.

RTL comparison of sets of integers thus always favors sets with larger integers. Thus, the set {2, 3} is RTL bigger than the infinite set {..., − 3, −2, −1, 0, 1, 3}, because the former set has 2 in it while the latter does not.

It looks to me that West’s proof straightforwardly adapts to show that that there is a weakly translation invariant qualitative probability that coheres with RTL ordering: if B is RTL bigger than A, then B is strictly more likely than A. But a probability comparison that coheres with RTL ordering is about as far from fairness as we can imagine: a bigger ticket number is always more likely than a smaller one, and indeed each ticket number is more likely to be the winner than the disjunction of all the smaller ticket numbers!

So, weak translation invariance doesn’t capture the concept of fairness.

Here is a natural suggestion. Let’s add to weak translation invariance the following constraint: any two tickets are equally likely.

I think—but here I need to check more details—that a variant of West’s proof again shows that this won’t do. Say that a set B of integers is right-skewed (RS) at least as big as a set A of integers provided that one or more of the following holds:

  1. A is finite and B has at least as many members than A, or

  2. B has infinitely many positive integers and A does not, or

  3. A is a subset of B.

Intuitively, a probability ordering that coheres with RS ordering fails to be fair, because, for instance, it makes it more likely that the winning ticket will be, say, a power of two than that it be a negative number. But at the same time, a probability ordering that coheres with RS ordering makes all individual tickets be equally likely by (1).

To make this work with West’s proof, replace his C0 with the set of bounded functions that have a well-defined and non-negative sum or whose positive part has an infinite sum.

Monday, August 24, 2020

Invariance under independently chosen random transformations

Often, a probabilistic situation is invariant under some set of transformations, in the sense that the complete probabilistic facts about the situation are unchanged by the transformation. For instance, in my previous post I suggested that a sequence of fair coin flips should be invariant under the transformation of giving a pre-specified subset of the coins an extra turn-over at the end and I proved that we can have this invariance in a hyperreal model of the situation.

Now, a very plausible thesis is this:

Randomized Invariance: If a probabilistic situation S is invariant under each member of some set T of transformations, then it is also invariant under the process where one chooses a random member of T independently of S and applies that member to S.

For instance, in the coin flip case, I could choose a random reversing transformation as follows: I line up (physically or mentally) the infinite set of coins with an independent second infinite set of coins, flip the second set of coins, and wherever that flip results in heads, I reverse the corresponding coin in the first set.

By Randomized Invariance, doing this should not change any of the probabilities. But insisting on this case of Randomized Invariance forces us to abandon the idea that we should assign such things as an infinite sequence of heads a non-zero but infinitesimal probability. Here is why. Consider a countably infinite sequence of fair coins arranged equidistantly in a line going to the left and to the right. Fix a point r midway between two successive coins. Now, use the coins to the left of r to define the random reversing transformation for the coins to the right of r: if after all the coins are flipped, the nth coin to the left of r is heads, then I give an extra turn-over to the nth coin to the right of r.

According to Randomized Invariance, the probability that all the coins to the right of r will be tails after the random reversing transformations will be the same as the probability that they were all tails before it. Let p be that probability. Observe that after the transformations, the coins to the right of r are all tails if and only if before the transformations the nth coin to the right and the nth coin to the left showed the same thing (for we only get tails on the nth coin on the right at the end if we had tails there at the beginning and the nth coin on the left was tails, or if we had heads there at the beginning, but the heads on the nth coin to the left forced us to reverse it). Hence, p is also the probability that the corresponding coins to the left and right of r showed the same thing before the transformation.

Thus, we have shown that the probability that all the paired coins on the left and right equidistant to r are the same (i.e., we have a palindrome centered at r) is the same as the probability that we have only tails to the right of r. Now, apply the exact same argument with “right” and “left” reversed. We conclude that the probability that the coins on the right and left equidistant to r are always the same is the same as the probability that we have only tails to the left of r. Hence, the probability of all-tails to the left of r is the same as the probability of all-tails to the right of r.

And this argument does not depend on the choice of the midpoint r between two coins. But as we move r one coin to the right, the probability of all-tails to the right of r is multiplied by two (there is one less coin that needs to be tails) and the probability of all-tails to the left of r is multiplied by a half. And yet these numbers have to be equal as well by the above argument. Thus, 2p = p/2. The only way this can be true is if p = 0.

Therefore, Randomized Invariance, plus the thesis that all the non-random reversing transformations leave unchanged the probabilistic situation (a thesis made plausible by the fact that even with infinitesimal probabilities, we provably can have a model of the probabilities that is invariant under these transformation), shows that we must assign probability zero to all-tails, and infinitesimal probabilities are mistaken.

This is, of course, a highly convoluted version of Timothy Williamson’s coin toss argument. The reason for the added complexity is to avoid any use of shift-based transformations that may be thought to beg the question against advocates of non-Archimedean probabilities. Instead, we simply use randomized reversal symmetry.

Hyperreal modeling of infinitely many coin flips

A lot of my work in philosophy of probability theory has been devoted to showing that one cannot use technical means to get rid of certain paradoxes of infinite situations. As such, most of the work has been negative. But here is a positive result. (Though admittedly it was arrived at in the service of a negative result which I hope to give in a future post.)

Consider the case of a (finite or infinite, countable or not) sequence of independent fair coin flips. Here is an invariance feature we would like to have for our coin flips. Suppose that ahead of time, I designate a (finite or infinite) set of locations in the infinite sequence. You then generate the sequence of independent fair coin flips, and I go through my pre-designated set of locations, and turn over each of the coins corresponding to that location. (For instance, if you will make a sequence of four coin flips, and I predesignate the locations 1 and 3, and you get HTTH, then after my extra flipping set the sequence of coin flips becomes TTHH: I turned over the first and third coins.) The invariance feature we want is that no matter what set of locations I predesignate, it won’t affect the probabilistic facts about the sequence of independent fair coin flips.

This invariance feature is clearly present in finite cases. It is also present if “probabilistic facts” are understood according to classical countably-additive real-valued probability theory. But what if we have infinitely many coins, and we want to be able to do things like comparing the probability of all the coins being heads to all the even-numbered coins being heads, and say that the latter is more likely than the former, with both probabilities being infinitesimal? Can we still have our reversal-invariance property for all predesignated sets of locations?

There are analogous questions for other probabilistic situations. For instance, for a spinner, the analogous property is adding an extra predesignated rotation to the spinner once the spinner stops, and it is well-known that one cannot have such invariance in a context that gives us “enough” infinitesimal probabilities (e.g., see here for a strong and simple result).

But the answer is positive for the coin flip case: there is a hyperreal-valued probability defined for all subsets of the set of sequences (with fixed index set) of heads and tails that has the reversal-invariance property for every set of locations.

This follows from the following theorem.

Theorem: Assume the Axiom of Choice. Let G be a locally finite group (i.e., every finite subset generates a finite subgroup) and suppose that G acts on some set X. Then there is a hyperreal finitely additive probability measure P defined for all subsets of X such that P(gA)=P(A) for every A ⊆ X and g ∈ G and P(A)>0 for all non-empty A.

To apply this theorem to the coin-flip case, let G be the abelian group whose elements are sets of locations with the exclusive-or operation (i.e., A ⊕ B = (A − B)∪(B − A) is the set of all locations that are in exactly one of A and B). The identity is the empty set, and every element has order two (i.e., A ⊕ A = ∅). But for abelian groups, the condition that every finite subset generates a finite subgroup is equivalent to the condition that every element has finite order (i.e., some finite multiple of it is zero).

Mathematical notes: The subgroup condition on G in the Theorem entails that every element of G has finite order, but is stronger than that in the non-abelian case (due to the non-trivial fact that there are infinite finitely generated torsion groups). In the special case where X = G, the condition that every element of G have finite order is necessary for the theorem. For if g has infinite order, let A = {gn : n ≥ 0}, and note that gA is a proper subset of A, so the condition that non-empty sets get non-zero measure and finite additivity would imply that P(gA)<P(A), which would violate invariance. It is an interesting question whether the condition that every finite subset generates a finite subgroup is also necessary for the Theorem if X = G.

Proof of Theorem: Let F be the partially ordered set whose elements are pairs (H, V) where H is a finite subgroup of G and V is a finite algebra of subsets of X closed under the action of H, with the partial ordering (H1, V1)≼(H2, V2) if and only if H1 ⊆ H2 and V1 ⊆ V2.

Given (H, V) in F, let BV be the basis of V, i.e., a subset of pairwise disjoint non-empty elements of V such that every element of V is a union of (finitely many) elements of BV. For A ∈ BV and g ∈ H, note that gA is a member of V since V is closed under the action of H. Thus, gA = B1 ∪ ... ∪ Bn for distinct elements B1, ..., Bn in BV. I claim that n = 1. For suppose n ≥ 2. Then g−1B1 ⊆ A and g−1B2 ⊆ A, and yet both g−1B1 and g−1B2 are members of V by H-closure. But since A is a basis element it follows that g−1B1 = A = g−1B2, and hence B1 = B2, a contradiction. Thus, n = 1 and hence gA ∈ BV. Moreover, if gA = gB then A = B, so each member g of H induces a bijection of BV onto itself.

Now let P(H, V) be the probability measure on V that assigns equal probability to each member of BV. Since each member of H induces a bijection of BV onto itself, it’s easy to see that P(H, V) is an H-invariant probability measure on V. And, for convenience, if A ∉ V, write P(H, V)(A)=0.

Let F* = {{B ∈ F : A ≼ B}:A ∈ F}. This is a nonempty set with the finite intersection property (it is here that we will use the fact that every finite subset of G generates a finite subgroup). Hence it can be extended to an ultrafilter U. This ultrafilter will be fine: {B ∈ F : A ≼ B}∈U for every A ∈ F. Let *R be the ultraproduct of the reals R over F with respect to U, i.e., the set of functions from F to R modulo U-equivalence. Given a subset A of X, let P(A) be the equivalence class of (H, V)↦P(H, V)(A).

It is now easy to verify that P has all the requisite properties of a finitely-additive hyperreal probability that is invariant under G and assigns non-zero probability to every non-empty set.

Friday, July 17, 2020

Symmetry, regularity and qualitative probability

Let ⪅ be a qualitative probability comparison for some collection F of subsets of a space Ω. Say that A ≈ B iff A ⪅ B and B ⪅ A, and that A < B provided that A ⪅ B but not B ⪅ A. Minimally suppose that ⪅ is a partial preorder (i.e., transitive and reflexive). Say it’s total provided that for all A and B either A ⪅ B or B ⪅ A. Suppose that G is a group of symmetries acting on Ω, and that F is G-invariant in the sense that gA ∈ F for all g ∈ G. Then we can define:

  1. ⪅ is strongly G-invariant provided that for all A in G and all g in G we have A ≈ gA, and

  2. ⪅ is weakly G-invariant provided that for all A and B in G and all g in G we have A ⪅ B iff gA ⪅ gB.

There is some reason to be suspicious of strong G-invariance. For in some interesting cases, say where Ω is a circle that G is the set of all rotations, there will be cases where gA is a proper subset of A, and by regularity we would expect to have gA < A rather than gA ≈ A. But weak G-invariance seems harder to question.

Say that g ∈ G is of order n provided that gn = e, where e is identity. However, we also have:

Lemma 1. If ⪅ is total, g is of order 2, and A ⪅ B implies gA ⪅ gB for all A and B, then A ≈ gA for all A.

Proof: Since ⪅ is a total order, either A ⪅ gA or gA ⪅ A. Suppose A ⪅ gA. Then gA ⪅ g2A. But g2A = A. Hence gA ≈ A. Similarly, if gA ⪅ A, then g2A ≈ gA. But g2A = A, so gA ≈ A.

So, we have:

Proposition 1. If ⪅ is total and G is a group generated by elements of order 2, then weak G-invariance entails strong G-invariance.

Say that ⪅ is strongly regular provided that if A is a proper subset of B, then A < B. Weak regularity would say that if B is non-empty then ∅ < B. Weak regularity together with an appropriate additivity condition will imply strong regularity (details left to the reader).

Proposition 2. If G is generated by elements of order 2, and ⪅ is total and weakly G-invariant, then if there exists a g ∈ G and A ∈ F such that gA is a proper subset of A, then G is not strongly regular.

Proof: Strong regularity would require that gA < A, but that would contradict strong G-invariance which we have by Proposition 1.

Corollary 1. If F is a collection of subsets of the unit circle containing all countable sets and invariant under all reflections, and ⪅ is a total qualitative probability comparison weakly invariant under all reflections, then ⪅ is not strongly regular.

Proof: The group generated by all reflections includes all rotations. But there is a subset A of the circle and a rotation g such that gA is a proper subset of A. For instance, let A be the set of points at angles r, 2r, 3r, ... in degrees, where r is irrational, and let g be rotation by r. Then rA is the set of points at angles 2r, 3r, 4r, ... in degrees.

Now, imagine an infinite line on which there are infinitely many evenly spaced people, stretching out in both directions, each of whom flips a fair coin. Let Ω be the probability space describing these flips. Let Hn be the event that all the flips starting with person number n (i.e., n, n + 1, n + 2, ...) land heads. Suppose that F contains all the Hn and is invariant under all reflections of the situation (where we reflect the setup either about the point at which some person stands or at a point half-way between two neighboring people).

Corollary 2. If ⪅ is a total qualitative probability comparison weakly invariant under all reflections, then ⪅ is not strongly regular.

Proof: Let G be the group generated by the reflections. This group contains all translations. A non-trivial translation of Hn will either be a proper subset or a proper superset of Hn, depending on the direction of Hn. So by Proposition 2 we cannot have regularity.

Bibliographic note: Lemma 1 and Corollary 2 are analogous to Lemma 2 and Theorem 4 of this paper.

Tuesday, July 14, 2020

Regularity and rotational invariance

Suppose that we have some sort of (not merely real-valued) probability assignment P to the Lebesgue measurable subsets of the unit circle Ω.

Theorem: Suppose that the probability values are rotationally invariant (P(A)=P(ρA) for any rotation ρ) and satisfy the two axioms:

  1. If A and B are disjoint, A and C are disjoint, and P(B)=P(C), then P(A ∪ B)=P(A ∪ C)

  2. P(Ω − A)=P(Ω − B) if and only if P(A)=P(B).

Then P(A)=P(∅) for every singleton A.

In other words, we cannot have regularity (non-empty sets having different probability from empty sets) if we have the additivity-type condition (1), the complement condition (2) and rotational invariance.

Proof: Fix an irrational number r and let B be the set of points at angles in degrees r, 2r, 3r, .... Let x0 be the point at angle 0. Then B and C = B ∪ {x0} are rotationally equivalent (you get the former from the latter by rotating by r degrees). So, P(B)=P(C). Let A = Ω − C. Then A and B are disjoint as are A and C. Hence, P(A ∪ B)=P(A ∪ C). But A ∪ C = Ω. So, P(A ∪ B)=P(Ω) by axiom 1. But A ∪ B = Ω − {x0}. So, P({x0}) = P(∅) by axiom 2. But all singletons are rotationally equivalent so they all have the same measure.

This result is a variant of the results here.

Thursday, November 17, 2016

Against isotropy

We think of Euclidean space as isotropic: any two points in space are exactly alike both intrinsically and relationally, and if we rotated or translated space, the only changes would be to the bare numerical identities to the points—qualitatively everything would stay the same, both at the level of individual points and of larger structures.

But our standard mathematical models of Euclidean space are not like that. For instance, we model Euclidean space on the set of triples (x, y, z) of real numbers. But that model is far from isotropy. For instance, some points, like (2, 2, 2) have the property that all three of their coordinates are the same, while others like (2, 3, 2) have the property that they have exactly two coordinates that are the same, and yet others like (3, 1, 2) have the property that their coordinates are all different.

Even in one-dimension, say that of time, when we represent the dimension by real numbers we do not have isotropy. For instance, if we start with the standard set-theoretic construction of the natural numbers as

0 = ⌀, 1 = {0}, 2 = {0, 1}, 3 = {0, 1, 2}, ...

and ensure that the natural numbers are a subset of the reals, then 0 will be qualitatively very different from, say, 3. For instance, 0 has no members, while 3 has three members. (Perhaps, though, we do not embed the set-theoretic natural numbers into the reals, but make all reals—including those that are natural—into Dedekind cuts. But we will still have qualitative differences, just buried more deeply.)

The way we handle this in practice is that we ignore the mathematical structure that is incompatible with isotropy. We treat the Cartesian coordinate structure of Euclidean space as a mere aid to computation, while the set-theoretic construction of the natural numbers is ignored completely. Imagine the look of incomprehension we’d get from a scientist if one said something like: “At a time t2, the system behaved thus-and-so, because at a time t1 that is a proper subset of t2, it was arranged thus-and-so.” Times, even when represented mathematically as real numbers, just don’t seem the sort of thing to stand in subset relations. But on the Dedekind-cut construction of real numbers, an earlier time is indeed a proper subset of a later time.

But perhaps there is something to learn from the fact that our best mathematical models of isotropic space and time themselves lack true isotropy. Perhaps true isotropy cannot be achieved. And if so, that might be relevant to solving some problems.

First, probabilities. If a particle is on a line, and I have no further information about it except that the line is truly isotropic, so should my probabilities for the particle’s position be. But that cannot be coherently modeled in classical (countably additive and normalized) probabilities. This is just one of many, many puzzles involving isotropy. Well, perhaps there is no isotropy. Perhaps points differ qualitatively. These differences may not be important to the laws of nature, but they may be important to the initial conditions. Perhaps, for instance, nature prefers the particles to start out at coordinates that are natural numbers.

Second, the Principle of Sufficient Reason. Leibniz argued against the substantiality of space on the grounds that there could be no explanation of why things are where they are rather than being shifted or rotated by some distance. But that assumed real isotropy. But if there is deep anisotropy, there could well be reasons for why things are where they are. Perhaps, for instance, there is a God who likes to put particles at coordinates whose binary digits encode his favorite poems. Of course, one can get out of Leibniz’s own problem by supposing with him that space is relational. But if the relation that constitutes space is metric, then the problem of shifts and rotations can be replaced by a problem of dilation—why aren’t objects all 2.7 times as far apart as they are? Again, that problem assumes that there isn’t a deep qualitative structure underneath numbers.

Friday, July 3, 2015

Symmetries in laws

Theists have often noticed that theism provides a nice aesthetically-based explanation for why we have simple laws, namely that such laws are beautiful and this gave God reason to enact them. (One can run this in two ways: (1) such laws are objectively beautiful, and God made them because of their objective beauty; (2) such laws are beautiful to us, and God created a world where the laws are beautiful to the intelligent creatures therein.)

Another interesting question about the fundamental laws is why they exhibit such nice symmetries. This question on its face seems independent of the question of why the laws are simple. You can have simple but asymmetric laws, and complex but symmetric ones. Again, an aesthetic theistic explanation seems to work well here (and again, it comes in two forms: either the symmetries are objectively beautiful or God made a world where the aesthetic properties of the laws fit with the aesthetic sensibilities of the intelligent creatures).

One might hope that symmetry considerations would thus allow one to run a teleological argument for the existence of God that escapes from the difficulty of making the notion of simplicity precise. However, while I think there is hope of a symmetry-based theistic argument, I don't think it escapes from the difficulties of theoretical simplicity. Any set of laws of nature that has an infinite space of solutions has an infinite number of symmetries: any bijection of the space of solutions onto itself is a symmetry. When we are excited by a potential symmetry like charge-parity-time invariance, we are excited by the fact that the symmetry can be specified in a simple way with respect to physically natural quantities. And if we can make sense of these twin notions (simplicity and physical naturalness), then we can likewise make sense of the notion of the simplicity of laws. So while a symmetry-based argument may provide additional evidence for the existence of God, it is subject to the same main difficulty as the simplicity of laws argument. (That said, I think this difficulty is not fatal.)

Tuesday, April 15, 2014

Popper functions, uniform distributions and infinite sequences of heads

Paper forthcoming in the Journal of Philosophical Logic, now posted. I argue that Popper functions don't solve the problems of uniform probabilities in infinite spaces. Yet another in a series of highly technical papers.

Tuesday, November 5, 2013

Invariance of Popper functions under symmetries

Popper functions are primitive finitely additive conditional probabilities—i.e., P(A|B) is the fundamental quantity, and P(A) is the defined quantity. Now, in some situations we have expect our probabilities to be invariant under some group G of symmetries. For instance, if we're shooting an idealized dart at a circular target and aiming at the center, our idealized method of shooting might be rotationally invariant so that the probability of hitting some region A will be the same as the probability of hitting rA where r is some rotation about the center. (In real life, this need not be so. For instance, in archery, one might have bigger error in the vertical direction than in the horizontal direction, or vice versa, depending on one's skills.) We might also think that similarly there is invariance under reflections about lines through the center.

With unconditional probabilities, we can just formulate these invariance condition as: P(gA)=P(A) for all symmetries g in G and all (measurable) regions A. But how to formulate this for conditional probabilities?

There are two natural definitions:

  • P is weakly invariant if and only if P(A|B)=P(gA|gB) for all A, B and g.
  • P is strongly invariant if and only if (a) whenever AgAB, we have P(A|B)=P(gA|B) and (b) whenever ABgB, we have P(A|B)=P(A|gB).
Fact: Conditions (a) and (b) in the definition of strong invariance are equivalent.

Personally, I find weak invariance to be the more intuitive condition, though strong invariance has some intuitive pull. It's an interesting question how the two are related.

One interesting special case is where G is generated by symmetries of finite order. A symmetry g has finite order provided that there is a finite number n such that gn is the identity—i.e., applying it n times gets you back to where you started. For instance, rotation by an angle of 360/n degrees where n is a non-zero integer has finite order—you do this |n| times and you're back where you started. And all reflections have finite order.
Fact: If every symmetry in G can be written as a combination of symmetries of finite order, then weak invariance implies strong invariance.

For instance, while most rotations in the plane don't have finite order (only ones by a rational-number angle do), any rotation in the plane can be generated by combining two reflections. Thus, in our circular target case, where we are looking at invariance under reflections and rotations, weak invariance implies strong invariance.

Armstrong in this paper claims that weak invariance implies strong invariance in general (Prop. 1.3). Unfortunately Armstrong's proof is incomplete. And well it might be. For yesterday I came up with a super-simple case showing:
Fact: Weak invariance does not imply strong invariance.

It would be interesting to characterize cases where weak invariance does imply strong invariance. Two general cases are known to me. One is where the symmetries are generated by symmetries of finite order. The second is where the conditional probabilities are defined by the ratio formula starting with a regular probability (one that assigns non-zero probability to each empty set).