Alexander Pruss's Blog: infinitesimals

Showing posts with label infinitesimals. Show all posts

Friday, May 23, 2025

Hyperreal infinitesimal probabilities and definability

In order to assign non-zero probabilities to such things as a lottery ticket in an infinite fair lottery or hitting a specific point on a target with a uniformly distributed dart throw, some people have proposed using non-zero infinitesimal probabilities in a hyperreal field. Hajek and Easwaran criticized this on the grounds that we cannot mathematically specify a specific hyperreal field for the infinitesimal probability. If that were right, then if there are hyperreal infinitesimal probabilities for such a situation, nonetheless we would not be able to say what they are. But it’s not quite right: there is a hyperreal field that is "definable", or fully specifiable in the language of ZFC set theory.

However, for Hajek-Easwaran argument against hyperreal infinitesimal probabilities to work, we don’t need that the hyperreal field be non-definable. All we need is that the pair (^*R,α) be non-definable, where ^*R is a hyperreal field and α is the non-zero infinitesimal assigned to something specific (say, a single ticket or the center of the target).

But here is a fun fact, much of the proof of which comes from some remarks that Michael Nielsen sent me:

Theorem: Assume ZFC is consistent. Then ZFC is consistent with there not being any definable pair (^*R,α) where ^*R is a hyperreal field and α is a non-zero infinitesimal in that field.

[Proof: Solovay showed there is a model of ZFC where every definable set is measurable. But every free ultrafilter on the powerset of the naturals is nonmeasurable. However, an infinite integer in a hyperreal field defines a free ultrafilter on the naturals—given an infinite integer M, say that a subset A of the naturals is a member of the ultrafilter iff |M| ∈ ^*A. And a non-zero infinitesimal defines an infinite integer—say, as the floor of its reciprocal.]

Given the Theorem, without going beyond ZFC, we cannot count on being able to define a specific hyperreal non-zero infinitesimal probability for situations like a ticket infinite lottery or hitting the center of a target. Thus, if a friend of hyperreal infinitesimal probabilities wants to be able to define one, they must go beyond ZFC (ZFC plus constructibility will do).

Friday, January 10, 2025

Hyperreal worlds

In a number of papers, I argued against using hyperreal-valued probabilities to account for zero probability but nonetheless possible events, such as a randomly thrown dart hitting the exact center of the target, by assigning such phenomena non-zero but infinitesimal probability.

But it is possible to accept all my critiques, and nonetheless hold that there is room for hyperreal-valued probabilities.

Typically, physicists model our world’s physics with a calculus centered on real numbers. Masses are real numbers, wavefunctions are functions whose values are pairs of real numbers (or, equivalently, complex numbers), and so on. This naturally fits with real-valued probabilities, for instance via the Born rule in quantum mechanics.

However, even if our world is modeled by the real numbers, perhaps there could be a world with similar laws to ours, but where hyperreal numbers figure in place of our world’s real ones. If so, then in such a world, we would expect to have hyperreal-valued probabilities. We could, then, say that whether chances are rightly modeled with real-valued probabilities or hyperreal-valued probabilities depends on the laws of nature.

This doesn’t solve the problems with zero probability issues. In fact, in such a world we would expect to have the same issues coming up for the hyperreal probabilities. In that world, a dartboard would have a richer space of possible places for the dart to hit—a space with a coordinate system defined by pairs of hyperreal numbers instead of pairs of real numbers—and the probability of hitting a single point could still be zero. And in our world, the probabilities would still be real numbers. And my published critiques of hyperreal probabilities would not apply, because they are meant to be critiques of the application of such probabilities to our world.

There is, however, a potential critique available, on the basis of causal finitism. Plausibly, our world has an infinite number of future days, but a finite past, so on any day, our world’s past has only finitely many days. The set of future days in our world can be modeled with the natural numbers. An analogous hyperreal-based world would have a set of future days that would be modeled with the hypernatural numbers. But because the hypernatural numbers include infinite numbers, that world would have days that were preceded by infinitely (though hyperfinitely) many days. And that seems to violate causal finitism. More generally, any hyperreal world will either have a future that includes a finite number of days or one that includes days that have infinitely many days prior to them.

If causal finitism is correct, then “hyperreal worlds”, ones similar to ours but where hyperreals figure where in our our world we have reals, must have a finite future, unlike our world. This is an interesting result, that for worlds like ours, having real numbers as coordinates is required in order to have both causal finitism true and yet an infinite future.

Wednesday, April 26, 2023

Cable Guy and van Fraassen's Reflection Principle

Van Fraassen’s Reflection Principle (RP) says that if you are sure you will have a specific credence at a specific future time, you should have that credence now. To avoid easy counterexamples, the RP needs some qualifications such that there is no loss of memory, no irrationality, no suspicion of either, full knowledge of one’s own credences at any time, etc.

Suppose:

Time can be continuous and causal finitism is false.
There are non-zero infinitesimal probabilities.

Then we have an interesting argument against van Fraassen’s Reflection Principle. Start by letting RP+ be the strengthened version of RP which says that, with the same qualifications as needed for RP, if you are sure you will have at least credence r at a specific future time, then you should have at least credence r now. I claim:

If RP is true, so is RP+.

This is pretty intuitive. I think one can actually give a decent argument for (3) beyond its intuitiveness, and I’ll do that in the appendix to the post.

Now, let’s use Cable Guy to give a counterexample to RP+ assuming (1) and (2). Recall that in the Cable Guy (CG) paradox, you know that CG will show at one exact time uniformly randomly distributed between 8:00 and 16:00, with 8:00 excluded and 16:00 included. You want to know if CG is coming in the afternoon, which is stipulated to be between 12:00 (exclusive) and 16:00 (inclusive). You know there will come a time, say one shortly after 8:00, when CG hasn’t yet shown up. At that time, you will have evidence that CG is coming in the afternoon—the fact that they haven’t shown up between 8:00 and, say, 8:00+δ for some δ > 0 increases the probability that CG is coming in the afternoon. So even before 8:00, you know that there will come a time when your credence in the afternoon hypothesis will be higher than it is now, assuming you’re going to be rational and observing continuously (this uses (1)). But clearly before 8:00 your credence should be 1/2.

This is not yet a counterexample to RP+ for two reasons. First, there isn’t a specific time such that you know ahead of time for sure your credence will be higher than 1/2, and, second, there isn’t a specific credence bigger than 1/2 that you know for sure you will have. We now need to do some tricksy stuff to overcome these two barriers to a counterexample to RP+.

The specific time barrier is actually pretty easy. Suppose that a continuous (i.e., not based on frames, but truly continuously recording—this may require other laws of physics than we have) video tape is being made of your front door. You aren’t yourself observing your front door. You are out of the country, and will return around 17:00. At that point, you will have no new information on whether CG showed up in the afternoon or before the afternoon. An associate will then play the tape back to you. The associate will begin playing the tape back strictly between 17:59:59 and 18:00:00, with the start of the playback so chosen that that exactly at 18:00:00, CG won’t have shown up in the playback. However, you don’t get to see the clock after your return, so you can’t get any information from noticing the exact time at which playback starts. Thus, exactly at 18:00:00 you won’t know that it is exactly 18:00:00. However, exactly at 18:00:00, your credence that CG came in the afternoon will be bigger than 1/2, because you will know that the tape has already been playing for a certain period of time and CG hasn’t shown up yet on the tape. Thus, you know ahead of time that exactly at 18:00:00 your credence in the afternoon hypothesis will be higher than 1/2.

But you don’t know how much higher it will be. Overcoming that requires a second trick. Suppose that your associate is guaranteed to start the tape playback a non-infinitesimal amount of time before 18:00:00. Then at 18:00:00 your credence in the afternoon hypothesis will be more than 1/2 + α for any infinitesimal α. By RP+, before the tape playback, your credence in the afternoon hypothesis should be at least 1/2 + α for every infinitesimal α. But this is absurd: it should be exactly 1/2.

So, we now have a full counterexample to RP+, assuming infinitesimal probabilities and the coherence of the CG setup (i.e., something like (1)). At exactly 18:00:00, with no irrationality, memory loss or the like involved (ignorance of what time it is not irrational nor a type of memory loss), you will have a credence at least 1/2 + α for some positive infinitesimal α, but right now your credence should be exactly 1/2.

Appendix: Here’s an argument that if RP is true, so is RP+. For simplicity, I will work with real-valued probabilities. Suppose all the qualifications of RP hold, and you now are sure that at t₁ your credence in p will be at least r. Let X be a real number uniformly randomly chosen between 0 and 1 independently of p and any evidence you will acquire by t₁. Let C_t(q) be your credence in q at t. Let u be the following proposition: X < r/C_t(p) and p is true. Then at t₁, your credence in u will be (r/C_t(p))C_t(p) = r (where we use the fact that r ≤ C_t(p)). Hence, by RP your credence now in u should be r. But since u is a conjunction of two propositions, one of them being p, your credence now in p should be at least r.

(One may rightly worry about difficulties in dropping the restriction that we are working with real-valued probabilities.)

Wednesday, March 23, 2022

An analogy for divine infinity

Here’s an analogy I’ve been thinking about. God’s value is related to other infinities like (except with a reversal of order) zero is related other infinitesimals. Just as zero is infinitely many times smaller than any other infinitesimal (technically, zero is an infinitesimal—an infinitesimal being a quantity x such that |x| < 1/n for every natural number n), and in an important sense is radically different from them, so too the infinity of God’s value is infinitely many times greater than any other infinity, and in an important sense is radically different from them.

Suppose we think with the medievals that value and being are correlative. Then zero value corresponds to complete non-being. There isn’t anything that has that. Between ordinary non-divine things like people and oak trees and non-being we have a radical ontological difference: there are people and oak trees, but there is no non-being. Suppose we push the analogy on the side of God. Then between ordinary non-divine things like people and oak trees and God we will have a radical ontological difference, too. Some theologians have infamously tried to mark this difference by saying that people and oak trees are but God is not. That way of marking the difference is misleading by making God seem like non-being instead of like its opposite. A better way to mark the difference is to say that in an important sense God is and people and oak trees are not (compare what Jesus is said to have have said to St Catherine of Siena: “I am who I am and you are she who is not”). In any case, the gap between God’s “is” and our “is” is at least as radical as the gap between our “is” and the “is not” of non-being.

In fact, I think the gap is more radical: we and all other creatures are closer to non-being than to God. So the analogy I’ve been thinking about, that God’s value is related to other infinities like zero to other infinitesimals (but in reverse order) is misleading: God’s value is in a sense further from other infinities than zero is from other infinitesimals. (And not just because all infinitesimals are infinitesimally close to zero. The relevant scale should not be arithmetic but logarithmic, so that the gap between zero and anything—even an infinitesimal—bigger than zero is in an important sense infinite.)

Don’t take this too seriously. Remember this.

Tuesday, March 22, 2022

The spectrum of values

Let’s do some rough and ready thinking about the spectrum of possible values of objects on a classical theistic view.

God has his value essentially.
Necessarily, God is more valuable than everything other than God.
Necessarily, everything that exists exists by participation in the good God and hence has positive value.

The spectrum of possible values thus has an upper bound: God’s value. Moreover, it follows from (2) that God is infinitely many times more valuable than anything else. For consider some object x other than God, and imagine a world (perhaps a multiverse) where x is duplicated some number n of times. By (2), God will be more valuable than the duplicates of x, and hence God is more than n times valuable than x. Since n is arbitrary, it follows that God is infinitely many times more valuable than x.

Thus, our spectrum of values has God at the top, then an infinite gap, and below that possible creatures.

What does the lower end of the spectrum of possible values of objects look like? Well, by (3), all the values are positive. So the lower end of the spectrum lies above zero. I suspect that it asymptotically approaches zero. For consider an object x and now imagine an object y which has exactly one essential causal power, that of producing x with a probability of 1/2. Intuitively, y has something like half the value of x. So it is plausible that the lower end of the spectrum of possible values approaches zero but does not reach it.

But now suppose that y has only an infinitesimal probability of producing x (imagine y has an internal spinner and it produces x whenever that spinner lands exactly at ninety degrees). Then x seems like it would be infinitely more valuable than y. If this is right, then for every value in the spectrum, there is a value that is infinitely many times smaller than it.

The spectrum of values has a top (God) but no bottom. For any value on the spectrum of values, there is a value infinitely many times smaller than it. And for any value on the spectrum of values other than God, there is a value infinitely many times greater than it.

There is thus a very natural sense in which everything is relatively infinite in value: everything is infinitely many times more valuable than something else. But only God is absolutely infinite in value: God is infinitely many times more valuable than everything else.

Incommensurability complicates things, though.

Tuesday, February 22, 2022

Nano St Petersburg

In the St Petersburg game, you toss a fair coin until you get heads. If it took you n tosses to get to heads, you get a utility of 2ⁿ. The expected payoff is infinite (since (1/2) ⋅ 2 + (1/4) ⋅ 4 + (1/8) ⋅ 8 + ... = ∞), and paradoxes abound (e.g., this.

One standard way out is to deny the possibility of unboundedly large utilities.

Interestingly, though, it is possible to imagine St Petersburg style games without really large utilities.

One way is with tiny utilities. If it took you n tosses to get to heads, you get a utility of 2ⁿα, where α > 0 is a fixed infinitesimal. The expected payoff won’t be infinite, but the mathematical structure is the same, and so the paradoxes should all adapt.

Another way is with tiny probabilities. Let G(n) be this game: a real number is uniformly randomly chosen between zero and one, and if the number is one of 1, 1/2, 1/3, ..., 1/n, then you get a dollar. Intuitively, the utility of getting to play G(n) is proportional to n. Now our St Petersburg style game is this: you toss a coin until you get heads, and if you got heads on toss n, you get to play G(2ⁿ).

Saturday, February 19, 2022

Dominance and infinite lotteries

Suppose we have an infinite fair lottery with tickets 1,2,3,…. Now consider a wager W where you get 1/n units of utility if ticket n wins. How should you value that wager?

Any value less than zero is clearly a non-starter. How about zero? Well, that would violate the dominance principle: you would be indifferent between W and getting nothing, and yet W is guaranteed to give you something positive. What about something bigger than zero? Well, any real number y bigger than zero has the following problem as a price: You are nearly certain (i.e., within an infinitesimal of certainty) that the payoff of W will be less than y/2, and hence you’ve overpaid by at least y/2.

But what about some price that isn’t a real number? By the above argument, that price would have to be bigger than zero, but must be smaller than every positive real number. In other words, it must be infinitesimal. But any such price will violation dominance as well: you would be indifferent between W and getting that price, yet it is certain that W would give you something bigger—namely one of the real numbers 1, 1/2, 1/3, ....

So it seems that no price, real numbered or other, will do.

(This argument is adapted from one that Russell and Isaacs give in the case of the St Petersburg paradox.)

One way out will be familiar to readers of my work: Reject the possibility of infinite fair lotteries, and thereby get yet another argument for causal finitism.

But for those who don’t like controversial metaphysics solutions to decision theoretic problems, there is another way: Deny the dominance principle, price W at zero, and hold that sometimes it is rational to be indifferent between two outcomes, one of which is guaranteed to be better than the other no matter what.

This may sound crazy. But consider someone who assigns the price zero to a dart tossing game where you get a dollar if the dart hits the exact center and nothing otherwise, reasoning that the classical mathematical expected value of that game for any continuous distribution of dart tosses (such as a normal distribution around the center) is zero. I think this response to an offer to play is quite rational: “I am nearly certain to lose, so what’s the point of playing?” Now, that case doesn’t violate the same dominance principle as the lottery case—it violates a stronger dominance principle that says that if one option is guaranteed to be at least as good as the other and in some possible scenario is better, then it should be preferred. But I think the dart case may soften one up for thinking this:

If (a) some game never has an outcome that’s negative, and (b) for any positive real, it is nearly certain that the outcome of some game will be less than that, I should value it at zero or less.

And if we do that, then we have to value W at zero. Yes, if you reject W in favor of nothing, you’ve lost something. But probably very, very little.

Here is another weakish reason to be suspicious of dominance. Dominance is too similar to conglomerability, and conglomerability should be suspicious to anyone who likes exotic probabilistic cases. (By the way, this paper connects with this.)

Wednesday, September 8, 2021

Two spinners and infinitesimal probabilities

Suppose you do two independent experiments, A and B, each of which uniformly generates a number in the interval I = [0, 1).

Here are some properties we would like to have on our probability assignment P:

There is a value α such that P(A = x)=P(B = x)=α for all x ∈ I and P((A, B)=z)=α² for all z ∈ I².
For every subset U of I² consisting of a finite union of straight lines, P((A, B)∈U) is well-defined.
For any measurable U ⊆ I², if P((A, B)∈U|A = x)=y for all x ∈ I, then P((A, B)∈U)=y.
For any measurable U ⊆ I², if P((A, B)∈U|B = x)=y for all x ∈ I, then P((A, B)∈U)=y.
The assignment P satisfies the axioms of finitely additive probability with values in some real field.

Here is an interesting consequence. Let U consist of two line segments, one from (0, 0) to (1, 1/2) and the other from (0, 1/2) to (1, 1). Then every vertical line in I² intersects U in exactly two points. This is measurable by (2). It follows from (1) and (5) that P((A, B)∈U|A = x)=2α for all x ∈ I. Thus, P((A, B)∈U)=2α by (3). On the other hand, every horizontal line in I² meets U in exactly one point, so P((A, B)∈U|B = x)=α by (1) and P((A, B)∈U)=α by (4). Thus, 2α = α, and so α = 0.

In other words, if we require (1)-(5) to hold, then the probability of every single point outcome of either experiment must be exactly zero. In particular, it is not possible for the probability of a single point outcome to be a positive infinitesimal.

Cognoscenti of these kinds of arguments will recognize (3) and (4) as special cases of conglomerability, and are likely to say that we cannot expect conglomerability when dealing with infinitesimal probabilities. Maybe so: but (3) and (4) are only a special case of conglomerability, and they feel particularly intuitive to me, in that we are partitioning the sample space I² on the basis of the values of one of the two independent random variables that generate the sample space. The setup—say, two independent spinners—seems perfectly natural and unparadoxical, the partitions seem perfectly natural, and the set U to which we apply (3) and (4) is also a perfectly natural set, a union of two line segments. Yet even in this very natural setup, the friend of infinitesimal probabilities has to embrace a counterintuitive violation of (3) and (4).

Monday, December 7, 2020

Independence, spinners and infinitesimals

Say that a “spinner” is a process whose output is an angle from 0^∘ (inclusive) to 360^∘ (exclusive). Take as primitive a notion of uniform spinner. I don’t know how to define it. A necessary condition for uniformity is that every angle has the same probability, but this necessary condition is not sufficient.

Consider two uniform and independent spinners, generating angles X and Y. Consider a third “virtual spinner”, which generates the angle Z obtained by adding X and Y and wrapping to be in the 0^∘ to 360^∘ range (thus, if X = 350^∘ and Y = 20^∘, then Z = 10^∘). This virtual spinner is intuitively statistically independent of each of X and Y on its own but not of both.

Suppose we take the intuitive statistical independence at face value. Then:

P(Z = 0^∘)P(X = 0^∘)=P(Z = X = 0^∘)=P(Y = X = 0^∘)=P(Y = 0^∘)P(X = 0^∘),

where the second equality followed from the fact that if X = 0^∘ then Z = 0^∘ if and only if Y = 0^∘. Suppose now that P(X = 0^∘) is an infinitesimal α. Then we can divide both sides by α, and we get

P(Z = 0^∘)=P(Y = 0^∘).

By the same reasoning with X and Y swapped:

P(Z = 0^∘)=P(X = 0^∘).

We conclude that

P(X = 0^∘)=P(Y = 0^∘).

We thus now have an argument for a seemingly innocent thesis:

Any two independent uniform spinners have the same probability of landing at 0^∘.

But if we accept that uniform spinners have infinitesimal probabilities of landing at a particular value, then (1) is false. For suppose that X and Y are angles from two independent uniform spinners for which (1) is true. Consider a spinner whose angle is 2Y (wrapped to the [0^∘, 360^∘) range). This doubled spinner is clearly uniform, and independent of X. But its probability of yielding 0^∘ is equal to the probability of Y being 0^∘ or 180^∘, which is twice the probability of Y being 0^∘, and hence twice the probability of X being 0^∘, in violation of (1) if P(X = 0^∘)>0.

So, something has gone wrong for friends of infinitesimal probabilities. I see the following options available for them:

Deny that Z = 0^∘ has non-zero probability.
Deny that Z is statistically independent of X as well as being statisticlaly independent of Y.

I think (3) is probably the better option, though it strikes me as unintuitive. This option has the interesting consequence: we cannot independently rerandomize a spinner by giving it another spin.

The careful reader will notice that this is basically the same argument as the one here.

Wednesday, December 2, 2020

Another problem for infinitesimal probabilities

Here’s another problem with independence for friends of infinitesimal probabilities.

Let ..., X₋₂, X₋₁, X₀, X₁, X₂, ... be an infinite sequence of independent fair coin tosses. For i = 0, 1, 2, ..., define E_i to be heads if X_i and X_{−1 − i} are the same and tails otherwise.

Now define these three events:

L: X₋₁, X₋₂, ... are all heads
R: X₀, X₁, ... are all heads
E: E₀, E₁, ... are all heads.

Friends of infinitesimal probabilities insist that P(R) and P(L) are positive infinitesimals.

I now claim that E is independent of R, and the same argument will show that E is independent of L. This is because of this principle:

If Y₀, Y₁, ... is a sequence of independent random variables, and f and g are functions such that f(Y_i) and g(Y_i) are independent of each other for each fixed i, then the sequences f(Y₀),f(Y₁),... and g(Y₀),g(Y₁),... are independent of each other.

But now let Y_i = (X_i, X_{−1 − i}). Then Y₀, Y₁, ... is a sequence of independent random variables. Let f(x, y)=x and let g(x, y) be heads if x = y and tails otherwise. Then it is easy to check that f(Y_i) and g(Y_i) are independent of each other for each fixed i. Thus, by (1), f(Y₀),f(Y₁),... and g(Y₀),g(Y₁),... are independent of each other. But f(Y_i)=X_i and g(Y_i)=E_i. So, X₀, X₁, ... and E₀, E₁, ... are independent of each other, and hence so are E and R.

The same argument shows that E and L are independent.

Write AB for the conjunction of A and B and note that EL, ER and RL are the same event—namely, the event of all the coins being heads. Then:

P(E)P(L)=P(EL)=P(RL)=P(R)P(L)

Since friends of positive infinitesimals insist that P(R) and P(L) are positive infinitesimals, we can divide both sides by P(L) and get P(E)=P(R). The same argument with L and R swapped shows that P(E)=P(L). So, P(L)=P(R).

But now let X_i^* = X_i + 1, and define L^* to be the event of X₋₁^*, X₋₂^* being all heads, and R^* the event of X₀^*, X₁^*,… being all heads. The exact same argument as above will show that P(L^*)=P(R^*). But friends of infinitesimal probabilities have to say that P(R^*)>P(R) and P(L^*)<P(L), and so we have a contradiction if P(L)=P(R) and P(L^*)=P(R^*).

I think the crucial question is whether (1) is still true in settings with infinitesimal probabilities. I don’t have a great argument for it. It is, of course, true in classical probabilistic settings.

Monday, November 30, 2020

Independence, uniformity and infinitesimals

Suppose that a random variable X is uniformly distributed (in some intuitive sense) over some space. Then :

P(X = y)=P(X = z) for any y and z in that space.

But I think something stronger should also be true:

Let Y and Z be any random variables taking values in the same space as X, and suppose each variable is independent of X. Then P(X = Y)=P(X = Z).

Fixed constants are independent of X, so (1) follows from (2).

But if we have (2), and the plausible assumption:

If X and Y are independent, then X and f(Y) are independent for any function f,

we cannot have infinitesimal probabilities. Here’s why. Suppose X and Y are independent random variables uniformly distributed over the interval [0, 1). Assume P(X = a) is infinitesimal for a in [0, 1). Then, so is P(X = Y).

Let f(x)=2x for x < 1/2 and f(x)=2x − 1 for 1/2 ≤ x. Then if X and Y are independent, so are X and f(Y). Thus:

P(X = Y)=P(X = f(Y)).

Let g(x)=x/2 and let h(x)=(1 + x)/2. Then:

P(Y = g(X)) = P(Y = X)

and

P(Y = h(X)) = P(Y = X).

But now notice that:

Y = g(X) if and only if X = f(Y) and Y < 1/2

and

Y = h(X) if and only if X = f(Y) and 1/2 ≤ Y.

Thus:

(Y = g(X) or Y = h(X)) if and only if X = f(Y)

and note that we cannot have both Y = g(X) and Y = h(X). Hence:

P(X = Y)=P(X = f(Y)) = P(Y = g(X)) + P(Y = h(X)) = P(Y = X)+P(Y = X)=2P(X = Y).

Therefore:

P(X = Y)=0,

which contradicts the infinitesimality of P(X = Y).

This argument works for any uniform distribution on an infinite set U. Just let A and B be a partition of U into two subsets of the same cardinality as U (this uses the Axiom of Choice). Let g be a bijection from U onto A and h a bijection from U onto B. Let f(x)=g⁻¹(x) for x ∈ A and f(x)=h⁻¹(x) for x ∈ B.

Note: We may wish to restrict (3) to intuitively “nice” functions, ones that don’t introduce non-measurability. The functions in the initial argument are “nice”.

Monday, August 24, 2020

Invariance under independently chosen random transformations

Often, a probabilistic situation is invariant under some set of transformations, in the sense that the complete probabilistic facts about the situation are unchanged by the transformation. For instance, in my previous post I suggested that a sequence of fair coin flips should be invariant under the transformation of giving a pre-specified subset of the coins an extra turn-over at the end and I proved that we can have this invariance in a hyperreal model of the situation.

Now, a very plausible thesis is this:

Randomized Invariance: If a probabilistic situation S is invariant under each member of some set T of transformations, then it is also invariant under the process where one chooses a random member of T independently of S and applies that member to S.

For instance, in the coin flip case, I could choose a random reversing transformation as follows: I line up (physically or mentally) the infinite set of coins with an independent second infinite set of coins, flip the second set of coins, and wherever that flip results in heads, I reverse the corresponding coin in the first set.

By Randomized Invariance, doing this should not change any of the probabilities. But insisting on this case of Randomized Invariance forces us to abandon the idea that we should assign such things as an infinite sequence of heads a non-zero but infinitesimal probability. Here is why. Consider a countably infinite sequence of fair coins arranged equidistantly in a line going to the left and to the right. Fix a point r midway between two successive coins. Now, use the coins to the left of r to define the random reversing transformation for the coins to the right of r: if after all the coins are flipped, the nth coin to the left of r is heads, then I give an extra turn-over to the nth coin to the right of r.

According to Randomized Invariance, the probability that all the coins to the right of r will be tails after the random reversing transformations will be the same as the probability that they were all tails before it. Let p be that probability. Observe that after the transformations, the coins to the right of r are all tails if and only if before the transformations the nth coin to the right and the nth coin to the left showed the same thing (for we only get tails on the nth coin on the right at the end if we had tails there at the beginning and the nth coin on the left was tails, or if we had heads there at the beginning, but the heads on the nth coin to the left forced us to reverse it). Hence, p is also the probability that the corresponding coins to the left and right of r showed the same thing before the transformation.

Thus, we have shown that the probability that all the paired coins on the left and right equidistant to r are the same (i.e., we have a palindrome centered at r) is the same as the probability that we have only tails to the right of r. Now, apply the exact same argument with “right” and “left” reversed. We conclude that the probability that the coins on the right and left equidistant to r are always the same is the same as the probability that we have only tails to the left of r. Hence, the probability of all-tails to the left of r is the same as the probability of all-tails to the right of r.

And this argument does not depend on the choice of the midpoint r between two coins. But as we move r one coin to the right, the probability of all-tails to the right of r is multiplied by two (there is one less coin that needs to be tails) and the probability of all-tails to the left of r is multiplied by a half. And yet these numbers have to be equal as well by the above argument. Thus, 2p = p/2. The only way this can be true is if p = 0.

Therefore, Randomized Invariance, plus the thesis that all the non-random reversing transformations leave unchanged the probabilistic situation (a thesis made plausible by the fact that even with infinitesimal probabilities, we provably can have a model of the probabilities that is invariant under these transformation), shows that we must assign probability zero to all-tails, and infinitesimal probabilities are mistaken.

This is, of course, a highly convoluted version of Timothy Williamson’s coin toss argument. The reason for the added complexity is to avoid any use of shift-based transformations that may be thought to beg the question against advocates of non-Archimedean probabilities. Instead, we simply use randomized reversal symmetry.

Thursday, August 13, 2020

Another simple way to see a problem with infinitesimal probabilities

Suppose I independently randomly and uniformly choose X and Y between 0 and 1, not including 1 but possibly including 0. Now in the diagram above, let the blue event B be that the point (X, Y) lies one one of the two blue line segments, and let the red event R be that it lies on one of the two red line segments. (The red event is the graph of the fractional part of 2x; the blue event is the reflection of this in the line y = x.) As usual, a filled circle indicates a point included and an unfilled circle indicates a point not included; the purple point at (0, 0) is in both the red and blue events.

It seems that B is twice as likely as R. For, given any value of X—see the dotted line in the diagram—there are two possible values of Y that put one in B but only one possible value of X that puts one in R.

But of course the situation is completely symmetric between X and Y, and the above reasoning can be repeated with X and Y swapped to conclude that R is about twice as likely as B.

Hmm.

Of course, there is no paradox in classical probability theory where we just say that the red and blue events have zero probability, and twice zero equals zero.

But if we have any probability theory that distinguishes different events that are classically of zero-probability and says things like “it’s more likely that Y is 0.2 or 0.8 than that Y is 0.2” (say because both events have infinitesimal probability, with one of these infinitesimals being twice as big as the other), then the above reasoning should yield the absurd conclusion that B is more likely than R and R is more likely than B.

Technically, there is nothing new in the above. It just shows that when we have a probability theory that distinguishes classically zero-probability events, that probability theory will fail conglomerability. I.e., we have to reject the reasoning that just because conditionally on any value of X it’s twice as likely that we’re in B as in R, therefore it’s twice as likely that we’re in B as in R. We already knew that conglomerability reasoning had to be rejected in such probability theories. But I think this is a really vivid way of showing the point, as this instance of conglomerability reasoning seems super plausible. And I think the vividness of it makes it clear that the problem doesn’t depend on any kind of weird trickery with strange sets, and that no mere technical tweak (such as moving to qualitative or comparative probabilities) is likely to get us out of it.

Monday, August 3, 2020

Uncountably infinite fair lottery

A fair lottery is going to be held. There are uncountably infinitely many players and the prize is infinitely good. Specifically, countably infinitely many fair coins will be tossed, and corresponding to each infinite sequence of heads and tails there is a ticket that exactly one person has bought.

Along comes Truthful Alice. She offers you a deal: she’ll take your ticket and give you two tickets. Of course, you go for the deal since it doubles your chances of winning, and Alice gives the same deal to everyone else, and everyone else goes for it. Alice then has everyone’s tickets. She now proceeds as follows. If you had a ticket with the sequence X₁X₂X₃..., she gives you the tickets HHHHHX₁X₂X₃... and HHHHTX₁X₂X₃.... And she keeps for herself all the tickets that start with something other than HHHH.

So, everyone has gone for the deal, and Alice has a 15/16 chance of winning (since that’s the chance that the coin sequence won’t start with HHHH). That’s paradoxical!

This paradox suggests that there may be something wrong with the concept of a fair infinite lottery even when the number of tickets is uncountable.

Here is one way to soften the paradox. If you reason with classical probability theory, without any infinitesimals, you will agree that the deal offered you by Alice doubles your chances of winning, but you will also note that the chance it doubles is zero, and doubling zero is no increase. So if you reason with classical probability theory, you will be indifferent to Alice’s deal. There is still something strange in thinking that Alice is able to likely enrich herself at the expense of a bunch of people doing something they are rationally indifferent about. But it’s less surprising than if she can do so at the expense of people doing what they rationally ought.

There is another thought which I find myself attracted to. The very concept of a fair lottery breaks down in infinite cases. If the lottery were fair, exchanging a ticket for two tickets would be a good deal. But the lottery isn’t fair, because there are no infinite fair lotteries.

Monday, July 20, 2020

Complete conditional probabilities, infinitesimal probabilities and two easy Frankenstein facts

Suppose we have a complete finitely-additive conditional probability P(⋅|⋅) on some algebra F of events on a space Ω (i.e., P is a Popper function with all non-empty sets regular), so that P(A|B) is defined for every non-empty B.

Here’s a curious thing: there is a very large disconnect between how P behaves when conditioning on sets of zero measure and when conditioning on sets of non-zero measure.

Here’s one way to see the disconnect. Consider any other complete finitely additive conditional probability Q(⋅|⋅) on the same algebra F, and suppose that Q assigns unconditional measure zero to everything that P does: i.e., if P(A|Ω)=0, then Q(A|Ω)=0.

There is an analogous Frankenstein fact for infinitesimal probabilities. Let P and Q be any two finitely-additive probabilities with values in some hyperreal field, and suppose that Q is tiny whenever P is tiny, where a hyperreal is “tiny” provided that it is zero or infinitesimal. Then there is a Frankenstein probability R = Std P + Inf Q, where Std x and Inf x are the standard and infinitesimal parts of a finite hyperreal. (The fact that Q is tiny when P is tiny is used to show that R is non-negative.) This R then has the same large-scale (i.e., standard scale) behavior as P and small-scale behavior as Q.

In other words, as we depart from classical probability, we get a nearly complete disconnect between small-scale and large-scale behavior.

Friday, February 2, 2018

A non-dimensionless infinitesimal probabilistic constant?

Suppose you throw a dart at a circular target of some radius r in such a way that the possible impact points are uniformly distributed over the target. Classically, the probability that you hit the center of the target is zero. But suppose that you believe in infinitesimal probabilities, and hence assign an infinitesimal probability α(r)>0 to hitting the center of the target.

Now, α(r) intuitively should vary with r. If you double the radius, you quadruple the area of the target, and so you should be only one quarter as likely to hit the center. If that’s right, then α(r)=β/r² for some infinitesimal constant β.

This means that in addition to the usual constants of physics, there is a special infinitesimal constant measuring the probability of hitting the center of a target. Now, there is nothing surprising about general probabilistic stuff involving constants like π and e. But these are dimensionless constants. However, β is not dimensionless: in SI units, it is expressed in square meters. And this seems incredible to me, namely that there should be a non-dimensionless constant calibrating the probabilities of a uniformly distributed dart throw. Any non-dimensionless constant should vary between worlds with different laws of nature—after all, there will be worlds where meters make no sense at all (a meter is 1/299792458 of the distance light travels in a second; but you can have a world where there is no such thing as light). So, it seems, the laws of nature tell us something about the probabilities of uniform throws. That seems incredible.

It is so much better to just say the probability is zero. :-)

Friday, January 26, 2018

Real questions about infinity and probability

One may have the impression that the kinds of questions I like to ask about infinity and probability, questions involving zero probability events and infinitesimals are all purely hypothetical. We don’t actually play games with infinitely many die rolls and the like.

This is a mistake. Here are five non-hypothetical questions that have problematic features dealing with infinity:

How epistemically likely is it that we live in a multiverse with exactly K universes (where K is finite or infinite, and at least 1)?
How epistemically likely is it that we live in a multiverse that includes at least one K-dimensional universe (where K is finite or infinite, but different from 4)?
How epistemically likely is it that there are exactly K objects in existence (where K is finite or infinite, and at least around 10⁸⁰)?
How epistemically likely is it that in the future I will roll a die infinitely often and each time get heads?
How epistemically likely is it that the first counterexample to Goldbach’s conjecture is between 10¹⁰⁰ and 10¹⁰¹?

Since the epistemic probability that we live in an infinite multiverse is non-zero, questions 1 and 2 have non-hypothetical bite. Question 3 obviously has bite. Likewise, question 4 has bite because the epistemic probability of an infinite afterlife is non-zero. Question 5 has bite because it is epistemically possible that Goldbach’s conjecture is false.

Wednesday, January 17, 2018

Arbitrariness, probability and infinitesimals

A well-known objection to replacing the zero probability of some events—such as getting heads infinitely many times in a row—with an infinitesimal is arbitrariness. Infinitesimals are usually taken to be hyperreals and there are infinitely many hyperreal extensions of the reals.

This version of the arbitrariness has an objection. There are extensions of the reals that one can unambiguously define. Three examples: (1) the surreals, (2) formal Laurent series and (3) the Kanovei-Shelah model.

But it turns out that there is still an arbitrariness objection in these contexts. Instead of saying that the choice of extension of the reals is arbitrary, we can say say that the choice of particular infinitesimals within the system to be assigned to events is arbitrary.

Here is a fun fact. Let R be the reals and let R^* be any extension of R that is a totally ordered vector space over the reals, with the order agreeing with that on R. (This is a weaker assumption than taking R^* to be an ordered field extension of the reals.) Say that an infinitesimal is an x in R^* such that −y < x < y for any real y > 0.

Theorem: Suppose that P is an R^*-valued finitely additive probability on some algebra of sets, and suppose that P assigns a non-real number to some set. Then there are uncountably different many R^*-valued finitely additive probability assignments Q on the same algebra of sets such that:

If P(A) is real if and only if Q(A) is real, and then P(A)=Q(A).
All corresponding linear combinations of P and Q are ordinally equivalent to each other, i.e., for any sets A₁, ..., A_n, B₁, ..., B_m in the algebra and any real a₁, ..., a_n, b₁, ..., b_m, we have ∑a_iP(A_i)<∑b_iP(B_i) if and only if ∑a_iQ(A_i)<∑b_iQ(B_i).
P(A)−Q(A) differ by a non-zero infinitesimal whenever P(A) is non-real.

Condition (ii) has some important consequences. First, it follows that ordinal comparisons of probabilities will be equally preserved by P and by Q. Second, it follows that both probabilities will assign the same results to decision problems with real-number utilities. Third, it follows that P(A)=P(B) if and only if Q(A)=Q(B), so any symmetries preserved by P will be preserved by Q. These remarks show that it is difficult indeed to hold that the choice of P over Q (or any of the other uncountably many options) is non-arbitrary, since it seems epistemic, decision-theoretic and symmetry constraints satisfied by P will be satisfied by Q.

Sketch of proof: For any finite member x of R^* (x is finite if and only if there is a real y such that −y < x < y), let s(x) be the unique real number such that x − s(x) is infinitesimal. Let i(x)=x − s(x). Then for any real number r > 0, let Q_r(A)=s(P(A)) + ri(P(A)). Note that s and i are linear transformations, from which it follows that Q_r is a finitely additive probability assignment. It is not difficult to show that (i) and (ii) hold, and that (iii) holds if r ≠ 1.

Remark 1: I remember seeing the s + ri construction, but I can’t remember where. Maybe it was in my own work, maybe in something by someone else (Adam Elga?).

Remark 2: What if we want to preserve facts about conditional probabilities? This is a bit trickier. We’ll need to assume that R^* is a totally ordered field rather than a totally ordered vector space. I haven’t yet checked what properties will be preserved by the construction above then.

Wednesday, September 2, 2015

From a past-infinite causal sequence to a paradoxical lottery: A cosmological argument

Infinite fair lotteries are well-known to be paradoxical. Let's say that an infinite fair lottery is played twice with tickets 1,2,3,.... Then whatever number wins first, you can be all but perhaps certain that in the next run of the lottery a bigger number will win (since the probability of any particular number winning is zero or infinitesimal, so the probability that the winner is a member of the finite set of numbers smaller than or equal to the first picked number is zero or infinitesimal). So as you keep on playing, you can be completely confident that the next number picked will be bigger than the one you just picked. But intuitively that's not what's going to happen. Or consider this neat paradox. Given the infinite fair lottery, there is a way to change the lottery that makes each ticket infinitely more likely to win. Just run a lottery where the probability of ticket n is 2^-n (which is infinitely bigger than the zero or infinitesimal probability in the paradoxical lottery)

What makes the infinite fair lottery paradoxical is that

there is a countable infinity of tickets

and

each ticket has zero or infinitesimal chance of winning.

Let's stipulate that a lottery is "paradoxical" if and only if it satisfies (1) and (2).

Suppose now that a past-infinite causal sequence is possible (e.g., my being caused by my parents, their being caused by theirs, and so on ad infinitum). Then the following past-infinite causal sequence is surely possible as well. There is a machine that has always been on an infinite line with positions marked with integers: ...,-3,-2,-1,0,1,2,3,.... Each day, the machine has tossed a fair coin. If the coin was heads, it moved one position to the right on the line (e.g., from 2 to 3) and if it was tails, one position to the left (e.g., from 0 to -1). The machine moved in no other way.

We can think of today's position of the machine as picking out a ticket from a countably infinite lottery. Moreover, this countably infinite lottery is paradoxical. It satisfies (1) by stipulation. And it's not hard to argue that it satisfies (2), because of how random walks thin out probability distributions. (And all we need is finite additivity for the argument.)

So if past-infinite causal sequences are possible, paradoxical lotteries are as well. But paradoxical lotteries are not possible, I say. So past-infinite causal sequences are not possible. So there is an uncaused cause.

Wednesday, June 17, 2015

Non-conglomerability

This result is probably known, and probably not optimal. A conditional probability function P is conglomerable provided that for any partition {H_i} (perhaps infinite and maybe even uncountable) of the state space if P(A|H_i)≥r for all i, then P(A)≥r.

Theorem. Assume the Axiom of Choice. Suppose P is a full conditional probability function (i.e., Popper function) P on an uncountable space such that:

all singletons are measurable
the function satisfies this regularity condition for all elements x and y: P({x}|{x,y})>0
there is a partition of the probability space into two disjoint subsets A and B with the same cardinality such that P(A)>0 and P(B)>0

Then P is not conglomerable.

Conditions (2) and (3) are going to be intuitively satisfied for plausible continuous probabilities, like uniform and Gaussian ones. So in those cases there is no hope for a conglomerable conditional probability.

Sketch of proof: Let Q be a hyperreal-valued unconditional probability corresponding to P, so that P(X|Y)=Q(X∩Y)/Q(Y). The regularity condition (2) implies that that there is a hyperreal α such that Q(F)/α is finite, non-zero and non-infinitesimal for each finite set F. (Just let α=Q({x_0}) for any fixed x₀.) Let R(F) be the standard part of Q(F)/α for any finite set α. Then P(F|G)=R(F∩G)/R(G) for any finite sets F and G. Moreover, R is finitely additive and non-zero on every singleton.

Since A² has the same cardinality as A, there is a function f from B to the subsets of A with the property that f(b) and f(c) are disjoint if b and c are distinct and every f(b) is uncountable. Choose a finite number c such that P(A)<c/(1+c). For each b in B, choose a finite subset F_b of f(b) such that R(F_b)>cR({b}). Such a finite subset exists as R is finitely additive and the sum of an uncountable number of non-zero positive numbers is always infinity. Let H be the union of the F_b as b ranges over B. Then A−H has at most the cardinality of B. Let h be a one-to-one function from A−H to B. For each b in B, let G_b=F_b if there is no a in A−H such that h(a)=b; otherwise, let G_b=F_b∪{a} for such an a. Let H_b={b}∪G_b. Then R(G_b)>cR({b}) and so R(G_b)/R(H_b)>c/(1+c). Then P(A|H_b)=P(G_b|H_b)=R(G_b)/R(H_b)>c/(1+c). But the H_b are a partition of our probability space, and P(A)<c/(1+c), so we have a violation of conglomerability.