Showing posts with label complexity. Show all posts
Showing posts with label complexity. Show all posts

Thursday, September 26, 2024

Laws and mathematical complexity

Over the last couple of days I have realized that the laws of physics are rather more complex than they seem. The lovely equations like G = 8πT and F = Gmm′/r2 (with a different G in the two equations) seem to be an iceberg most of which is submerged in the icy waters of the foundations of mathematics where the foundational concepts of real analysis and arithmetic are defined in terms of axioms.

This has a curious consequence. We might think that F = Gmm′/r2 is much simpler than F = Gmm′/r2 + Hmm′/r3 (where H is presumably very, very small). But if we fill out each proposal with the foundational mathematical structure, the percentage difference in complexity will be slight, as almost all of the contribution to complexity will be in such things as the construction of real numbers (say, via Dedekind cuts).

Perhaps, though, the above line of thought is reason to think that real analysis and arithmetic are actually fundamental?

Friday, October 6, 2023

Complexity and skeptical hypotheses

Suppose a strong epistemic preference for simpler theories of the world. One might then think that a simulation hypothesis is automatically more complex than the best physical story about our world, because in addition to all the complexity of our simulated cosmos, it includes the complexity of whatever physical cosmos houses the hardware running the simulation.

But this need not be the case. The best physical story about our world makes our world include vast amounts of information that would not need to be included in the simulation. To simulate the history of the human race, we at most need information on the particles wihtin a sphere of radius about a hundred thousand light-years, so basically just the Milky Way Galaxy, a very small fraction of the particles in the world. And even that is a vast overstatement. One can surely have a low simulation resolution for a lot of stuff, simulating things only on a macroscopic level, and only including particle-level information when the simulated humans peer into scientific instruments. So the information content of the simulation software could be much, much lower than the information content of the physical world that our best theories say we live in.

But what about the simulation hardware itself? Wouldn’t that need to live in a complex physical universe? Maybe, but that universe need not be as complex as our physical theories claim ours to be. It could be a universe that has a level of physical complexity optimized for running the computing hardware. The granularity of that universe could be much coarser than ours. For instance, instead of that universe being made of tiny subatomic particles like ours, requiring many (but fewer and fewer with progress in miniaturization) particles per logic gate, we could suppose a universe optimized for computing whose fundamental building blocks are logic gates, memory cells, etc.

I am dubious, thus, whether we can rule out simulation hypotheses by an epistemic preference for simpler theories. The same goes for Berkeleian skeptical hypotheses on which there is no physical world, but we are disembodied minds being presented with qualia.

And of course the “local five minute hypothesis”, on which the universe is five minutes old and has a radius of five light-minutes, posits a world with intuitively much less complexity than the world of our best theories, a world with vastly fewer particles.

But if we cannot avoid skeptical hypotheses on grounds of complexity, how can we avoid them?

My current view is that we simply have to suppose that our priors are normatively constrained by human nature (which on my Aristotelian view is a form, a real entity), and human nature requires us to have non-skeptical priors. This is a very anthropocentric account.

Friday, June 22, 2018

Language and specified complexity

Roughly speaking—but precisely enough for our purposes—Dembski’s criterion for the specified complexity of a system is that a ratio of two probabilities, pΦ/pL, is very low. Here, pL is the probability that by generating bits of a language L at random we will come up with a description of the system, while pΦ is the physical probability of the system arising. For instance, when you have the system of 100 coins all lying heads up, pΦ = (1/2)100 while pL is something like (1/27)9 (think of the description “all heads” generated by generating letters and spaces at random), something that pΦ/pL is something like 6 × 10−18. Thus, the coin system has specified complexity, and we have significant reason to look for a design-based explanation.

I’ve always been worried about the language-dependence of the criterion. Consider a binary sequence that intuitively lacks specified complexity, say this sequence generated by random.org:

  • 0111101001100111010101011001100111001110000110011110101101101101001011011000011101100111100111111111

But it is possible to have a language L where the word “xyz” means precisely the above binary sequence, and then relative to that language pL will be much, much bigger than 2−100 = pΦ.

However, I now wonder how much this actually matters. Suppose that L is the language that we actually speak. Then pL measures how “interesting” the system is relative to the interests of the one group of intelligent agents we know well—namely, ourselves. And interest relative to the one group of intelligent agents we know well is evidence of interest relative to intelligent agents in general. And when a system is interesting relative to intelligent agents but not probable physically, that seems to be evidence of design by intelligent agents.

Admittedly, the move from ourselves to intelligent agents in general is problematic. But we can perhaps just sacrifice a dozen orders of magnitude to the move—maybe the fact that something has an interest level pL = 10−10 to us is good evidence that it has an interest level at least 10−22 to intelligent agents in general. That means we need the pΦ/pL ratio to be smaller to infer design, but the criterion will still be useful: it will still point to design in the all-heads arrangement of coins, say.

Of course, all this makes the detection of design more problematic and messy. But there may still be something to it.

Monday, February 15, 2016

Presentism and theoretical simplicity

It's oft stated that Ockham's razor favors the B-theory over the A-theory, other things being equal. But the theoretical gain here is small: the A-theorist need only add one more thing to her ideology over what the A-theorist has, namely an absolute "now", and it wouldn't be hard to offset this loss of parsimony by explanatory gains. But I want to argue that the gain in theoretical simplicity by adopting B-theoretic eternalism over presentism is much, much larger than that. In fact, it could be one of the larger gains in theoretical simplicity in human history.

Why? Well, when we consider the simplicity of a proposed law of nature, we need to look at the law as formulated in joint-carving terms. Any law can be formulated very simply if we allow gerrymandered predicates. (Think of "grue" and "bleen".) Now, if presentism is true, then a transtemporally universally quantified statement like:

  1. All electrons (ever) are negatively charged
should be seen as a conjunction of three statements:
  1. All electrons have always been negatively charged, all electrons are negatively charged and all electrons will always be positively charged.
But every fundamental law of nature is transtemporally universally quantified, and even many non-fundamental laws, like the laws of chemistry and astronomy, are transtemporally universally quantified. The fundamental laws of nature, and many of the non-fundamental ones as well, look much simpler on B-theoretic eternalism. This escapes us, because we have compact formulations like (1). But if presentism is true, such compact formulations are mere shorthand for the complex formulations, and having convenient shorthand does not escape a charge of theoretical complexity.

In fact, the above story seems to give us an account of how it is that we have scientifically discovered that eternalist B-theory is true. It's not relativity theory, as some think. Rather it is that we have discovered that there are transtemporally quantified fundamental laws of nature, which are insensitive to the distinction between past, present and future and hence capable of a great theoretical simplification on the hypothesis that eternalist B-theory is true. It is the opposite of what happened with jade, where we discovered that in fact we achieve simplification by splitting jade into two natural kinds, jadeite and nephrite.

Technical notes: My paraphrase (2) fits best with something like Prior's temporal logic. A competitor to this are ersatz times, as in Crisp's theory. Ersatz time theories allow a paraphrase of (1) that seems very eternalist:

  1. For all times t, at t every electron is negatively charged.
However, first, the machinery of ersatz times is complex and so while (3) looks relatively simple (it just has one extra quantifier beyond (1)), if we expand out what "times" means for the ersatzist, it becomes very complex. Moreover on standard ersatzist views, the laws of nature become disjunctive in form, and that is quite objectionable. For a standard approach is to take abstract times to be maximal consistent tensed propositions, and then to distinguish actual times as times that were, are or will be true.

Tuesday, July 14, 2015

Vague mental states

I've been thinking through my intuitions about vagueness and mental states, especially conscious ones. It certainly seems natural to say that it can be vague whether you are in pain or itching, or that it can be vague whether you are sure of something or merely believe it strongly. But I find very plausible the following mental non-vagueness principle:

  • (MNV) Let M be a maximally determinate mental state. Then it cannot be vague whether I am in M.
MNV is compatible with the above judgments. For if I am in a borderline case between pain and itch, it is not vague that I have the maximally determinate unpleasant conscious state U that I have. Rather, what is vague is whether U is a pain or an itch. Intuitively, this is not a case of ontological vagueness, but simply of how to classify U. Similarly, if I am borderline between sureness and strong belief, there is a maximally determinate doxastic state D that I have, and I have it definitely. But it's vague whether this state is classified as sureness or strong belief.

Interestingly, though, MNV is strong enough to rule out a number of popular theories.

The first family of theories ruled out by MNV is just about any theory of diachronic personal identity that allows personal identity to be vague. Psychological continuity theories, for instance, are going to have to make personal identity be vague (on pain of having a very implausible cut-off). More generally, I suspect any theory of personal identity compatible with reductive materialism will make personal identity be vague. But suppose it's vague whether I am identical with person B who exists at a later time t. Then likely B has, and surely could have, a maximally determinate mental state M at t that definitely nobody else has at that time. Then if it's vague at t whether I am B, it's vague at t whether I have M, contrary to MNV.

I suppose one could weaken MNV to say that it's not vague whether something is in M. I would resist this weakening, but even the weakened MNV will be sufficiently strong to rule out typical (i.e., non-Aristotelian) functionalist theories of mind. For suppose that my present maximally determinate mental state M is constituted by computational state C. But now imagine a sequence of possible worlds, starting with the actual, and moving to worlds where my brain is more and more gerrymandered. Just replace bits of my brain by less and less natural prosthetics, in such ways that it becomes more and more difficult to interpret my brain as computing C. (For instance, at some point whether something counts as a computational state may depend on whether it's raining on a far away planet.) Suppose also nothing else computing C is introduced. Then there will be a continuum of worlds, at one end of which there is computation of C and at the other of which there isn't. But it would be arbitrary to have a cut-off as to where M is exemplified. So it's vague whether M is exemplified in some of these worlds, contrary to MNV.

Tuesday, November 25, 2014

Simplicity, language-independence and laws

One measure of the simplicity of a proposition is the length of the shortest sentence expressing the proposition. Unfortunately, this measure is badly dependent on the choice of language. Normally, we think of the proposed law of nature

  • F=Gmm'/r2
as simpler than:
  • F=Gmm'/r2.000000000000000000000000000000000000001,
but if my language has a name "H" for the number in the exponent, then the second law is as brief as the first:
  • F=Gmm'/rH.

One common move is to employ theorems to the effect that given some assumptions, measures of simplicity using different languages are going to be asymptotically equivalent. These theorems look roughly like this: if cL is the measure of complexity with respect to language L, then cL(pn)/cM(pn) converges to 1 whenever pn is a sequence of propositions (or bit-strings or situations) such that either the numerator or the denominator goes to infinity. I.e., for sufficiently complex propositions, it doesn't matter which language we choose.

Unfortunately, one of the places we want to engage in simplicity reasoning in is with respect to choosing between different candidates for laws of nature. But it may very well turn out that the fundamental laws of physics—and maybe even a number of non-fundamental laws—are sufficiently simple that theorems about asymptotic behavior of complexity measures are of no help at all, since these theorems only tell us that for sufficiently complex cases the choice of language doesn't matter.

Monday, February 24, 2014

"If there are so many, then probably there are more"

Suppose the police have found one person involved in the JFK assassination. Then simplicity grounds may give us significant reason to think that that one person is the sole killer. But suppose that they have found 15 people involved. Then while the hypothesis H15 that there were exactly 15 conspirators is simpler than the hypothesis Hn that there were exactly n for n>15, nonetheless barring special evidence that they got them all, we should suspect that there are more conspirators at large. With that large number, it's just not that likely that all were caught.

Why is this? I think it's because even though prior probabilities decrease with complexity, the increment of complexity from H15 to, say, H16 or H17 is much smaller than the increment of complexity from H1 to H2. Maybe P(H2)≈0.2P(H1). But surely we do not have P(H16)≈0.2P(H15). Rather, we have a modest decrease, maybe P(H16)≈0.9P(H15) and P(H17)≈0.9P(H16). If so, then P(H16)+P(H17)≈1.7P(H15). Unless we receive specific evidence that favors H15 over H16 and H17, something like this will be true of the posterior probabilities, and so the disjunction of H16 and H17 will be significantly more likely that H15.

Thus we have a heuristic. If our information is that there are at least n items of some kind, but we have no evidence that there are no more, then when n is small, say 1 or 2 or maybe 3, it may be reasonable to think there are no more items of that kind. But if n is bigger—my intuition is that the switch-around is around 6—then under these conditions it is reasonable to think there are more. If there are so many, then probably there are more. And this just follows from the fact that the increase in complexity from 1 to 2 is great, and from 2 to 3 is significant, but from 6 to 7 or maybe even 4 to 5 it's not very large.

This is all just intuitive, since I do not have any precise way to assign prior probabilities. But staying at this intuitive level, we get some nice intuitive applications:

  • If after thorough investigation we have found only one kind of good that could justify God's permitting evil, then we have significant evidence that it's the only such good. And if some evil is no justified by that kind of good, then that gives significant evidence that it's not justified. But suppose we've found six, say. And it's easy to find at least six: (1) exercise of virtues that deal with evils; (2) significant freedom; (3) preservation of laws of nature; (4) opportunities to go beyond justice via forgiveness[note 1]; (5) adding variety to life; (6) punishment; (7) the great goods of the Incarnation and sacrifice of the cross. So we have good reason to think there are more permission-of-evil justifying goods that we have not yet found. (Alston makes this point.)
  • Suppose our best definition of knowledge has three clauses. Then we might reasonably suspect that we've got the definition. But it is likely, given Gettier stuff, that one needs at least four clauses. But for any proposed definition with four clauses, we should be much more cautious to think we've got them all.
  • Suppose we think we have four fundamental kinds of truths, as Chalmers does (physics, qualia, indexicals and that's all). Then we shouldn't be confident that we've got them all. But once we realize that the list leaves out severel kinds (e.g., morality, mathematics, intentions and intentionality, pace Chalmers), our confidence that we have them all should be low.
  • If our best physics says that there are two fundamental laws, we have some reason to think we've got it all. But if it says that there six, we should be dubious.

Wednesday, July 21, 2010

A defense (well, sort-of) of specified complexity as a guide to design

I will develop Dembski's specified complexity in a particular direction, which may or may not be exactly his, but which I think can be defended to a point.

Specified Complexity (SC) comes from the fact that there are three somewhat natural probability measures on physical arrangements. For definiteness, think of physical arrangements as black-and-white pixel patterns on a screen, and then there are 2n arrangements where n is the number of pixels.

There are three different fairly natural probability measures on this.

1. There is what one might call "a rearrangement (or Humean) measure" which assigns every arrangement equal probability. In the pixel case, that is 2-n.

2. There is "a nomic measure". Basically, the probability of an arrangement is the probability that, given the laws (and initial conditions? we're going to have two ways of doing it--one allowing the initial conditions to vary, and one to vary), such an arrangement would arise.

3. There is what one might call "a description measure". This is relative to a language L that can describe pixel arrangements. One way to generate a description measure is to begin by generating random finite-length strings of symbols from L supplemented with an "end of sentence" marker which, when generated, ends a string. Thus, the probability of a string of length k is m-k where m is the number of symbols in L (including the end of sentence marker). Take this probability measure and condition on (a) the string being grammatical and (b) describing a unique arrangement. The resulting conditional probability measure on the sentences of L that describe a unique arrangement then gives rise to a probability measure on the arrangements themselves: the description probability of an arrangement A is the (conditionalized as before) probability that a sentence of L describes A.

So, basically we have the less anthropocentric nomic and rearrangement measures, and the more anthropocentric description measure. The rearrangement measure has no biases. The nomic measure has a bias in favor of what the laws can produce. The description measure has a bias in favor of what can be more briefly described.

We can now define SC of two sorts. An arrangement A has specified rearrangement (respectively, nomic) complexity, relative to a language L, provided that A's rearrangement (respectively, nomic) measure is much smaller than its L-description measure. (There is some technical stuff to be done to extend this to less specific arrangements--the above works only for fully determinate arrangements.)

For instance, consider the arrangement where all the pixels are black. In a language L based on First Order Logic, there are some very short descriptions of this: "(x)(Bx)". So, the description measure of the all-black arrangement will be much bigger than the description measure of something messy that needs a description like "Bx1&Bx2&Wx3&...&Bxn". On the other hand, the rearrangement measure of the all-black arrangement is the same as that of any other arrangement. In this case, then, the L-description measure of the all-black arrangement will be much greater than its rearrangement measure, and so we will have specified rearrangement complexity, relative to L. Whether we will have nomic rearrangement complexity depends on the physics involved in the arrangement.

All of the above seems pretty rigorous, or capable of being made so.

Now, given the above, we have the philosophical question: Does SC give one reason to suppose agency? Here is where things get more hairy and less rigorous.

An initial problem: The concept of SC is language-relative. For any arrangement A, there is a language L1 relative to which A lacks complexity and a language L2 relative to which A has complexity. So SC had better be defined in terms of a privileged kind of language. I think this is a serious problem for the whole approach, but I do not know that it is insuperable. For instance, easily inter-translatable languages are probably going to give rise to similar orders of magnitude within the description measures. We might require that the language L be the language of a completed and well-developed physics. Or we might stipulate L to be some extension of FOL with the predicates corresponding to the perfectly normal properties. There are tough technical problems here, and I wish Dembski would do more here. Call any language that works well here "canonical".

Once we have this taken care of, it it can be done, we can ask: Is there any reason to think that SC is a mark of design?

Here, I think Dembski's intuition is something like this: Suppose I know nothing of an agent's ends. What can I say about the agent's intentions? Well, an agent's space of thoughts is going to be approximately similar to a canonical language (maybe in some cases it will constitute a canonical language). Without any information on the agent's ends, it is reasonable to estimate the probabilities of an agent having a particular intention in terms of the description measure relative to a canonical language.

But if this is right, then the approach has some hope of working, doesn't it? For suppose you have nomic specified complexity of an arrangement A relative to a canonical language. Then P(A|no agency) will be much smaller than the description measure of L, which is an approximation to P(A|agency) with no information about the sort of agency going on. Therefore, A incrementally confirms the agency hypothesis. The rest is a question of priors (which Dembski skirts by using absolute probability bounds).

I think the serious problems for this approach are:

  • The problem of canonical languages.
  • The problem that in the end we want this to apply even to supernatural designers who probably do not think linguistically. Why think that briefer descriptions are more likely to match their intentions?
  • We do have some information on the ends of agents in general--agents pursue what they take to be valuable. And the description measure does not take value into account. Still, insofar as there is value in simplicity, and the description measure favors briefer descriptions, the description measure captures something of value.

Wednesday, May 7, 2008

Dembski's definition of specified complexity

A central part of Dembski's definition of specified complexity is a way of measuring whether an event E is surprising. This is not just a probabilistic measure. If you roll eleven dice and get the "unsurprising" sequence 62354544555, this sequence has the same probability 1/611 as the intuitively more "surprising" sequences 12345654321 or 11111111111. It would be a mistake (a mistake actually made by some commenters on the design argument) to conclude from the probabilistic equality that there is no difference in surprisingness, since what one should conclude that is that surprisingness is not just a matter of the probabilities. Instead of talking about "surprisingness", however, Dembski talks about "specification". The idea is that you can "specify" the sequences 12345654321 or 11111111111 ahead of time in a neat way. The first you specify as the only sequence of eleven dice throws consisting of a strict monotonic ending precisely where a strict monotonic decline begins. The second is one of only six sequences of eleven dice throws that each yield the same result.

I will describe Dembski's account of specification, and that will be somewhat technical, and then I will criticize it, and consider a way of fixing it up which is not entirely satisfactory.

Dembski proposes a measure of specification.[note 1] Suppose we have a probability space S (e.g., the space of all sequence of eleven dice throws) with a probability measure PH (defined by some chance hypothesis H). Let f be a "detached" real-valued function on S (a lot more on detachment later). Then an event E in the probability space S is just a measurable subset of the probability space. For any real-valued function f defined on S and real number y, let fy be the set of all points x in S such that f(x)≥y. This is an event in S. Indeed, fy is the event of being at a point x in our probability space where f(x) is at least y.

We now say that an event E in S is specified to significance a provided that there is a function f on S "detached" from E (a lot more on detachment later on) and a real number y such that fy contains E and PH(fy)<a.

For instance, in our eleven dice throw case, if x is a sequence of eleven dice throw results, let f(x) be the greatest number n such that at least n of the throw results in x are the same. Then, f11 is equivalent to the event that all eleven of the dice throws were the same. Let E be the event of the sequence 11111111111 occurring. Then E is contained in f11, and PH(f11)=1/611<10-8, and so E is specified to significance 10-8, as long as we can say that f is detached from E. Similarly, we can let f be the length of the largest interval over which a sequence of dice throws is monotonic increasing plus the length of the largest interval over which a sequence of dice throws is monotonic decreasing, and then our sequence 12345654321 will be a member of f12, and if f is detachable, we can thus compute a significance for this result.

The crucial part of the account is the notion of "detachability". Without such a condition, every improbable event E is significant. Given our intuitively unsurprising sequence 62354544555 (which was as a matter of fact generated by a pretty random process: I made it by using random.org[note 2]). Let f be the function assigning 1 to the sequence 62354544555 and 0 to every other sequence. Then our given sequence is the only member of f1, and so without any detachability condition on f, we would conclude that we have specification to a high degree of significance. But of course this is cheating. The function f was jerryrigged by me to detect the event we were looking at, and one can always thus jerryrig a function. To check for specification, however, we need a function f that could in principle have been specified beforehand, i.e., before we found out what the result of the dice throwing experiment was. If we get significance with such a function, then we can have some confidence that our event E is specified.

Dembski, thus, owes us an account of detachability. In No Free Lunch, he offers the following:

a rejection function f is detachable from E if and only if a subject possesses background knowledge K that is conditionally independent of E (i.e., P(E|H& K) = P(E|H)) and such that K explicitly and univocally identifies the function f.

Or, to put it in our notation, f is detachable from E iff the epistemic agent has background knowledge K such that PH(E|K)=PH(E). It is hard to overstress how central this notion of detachability is to Dembski's account of specification, and therefore to his notion of specified complexity, and thus to his project.

But there is a serious problems with detachability: I am not sure that the independence condition PH(E)=PH(E|K) makes much sense. Ordinarily, the expression P(...|K) makes sense only if K is an event in the probability space or K is a random variable on the probability space (i.e., a measurable function on the probability space). In this case, K is "knowledge". This is ambiguous between the content of the knowledge and the state of knowing. Let's suppose first that K is the content of the knowledge—that's, after all, what we normally mean in probabilistic epistemology when we talk of conditioning on knowledge. So, K is some proposition which, presumably, expresses some event—probabilities are defined with respect to events, not propositions, strictly speaking.[note 3] What is this proposition and event? The knowledge is supposed to "identify" the function f. It seems, then, that K is a proposition of the form "There is a unique function f such that D(f)", where D is an explicit and univocal identification.

But on this reading of "knowledge", the definition threatens uselessness. Let K be the proposition that there is a unique function f such that f(x)=1 if and only if x equals 62354544555 and f(x)=0 otherwise. This function f was our paradigm of a non-detachable function. But what is PH(E|K)? Well, K is a necessary truth: It is a fact of mathematics that there is a unique function as described. If PH is an objective probability, then all necessary truths have probability 1, and so to condition on a necessary truth changes nothing: PH(E|K)=PH(E), and we get detachability for free for f, and indeed for every other function.

So on the reading where K is the content of the knowledge, if necessary truths get unit probability, Dembski's definition is pretty much useless—every function that has a finite mathematical description becomes detachable, since truths about whether a given finite mathematical description uniquely describes a function are necessary truths.

But perhaps PH is an epistemic probability, so that necessary truths might have probability less than 1. One problem with this is that much of the nice probabilistic apparatus now breaks down. How on earth do we define a probability space in such a way that we can assign probabilities less than 1 to necessary truths? Do we partition the space of possibilities-and-impossibilities into regions where it is true that there is a unique function f such that f(x)=1 iff x=62354544555 and f(x)=0 otherwise and regions where this is false? I am not sure what we can make of probabilities in the regions where this is false. Presumably they are regions where mathematics breaks down. How do we avoid incoherence in applying probability theory—as Dembski wants to!—over the space of possibilities-and-impossibilities?

Moreover, it seems to me that on any reasonable notion of epistemic probabilities, those necessary truths that the epistemic agent would immediately see as necessary truths were they presented to her should get probability 1. Any epistemic agent who is sufficiently smart to follow Dembski's arguments and who knows set theory would immediately see as a necessary truth the claim that there is a unique function f on S such that f(x)=1 iff x=62354544555 and f(x)=0 otherwise. So even if we allow that some necessary truths, such as that horses are mammals, might get epistemic probabilities less than 1, the ones that matter for Dembski are not like that—they are self-evident necessary truths in the sense that once you understand them, you understand that they are true. The prospects for an account of epistemic probability that does not assign 1 to such necessary truths strike me as unpromising, though I think this is the route Dembski actually wants to go according to Remark 2.5.7 of No Free Lunch.

Besides, as a matter of fact, any agent who is sufficiently smart to understand Dembski's methods will be one who will assign 1 to the claim that there is a unique function f as above. So on the objective probability reading, Dembski's definition of detachability applies to all finitely specifiable functions. On the epistemic, it does so too, at least for agents who are sufficiently smart. This makes Dembski's definition just about useless for any legitimate purposes.

Let's now try the second interpretation of K, where K is not the content of the knowledge, but the event of the agent's actually knowing the identification of f. This is more promising, I think. Let p be the proposition that there is a unique function f on S such that f(x)=1 iff x=62354544555 and f(x)=0 otherwise. Let us suppose, then, that K is the event of the agent knowing that p. It is essential, we've seen, to judging f to be non-detachable that PH(K) be not equal to 1. This requires a theory of knowledge where for an agent to know p is more than just for an agent to be in a position to know p, as when the agent knows things that self-evidently entail p. An actual explicit belief is required for knowledge on this view. Seen this way, PH(K)<1, since the agent might never have thought about K. Since K is a bona fide event on this view, we can apply probability theory without any worries about dealing with incoherence. So far so good.

But new problems show up. It is essential to Dembski's application of his theory to Intelligent Design that it apply in cases where people have only thought of f after seeing the event E—cases of "old evidence". Take, for instance, Dembski's example of the guy whose allegedly random choices of ballot orderings heavily favored one party. Dembski proposes a function f that counts the number of times that one party is on the top of the ballot. But I bet that Dembski did not actually think of this function before he heard of the event E of skewed ballot orderings. Moreover, hearing of the event surely made him at least slightly more likely to think of this function. If he never heard of this event, he might never have thought about the issue of ballot orderings, and hence about functions counting them. There is surely some probabilistic dependence between Dembski's knowing that there is such a function and the event E. Similarly, seeing the sequence 11111111111 does make one more likely to think of the function counting the number of repetitions. One might have thought of that function anyway, but the chance of thinking of it is higher when one does see the result. Hence, there is no independence, and, thus, no detachability.

This problem is particularly egregious in some of the biological cases that ultimately one might want to apply Dembski's theory to. Let's consider the event E that there is intelligent life. Let K be any state of knowledge identifying a function. Surely, there is probabilistic dependence between E and K. After all, PH(K|~E)=0, since were there no intelligent life, nobody would know anything, as there would be nobody to do the knowing. Thus, PH(E|K)=1, which entails that E and K are not probabilistically independent unless P(E)=1.

So the problem is that in just about no interesting case where we already knew about E will f be detachable from E, and yet the paradigmatic applications of Dembski's theory to Intelligent Design are precisely such cases. Here is a suggestion for how to fix this up (inspired by some ideas in Dembski's The Design Inference). We allow a little bit of dependence between E and K, but require that the amount of dependence not be too big. My intuition is that smaller the significance a of the specification (note that the smaller the significance a, the more significant the specification—that's how it goes in statistics), the more dependence we can permit. To do that right, we'd have to choose an appropriate measure of dependence, but since I'm just sketching this, I will leave out the details.

However, there is a difficulty. The difficulty is that in "flagship cases" of Intelligent Design, such as the arising of intelligence or of reproducing life-forms, there is a lot of dependence between E and K, since our language is in large part designed (consciously or not) for discussing these kinds of events. It is in large part because reproducing life-forms are abundant on earth that our language makes it easy to describe reproduction, and that our language makes it easy to describe reproduction significantly increases the probability that we will think of functions f that involve reproductive concepts. In these cases, the amount of dependence between E and K will be quite large.

There may still be cases where there is little dependence, at least relative to some background data. These will be cases where our language did not develop to describe the particular cases observed but developed to describe other cases, perhaps similar to the ones observed but largely probabilistically independent of them. Thus, our language about mechanics and propulsion plainly did not develop to describe bacterial flagella, and it may be that the existence of bacterial flagella is probabilistically independent of the things for which our language developed. So maybe the above account works if K is a state of knowing a specification that includes bacterial flagella. Or not! There are hard questions here. One of the hard questions is with regard to how particular K is. Is K the event of one particular knower, say William Dembski, having the identification of f? If so, then there is a lot of probabilistic dependence between the existence of bacterial flagella and K, since the probability of Dembski's existing in a world where there are no bacterial flagella is very low, since history would have gone very differently without bacterial flagella, and probably Dembski would never have come into existence.

Or is K the event of some knower or other having the identification of f? Then, to evaluate the dependence between K and the existence of bacterial flagella we would have to examine the almost intractable question of what a world without bacterial flagella would have been like.