Showing posts with label measurement. Show all posts
Showing posts with label measurement. Show all posts

Tuesday, January 21, 2025

Kripke's standard meter

Back when there was a standard meter, Kripke claimed that it was contingent a priori that the standard meter is a meter in length.

This seems wrong. For anything narrowly logically entailed by something that’s a priori is also a priori. But that the standard meter is a meter in length entails that there is an extended object. And that there is an extended object is clearly a posteriori.

Kripke’s reasoning is that to know that the standard meter is a meter in length all you need to know is how “meter” is stipulated, namely as the actual length of the standard meterstick, and anything you can know From knowing how the terms are stipulated is known a priori.

There is something fishy here. We don’t know a priori that the stipulation was successful (it might have failed if, for instance, the “standard meter” never existed but with a conspiracy to pretend it exists). In fact, we don’t know a priori that any stipulations were ever made—that, too, is clearly a posteriori.

Maybe what we need here is some concept of “stipulational content”, and the idea is that something is a priori if you can derive it a priori from the stipulational content of the terms. But the stipulational content of a term needs to be defined in such a way that it’s neutral on whether the stipulation happened or succeeded. If so, then Kripke should have said that it’s a priori that if there is a standard meterstick, it is a meter long.

Monday, January 9, 2023

Three non-philosophy projects

Here are some non-philosophy hobby projects I’ve been doing over the break:

  • Measuring exercise bike power output

  • Dumping NES ROMs

  • Adapting Dance Dance Revolution and other mat controllers to work as NES Power Pad controllers for emulation.



Friday, September 10, 2021

Comparing the epistemic relevance of measurements

Suppose P is a regular probability on (the powerset of) a finite space Ω representing my credences. A measurement M is a partition of Ω into disjoint events E1, ..., En, with the result of the experiment being one of these events. In a given context, my primary interest is some subalgebra F of the powerset of Ω.

Note that a measurement can be epistemically relevant to my primary interest without any of the events in in the measurement being something I have a primary interest in. If I am interested in figuring out whether taller people smile more, my primary interest will be some algebra F generated by a number of hypotheses about degree to which height and smiliness are correlated in the population. Then, the measurement of Alice’s height and smiliness will not be a part of my primary interest, but it will be epistemically relevant to my primary interest.

Now, some measurements will be more relevant with respect to my primary interest than others. Measuring Alice’s height and smiliness will intuitively be more relevant to my primary interest about height/smile correlation, while measuring Alice’s mass and eye color will be less so.

The point of this post is to provide a relevance-based partial ordering on possible measurements. In fact, I will offer three, but I believe they are equivalent.

First, we have a pragmatic ordering. A measurement M1 is at least as pragmatically relevant to F as a measurement M2, relative to our current (prior) credence assignment P, just in case for every possible F-based wager W, the P-expected utility of wagering on W after a Bayesian update on the result of M1 is at least as big as that of wagering of W after updating on the result of M2, and M1 is more relevant if for some wager W the utility of wagering after updating on the result of M1 is strictly greater.

Second, we have an accuracy ordering. A measurement M1 is at least as accuracy relevant to F as a measurement M2 just in case for every proper scoring rule s on F, the expected score of updating on the result of M1 is better than or equal to the expected score of updating on the result of M2, and M1 is more relevant when for some scoring rule the expected score is better in the case of M1.

Third, we have a geometric ordering. Let HP, F(M) be the horizon of a measurement M, namely the set of all possible posterior credence assignments on F obtained by starting with P, conditionalizing on one of the possible events in that M partitions Ω into, and restricting to F. Then we say that M1 is at least as (more) geometrically relevant to F as M2 just in case the convex hull of the horizon of M1 contains (strictly contains) the convex hull of the horizon of M2.

I have not written out the details, but I am pretty sure that all three orderings are equivalent, which suggests that I am on to something with these concepts.

An interesting special case is when one’s interest is binary, an algebra generated by a single hypothesis H, and the measurements are binary, i.e., partitions into two sets. In that case, I think, a measurement M1 is at least as (more) relevant as a measurement M2 if and only if the interval whose endpoints are the Bayes factors of the events in M1 contains (strictly contains) the interval whose endpoints are the Bayes factors of the events in M2.

Thursday, July 22, 2021

Measuring rods

In his popular book on relativity theory, Einstein says that distance is just what measuring rods measure. I am having a hard time making sense of this in Einstein’s operationalist setting.

Either Einstein is talking of real measuring rods or idealized ones. If real ones, then it’s false. If I move a measuring rod from one location to another, its length changes, not for relativistic reasons, but simply because the acceleration causes some shock to it, resulting in a distortion in its shape and dimensions, or because of chemical changes as the rod ages. But if he’s talking about idealized rods, then I think we cannot specify the relevant kind of idealization without making circular use of dimensions—relevantly idealized rods are ones that don’t change their dimensions in the relevant circumstances.

If one drops Einstein’s operationalism, one can make perfect sense of what he says. We can say that distance is the most natural of the quantities that are reliably and to a high degree of approximation measured by measuring rods. But this depends on a metaphysics of naturalness: it’s not a purely operational definition.

Friday, February 10, 2017

Measurement error

Let’s say that I am in the lab and I am measuring some unknown value U. My best model of the measurement process involves a random additive error E independent of U, with E having some known distribution, say a Gaussian of some particular standard deviation (perhaps specified by the measurement equipment manufacturer) centered around zero. The measurement gives the value 7.3. How should I now answer probabilistic questions like: “How likely is it that U is actually between 7.2 and 7.4?”

Here’s how this is sometimes done in practice. We know that U = 7.3 − E. Then we say that the probability that U is, say, between 7.2 and 7.4 is the same as the probability that E is between −0.1 and 0.1, and we calculate the latter probability using the known distribution of E.

But this is an un-Bayesian way of proceeding. We can see that from the fact that we never said anything about our priors regarding U, and for a Bayesian that should matter. Here’s another way to see the mistake: When I calculated the probability that U was between 7.2 and 7.4, I used the prior distribution of E. But to do that neglects data that I have received. For instance, suppose that U is the diameter of a human hair that I have placed between my digital calipers. And the calipers show 7.3 millimeters. What is the probability that the hair really has a diameter between 7.2 and 7.4 millimeters? It’s vanishingly small! That would be just an absurdly large diameter for a hair. Rather, the fact that the calipers show 7.3 millimeters shows that E is approximately equal to 7.3 millimeters. The posterior distribution of E, given background information on human hair thickness, is very different from the prior distribution.

Yet the above is what one does in practice. Can one justify that practice? Yes, in some cases. Generalize a little. Let’s say we measure the value of U to be α, and we want to know the posterior probability that U lies in some set I. This probability is:

P(U ∈ I|U + E = α)=P(α − E ∈ I|U + E = α).

Now suppose that E has a certain maximum range, say, from −δ to δ. (For instance, there is no way that digital with four digits can show more than 9999 or less than −9999.) And suppose that U is uniformly distributed over the region from α − δ to α + δ, i.e., its distribution over that region is perfectly flat. In that case, it’s easy to see that E and U + E = α are actually statistically independent. Thus:

P(U ∈ I|U + E = α)=P(α − E ∈ I).

And so in this case our initial naive approach works just fine.

In the original setting, if for instance we’re completely confident that E cannot exceed 0.5 in absolute value, and our prior distribution for U is flat from 6.8 to 7.8, then the initial calculation that the probability that U is between 7.2 and 7.4 equals the prior probability that E is between −0.1 and 0.1 stands. (The counterexample then doesn’t apply, since in the counterexample we had the possibility, now ruled out, that E is really big.)

The original un-Bayesian way of approaching basically pretended that U was (per impossibile) uniformly distributed over the whole real line. When U is close to uniformly distributed over a large salient portion of the real line, the original way kind of works.

The general point goes something like this: As long as the value of E is approximately independent of whether U + E = α, we can approximate the posterior distribution of E by its prior and all is well. In the case of the hair measurement, E was not approximately independent of whether U + E = 7.3, since if U + E = 7.3, then very likely E is enormous, but I assume E isn’t in other cases very likely to be enormous.

This is no doubt stuff well-known to statisticians, but I’m not a statistician, and it’s clarified some things for me.

The naive un-Bayesian calculation I gave at the beginning is precisely the one that I used in my previous post when adjusting for errors in the evaluation of evidence. But an appropriate flatness of prior distribution assumption can rescue the calculations in that post.

Monday, June 30, 2014

Measuring rotational speed with a phone and an LED

Over the weekend, I was having fun with using an LED as a photodiode, and hooking it up to my oscilloscope. This can be used to measure the speed of a drill (just stick a reflective spot on a matte chuck and use a flashlight). I was going to make an Instructable on measuring rotational speeds of various objects, but my son told me that most people don't have an oscilloscope. But then I found you can just connect the LED to the microphone input on a phone and use a free oscilloscope app, and use that to measure rotational speed. And so I made an Instructable that doesn't need an oscilloscope.

Tuesday, June 17, 2014

Two kinds of tools

I picked up this beauty on Saturday from a generous person in Austin who was selling it on Craigslist, but gave it to me for free when it became clear that I'd be actually using it rather than reselling it. (The image on screen is a one-shot capture of the electricity generated by a DC motor when you set it spinning by hand and then let it slow down. Surprisingly linear down-slope, by the way.)

Anyway, thinking about this led me to a curious distinction between two kinds of tools. A tool is used for affecting something. We can distinguish tools into:

  • Tools designed to affect minds.
  • Tools designed to affect the extra-mental world.
An oscilloscope, a calculator, measuring tape, a book, binoculars and an anti-depressant are all tools designed to produce mental effects, of a particularly calibrated sort, whether by leading to beliefs about the results of measurements, or certain kinds of perceptions, or the like. On the other hand, a bulldozer, a vacuum cleaner, a test-tube and a roof are all tools designed to affect the extra-mental world. Of course, the affecting of the mind in the case of the first kind of tool might in some instance only be a means to affecting the extra-mental world (you might read a book about how to fix a car), while the affecting of the extra-mental world in the case of the second kind of tool might in some instance only be a means to affecting the mental world (the test-tube is used to contain chemicals in order to find out something). Nonetheless, the direct intended effect of the tools is, at least in the primary intended application, as stated.

Tuesday, January 24, 2012

Beating Condorcet (well, sort of)

This builds on, but also goes back over the ground of, my previous post.

I've been playing with voting methods, or as I might prefer to call them "utility estimate aggregation methods." My basic model is there are n options (say, candidates) to choose between and m evaluators ("voters"). The evaluators would like to choose the option that has the highest utility. Unfortunately, the actual utilities of the options are not known, and all we have are estimates of the utilities by all the evaluators.

A standard method for this is the Condorcet method. An option is a Condorcet winner provided that it "beats" every other option, when an option x "beats" an option y provided that a majority of the evaluators estimates x more highly than y. If there is no Condorcet winner, there are further resolution methods, but I will only be looking at cases where there is a Condorcet winner.

My first method is

  • Method A: Estimate each option's utility with the arithmetical average of the reported utilities assigned to it by all the evaluators, and choose the option with the highest utility.
(I will be ignoring tie-resolution in this post, because all the utilities I will work with are real-numbered, and the probability of a tie will be zero.) This method can be proved to maximize epistemically expected utility under the
  • Basic Setup: Each evaluator's reported estimate of each option's utility is equal to the actual utility plus an error term. The error terms are (a) independent of the actual utilities and (b) normally distributed with mean zero. Moreover, (c) our information as to the variances of the error terms is symmetric between the evaluators, but need not be symmetric between the options (thus, we may know that option 3 has a higher variance in its error terms than option 7; we may also know that some evaluators have a greater variance in their error terms; but we do not know which evaluators have a greater variance than which).

Unfortunately, it is really hard to estimate absolute utility numbers. It is a lot easier to rank order utilities. And that's all Condorcet needs. So in that way at least, Condorcet is superior to Method A. To fix this, modify the Basic Setup to:

  • Modified Setup: Just like the Basic Setup, except that what is reported by each evaluator is not the actual utility plus error term, but the rank order of the actual utility plus error term.
In particular, we still assume that beneath the surface—perhaps implicitly—there is a utility estimate subject to the same conditions. Our method now is
  • Method B: Replace each evaluator's rank ordering with roughly estimated Z-scores by using the following algorithm: a rank of k (between 1 and n) is transformed to f((n+1/2−k)/n), where f is the inverse of the cumulative normal distribution function. Each option's utility is then estimated as the arithmetical average of the roughly estimated Z-scores across the evaluators, and the option with the highest estimate utility is chosen.

Now time for some experiments. Add to the Basic Setup the assumptions that (d) the actual utilities in the option pool are normally distributed with mean zero and variances one, and (e) the variances of all the evaluators' error terms are equal to 1/4 (i.e., standard deviation 1/2). All the experiments use 2000 runs. Because I developed this when thinking about grad admissions, the cases that interest me most are ones with a small number of evaluators and a large number of options, which is the opposite of how political cases work (though unlike in admissions, I am simplifying by looking for just the best option). Moreover, it doesn't really matter whether we choose the optimal option. What matters is how close the actual utility of the chosen option is to the actual utility of the optimal option. The difference in these utilities will be called the "error". If the error is small enough, there is no practically significant difference. Given the normal distribution of option utilities, about 95% of actual utilities are between -2 and 2, so if we have about 20 option, we can expect the best option to have a utility of somewhere of the order of magnitude of 2. Choosing at random would then give us an average error of the order of magnitude of 2. The tables below give the average errors for the 2000 runs of the experiments. Moreover, so as to avoid between different choices of resolution methods, I am discarding data from runs during which there was no Condorcet winners, and hence comparing Method A and Method B to Condorcet at its best (interestingly, Method A and Method B also work less well when there was no Condorcet winner). Discarded runs were approximately 2% of runs. Source code is available on request.

Experiment 1: 3 evaluators, 50 options.

Condorcet0.030
Method A0.023
Method B0.029
So, with a small number of evaluators and a large number of options, Method A significantly beats Condorcet. Method B slightly beats Condorcet.

Experiment 2: 50 evaluators, 50 options.

Condorcet0.0017
Method A0.0011
Method B0.0015
So we have a similar distribution of values, but of course with a larger number of evaluators, the error is smaller. It is interesting, however, that even with only three evaluators, the error was already pretty small, about 0.03 sigma for all the methods.

Experiment 3: 3 evaluators, 3 options.

Condorcet0.010
Method A0.007
Method B0.029
Method B is much worse than Condorcet and Method A in this case. That's because with three options, the naive Z-score estimation method in Method B fails miserably. With 3 options Method B is equivalent to a very simple method we might call Method C where we simply average the rank order numbers of the options across the evaluators. At least with 3 options, that is a bad way to go. Condorcet is much better, and Method A is even better if it is workable.

Experiment 4: 50 evaluators, 3 options.

Condorcet0.0003
Method A0.0002
Method B0.0159
The badness of Method B for a small number of options really comes across here. Condorcet and Method A really benefit from boosting the number of evaluators, but with only 3 options, Method B works miserably.

So, one of the interesting consequences is that Method B is strongly outperformed by Condorcet when the number of options is small. How small? A bunch of experiments suggests that it's kind of complicated. For three evaluators, Method B catches up with Condorcet at around 12 options. Somewhat surprisingly, for a greater number of evaluators, it needs more options for Method B to catch up with Condorcet. I conjecture that Method B works better than Condorcet when the number of options is significantly greater than the number of evaluators. In particular, in political cases where the opposite inequality holds, Condorcet far outperforms Method B.

One could improve on Method B, whose Achilles heel is the Z-score estimation, by having the evaluators include in their rankings options that are not presently available. One way to do that would be to increase the size of the option pool by including fake options. (In the case of graduate admissions, one could include a body of fake applications generated by a service.) Another way would be by including options from past evaluations (e.g., applicants from previous years). Then these would enter into the Z-score estimation, thereby improving Method B significantly. Of course, the down side of that is that it would be a lot more work for the evaluators, thereby making this unworkable.

Method A is subject to extreme evaluator manipulation, i.e., "strategic voting". Any evaluator can produce any result she desires by just reporting her utilities to swamp the utilities set by others. (The Basic Setup's description of the errors rules this out.) Method B is subject to more moderate evaluator manipulation. Condorcet, I am told, does fairly well. If anything like Method A is used, what is absolutely required is a community of justified mutual trust and reasonableness. Such mutual trust does, however, make possible noticeably better joint choices, which is an interesting result of the above.

So, yes, in situations of great trust where all evaluators can accurately report their utility estimates, we can beat Condorcet by adopting Method A. But that's a rare circumstance. In situations of moderate trust and where the number of candidates exceeds the number of evaluators, Method B might be satisfactory, but its benefits over Condorcet are small.

One interesting method that I haven't explored numerically would be this:

  • Method D: Have each evaluator assign a numerical evaluations to the options on a fixed scale (say, integers from 1 to 50). Adjust the numerical evaluations to Z-scores, using data from the evaluator's present and past evaluations using some good statistical method. Average these estimated Z-scores across evaluators and choose the option with the highest average.
Under appropriate conditions, this method should converge to Method A over time in the Modified Setup. There would be possibilities for manipulation, but they would require planning ahead, beyond the particular evaluation (e.g., one could keep all one's evaluations in a small subset of the scale, and then when one really wants to make a difference, one jumps outside of that small subset).

Monday, January 23, 2012

An optimal voting method (under some generally implausible assumptions)

Let me qualify what I'm going to say by saying that I know next to nothing about the voting literature.

It's time for admissions committees to deliberate. But Arrow's Theorem says that there is no really good voting method with more than two options.

In some cases, however, there is a simple voting method that, with appropriate assumptions, is provably optimal. The method is simply to have each voter estimate a voter-independent utility of every option, and then to average these estimates, and choose the option with the highest average. By a "voter-independent utility", I mean a utility that does not vary from voter to voter. This could be a global utility of the option or it could be a utility-for-the-community of the voter or even a degree to which a certain set of shared goals are furthered. In other words, it doesn't have to be a full agent-neutral utility, but it needs to be the case that the voters are all estimating the same value—so it can depend on the group of voters as a whole.

Now if we are instead to choose n non-interacting options (i.e., the utilities of the options are additive), then we just choose the n with the highest averages. Under some assumptions, these simple methods are optimal. The assumptions are onerous, however.

Voting theory, as far as I can tell, is usually conducted in terms of preferences between options. In political elections, many people's preferences are probably agent-centered: people are apt to vote for candidates they think will do more for them and for those they take to be close to them. In situations like that, the simple method won't work, because people aren't estimating voter-indepenent utilities but agent-centered utilities.

But there are cases where people really are doing something more like estimating voter-independent utilities. For instance, take graduate admissions or hiring. The voters there really are trying to optimize something like "the objective value of choosing this candidate or these candidates", though of course their deliberations suffer from all sorts of errors.

In such cases, instead of thinking of the problem as a preference reconciliation problem, we can think of it as an estimation problem. We have a set of unknown quantities, the values of the options. If we knew what these quantities are, we'd know what decision to take: we'd go for the option(s) with the highest values. Instead, we have a number of evaluators who are each trying to estimate this unknown. Assume that each evaluator's estimate of the unknown quantity simply adds an independent random error to the quantity, and that the error is normally distributed with mean zero. Assume, further, that either the variances of the normal errors are the same between evaluators or that our information about these variances is symmetric between the evaluators (thus, we may know that evaluators are not equally accurate, but we don't know which ones are the ones who are more accurate). Suppose that I have no further relevant information about the differences in the values of the options besides the evaluators' estimates, and so I have the same prior probability distribution for the value of each option (maybe it's a pessimistic one that says that the option is probably bad).

Given all of the above information, I now want to choose the option that maximizes, with respect to my epistemic probabilities, the expected value of the option. It turns out by Bayes' Theorem together with some properties of normal random variables that the expected value of an option o, given the above information, can be written Aa0+Ba(o), where a0 is the mean-value of my baseline estimate for all the options and a(o) is the average of the evaluators' evaluations of o, and where both A and B are positive. It follows that under the above assumptions, if I am trying to maximize expected value, choosing the option(s) with the highest value of a(o) is provably optimal.

Now there are some serious problems here, besides the looming problem that the whole business of numerical utilities may be bankrupt (which I think in some cases isn't so big an issue, because numerical utilities can be a useful approximation in some cases). One of them is that one evaluator can skew the evaluations by assigning such enormous utilities to the candidates that her evaluations swamp everyone else's data. The possibility of such an evaluator violates my assumption that each person's evaluation is equal to the unknown plus an error term centered on zero. Such an evaluator is either really stupid, or dishonest (i.e., not reporting her actual estimates of utilities). This problem by itself is enough to ensure that the method can't be used except in a community of justified mutual trust.

A second serious problem is that we're not very good at making absolute utility judgments, and are probably better at rank ordering. The optimality condition requires that we work with utilities rather than rank orderings. But in a case where the number of options is largish—admissions and hiring cases are like that—if we assume that value is normally distributed in the option pool, we can get an approximation to the utilities from an evaluator's rank ordering of the n options. One way to do this is to use the rank ordering to assign estimated percentile ranks to each option, and then convert them to one's best estimate of the normally distributed value (maybe this can just be done by applying the inverse normal cumulative distribution function—I am not a statistician). Then average these between evaluators. Doing this also compensates for any affine shift, such as that due to the exaggerating evaluator in the preceding paragraph. I can't prove the optimality of this method, and it is still subject to manipulation by a dishonest evaluator (say, one who engages in strategic voting rather than reporting her real views).

I think the above can also work under some restrictive assumptions even if the evaluators are evaluating value-for-them rather than voter-independent value.

The basic thought in the above is that in some cases instead of approaching a voting situation as a preference situation, we approach it as a scientific estimation situation.