Here’s two important ideas from statistics that I wish would filter out a bit more, so that people could better evaluate statistics that are in the news.
1. It’s not the sample size, it’s the sample mechanism. Well, OK. It’s somewhat the sample size, obviously. My point is that most people who encounter a study’s methodology are much more likely to remark on the sample size– and pronounce it too small– than to remark on the sampling mechanism. I can’t tell you have often I’ve seen studies with an n = 100 that have been dismissed by commenters online as too small to take seriously. Depending on the design of the study, and the variables being evaluated, 100 is often a very large sample size. Under certain circumstances, an n of 30 is sufficient to draw broad conclusions about populations. We can’t say with 100% accuracy what a population’s average for a given trait is when we use inferential statistics. (We actually can’t say that with 100% accuracy even when taking a census, but that’s another discussion.) But we can say with a chosen level of confidence (usually 95%, by convention) that the average lies in a particular range, which can often be quite small, and from which we can make predictions of remarkable accuracy– provided the sampling mechanism was adequately random. By random, we mean that every member of the population has an equivalent chance of being selected for the sample. If there are factors that make one group more or less likely to be selected for the sample, that is statistical bias (as opposed to statistical error).
Part of this is because of the declining influence of sample size in reducing statistical error as sample size grows. Because calculating confidence intervals and margins of error involve placing the n under a square root sign, the power of sample size declines exponentially. Indeed: the positive value, in reducing statistical error, of going from an n = 90 to an n = 100 is the same as the positive value in reducing statistical error of going from an n = 100 to an n = 1000. Given the outlay of resources necessary for attracting truly large samples, it’s often not worth it to get samples of the size that people intuitively see as “big enough.”
Now compare a rigorously controlled study with an n = 30 which was drawn with a random sampling mechanism to, say, those surveys that ESPN.com runs all the time. Those very often get sample sizes in the thousands, sometimes hundreds of thousands. But the sampling mechanism is a nightmare. They’re voluntary response instruments that are biased in any number of ways: underrepresenting people without internet access, people who aren’t interested in sports, people who go to SI.com instead of ESPN.com, on and on. The value of the 30 person instrument is far higher than that of the ESPN.com data. The sampling mechanism makes the sample size irrelevant.
Sample size does matter, but in common discussions of statistics, its importance is misunderstood, and the value of increasing sample size declines exponentially.
2. For any reasonable definition of a sample, population size relative to sample size is irrelevant for the statistical precision of findings. A 1,000 person sample, if drawn with some sort of rigorous random sampling mechanism, is exactly as descriptive and predictive of the ~570,000 person population of Wyoming as it is of the ~315 million person population of the United States. I have found this one very hard to wrap my mind around, but it’s the case. The formulas for margin of error, confidence intervals, and the like do not involve any reference to the size of the total population. You can think about it this way: each time you pull a sample at random from some population, the odds of your sample being unlike the population goes down regardless of the size of that population. The mistake lies in thinking that the point of increasing sample size lies in making it closer in proportion to population. In reality, the point is just to increase the number of attempts in order
to reduce the possibility that correct our perception in case previous attempts produced unlikely results.
The essential caveat lies in “for any reasonable definition of a sample.” Yes, testing 900 out of a population of 1000 is more accurate than testing 900 out of a population of 1,000,000. But nobody would ever call 90% of a population a sample. You see different thresholds for where a sample begins and ends; some people say that anything larger than 1/100th of the total population is no longer a sample, but it varies. The point holds: when we’re dealing with real-world samples, where the population we care about is vastly larger than any reasonable sample size, the population size is irrelevant to the error in our statistical inferences.
Inferential statistics are powerful things.