AC9M8ST03 · YEAR 8 · STATISTICS

Sample Size and Variation

ACARA v9 CONTENT DESCRIPTION compare variations in distributions and proportions obtained from random samples of the same size drawn from a population and recognise the effect of sample size on this variation

Why samples vary

Whenever you study a group by looking at only a part of it, you are working with a sample, and a sample can never include every single member of the population. Because each sample is just one slice of the whole, two samples taken in the very same fair way will usually give slightly different results. Ask one handful of people their favourite colour and you get one set of figures; ask another handful and the figures shift a little, even though nothing was done wrong. This ordinary scatter of results from one sample to the next is called sampling variation. It is a normal feature of sampling, not a mistake or a sign of careless work, and recognising it is the first step to reading any survey sensibly.

Different samples, different results
Each sample lands a little off the true value.
different samples from the same population give different results; this scatter is sampling variation.

Small samples jump around

The size of a sample changes how far its result can wander. When a sample is small, a few unusual members can pull its result a long way from the truth, because each individual carries a large share of the total. Imagine asking only five people whether they walk to school; if two of them happen to be keen walkers, your sample suddenly reports 40% walking, far above the real figure for the whole school. With small samples this kind of swing happens easily, so their results scatter widely around the true population value, and any single small sample can land well off the mark in either direction.

Small samples scatter widely
Many samples of n = 10; results spread far.
small samples scatter widely, so any one small sample can land far from the true value.

Larger samples settle down

As a sample grows, the picture steadies. With more members included, the unusual ones are balanced out by the many ordinary members around them, so no single individual can dominate the result. A sample of 500 is barely moved by two keen walkers, where a sample of 5 was thrown right off. Larger samples therefore give results that sit close to the true population value and stay stable from one draw to the next: take another large sample and you tend to get almost the same answer. The result still varies a little, but the swings are far smaller, and the estimate you read is much more trustworthy.

Larger samples cluster tightly
Many samples of n = 1000; results stay near 50%.
larger samples cluster tightly near the true value, so each one is a better estimate.

Sampling variation and size

If you plot the spread of sample results against the sample size, the line falls in a very particular way. It drops steeply at first, so moving from a tiny sample to a moderate one cuts the variation sharply, and then it flattens out, so each further increase buys less and less. This is the rule of diminishing returns: bigger samples do reduce sampling variation, but doubling the size does not halve the error. To make an estimate twice as precise you generally need about four times as many members, which is why polling organisations weigh the cost of a larger sample against the small extra precision it brings.

Spread shrinks as size grows
More data buys less and less extra precision.
as the sample size grows the spread of results shrinks quickly at first, then more slowly.

Reliable, but never certain

A larger sample makes a conclusion more reliable, because its estimate is likely to sit closer to the real value and is less at the mercy of a few odd members. Reliable, though, is not the same as certain. Short of measuring every single member in a full census, no sample can tell you the population value exactly; some residual variation always remains. This is why careful reporting speaks of estimates and likely ranges rather than exact answers, and why a good poll states a margin around its figure. A bigger sample narrows that range, but it never shrinks it to nothing, and pretending otherwise overstates what the data can prove.

10 flips against 1000 flips
More flips, less variation around 50%.
10 coin flips can stray far from 50%, but 1000 flips stay close; more data, less variation.

Why this matters

Sample size sits quietly behind almost every poll, survey and quality test you meet. It is the reason a serious opinion poll always reports how many people it asked, and the reason a quick online vote answered by a handful of users should be read with real caution. Once you grasp that bigger samples mean less variation and steadier, more reliable conclusions, but with diminishing returns, you can judge for yourself how much weight a given result deserves. The common slip runs in two directions: trusting a very small sample because its headline looks striking, or assuming that a large sample has removed all uncertainty. We stay within these ideas here, since planning and running full statistical investigations comes in the unit that follows.

Quick self-check
1. What is sampling variation?
2. How do the results of small samples behave?
3. What happens to the spread of sample results as the sample size grows?
4. A fair coin is flipped. Which is more likely to land close to 50% heads?
5. Does a larger sample make a conclusion completely certain?