AC9M9ST02 · YEAR 9 · STATISTICS

Sampling Methods and Misleading Representations

ACARA v9 CONTENT DESCRIPTION analyse how different sampling methods can affect the results of surveys and how choice of representation can be used to support a particular point of view
Builds on: Reading Surveys: Estimating the Mean and Median. This unit builds on using samples to estimate population values and on reading data displays. Recognising biased sampling and misleading graphs is essential for the display choices and investigations that complete this strand.

Two ways a statistic can mislead

A statistic is only as trustworthy as the way it was produced and the way it is shown. Two surveys asking the same question can reach opposite conclusions if they sample different people, and a single set of numbers can be drawn to look dramatic or trivial depending on the chart. This unit examines both halves of that problem: how the method of sampling shapes a survey's results, and how the choice of representation can be used, fairly or unfairly, to push a particular point of view.

Random sampling
Every member of the population has an equal chance of being chosen. The highlighted, scattered members are the sample - the fair baseline.
Random sampling gives every member of the population an equal chance of being picked. The highlighted members, scattered across the whole group, are the sample. This even-handedness is what makes random sampling the least biased method.

Random sampling: the fair baseline

The goal of sampling is a group that fairly represents the whole population, because the sample's figures are used to estimate the population's. The cleanest approach is random sampling, where every member of the population has an equal chance of being chosen, like drawing fifty names from a list of every student. Randomness is what keeps the sample from systematically favouring one kind of person, so it tends to give the least biased estimates.

Stratified keeps the proportions
The population splits 60/40 into two groups; the sample is drawn in the same proportion, so it stays 60/40.
A population that is 60% junior and 40% senior is sampled in proportion, so the sample is also 60% junior and 40% senior. Stratified sampling keeps each group in the same balance as the population.

Systematic, stratified and convenience samples

Other methods trade some of that fairness for convenience or structure. Systematic sampling takes every tenth name from a list: quick to do, though it can go wrong if the list has a hidden repeating pattern. Stratified sampling splits the population into groups and samples each in proportion, so if a school is sixty percent junior and forty percent senior, the sample keeps that sixty-forty balance, which is useful when the groups might differ. Convenience sampling simply asks whoever is easiest to reach, and self-selected sampling lets people volunteer, as in an online poll. These last two are cheap but usually biased.

A convenience sample is biased
Sampling only whoever is easiest, here just one corner of the population, leaves most of the group out.
A convenience sample takes only whoever is easiest to reach, here just one corner of the population, and ignores everyone else. The result is biased: it cannot represent the whole group, however many people it includes.

Bias: when a sample is not representative

Bias is the central danger, and it means a sample that systematically over-represents or under-represents part of the population. Surveying only people leaving a gym about how much they exercise will obviously overstate activity levels, because the sample is not representative. Bias can also creep in through timing, leading questions, or low response rates. The crucial point, carried over from estimating population values, is that a biased sample mis-estimates the population no matter how large it is; size cannot fix a sample that was unfair to begin with.

Same data, axis from zero
With the y-axis starting at 0, the values 95 and 98 give bars of almost the same height. The real difference is only 3.
With the vertical axis starting at 0, the values 95 and 98 produce bars of almost the same height. This is the honest picture, and the real difference is only 3.
Same data, truncated axis
The same values 95 and 98, but the y-axis now starts at 90. The visible bars become 5 and 8, so the second looks far taller.
The same two values, 95 and 98, drawn on an axis that starts at 90. The visible bars are now 5 and 8, so the second looks over 1.5 times taller, even though nothing about the data has changed.

The truncated axis

The second half of the problem is representation: how the same data is displayed. The most common trick is a truncated axis, where a bar chart's vertical axis starts not at zero but partway up. Imagine two values, ninety-five and ninety-eight. Their real difference is only three, and on an axis starting at zero the bars look almost identical. But if the axis starts at ninety, the visible bar heights become five and eight, and the second bar suddenly looks over one and a half times taller than the first. The numbers have not changed; only the impression has.

Other ways to bend a graph

There are many such devices. Stretching or compressing an axis can dramatise or flatten a trend; cherry-picking the time range shows only the window that supports a claim; and area or three-dimensional effects mislead because doubling both the width and the height of a symbol makes a value that is twice as large appear four times as large. None of these is illegal, and each can be defended as a design choice, which is exactly why a reader has to look closely at the axes, the scale and the range.

Choosing the story
Identical data, two charts. The truncated version shouts growth; the full-axis version looks flat. The chart chooses the conclusion.
Identical data, two stories. The truncated chart on the left makes the rise look like dramatic growth; the full-axis chart on the right shows the same numbers as barely changed. Choosing the chart is choosing the conclusion.

Choosing the representation chooses the story

This is the heart of the descriptor: the choice of representation can be used to support a point of view. A company wanting to advertise growth will choose the truncated axis that makes a small rise look spectacular; a critic wanting to downplay the same growth will choose a full axis and a long time range that make it look flat. Both charts can be drawn from identical data. Choosing how to represent the data is, in effect, choosing the story it tells. The defence is critical reading: when you meet a survey result, ask how the sample was taken and whether it could be biased; when you meet a graph, check where the axis starts, what the scale is, and whether the range was chosen to flatter a conclusion.

Quick self-check
1. Which sampling method gives every member of the population an equal chance of being selected?
2. An online poll where people choose whether to respond is an example of:
3. A school is 60% junior and 40% senior. A stratified sample of 100 students should contain:
4. A bar chart's vertical axis starts at 90 instead of 0, comparing values 95 and 98. What is the effect?
5. Surveying only people leaving a gym about how often they exercise will most likely: