AC9M9ST02 · YEAR 9 · STATISTICS

Sampling Methods and Misleading Representations

ACARA v9 CONTENT DESCRIPTION “analyse how different sampling methods can affect the results of surveys and how choice of representation can be used to support a particular point of view”

Builds on: Reading Surveys: Estimating the Mean and Median. This unit builds on using samples to estimate population values and on reading data displays. Recognising biased sampling and misleading graphs is essential for the display choices and investigations that complete this strand.

Two ways a statistic can mislead

A statistic is only as trustworthy as the way it was produced and the way it is shown. Two surveys asking the same question can reach opposite conclusions if they sample different people, and a single set of numbers can be drawn to look dramatic or trivial depending on the chart. This unit examines both halves of that problem: how the method of sampling shapes a survey's results, and how the choice of representation can be used, fairly or unfairly, to push a particular point of view.

Random sampling

Every member of the population has an equal chance of being chosen. Change the sample size - the highlighted members stay scattered across the whole group.

Sampling 12 of the 40 members at random: the highlighted members stay scattered evenly across the whole group, never clustered in one place. Whatever size you pick, every member had an equal chance, which is what makes random sampling the least biased method.

Random sampling: the fair baseline

The goal of sampling is a group that fairly represents the whole population, because the sample's figures are used to estimate the population's. The cleanest approach is random sampling, where every member of the population has an equal chance of being chosen, like drawing fifty names from a list of every student. Randomness is what keeps the sample from systematically favouring one kind of person, so it tends to give the least biased estimates.

Stratified keeps the proportions

Pick the population split. The smaller sample bar is drawn in the same proportion, so it always copies the population's balance.

A population that is 60% junior and 40% senior is sampled in proportion, so the smaller sample bar keeps the exact same 60/40 balance. Stratified sampling preserves each group share, whatever it is.

Systematic, stratified and convenience samples

Other methods trade some of that fairness for convenience or structure. Systematic sampling takes every tenth name from a list: quick to do, though it can go wrong if the list has a hidden repeating pattern. Stratified sampling splits the population into groups and samples each in proportion, so if a school is sixty percent junior and forty percent senior, the sample keeps that sixty-forty balance, which is useful when the groups might differ. Convenience sampling simply asks whoever is easiest to reach, and self-selected sampling lets people volunteer, as in an online poll. These last two are cheap but usually biased.

A convenience sample is biased

Pick whoever is easiest to reach. Wherever the cluster sits, it is one corner of the population, so most of the group is left out.

A convenience sample takes only whoever is by the door, a single cluster of the population, and ignores everyone else. Move the cluster anywhere: it is still biased, because it cannot represent the whole group, however many people it includes.

Bias: when a sample is not representative

Bias is the central danger, and it means a sample that systematically over-represents or under-represents part of the population. Surveying only people leaving a gym about how much they exercise will obviously overstate activity levels, because the sample is not representative. Bias can also creep in through timing, leading questions, or low response rates. The crucial point, carried over from estimating population values, is that a biased sample mis-estimates the population no matter how large it is; size cannot fix a sample that was unfair to begin with.

Same data, axis from zero

With the y-axis starting at 0, try any pair. The small real difference always gives bars of almost the same height - the honest picture.

With the vertical axis starting at 0, the values 95 and 98 produce bars of almost the same height. This is the honest picture: a real difference of only 3 looks like a small difference.

Same data, truncated axis

The values stay 95 and 98; you move only the axis floor. The higher the floor, the more a difference of 3 is exaggerated.

The same two values, 95 and 98, but the axis now starts at 90. The visible bars are only 5 and 8 tall, so the second looks far taller than the first - even though nothing about the data has changed.

The truncated axis

The second half of the problem is representation: how the same data is displayed. The most common trick is a truncated axis, where a bar chart's vertical axis starts not at zero but partway up. Imagine two values, ninety-five and ninety-eight. Their real difference is only three, and on an axis starting at zero the bars look almost identical. But if the axis starts at ninety, the visible bar heights become five and eight, and the second bar suddenly looks over one and a half times taller than the first. The numbers have not changed; only the impression has.

Other ways to bend a graph

There are many such devices. Stretching or compressing an axis can dramatise or flatten a trend; cherry-picking the time range shows only the window that supports a claim; and area or three-dimensional effects mislead because doubling both the width and the height of a symbol makes a value that is twice as large appear four times as large. None of these is illegal, and each can be defended as a design choice, which is exactly why a reader has to look closely at the axes, the scale and the range.

Choosing the story

Identical data, two charts. Pick the conclusion you want to push and watch which chart you would reach for. The chart chooses the conclusion.

To advertise growth, you reach for the truncated chart on the left: the same numbers, 95 and 98, now look like dramatic growth. The conclusion came from the chart, not the data.

Choosing the representation chooses the story

This is the heart of the descriptor: the choice of representation can be used to support a point of view. A company wanting to advertise growth will choose the truncated axis that makes a small rise look spectacular; a critic wanting to downplay the same growth will choose a full axis and a long time range that make it look flat. Both charts can be drawn from identical data. Choosing how to represent the data is, in effect, choosing the story it tells. The defence is critical reading: when you meet a survey result, ask how the sample was taken and whether it could be biased; when you meet a graph, check where the axis starts, what the scale is, and whether the range was chosen to flatter a conclusion.

Quick self-check

1. Which sampling method gives every member of the population an equal chance of being selected?

2. An online poll where people choose whether to respond is an example of:

3. A school is 60% junior and 40% senior. A stratified sample of 100 students should contain:

4. A bar chart's vertical axis starts at 90 instead of 0, comparing values 95 and 98. What is the effect?

5. Surveying only people leaving a gym about how often they exercise will most likely:

Teaching pack: free to printReady-to-teach plans, student sheets, cut-outs and answers for this unit. Print or save as PDF.