AC9M9ST01 · YEAR 9 · STATISTICS

Reading Surveys: Estimating the Mean and Median

ACARA v9 CONTENT DESCRIPTION analyse reports of surveys in digital media and elsewhere for information on how data was obtained to estimate population means and medians
Builds on: the Statistics strand. This unit builds on calculating the mean and median and on reading data displays from earlier years. Analysing how survey data is obtained is the foundation for the sampling, comparison and investigation work that follows in this strand.

Statistics behind the headlines

Surveys and statistics fill the news: a poll reports the average household spends a certain amount, a study claims the typical commute has a particular length. Behind every such headline is a sample, a smaller group standing in for a much larger population, and a calculation that turns the sample data into a single representative number. This unit is about reading those reports critically, understanding how the average and the middle value are found, and judging how trustworthy a reported figure really is.

Mean: balance the values
The five values sit on a number line and the mean, the sum 28 divided by the count 5, is 5.6 - the balance point of the data.
The mean is the sum of the values divided by how many there are: 28 over 5 = 5.6. It acts like a balance point, the place where the spread of the data evens out.

The mean and the median

Two summary numbers describe the centre of a data set. The mean, or average, is the sum of all the values divided by how many there are. For the five values three, seven, seven, two and nine, the sum is twenty-eight and there are five of them, so the mean is five point six. The median is the middle value once the data is sorted into order. Sorting those same five values gives two, three, seven, seven, nine, and the middle one, the third of five, is seven. The mean and median are both measures of centre, but they can differ, sometimes considerably.

Median: the middle of sorted data
Once the values are sorted, the median is the one in the middle. With five values that is the 3rd, which here is 7.
Sort first, then take the middle. With an odd count of five, the median is the 3rd value, which is 7. Sorting is essential: the median of unsorted data is meaningless.

Finding the median: odd and even

Finding the median needs a small rule depending on how many values there are. With an odd number of values there is a single one in the middle, as with the five values above. With an even number there are two middle values, and the median is their average. For the four values ten, fourteen, eighteen and twenty, the two middle values are fourteen and eighteen, so the median is sixteen, the average of those two. Sorting the data first is essential; the median of unsorted data is meaningless.

Even number: average two middles
With an even count there is no single middle, so the median is the average of the two middle values: 14 and 18 give 16.
With an even count of four there are two middle values, 14 and 18. The median is their average: (14 + 18) over 2 = 16, sitting halfway between them.

Samples estimate populations

The deeper purpose of these numbers is estimation. It is almost never possible to measure an entire population, so a sample is taken and its mean or median is used to estimate the corresponding value for the whole population. A survey of two hundred shoppers might estimate the average spending of an entire city. The sample statistic is an estimate, not the exact truth, and its quality depends heavily on how the sample was chosen. A larger, well-chosen sample generally gives a more reliable estimate than a small or careless one.

A sample stands for a population
A sample is a smaller group drawn from the population. Its mean estimates the population mean rather than equalling it exactly.
A sample is drawn from the population and its mean is calculated. That sample mean ≈ the population mean: it estimates the true value rather than equalling it, and a larger, well-chosen sample estimates it more closely.

How was the data obtained?

This is why reading how the data was obtained matters so much. A report that quotes an average without saying who was surveyed, how many people, and how they were selected, gives you no way to judge the figure. Key questions to ask are: how big was the sample, who was in it, how were they chosen, and might the method have favoured certain answers. A mean calculated from a biased sample, one that does not fairly represent the population, will mis-estimate the population mean no matter how carefully the arithmetic is done.

How was the data obtained?
A headline figure is only as good as its sampling. The questions a bare report leaves unanswered are flagged in gold.
This headline gives the sample size but not who was asked, how they were chosen, or whether the method was biased. A figure from a biased sample mis-estimates the population however correct the arithmetic.

When outliers pull the mean

The choice between mean and median is itself revealing, because the two react differently to extreme values. When data is roughly symmetric, the mean and median are close and either describes the centre well. But when a data set contains outliers, values far from the rest, the mean is pulled towards them while the median is barely affected. Consider the values twenty, twenty-two, twenty-four, twenty-five and one thousand. The mean is dragged up above two hundred by the single large value, yet the median stays at twenty-four, much closer to the typical value. For skewed data or data with outliers, the median is often the more honest summary, which is why a careful reader checks which measure a report is using.

An outlier pulls the mean
The outlier 1000 drags the mean above 200, far from the rest, while the median stays at 24, the typical value.
The four typical values 20, 22, 24 and 25 sit close together, so without the outlier the mean and median would nearly agree (mean ≈ median). The single value 1000 drags the mean above 200 while the median stays at 24, the more representative centre.

Reading a statistical claim

Putting this together gives a clear approach to any statistical claim. Identify whether the figure quoted is a mean or a median, and remember how each is calculated. Ask how the sample was obtained, because the estimate is only as good as the sampling. Consider whether outliers might be distorting a mean, and whether the median would tell a different story. Reading statistics this way, as estimates from samples rather than exact facts, is the difference between being informed by data and being misled by it.

Quick self-check
1. What is the mean of 3, 7, 7, 2 and 9?
2. What is the median of 3, 7, 7, 2 and 9?
3. What is the median of the four values 10, 14, 18 and 20?
4. A sample of 200 people is surveyed to estimate a city's average spending. The sample mean is:
5. The values 20, 22, 24, 25, 1000 contain an outlier (1000). Which measure better represents the typical value?