AC9M9ST01 · YEAR 9 · STATISTICS

Reading Surveys: Estimating the Mean and Median

ACARA v9 CONTENT DESCRIPTION “analyse reports of surveys in digital media and elsewhere for information on how data was obtained to estimate population means and medians”

Builds on: the Statistics strand. This unit builds on calculating the mean and median and on reading data displays from earlier years. Analysing how survey data is obtained is the foundation for the sampling, comparison and investigation work that follows in this strand.

Statistics behind the headlines

Surveys and statistics fill the news: a poll reports the average household spends a certain amount, a study claims the typical commute has a particular length. Behind every such headline is a sample, a smaller group standing in for a much larger population, and a calculation that turns the sample data into a single representative number. This unit is about reading those reports critically, understanding how the average and the middle value are found, and judging how trustworthy a reported figure really is.

Mean: balance the values

Each value sits on the number line and the mean, the sum divided by the count, is the balance point. Switch data sets to watch the fulcrum move.

The mean is the sum of the values divided by how many there are: 28 over 5 = 5.6. It acts like a balance point, and a different data set shifts that point.

The mean and the median

Two summary numbers describe the centre of a data set. The mean, or average, is the sum of all the values divided by how many there are. For the five values three, seven, seven, two and nine, the sum is twenty-eight and there are five of them, so the mean is five point six. The median is the middle value once the data is sorted into order. Sorting those same five values gives two, three, seven, seven, nine, and the middle one, the third of five, is seven. The mean and median are both measures of centre, but they can differ, sometimes considerably.

Median: the middle of sorted data

Once the values are sorted, the median is the one in the middle. Switch sets to see the middle position move with the count.

Sort first, then take the middle. With 5 values the median is the 3rd, which is 7. Sorting is essential: the median of unsorted data is meaningless.

Finding the median: odd and even

Finding the median needs a small rule depending on how many values there are. With an odd number of values there is a single one in the middle, as with the five values above. With an even number there are two middle values, and the median is their average. For the four values ten, fourteen, eighteen and twenty, the two middle values are fourteen and eighteen, so the median is sixteen, the average of those two. Sorting the data first is essential; the median of unsorted data is meaningless.

Even number: average two middles

With an even count there is no single middle, so the median is the average of the two middle values. Switch sets to recompute it.

With an even count of four there are two middle values, 14 and 18. The median is their average: (14 + 18) over 2 = 16, sitting halfway between them.

Samples estimate populations

The deeper purpose of these numbers is estimation. It is almost never possible to measure an entire population, so a sample is taken and its mean or median is used to estimate the corresponding value for the whole population. A survey of two hundred shoppers might estimate the average spending of an entire city. The sample statistic is an estimate, not the exact truth, and its quality depends heavily on how the sample was chosen. A larger, well-chosen sample generally gives a more reliable estimate than a small or careless one.

A sample stands for a population

Each sample is a smaller group drawn from the same population. Try different samples: each mean estimates the population mean of 50 without hitting it exactly.

Sample A has 6 members and a mean of 47, estimating the population mean of 50 (off by 3). Each sample lands a little differently: the larger, well-chosen one estimates more closely.

How was the data obtained?

This is why reading how the data was obtained matters so much. A report that quotes an average without saying who was surveyed, how many people, and how they were selected, gives you no way to judge the figure. Key questions to ask are: how big was the sample, who was in it, how were they chosen, and might the method have favoured certain answers. A mean calculated from a biased sample, one that does not fairly represent the population, will mis-estimate the population mean no matter how carefully the arithmetic is done.

How was the data obtained?

A headline figure is only as good as its sampling. Switch reports: green marks what is answered, gold what is missing or biased.

This report is too little stated to trust the figure. A figure from a biased or under-described sample mis-estimates the population however correct the arithmetic.

When outliers pull the mean

The choice between mean and median is itself revealing, because the two react differently to extreme values. When data is roughly symmetric, the mean and median are close and either describes the centre well. But when a data set contains outliers, values far from the rest, the mean is pulled towards them while the median is barely affected. Consider the values twenty, twenty-two, twenty-four, twenty-five and one thousand. The mean is dragged up above two hundred by the single large value, yet the median stays at twenty-four, much closer to the typical value. For skewed data or data with outliers, the median is often the more honest summary, which is why a careful reader checks which measure a report is using.

An outlier pulls the mean

Add or remove an outlier and watch the mean swing far while the median holds at 24, the typical value.

The single value 1000 drags the mean to 218.2, far from the cluster, while the median stays at 24, the more representative centre. A bigger outlier pulls the mean further still.

Reading a statistical claim

Putting this together gives a clear approach to any statistical claim. Identify whether the figure quoted is a mean or a median, and remember how each is calculated. Ask how the sample was obtained, because the estimate is only as good as the sampling. Consider whether outliers might be distorting a mean, and whether the median would tell a different story. Reading statistics this way, as estimates from samples rather than exact facts, is the difference between being informed by data and being misled by it.

Quick self-check

1. What is the mean of 3, 7, 7, 2 and 9?

2. What is the median of 3, 7, 7, 2 and 9?

3. What is the median of the four values 10, 14, 18 and 20?

4. A sample of 200 people is surveyed to estimate a city's average spending. The sample mean is:

5. The values 20, 22, 24, 25, 1000 contain an outlier (1000). Which measure better represents the typical value?

Teaching pack: free to printReady-to-teach plans, student sheets, cut-outs and answers for this unit. Print or save as PDF.