AC9M6ST01 · YEAR 6 · STATISTICS

Comparing Data Sets

ACARA v9 CONTENT DESCRIPTION interpret and compare data sets for ordinal and nominal categorical, discrete and continuous numerical variables using comparative displays or visualisations and digital tools; compare distributions in terms of mode, range and shape
Builds on: Numbers Beyond 10 000 (AC9M3N01). Reading and comparing the numbers in a data set rests on the place-value and ordering skills from earlier years — now those numbers describe a whole group at once.

A data set tells a story

A data set is a collection of answers to one question: how each student travels to school, how many goals each player scored, how tall everyone in the class is. On its own a long list of values says little, so we display it — as a column graph for categories, or a dot plot for numbers — and read the story it tells. Some variables are categories, like sport or travel method, while others are numbers, like height or score. This unit reads a single data set, then learns to compare two using three plain ideas: the mode, the range and the shape.

Reading a data set
A column graph shows how often each category occurs. The tallest column stands out.
Favourite sport: the tallest column is AFL with 9 — the category that occurs most is the mode.

The mode is the most common value

The mode is the value that appears most often. In a column graph it is simply the tallest column; in a list of numbers it is the value that repeats the most. If five students walk, eight come by car and ten catch the bus, the mode is the bus, because it is the most common answer. The mode works for categories and for numbers alike, which makes it the first thing to read from almost any data set. It answers a natural question: what is the most usual result here?

The mode is the most common
Stack the repeats. The value that piles up highest is the mode.
The value 4 appears most often, so the mode is 4 — the most common value in the set.

The range measures spread

Where the mode points to the most common value, the range describes how spread out the numbers are. It is the largest value minus the smallest, a single number that captures the whole sweep of the data. Scores from four to twelve have a range of eight; scores from two to five have a range of three, and are clearly more tightly bunched. The range only makes sense for numerical data, where subtracting is meaningful, and it is the simplest way to say whether results are close together or far apart.

The range measures spread
The range is the gap between the smallest and largest values.
The data runs from 4 to 12, so the range is 12 − 4 = 8 — a measure of spread.

Comparing two data sets

The real power of these ideas appears when two groups are placed side by side. Two classes might share the same most common score yet spread very differently, one tightly clustered and one ranging widely. Comparing their modes says which result was most usual in each; comparing their ranges says which group was more variable. Reading the two displays together, rather than one at a time, is what lets you make a fair statement about how the groups differ, instead of guessing from a jumble of numbers.

Comparing two distributions
Two groups, two shapes. Compare where they cluster and how far they spread.
Class A has range 3, Class B has range 8 — the wider spread is the more variable group.

The shape of a distribution

Beyond the mode and the range, a distribution has an overall shape. The values might pile up in the middle and tail off evenly on both sides, a symmetric shape; they might bunch at one end with a long tail stretching the other way, a skewed shape; or they might sit fairly level across the whole range. Shape is read from the outline of the graph, and it adds what mode and range alone cannot: a picture of how the data is distributed, not just where it centres or how far it reaches.

The shape of the data
Beyond mode and range, the overall shape describes how the data is spread.
This distribution is symmetric — shape tells you how the values are spread across the range.

Reading a summary

Mode, range and shape are three different questions about the same data. The mode asks which value is most common; the range asks how far the data spreads; the shape asks how the values are arranged across that spread. Knowing which one answers a given question is the heart of interpreting data well: a question about the most usual result wants the mode, a question about consistency wants the range, and a question about symmetry or skew wants the shape. Keeping the three apart keeps your reading of a data set clear.

Mode, range or shape
Three ways to summarise a data set. Match each question to the right one.
Which value occurs most often? Pick A, B or C.

From one data set to many

With mode, range and shape in hand, a data set stops being a list and becomes something you can describe and compare. These three summaries let you say what was most common, how spread the results were, and what the overall pattern looked like, for one group or for two placed side by side. From here the same ideas grow into the mean and median, into larger surveys, and into the statistical investigations where data is gathered, displayed and questioned to settle real arguments.

Quick self-check
1. In the data set 3, 5, 5, 5, 8, which value is the mode?
2. What is the range of the data set 4, 7, 9, 12?
3. A column graph shows 8 dogs, 5 cats and 3 birds. Which is the mode?
4. Class A scores spread from 2 to 9; Class B from 4 to 6. Which class is more spread out?
5. A distribution with a long tail of high values stretching to the right is described as...