ACARA v9 CONTENT DESCRIPTION “compare data distributions for continuous numerical variables using appropriate data displays including boxplots; discuss the shapes of these distributions in terms of centre, spread, shape and outliers in the context of the data”
Builds on: Reading Statistics in the Media. Reading statistics critically leads naturally to the tools that summarise data fairly. This unit introduces the boxplot, a compact picture built from quartiles that shows the centre, spread and extremes of a distribution, and makes comparing groups straightforward.
Summarising data with five numbers
A long list of data is hard to take in at a glance, so statisticians summarise it. One of the most useful summaries is the five-number summary: the minimum, the first quartile (called Q1), the median, the third quartile (Q3), and the maximum. The median is the middle value, splitting the data in half; the quartiles are the middle values of each half, splitting the data into quarters. These five numbers between them describe where the data starts and ends, where its centre lies, and how its middle bulk is spread, all without listing every value. They are the foundation of the boxplot, a diagram designed to show exactly this summary in a form the eye can read instantly.
The five-number summary
A dataset can be summarised by its minimum, lower quartile, median, upper quartile and maximum.
A boxplot is built from five numbers that summarise a dataset: the minimum, the first quartile, the median, the third quartile, and the maximum. Reveal them one at a time on this data. Together they capture where the data sits and how it spreads, without showing every single value.
Drawing a boxplot
A boxplot turns the five-number summary into a picture drawn against a number line. A rectangular box is drawn from the first quartile to the third quartile, so the box contains the middle half of the data. A line is drawn inside the box at the median, showing the centre. Then two lines called whiskers extend from the box, one out to the minimum and one out to the maximum, marking the full extent of the data. The result is a clear visual: the box shows where the bulk of the data lies, the median line shows the centre, and the whiskers show the reach. Because every boxplot follows the same construction, once you can read one you can read any.
Building the boxplot
A boxplot shows the box from Q1 to Q3, a median line, and whiskers to the minimum and maximum.
The boxplot draws the five numbers as a picture. A box spans from the first quartile to the third, a line inside marks the median, and two whiskers reach out to the minimum and maximum. Step through to see each part appear. The box holds the middle half of the data.
What the quartiles mean
The key to reading a boxplot is understanding the quartiles. They divide the sorted data into four parts, each holding about a quarter, that is 25 percent, of the values. So about a quarter of the data lies below Q1, a quarter between Q1 and the median, a quarter between the median and Q3, and a quarter above Q3. A crucial subtlety is that the quartiles are about counts of values, not equal distances: if the data is bunched in one region and sparse in another, the four parts will have very different widths on the number line even though each contains the same number of values. A narrow box section means the data is densely packed there; a wide one means it is spread out.
Quartiles split the data in four
Quartiles divide sorted data into four parts each containing about a quarter of the values.
The three quartiles, Q1, the median and Q3, divide the data. Reveal the four regions to see how the data splits into quarters by count.
Spread and outliers
The width of the box has its own name and use: the interquartile range, or IQR, is Q3 minus Q1, and it measures the spread of the middle half of the data. Because it ignores the extreme quarters, the IQR is a measure of spread that is not thrown off by a single unusual value, in the way the full range can be. The IQR also gives an objective test for outliers, values that sit surprisingly far from the rest. A common rule flags any value more than 1.5 times the IQR beyond a quartile as an outlier. This turns a vague sense that a value looks odd into a definite calculation, and boxplots are often drawn with such outliers marked as separate points beyond the whiskers.
The IQR and outliers
The interquartile range measures middle spread, and a value beyond 1.5 times the IQR past a quartile is flagged as an outlier.
The interquartile range, Q3 minus Q1, measures the spread of the middle half. Reveal the 1.5 times IQR fence to test whether the largest value is an outlier.
Comparing distributions
Where boxplots truly shine is in comparing two or more groups. Drawn on the same scale, one above the other, two boxplots let you compare distributions at a glance in a way that raw lists never could. If one box sits further along the scale than the other, that group generally has larger values, seen most clearly in the position of its median. If one box is wider, that group has a greater spread in its middle half. You can compare centres through the median lines, spreads through the box widths, and reach through the whiskers, all at once. This makes parallel boxplots a favourite tool for questions like whether one class scored higher than another, or whether one method gives more consistent results.
Comparing groups with boxplots
Boxplots drawn on the same scale make it easy to compare the centre and spread of two distributions.
Here is the boxplot for Group A. Add Group B, drawn on the same axis, to compare the two distributions directly.
Quick self-check
1. A boxplot is built from the five-number summary, which is:
2. On a boxplot, the box itself spans from:
3. The quartiles divide a sorted dataset so that each quarter contains:
4. The interquartile range (IQR) is:
5. Two boxplots are drawn on the same scale and one sits noticeably further right. This tells you that group: