ACARA v9 CONTENT DESCRIPTION “construct scatterplots and comment on the association between the 2 numerical variables in terms of strength, direction and linearity”
Builds on: Boxplots and Quartiles. Having summarised a single variable with boxplots, this unit looks at two variables at once. The scatterplot shows how two quantities relate, and a line of best fit captures the trend, while a familiar caution returns: a relationship is not the same as a cause.
Looking at two variables together
So far data has mostly meant a single variable, summarised by a centre and a spread. Often, though, the interesting question is how two variables relate: do students who study longer score higher, does a colder day mean a higher heating bill? To explore such bivariate data, where each item carries two measurements, the tool is the scatterplot. One variable is placed on the horizontal axis and the other on the vertical, and each item becomes a single point positioned by its two values. The resulting cloud of points reveals at a glance whether, and how, the two variables move together, something no summary of either variable alone could show.
A scatterplot of paired data
A scatterplot plots two variables together, one pair of values as one point, to reveal their relationship.
A scatterplot shows the relationship between two numerical variables measured on the same items, here hours studied and a test score. Each pair of values becomes a single point, its horizontal position from one variable and its vertical position from the other. Plot more points to see the pattern emerge from the cloud.
The direction of an association
The first thing a scatterplot reveals is the direction of any association between the variables. If the points tend to rise as you move from left to right, so that larger values of one variable go with larger values of the other, the association is positive. If the points tend to fall, with larger values of one going with smaller values of the other, the association is negative. And if the points show no clear upward or downward trend, scattered without pattern, there is no association: the variables do not appear linearly related. Naming the direction is the first step in describing what a scatterplot shows, and it often matches an intuitive expectation, more study with higher scores being positive, more absences with lower scores being negative.
Positive, negative, or none
An association can be positive (rising), negative (falling), or absent (no clear trend).
A positive association: as one variable increases, the other tends to increase too, so the cloud of points slopes upward. More hours studied tending to mean higher scores is a positive association.
Summarising the trend with a line
When the points follow a roughly linear pattern, that trend can be captured by a line of best fit, a single straight line drawn to pass as close as possible to all the points at once. This line does the same job for bivariate data that an average does for a single variable: it summarises. Once drawn, the line of best fit serves two purposes. It describes the relationship, its slope telling you how much the second variable changes for each unit of the first, and it allows prediction, estimating a likely value of one variable from a given value of the other. Such predictions are reliable within the range of the observed data but become increasingly uncertain if pushed far beyond it, where the pattern may not continue.
The line of best fit
A line of best fit summarises a linear trend and can be used to make predictions within the data range.
The points slope upward, but how to capture that trend in one summary? Reveal the line of best fit, the standard tool for describing a linear association.
The strength of a correlation
Beyond direction, a scatterplot shows the strength of the relationship, which is how closely the points follow the trend. When the points cluster tightly around the line of best fit, the correlation is strong, and knowing one variable lets you predict the other quite accurately. When the points are loosely scattered around the line, the correlation is weak: a trend exists, but it is rough, and predictions carry more uncertainty. Strength and direction are separate ideas, so a correlation can be strongly positive, weakly positive, strongly negative, and so on. Describing a scatterplot well means stating both: the direction of the association and how strong it is, since together they capture how the two variables really relate.
Strength: how tightly points cluster
Correlation strength describes how closely the points follow the trend line, from strong to weak.
A strong correlation means the points lie close to the trend line, so knowing one variable predicts the other well. The tighter the cloud hugs a line, the stronger the linear correlation.
Correlation is not causation
A scatterplot is powerful, but it carries the same warning met when analysing statistics in the media: a correlation, however strong, does not prove that one variable causes the other. Ice cream sales and cases of sunburn are strongly correlated across the year, yet ice cream does not cause sunburn; both are driven by sunny weather, a lurking variable behind the scenes. A scatterplot can reveal that two variables move together, and a line of best fit can describe how, but neither can establish cause, which requires careful investigation and usually a controlled experiment. So the final discipline of reading a scatterplot is to describe the association honestly while resisting the leap from a pattern of points to a claim about cause and effect.
Correlation still is not causation
A scatterplot can show a correlation but cannot prove that one variable causes the other.
Ice cream sales and sunburn cases plot as a clear upward trend, a strong correlation. Does buying ice cream cause sunburn? Reveal the caution every scatterplot demands.
Quick self-check
1. On a scatterplot, each plotted point represents:
2. If the points on a scatterplot tend to fall as you move right, the association is:
3. A line of best fit is used to:
4. Two scatterplots show the same upward trend, but one has points tightly hugging the line and the other widely scattered. The tight one has a:
5. A scatterplot shows ice cream sales and sunburn cases strongly correlated. The correct conclusion is: