ACARA v9 CONTENT DESCRIPTION “analyse and report on the distribution of data from primary and secondary sources using random and non-random sampling techniques to select and study samples”
What a distribution shows
A distribution is the pattern of how the values of one variable are spread out. Instead of looking at a single number, you look at the whole shape that the data makes: which values appear often, which appear rarely, and how the readings sit together. A dot plot, a column graph, or a histogram all display this pattern at a glance, stacking the data over the values it takes. A simple example is the number of pets owned by each student in a class. Most students might have one or two pets, a few have none, and one or two have a houseful, and when you plot every student over the value that fits them, that picture is the distribution. Reading it tells you far more than any single student ever could.
A distribution as a dot plot
Each dot is one student, stacked over the value.
a distribution shows how the values of one variable are spread; here the number of pets per student.
Shape, centre and spread
Three plain ideas describe almost any distribution. The first is its shape: whether it is roughly symmetric, with the two sides mirroring each other, or skewed, leaning out to one side, and whether it rises to one clear peak or to several. The second is its centre, meaning roughly where the values cluster; you can locate it with the mean, which is the average, or with the median, which is the middle value once the data is put in order. The third is its spread, meaning how far the values reach, from the lowest reading to the highest, which is the range. Reporting all three together says far more about the data than quoting one number on its own, because two groups can share the same average yet look completely different once you see their shape and spread.
Shape, centre and spread
One distribution, three things to read.
describe a distribution by its shape, its centre where values cluster, and its spread from lowest to highest.
Random and non-random samples
A sample is the part of the population you actually measure, and how you choose it matters a great deal. In a random sample, every member of the population has an equal chance of being selected, and nothing about a member makes them more or less likely to be picked. That even-handed selection tends to make a random sample resemble the whole population. A non-random sample is chosen some other way. A convenience sample takes whoever is nearest or easiest to reach, and a self-selected sample takes whoever volunteers to take part. Because these methods favour particular kinds of people, a non-random sample can lean toward one group and leave others underrepresented.
Random and non-random samples
Equal chance against whoever is nearest.
in a random sample every member has an equal chance; a non-random sample picks by convenience or self-selection.
When a sample misleads
The danger of a non-random sample is that it can over-represent one part of the population, so the distribution it produces does not match the distribution of the whole group. When that happens, the centre can shift away from the true value and the shape can change, becoming skewed where the population was balanced. An online poll about a football match answered only by fans of one team will lean heavily toward that team, no matter how many people reply. This is why the method of choosing a sample matters at least as much as how many members were chosen; a large convenience sample can still be badly misleading.
When a sample misleads
A skewed sample against the population.
a non-random sample can give a distribution that does not match the population, shifting the centre or shape.
Comparing two distributions
To compare two samples fairly, look at their shape, their centre, and their spread together rather than at a single figure. The distribution of a random sample will often resemble the distribution of the population, while the distribution of a convenience sample may be visibly skewed or shifted to one side. Reporting the comparison means describing those differences in words: saying that one is symmetric while the other leans left, that their centres sit close together or far apart, and that one spreads wider than the other. A comparison built only on the averages can hide a real difference in shape or spread, so the fuller description is the honest one.
Comparing two distributions
Report shape, centre and spread, not one number.
compare two distributions by reporting their shape, centre and spread, not just a single number.
Why this matters
Distributions are the everyday language of real data. Test scores, heights, waiting times, and survey answers are all summarised and compared by their distributions, and being able to read shape, centre, and spread lets you describe what is really going on. Recognising the difference between a random and a non-random sample protects you from being misled by a skewed one, because it prompts the simple question of how the data was gathered before you trust what it seems to say. The most common slip is to compare only the averages while ignoring the shape and the spread, or to trust a convenient sample just because it was easy to collect. We stay within these ideas here: the effect of sample size on variation, and full statistical investigations, come in the units that follow.
Quick self-check
1. What does the distribution of a one-variable data set show?
2. Which three features best describe a distribution?
3. What defines a random sample?
4. Why can a non-random (convenience) sample mislead?
5. When comparing two distributions, what should you report?