AC9M8ST02 · YEAR 8 · STATISTICS

Sampling and Distributions

ACARA v9 CONTENT DESCRIPTION “analyse and report on the distribution of data from primary and secondary sources using random and non-random sampling techniques to select and study samples”

What a distribution shows

A distribution is the pattern of how the values of one variable are spread out. Instead of looking at a single number, you look at the whole shape that the data makes: which values appear often, which appear rarely, and how the readings sit together. A dot plot, a column graph, or a histogram all display this pattern at a glance, stacking the data over the values it takes. A simple example is the number of pets owned by each student in a class. Most students might have one or two pets, a few have none, and one or two have a houseful, and when you plot every student over the value that fits them, that picture is the distribution. Reading it tells you far more than any single student ever could.

A distribution as a dot plot

Different data makes a different shape.

a distribution shows how the values of one variable are spread; here the number of pets per student.

Shape, centre and spread

Three plain ideas describe almost any distribution. The first is its shape: whether it is roughly symmetric, with the two sides mirroring each other, or skewed, leaning out to one side, and whether it rises to one clear peak or to several. The second is its centre, meaning roughly where the values cluster; you can locate it with the mean, which is the average, or with the median, which is the middle value once the data is put in order. The third is its spread, meaning how far the values reach, from the lowest reading to the highest, which is the range. Reporting all three together says far more about the data than quoting one number on its own, because two groups can share the same average yet look completely different once you see their shape and spread.

Shape, centre and spread

Read one feature at a time, or all three.

describe a distribution by its shape, its centre where values cluster, and its spread from lowest to highest.

Random and non-random samples

A sample is the part of the population you actually measure, and how you choose it matters a great deal. In a random sample, every member of the population has an equal chance of being selected, and nothing about a member makes them more or less likely to be picked. That even-handed selection tends to make a random sample resemble the whole population. A non-random sample is chosen some other way. A convenience sample takes whoever is nearest or easiest to reach, and a self-selected sample takes whoever volunteers to take part. Because these methods favour particular kinds of people, a non-random sample can lean toward one group and leave others underrepresented.

Random and non-random samples

How is the sample chosen?

in a random sample every member has an equal chance; a non-random sample picks by convenience or self-selection.

When a sample misleads

The danger of a non-random sample is that it can over-represent one part of the population, so the distribution it produces does not match the distribution of the whole group. When that happens, the centre can shift away from the true value and the shape can change, becoming skewed where the population was balanced. An online poll about a football match answered only by fans of one team will lean heavily toward that team, no matter how many people reply. This is why the method of choosing a sample matters at least as much as how many members were chosen; a large convenience sample can still be badly misleading.

When a sample misleads

Random matches; convenience shifts the centre.

a non-random sample can give a distribution that does not match the population, shifting the centre or shape.

Comparing two distributions

To compare two samples fairly, look at their shape, their centre, and their spread together rather than at a single figure. The distribution of a random sample will often resemble the distribution of the population, while the distribution of a convenience sample may be visibly skewed or shifted to one side. Reporting the comparison means describing those differences in words: saying that one is symmetric while the other leans left, that their centres sit close together or far apart, and that one spreads wider than the other. A comparison built only on the averages can hide a real difference in shape or spread, so the fuller description is the honest one.

Comparing two distributions

Compare by shape, centre, then spread.

compare two distributions by reporting their shape, centre and spread, not just a single number.

Where the data comes from: primary and secondary sources

Data reaches you in two ways. A primary source is data you collect yourself for your own question: a survey you hand out, an experiment you run, or measurements you take by direct observation. A secondary source is data gathered by someone else that you reuse, such as census figures, published research, a government report, or an online database. Both kinds can be analysed for their distribution in the same way, but knowing the source matters: with a primary source you control how the sample was chosen, while with a secondary source you must check how it was collected before you trust it, because those sampling decisions were made by someone else.

Why this matters

Distributions are the everyday language of real data. Test scores, heights, waiting times, and survey answers are all summarised and compared by their distributions, and being able to read shape, centre, and spread lets you describe what is really going on. Recognising the difference between a random and a non-random sample protects you from being misled by a skewed one, because it prompts the simple question of how the data was gathered before you trust what it seems to say. The most common slip is to compare only the averages while ignoring the shape and the spread, or to trust a convenient sample just because it was easy to collect. We stay within these ideas here: the effect of sample size on variation, and full statistical investigations, come in the units that follow.

Quick self-check

1. What does the distribution of a one-variable data set show?

2. Which three features best describe a distribution?

3. What defines a random sample?

4. Why can a non-random (convenience) sample mislead?

5. When comparing two distributions, what should you report?

6. You reuse census figures published by the government. That data is a...

Teaching pack: free to printReady-to-teach plans, student sheets, cut-outs and answers for this unit. Print or save as PDF.