AC9M9ST05 · YEAR 9 · STATISTICS

Planning a Statistical Investigation

ACARA v9 CONTENT DESCRIPTION plan and conduct statistical investigations involving the collection and analysis of different kinds of data; report findings and discuss the strength of evidence to support any conclusions
Builds on: Choosing the Right Data Display. This unit brings together the whole statistics strand: estimating from samples, fair sampling, comparing distributions and choosing displays. Planning investigations and judging the strength of evidence is the foundation for senior statistics and for evidence-based reasoning everywhere.

Bringing the whole strand together

The previous units each handled one piece of statistics: calculating averages, sampling fairly, comparing distributions, choosing a display. This final unit puts them together into a complete statistical investigation, the full journey from a question to a conclusion. Just as importantly, it asks you to be honest about how much that conclusion can really claim, because every investigation has limits, and a good report says so.

The investigation cycle
One loop: pose a question, collect, analyse, then conclude and report.
A statistical investigation is one cycle: pose a clear question, collect the data, analyse it, then conclude and report. A conclusion can feed straight into the next question.

Posing a clear question

A statistical investigation follows a cycle, and it begins with a clear question. The question must be specific and answerable with data: not the vague question of whether students sleep enough, but something measurable like whether Year 9 students at our school sleep fewer than eight hours on school nights. Posing the question also means identifying the population you care about and the variable you will measure, and planning how you will gather the data. A well-framed question shapes everything that follows.

Collecting data: primary and secondary

Next comes collecting the data, and there is a basic choice in where it comes from. Primary data is data you gather yourself, through a survey, an experiment or direct measurement, giving you control over exactly what is recorded. Secondary data is data already collected by someone else, such as a national census or a published dataset, which is convenient but was gathered for someone else's purpose. Either way, the data may be categorical or numerical, and the sample must be chosen fairly, because a biased sample will undermine the whole investigation no matter how careful the later steps are.

Primary or secondary data?
Primary data you collect yourself; secondary data is already collected by others.
Primary data is collected by you (a survey, experiment or measurement). Secondary data is already collected by someone else (a census, a published dataset, official statistics).

Analysing the data

The third stage is analysis, where the earlier units do their work. You summarise the data using a measure of centre, the mean or median, and a measure of spread, the range or interquartile range. You choose a display suited to the data type, a bar chart, histogram, scatter plot or line graph, and if you are comparing groups you might place box plots side by side to contrast their centre, spread and shape. Analysis turns raw numbers into a picture clear enough to answer the question.

From data to summary
Analysis pulls together centre, spread and a display from the earlier units.
Analysis draws the earlier units together: a measure of centre (the median), a measure of spread (the range or IQR, shown as a box plot), and a display suited to the data.

Concluding and reporting

Then you reach a conclusion and report it. A conclusion should answer the original question directly, supported by the evidence you have gathered, and a clear report sets out the question, the method, the results and the conclusion so that someone else could follow and check your reasoning. Reporting is not an afterthought; it is how an investigation becomes something others can trust or challenge.

Reporting honestly
A full report states the question, method, results, conclusion and the limitations.
A complete report states the question, method, results and conclusion so others can check it, and the limitations (highlighted) so readers can judge the strength of the evidence.

The limitations of a conclusion

The heart of this unit, though, is recognising the limitations of any conclusion. A small sample gives a less reliable estimate than a large one, and another sample might well give somewhat different results, so a single investigation rarely settles a question for good. A conclusion strictly applies only to the population you actually sampled; surveying one school says little about the whole country. And if your sampling was biased, the conclusion may not generalise at all. Stating these limits honestly is what lets a reader judge the strength of the evidence rather than taking a headline figure at face value.

Limitations of a conclusion
A biased sample from one corner cannot speak for the whole population.
A sample drawn from one corner of the population is biased, so its conclusion applies to the group sampled, not beyond. Another sample could differ, which is why limits are stated.

Correlation is not causation

One limitation deserves special attention because it is so often ignored: correlation is not causation. Two variables can rise and fall together without one causing the other. Ice cream sales and drowning incidents both increase in summer, but eating ice cream does not cause drowning; a third factor, hot weather, drives both. Finding an association in data is a genuine result, but claiming that one variable causes the other is a much stronger claim that observation alone cannot justify. A sound investigation poses a precise question, collects suitable data from a fair sample, analyses it with appropriate summaries and displays, reports a conclusion that answers the question, and frames that conclusion with its limitations. Doing all of this, and being candid about the strength of the evidence, is what it means to think statistically rather than simply to quote a number.

Correlation is not causation
Two things rising together can share a hidden common cause.
Ice cream sales and drownings rise together, but neither causes the other: hot weather drives both. An association is real, yet on its own it never proves cause.
Quick self-check
1. What is the FIRST step of a statistical investigation?
2. You run your own survey of classmates. This data is:
3. Ice cream sales and drowning numbers both rise in summer. What can you conclude?
4. An investigation uses a small, biased sample. Its conclusion:
5. A good statistical report should include: