AC9M10P02 · YEAR 10 · PROBABILITY

Estimating Conditional Probability by Simulation

ACARA v9 CONTENT DESCRIPTION design and conduct repeated chance experiments and simulations using digital tools to model conditional probability and interpret results
Builds on: Conditional Probability: Narrowing the World with Given (AC9M10P01). That unit worked out conditional probabilities exactly, by narrowing to a group and reading the fraction. Here we estimate the same kind of probability a different way, by running the experiment many times and counting, which is how real questions get answered when the exact reasoning is too tangled to do by hand.

When you cannot calculate, simulate

Some probabilities are easy to reason out, but many are not. Once a situation has several stages, depends on conditions, or mixes uneven chances, working out the exact answer by hand becomes fiddly and error-prone. Simulation offers a way around this. Instead of calculating, you set up a model of the chance experiment, run it a great many times using a digital tool, and count what happens. The proportion of trials that turn out a certain way becomes an estimate of the probability. This is exactly how engineers, scientists, and analysts handle questions that are too tangled for a neat formula: they let the computer play out the situation thousands of times and read the answer off the results.

A stream of simulated trials
A simulation repeats a chance experiment many times so the outcomes can be counted instead of calculated.
Each dot is one simulated morning. A navy ring marks the rainy trials, and a filled dot marks the late ones. To estimate the chance of being late given rain, we will look only inside the ringed dots.

Estimating a conditional means counting inside the condition

The conditional twist sits in what you count over. Suppose you want the chance the bus is late given that it is raining. You simulate many mornings, each with its own weather and bus outcome. To estimate the conditional probability, you do not look at every morning. You keep only the rainy ones, and among those you find the fraction that were late. The denominator is the number of rainy trials, not the total number of trials, because the condition given that it is raining tells you to set the dry mornings aside entirely. They simply are not part of the question. Getting this denominator right is the single most important step, and it is where most mistakes happen.

Count inside the condition only
A conditional estimate divides by the number of trials where the condition held, ignoring the rest.
With 40 trials, 19 were rainy and 11 of those were late, an estimate of 0.58. The denominator is the rainy count, not the total: trials where it was dry tell us nothing about the chance given rain.

More trials, a steadier answer

A fair worry is whether a simulated estimate can be trusted, since a handful of trials might fall unluckily. The reassuring fact is that estimates improve as trials accumulate. With ten rainy mornings the estimate can swing high or low on chance alone, but with five hundred it settles down and hugs the true value closely. This steadying is not luck; it is the law of large numbers at work, the same principle that makes a coin land heads close to half the time over many tosses. So the practical advice is plain: run plenty of trials. A few dozen gives a rough feel; many hundreds gives an estimate you can rely on. And because each run is a fresh sample, running the whole simulation again will give a slightly different number, which is expected rather than a sign of error.

More trials, steadier estimate
As the number of trials grows, the simulated estimate converges toward the true probability.
Plotting the running estimate after each rainy trial, the early values jump around because a handful of trials is noisy. As more rainy trials accumulate, the estimate settles toward the true 0.5. More trials means a more trustworthy estimate.

The wrong denominator answers the wrong question

It is worth seeing clearly what goes wrong if the denominator slips. Take the late-and-rainy mornings and divide instead by every morning, rainy or dry, and you are no longer estimating the chance of late given rainy. You are estimating the chance of late and rainy happening together, which is a smaller and quite different quantity. The number on top has not changed, but the meaning has, because the group you are measuring against has changed. This mirrors exactly the difference between a conditional and a joint probability from the previous unit. Designing a good simulation, then, is really about being clear which trials count: set up the model, run it many times, keep the trials the condition allows, and read the proportion among those. Interpret the result as an estimate, report roughly how many trials it rests on, and remember that more trials would sharpen it.

The denominator is the whole game
Dividing by the total trials rather than the conditioning trials estimates a joint probability, not a conditional one.
Dividing by the 19 rainy trials gives about 0.58, the estimate of late given rainy. Switch the denominator to see the common mistake.
Quick self-check
1. You simulate 200 mornings to estimate the chance a bus is late given it is raining. Which trials do you divide by?
2. After 10 simulated trials your estimate is 0.30; after 500 it is 0.49, close to the true value. Why trust the later estimate more?
3. In a simulation, 80 of 200 trials were rainy, and 36 of those rainy trials had a late bus. The estimate of the chance of late given rainy is closest to:
4. A simulation that uses a random spinner gives a slightly different estimate each time you run it. This is:
5. Why simulate a conditional probability instead of just reasoning it out?