AC9M10ST04 · YEAR 10 · STATISTICS

Two-Way Tables

ACARA v9 CONTENT DESCRIPTION construct two-way tables and discuss possible relationship between categorical variables
Builds on: Scatterplots and Correlation. Scatterplots handle two numerical variables; this unit handles two categorical ones. The two-way table organises counts across categories, and comparing proportions within it reveals whether the two variables are associated, an echo of the correlation idea for categories.

Organising two categories at once

Not all data is numerical. Often the variables are categories, such as whether a student owns a pet (yes or no) and whether they live in a house or a flat. To study two such categorical variables together, the right tool is a two-way table, which cross-classifies the data by both at once. One variable labels the rows and the other the columns, and each inner cell holds the count of people falling into that particular combination. In a survey of 100 students, for instance, the table might show 40 who own a pet and live in a house, 10 who own a pet and live in a flat, and so on. At a glance, a two-way table turns a jumble of paired categories into an organised set of counts ready to analyse.

A two-way table
A two-way table cross-classifies data by two categorical variables, counting each combination.
A two-way table classifies the same people by two categorical variables, here pet ownership and home type. Reveal the counts that fill each combination.

The margins: row and column totals

Around the edge of a two-way table sit the totals, known as the marginal totals or simply the margins. Each row total counts everyone in that row regardless of column, so 50 students own a pet and 50 do not. Each column total counts everyone in that column, so 60 live in houses and 40 in flats. In the corner sits the grand total, the whole sample, here 100. A useful check is that the row totals and the column totals each add up to the same grand total, since both count everyone, just grouped differently. The margins describe each variable on its own, ignoring the other, and they are the starting point for the proportions that follow.

Row and column totals: the margins
Row and column totals, called marginal totals, count everyone in each category and sum to the grand total.
Adding across each row and down each column gives the totals, shown in green at the margins. Reveal them to complete the table.

Joint proportions

From the counts, proportions tell the real story, and the first kind is the joint proportion: a single cell divided by the grand total. It answers how common one particular combination is among everyone. With 40 of the 100 students both owning a pet and living in a house, the joint proportion of that combination is 40 out of 100, or 40 percent. Joint proportions for every cell together add up to 100 percent, because every person sits in exactly one combination. They give a sense of which combinations are common and which are rare across the whole group, but to compare groups fairly a different kind of proportion is needed.

Joint proportion: a cell over the whole
A joint proportion divides one cell by the grand total, the share of everyone in that combination.
Of all 100 students, how many fall in the pet-and-house cell? Reveal the joint proportion, that cell as a share of everyone.

Conditional proportions

The most revealing figure from a two-way table is the conditional proportion, which is taken within a single group rather than across everyone. To find the proportion of house dwellers who own a pet, you divide the pet-and-house cell by the house total, not by the grand total: 40 out of 60, which is about 67 percent. The denominator is the row or column total for the group you are conditioning on. Conditional proportions answer questions like the rate of pet ownership among house dwellers, as opposed to among everyone, and getting the denominator right, the group total rather than the grand total, is the single most important skill in reading these tables.

Conditional proportion: within a group
A conditional proportion divides a cell by its row or column total, the rate within that group.
A conditional proportion is taken within one group. Among the 60 house dwellers, 40 own a pet, so 67 percent of house dwellers own a pet. The denominator is the row or column total, not the grand total. Switch to flat dwellers to compare.

Detecting association

The point of all this is to judge whether the two categorical variables are associated, that is, whether knowing one tells you something about the other. The test is to compare the conditional proportions across the groups. Among house dwellers, 67 percent own a pet; among flat dwellers, only 25 percent do. Because these rates differ substantially, pet ownership and home type are associated in this sample: home type is informative about the chance of owning a pet. Had the two conditional rates been roughly equal, there would be no association, with the variables behaving independently. This comparison of conditional proportions is the categorical counterpart of correlation, and as ever, an association found this way does not by itself prove that one variable causes the other.

Reading association from the table
Two categorical variables are associated when conditional proportions differ markedly across groups.
The way to detect a relationship in a two-way table is to compare the conditional rates. Reveal the comparison of pet ownership rates between house and flat dwellers.
Quick self-check
1. A two-way table is used to:
2. In a two-way table, the row and column totals are called the:
3. Of 100 students, 40 both own a pet and live in a house. The joint proportion for that cell is:
4. Among 60 house dwellers, 40 own a pet. The conditional proportion of pet owners among house dwellers is:
5. House dwellers own pets at 67% but flat dwellers at 25%. This tells you the two variables are: