Test hypotheses about frequency distributions

There are two types of Pearson’s chi-square tests, but they both test whether the observed frequency distribution of a categorical variable is significantly different from its expected frequency distribution. A frequency distribution describes how observations are distributed between different groups.

Frequency distributions are often displayed using frequency distribution tables. A frequency distribution table shows the number of observations in each group. When there are two categorical variables, you can use a specific type of frequency distribution table called a contingency table to show the number of observations in each combination of groups.

Example: Bird species at a bird feeder
Frequency of visits by bird species at a bird feeder during a 24-hour period
Bird species Frequency
House sparrow 15
House finch 12
Black-capped chickadee 9
Common grackle 8
European starling 8
Mourning dove 6

A chi-square test (a chi-square goodness of fit test) can test whether these observed frequencies are significantly different from what was expected, such as equal frequencies.

Example: Handedness and nationality
Contingency table of the handedness of a sample of Americans and Canadians
Right-handed Left-handed
American 236 19
Canadian 157 16

A chi-square test (a test of independence) can test whether these observed frequencies are significantly different from the frequencies expected if handedness is unrelated to nationality.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Grammar
  • Style consistency

See an example

The chi-square formula

Both of Pearson’s chi-square tests use the same formula to calculate the test statistic, chi-square (Χ2):

  \begin{equation*} X^2=\sum{\frac{(O-E)^2}{E}} \end{equation*}

Where:

  • Χ2 is the chi-square test statistic
  • Σ is the summation operator (it means “take the sum of”)
  • O is the observed frequency
  • E is the expected frequency

The larger the difference between the observations and the expectations (O − in the equation), the bigger the chi-square will be. To decide whether the difference is big enough to be statistically significant, you compare the chi-square value to a critical value.

When to use a chi-square test

A Pearson’s chi-square test may be an appropriate option for your data if all of the following are true:

  1. You want to test a hypothesis about one or more categorical variables. If one or more of your variables is quantitative, you should use a different statistical test. Alternatively, you could convert the quantitative variable into a categorical variable by separating the observations into intervals.
  2. The sample was randomly selected from the population.
  3. There are a minimum of five observations expected in each group or combination of groups.

Types of chi-square tests

The two types of Pearson’s chi-square tests are:

Mathematically, these are actually the same test. However, we often think of them as different tests because they’re used for different purposes.

Chi-square goodness of fit test

You can use a chi-square goodness of fit test when you have one categorical variable. It allows you to test whether the frequency distribution of the categorical variable is significantly different from your expectations. Often, but not always, the expectation is that the categories will have equal proportions.

Example: Hypotheses for chi-square goodness of fit test
Expectation of equal proportions 

  • Null hypothesis (H0): The bird species visit the bird feeder in equal proportions.
  • Alternative hypothesis (HA): The bird species visit the bird feeder in different proportions.

Expectation of different proportions

  • Null hypothesis (H0): The bird species visit the bird feeder in the same proportions as the average over the past five years.
  • Alternative hypothesis (HA): The bird species visit the bird feeder in different proportions from the average over the past five years.

Chi-square test of independence

You can use a chi-square test of independence when you have two categorical variables. It allows you to test whether the two variables are related to each other. If two variables are independent (unrelated), the probability of belonging to a certain group of one variable isn’t affected by the other variable.

Example: Chi-square test of independence
  • Null hypothesis (H0): The proportion of people who are left-handed is the same for Americans and Canadians.
  • Alternative hypothesis (HA): The proportion of people who are left-handed differs between nationalities.

Other types of chi-square tests

Some consider the chi-square test of homogeneity to be another variety of Pearson’s chi-square test. It tests whether two populations come from the same distribution by determining whether the two populations have the same proportions as each other. You can consider it simply a different way of thinking about the chi-square test of independence.

McNemar’s test is a test that uses the chi-square test statistic. It isn’t a variety of Pearson’s chi-square test, but it’s closely related. You can conduct this test when you have a related pair of categorical variables that each have two groups. It allows you to determine whether the proportions of the variables are equal.

Example: McNemar’s test
Suppose that a sample of 100 people is offered two flavors of ice cream and asked whether they like the taste of each. 

Contingency table of ice cream flavor preference
Like chocolate Dislike chocolate
Like vanilla 47 32
Dislike vanilla 8 13
  • Null hypothesis (H0): The proportion of people who like chocolate is the same as the proportion of people who like vanilla.
  • Alternative hypothesis (HA): The proportion of people who like chocolate is different from the proportion of people who like vanilla.

There are several other types of chi-square tests that are not Pearson’s chi-square tests, including the test of a single variance and the likelihood ratio chi-square test.

Receive feedback on language, structure, and formatting