What is the chi-square (χ2) statistic?
– The chi-square statistic measures how much observed categorical data deviate from what a model (or null hypothesis) expects. It is commonly used to test whether differences between observed and expected counts can be explained by random variation, or whether they are unlikely under the assumed model.
Core formula
– χ2 = sum over categories of (Observed − Expected)^2 / Expected
– Written: χ2 = Σ (Oi − Ei)^2 / Ei
– Oi = observed count in category i
– Ei = expected count in category i
– Degrees of freedom (df) are used to interpret χ2 via the chi-square distribution:
– Goodness-of-fit test (no parameters estimated): df = k − 1 (k = number of categories)
– Test of independence for a contingency table: df = (r − 1) × (c − 1) (r = rows, c = columns)
– If parameters of the expected distribution are estimated from the data, subtract the number of estimated parameters from the df.
Two common uses
1. Goodness-of-fit test
– Asks whether observed counts follow a specified distribution for a single categorical variable (e.g., are observed age-group counts consistent with the company’s target-market proportions?).
2. Test of independence (contingency table)
– Tests whether two categorical variables are statistically independent (e.g., does course choice depend on student gender?). Expected counts for each cell are computed as:
– Ei = (row total × column total) / grand total
Step-by-step procedure (how to perform a chi-square test)
1. Define hypotheses:
– H0 (null): Observed counts follow the expected model (or variables are independent).
– H1 (alternative): Observed counts do not follow the model (or variables are associated).
2. Collect data ensuring they meet assumptions (see checklist below).
3. Compute expected counts (Ei) for each category or cell.
4. Calculate χ2 = Σ (Oi − Ei)^2 / Ei.
5. Determine degrees of freedom.
6. Compare the χ2 statistic to the chi-square distribution with the computed df to obtain a p-value:
– Use a chi-square table or software (R, Python, Excel, statistical calculators).
7. Interpret:
– If p ≤ chosen significance level (e.g., 0.05), reject H0; otherwise do not reject H0.
8. Report χ2, df, p-value, and practical interpretation.
Worked numeric example — coin toss (goodness-of-fit)
– Scenario: You toss a coin 100 times. Observed: Heads = 60, Tails = 40. Test whether the coin is fair (expected 50/50).
1. Expected counts: E_heads = 100 × 0.5 = 50; E_tails = 50.
2. Compute χ2:
– For heads: (60 − 50)^2 / 50 = 100 / 50 = 2
– For tails: (40 − 50)^2 / 50 = 100 / 50 = 2
– Total χ2 = 2 + 2 = 4
3. Degrees of freedom: df = k − 1 = 2 − 1 = 1
4. p-value: For χ2 = 4 with df = 1, p ≈ 0.0455
5. Interpretation: At a 5% significance level, p < 0.05, so you would reject the null hypothesis that the coin is fair. (Note: result is borderline and depends on assumptions and multiple-testing considerations.)
Checklist — when a chi-square test is appropriate
– Data are categorical (nominal or ordinal, though ordinal data discard ordering in the basic χ2 test).
– Observations are independent (no repeated measures on the same unit).
– Categories are mutually exclusive (each observation fits one category).
– Expected counts are sufficiently large:
– Common rule: expected count ≥ 5 for each cell; if not, use exact tests (e.g., Fisher’s exact test) or combine categories.
– Data come from a (preferably) random sample.
Limitations and caveats
– Not for continuous variables unless they are binned into categories (binning can distort information).
– Sensitive to sample size: very large samples can give significant χ2 for trivial differences; very small samples may lack power.
– Expected-cell rule: small expected counts make the χ2 approximation inaccurate.
– Does not indicate strength or direction — only whether observed differences are unlikely under H0. For association strength, use measures such as Cramér’s V or odds ratios.
– If parameters of the expected distribution were estimated from the same data, adjust degrees of freedom accordingly.
Who uses chi-square analysis?
– Researchers in social sciences, biology, marketing, quality control, epidemiology, and many applied fields — anywhere categorical data and hypotheses about frequencies are analyzed.
When to use nominal vs. ordinal
– The standard χ2 test treats categories as nominal (no order). If categories are truly ordinal and you want to use ordering information, consider alternative tests (e.g., Cochran–Armitage trend test).
Quick interpretation checklist after running a test
– Are assumptions met