What is the median?
The median is the middle value in an ordered list of numbers. It is the point that divides a dataset so that half the observations are less than or equal to it and half are greater than or equal to it. Because it depends only on order, not magnitude, the median is less sensitive to extreme values (outliers) than the mean (arithmetic average) and is often used to describe a “typical” value in skewed distributions (for example, household income).
Key properties (fast facts)
– The median is the 50th percentile (P50) of a dataset.
– For a perfectly symmetric (normal) distribution, mean = median = mode.
– The median is robust: one or a few extreme values move the mean a lot but change the median little or not at all.
– The median can be defined for unweighted and weighted data, for raw lists and for grouped (binned) frequency data.
When to use the median (practical rule of thumb)
– Use the median when your data are skewed or contain outliers (e.g., income, house prices).
– Use the mean when data are roughly symmetric and you need a measure that reflects total magnitude (e.g., for further algebraic calculations, variance, inferential statistics that assume means).
(See Investopedia; NCI Dietary Assessment Primer for normal distribution context.)
How to calculate the median — step‑by‑step (raw, ungrouped data)
1. Put all observations in numerical order (ascending or descending).
2. Count the number of observations, N.
3. If N is odd: the median is the value at position (N + 1)/2 in the ordered list.
– Example: Data 3, 13, 2, 34, 11, 26, 47. Ordered: 2, 3, 11, 13, 26, 34, 47. N = 7 (odd). Position = (7+1)/2 = 4 ⇒ median = 13.
4. If N is even: the median is the arithmetic mean of the two central values at positions N/2 and N/2 + 1.
– Example: Data 3, 13, 2, 34, 11, 17, 27, 47. Ordered: 2, 3, 11, 13, 17, 27, 34, 47. N = 8 (even). Central values = positions 4 and 5 = 13 and 17 ⇒ median = (13 + 17) / 2 = 15.
5. Interpret: the median is the “middle” observation by rank, not the arithmetic average.
Median vs mean — an illustrative example
– Data: 0, 0, 0, 1, 1, 2, 10, 10
– Mean = (0+0+0+1+1+2+10+10) / 8 = 24 / 8 = 3.
– Median = average of 4th and 5th values = (1 + 1) / 2 = 1.
Because of the high values (10s), the mean is pulled upward; the median better reflects the center of most observations. (See Investopedia.)
Median in a normal distribution
– For a normal (bell‑shaped, symmetric) distribution, the mean, median, and mode coincide at the center. In skewed distributions they differ: the mean is pulled toward the tail, the median stays nearer the bulk of observations. (NCI Dietary Assessment Primer; Investopedia.)
Median for grouped (binned) data — practical formula
When data are provided as frequency by class (e.g., 0–9, 10–19, …), you can estimate the median by interpolation:
1. Compute cumulative frequencies and find the median class where cumulative frequency ≥ N/2.
2. Use:
Median ≈ L + [(N/2 − CFB) / f] × h
where:
– L = lower boundary of median class,
– CFB = cumulative frequency before the median class,
– f = frequency of median class,
– h = class width (interval length),
– N = total frequency.
3. This gives an estimated median for continuous intervals. (Standard statistical method; used in practice for binned survey or histogram data.)
Weighted median (when observations have weights)
– The weighted median is the value at which the cumulative weight reaches 50% of total weight. It is useful when observations represent groups of unequal size (e.g., county incomes with differing populations).
– Practical steps:
1. Sort observations by value.
2. Accumulate their weights.
3. The weighted median is the first value where cumulative weight ≥ 50% of total weight.
How to compute medians quickly (tools)
– Excel / Google Sheets: =MEDIAN(range)
– Python: numpy.median(array) or statistics.median(list)
– R: median(x)
These handle odd/even lists and ignore non-numeric NA values if specified.
Interpreting median in applied contexts (financial and real‑world examples)
– Median household income: preferred to mean income because a few very high incomes would inflate the mean and misrepresent the “typical” household.
– Real estate: median sale price is reported because a couple of ultra‑high priced sales would otherwise distort the average.
– Medical studies: median survival time is often used because survival times are skewed and censored.
Limitations of the median
– Does not use information about the size/magnitude of all values (so two datasets with very different totals can have the same median).
– Not algebraically convenient for many inferential statistics that rely on sums and variances (mean is required for many parametric tests).
– For small samples with repeated values, the median may be less informative.
– For grouped data the median is an estimate (interpolation assumes uniform distribution within the class).
Practical checklist for choosing and reporting the median
– Check distribution shape (histogram, boxplot): if skewed or outliers are present, report median (and IQR).
– If reporting median, also report sample size (N) and a measure of spread: interquartile range (IQR) or range.
– For weighted samples, compute and report weighted median.
– For binned data, explain that the median is estimated via interpolation (report class widths and cumulative counts).
Bottom line
The median is a simple, robust measure of central tendency that identifies the middle observation (or midpoint between two middle observations). It is especially useful for skewed data or when outliers are present. Use the median with measures of spread (IQR) and be explicit about how it was calculated (raw, grouped, or weighted).
Sources
– Investopedia, “Median,” Sydney Saporito.
– National Center for Biotechnology Information (NCBI), “Median.”
– Corporate Finance Institute (CFI), “Mean.”
– National Cancer Institute (NCI), Dietary Assessment Primer: “Learn More About Normal Distribution.”
If you want, I can:
– Calculate the median for a dataset you provide (raw, weighted, or grouped), or
– Show Excel/Python formulas and a worked example for grouped or weighted medians.