What it is — short definition
– The correlation coefficient measures the strength and direction of a linear relationship between two variables.
– It ranges from −1 (perfect negative linear relationship) to +1 (perfect positive linear relationship). A value near 0 indicates little or no linear association.
Key terms (defined)
– Pearson correlation (Pearson’s r): the most common measure of linear correlation between two continuous variables.
– Covariance: a raw measure of how two variables move together (units depend on the variables).
– Standard deviation: a measure of dispersion around the mean for a single variable.
– R-squared (R²): the square of Pearson’s r in simple linear regression; interpreted as the proportion of variance in one variable explained by the other.
How the Pearson correlation is computed — formulas
1) Using covariance and standard deviations:
r = Cov(X, Y) / (σX · σY)
– Cov(X, Y) is the covariance between X and Y.
– σX and σY are the (population or sample) standard deviations for X and Y respectively.
Note: whether you use population or sample formulas must be consistent.
2) Equivalent formula using sums (useful for hand or spreadsheet calculation):
r = [ n·Σ(XY) − ΣX·ΣY ] / sqrt( [ n·Σ(X²) − (ΣX)² ] · [ n·Σ(Y²) − (ΣY)² ] )
– n = number of paired observations.
Worked numeric example (step‑by‑step)
Suppose we have five paired returns for Asset A (X) and Asset B (Y):
X = [2, 4, 6, 8, 10]
Y = [1, 3, 7, 9, 11]
Step 1 — compute sums:
– ΣX = 30 ; ΣY = 31
– Σ(XY) = 238
– Σ(X²) = 220 ; Σ(Y²) = 261
– n = 5
Step 2 — compute numerator:
– n·Σ(XY) − ΣX·ΣY = 5·238 − 30·31 = 1190 − 930 = 260
Step 3 — compute denominator parts:
– n·Σ(X²) − (ΣX)² = 5·220 − 30² = 1100 − 900 = 200
– n·Σ(Y²) − (ΣY)² = 5·261 − 31² = 1,305 − 961 = 344
Step 4 — compute denominator (square root of the product)
– sqrt( [n·Σ(X²) − (ΣX)²] · [n·Σ(Y²) − (ΣY)²] ) = sqrt(200 · 344) = sqrt(68,800) ≈ 262.307
Step 5 — compute Pearson correlation coefficient r
– r = numerator / denominator = 260 / 262.307 ≈ 0.9914
Interpretation
– r ≈ 0.9914 indicates a very strong positive linear relationship between X and Y for this sample: as X increases, Y tends to increase almost proportionally.
– Important caveats:
– Correlation measures linear association only. Nonlinear relationships can have low r even when variables are strongly related.
– Correlation does not imply causation.
– Pearson r is sensitive to outliers; a single unusual pair can materially change r.
– With small samples (here n = 5) estimates are less stable; treat extreme values cautiously.
Optional — quick significance check (student t)
– t = r · sqrt((n − 2) / (1 − r²))
– For n = 5: degrees of freedom = 3. r² ≈ 0.9829, so
– t ≈ 0.9914 · sqrt(3 / 0.0171) ≈ 13.12
– This t is very large, implying a very small p-value for no correlation (but remember small samples and the assumptions of the test).
Practical checklist to compute r yourself
1. Gather paired data (X, Y) and count n.
2. Compute ΣX, ΣY, Σ(XY), Σ(X²), Σ(Y²).
3. Compute numerator = n·Σ(XY) − ΣX·ΣY.
4. Compute denominator = sqrt([n·Σ(X²) − (ΣX)²] · [n·Σ(Y²) − (ΣY)²]).
5. Compute r = numerator / denominator.
6. Inspect scatterplot for linearity and outliers; consider alternative measures (Spearman rank) if assumptions fail.
Sources
– Investopedia — Correlation Coefficient: https://www.investopedia.com/terms/c/correlationcoefficient.asp
– Pearson correlation coefficient — Wikipedia: https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
– Khan Academy — Describing relationships (correlation): https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data
– NIST/SEMATECH Engineering Statistics Handbook: https://www.itl.nist.gov/div898/handbook/
Educational disclaimer: This explanation is for educational purposes only and is not individualized investment advice.