Correlation - DominionFX

What is correlation (short answer)
– Correlation measures the strength and direction of a linear relationship between two variables. In finance, those variables are often asset returns, yields, or economic indicators. The correlation coefficient (commonly denoted r) ranges from −1.0 to +1.0:
– +1.0: perfect positive linear relationship (move together).
– −1.0: perfect negative linear relationship (move exactly opposite).
– 0.

: no linear relationship.

Interpretation and quick rules
– Magnitude (ignores units): |r| near 1 indicates strong linear association; near 0 indicates weak linear association.
– Sign: positive means variables tend to move in the same direction; negative means they move in opposite directions.
– Remember: correlation = linear association only. Nonlinear relationships can produce low r even when variables are strongly related.

How to calculate (Pearson sample correlation)
– Notation:
– xi, yi: paired observations (i = 1..n).
– x̄, ȳ: sample means.
– r: sample Pearson correlation coefficient.
– Formula (computationally stable form):
r = [sum_{i=1}^n (xi − x̄)(yi − ȳ)] / sqrt{ [sum_{i=1}^n (xi − x̄)^2] · [sum_{i=1}^n (yi − ȳ)^2] }
– Relationship to covariance:
– Sample covariance: cov(X,Y) = [sum (xi − x̄)(yi − ȳ)] / (n − 1)
– Sample standard deviations: sX = sqrt[ sum (xi − x̄)^2 / (n − 1) ], sY similar.
– Therefore r = cov(X,Y) / (sX · sY).

Worked numeric example (step-by-step)
– Data (five paired observations):
– X = [2, 4, 6, 8, 10]
– Y = [1, 4, 5, 7, 10]
– Step 1: compute means
– x̄ = (2+4+6+8+10)/5 = 6
– ȳ = (1+4+5+7+10)/5 = 5.4
– Step 2: compute deviations and products
– Deviations X: [-4, -2, 0, 2, 4]
– Deviations Y: [-4.4, -1.4, -0.4, 1.6, 4.6]
– Products (xi − x̄)(yi − ȳ): [17.6, 2.8, 0, 3.2, 18.4]
– Sum of products = 17.6 + 2

.8 + 0 + 3.2 + 18.4 = 42.

Step 3: compute sample covariance
– cov(X,Y) = [sum (xi − x̄)(yi − ȳ)] / (n − 1) = 42 / 4 = 10.5.

Step 4: compute sample standard deviations
– Sum of squared deviations for X = 16 + 4 + 0 + 4 + 16 = 40 → sX = sqrt(40 / 4) = sqrt(10) ≈ 3.1623.
– Sum of squared deviations for Y = 19.36 + 1.96 + 0.16 + 2.56 + 21.16 = 45.20 → sY = sqrt(45.20 / 4) = sqrt(11.30) ≈ 3.3634.

Step 5: compute Pearson correlation coefficient r
– r = cov(X,Y) / (sX · sY) = 10.5 / (3.1623 × 3.3634) ≈ 10.5 / 10.628 ≈ 0.988.

Interpretation of the numeric example
– r ≈ 0.99 indicates a very strong positive linear relationship between X and Y in this sample: as X increases, Y tends to increase almost linearly.
– This is a descriptive statistic for the sample; it does not prove causation.

Quick reference: formulas
– Sample covariance: cov(X,Y) = Σ(xi − x̄)(yi − ȳ) / (n − 1).
– Sample standard deviation: sX = sqrt[ Σ(xi − x̄)^2 / (n − 1) ].
– Pearson correlation (sample): r = cov(X,Y) / (sX sY).
– Population correlation (ρ): ρ = Cov(X,Y) / (σX σY) where σ denotes population SD.

Optional: testing whether r differs from zero (small-sample t-test)
– t = r · sqrt((n − 2) / (1 − r^2)), with df = n − 2.
– For this example: n = 5 → df = 3

If you want to test whether the observed sample correlation r is plausibly different from zero (the null hypothesis ρ = 0), use the small-sample t test you noted. I’ll continue the worked example and then review practical checks, confidence intervals, and important caveats for financial data.

Worked numeric example (continuing with n = 5 → df = 3)
– Formula: t = r · sqrt((n − 2) / (1 − r^2)), df = n − 2.
– Pick a demonstrative sample correlation: r = 0.90 (this is an example number — use your computed r from the data).
– Compute the pieces:
– n − 2 = 3.
– 1 − r^2 = 1 − 0.90^2 = 1 − 0.81 = 0.19.
– sqrt((n − 2)/(1 − r^2)) = sqrt(3 / 0.19) ≈ sqrt(15.789) ≈ 3.974.
– t = 0.90 × 3.974 ≈ 3.576.
– With df = 3, that t gives a two‑tailed p-value ≈ 0.036 (use t‑tables or software for the exact p). Interpretation: at α = 0.05 you would reject the null of zero correlation, but the sample is tiny so inference is fragile.

Confidence interval for the population correlation (Fisher z transform)
– Fisher z transform (to get an approximate normalizing transform):
– z’ = 0.5 · ln((1 + r)/(1 − r)).
– SE(z’) = 1 / sqrt(n − 3).
– For r = 0.90 and n = 5:
– z’ = 0.5 · ln(1.9 / 0.1) =

= 0.5 · ln(19) = 0.5 · 2.944439 ≈ 1.47222.

SE(z’) = 1 / sqrt(n − 3) = 1 / sqrt(2) ≈ 0.70711.

For a 95% confidence interval on z’ use z’ ± 1.96·SE(z’):

– Margin = 1.96 × 0.70711 ≈ 1.38593
– Lower z’ = 1.47222 − 1.38593 ≈ 0.08629
– Upper z’ = 1.47222 + 1.38593 ≈ 2.85815

Invert the Fisher z transform to get the CI for r (r = (e^{2z’} − 1) / (e^{2z’} + 1)):

– Lower limit: z’ = 0.08629 → e^{2·0.08629} ≈ 1.1884 → r_low ≈ (1.1884 − 1)/(1.1884 + 1) ≈ 0.086
– Upper limit: z’ = 2.85815 → e^{2·2.85815} ≈ 303 → r_high ≈ (303 − 1)/(303 + 1) ≈ 0.993

So the approximate 95% confidence interval for the population Pearson correlation is (0.09, 0.99). Interpretation: although the sample correlation r = 0.90 is statistically significant at α = 0.05 by the t‑test (t ≈ 3.576, df = 3, two‑tailed p ≈ 0.036), the confidence interval is very wide because n = 5. That wide interval (spanning weak to nearly perfect positive correlation) shows the estimate is imprecise — the data are consistent with much smaller true correlations as well as very large ones.

Key assumptions and limitations
– Paired observations are independent.
– The bivariate distribution is approximately normal (Fisher z relies on this to approximate normality).
– The relationship is approximately linear and not driven by outliers (outliers distort r strongly).
– With very small n, both p‑values and CIs are fragile; bootstrap or permutation methods can be more robust but still limited by sample size.

Practical checklist: how to report and act on a sample correlation
1. Plot the data (scatterplot) and look for nonlinearity or outliers.
2. Compute r and report sample size n.
3. Do a

3. Do a formal test and report uncertainty. Compute a significance test (two‑sided t-test) and a confidence interval (CI) for r; report the test statistic, degrees of freedom, p‑value, and a 95% CI (or another level you pre‑specified).

Step-by-step calculations (Pearson correlation, two‑tailed test and Fisher z CI)
1) Test statistic (t) for Pearson r
– Formula: t = r * sqrt((n − 2) / (1 − r^2))
– Degrees of freedom: df = n − 2
– Two‑tailed p‑value: use t distribution with df

2) 95% CI using Fisher z transformation
– Transform: z’ = atanh(r) = 0.5 * ln((1 + r)/(1 − r))
– Standard error: SE_z = 1 / sqrt(n − 3)
– CI in z: z’ ± z_alpha/2 * SE_z (for 95% use z_alpha/2 = 1.96)
– Back‑transform: r = tanh(z) = (e^{2z} − 1)/(e^{2z} + 1)

Worked numeric example
– Suppose n = 25 observations and sample correlation r