T Distribution - DominionFX

Key takeaways
– The t-distribution (Student’s t-distribution) is a bell‑shaped probability distribution with heavier tails than the normal distribution. It models extra uncertainty that arises when estimating a population standard deviation from a small sample.
– Use the t-distribution when the population standard deviation is unknown and the sample size is small (or when you prefer the more conservative inference).
– The shape (tail heaviness) is governed by degrees of freedom (df). Smaller df → heavier tails; as df → ∞ the t-distribution approaches the standard normal.
– The t-distribution underlies t-tests and confidence intervals for means. If population σ is known or n is very large, the normal (z) methods are appropriate.
(Sources: Investopedia; OpenStax Introductory Statistics)

1. What is the t‑distribution?
The t‑distribution (Student’s t) is a family of symmetric, bell‑shaped distributions parameterized by degrees of freedom (df). It was derived to account for extra variability caused by replacing an unknown population standard deviation σ with the sample standard deviation s in inference about a mean. Relative to the standard normal, the t‑distribution has heavier tails, reflecting greater probability of extreme sample outcomes when sample size is limited.

2. Why it matters — intuition
When σ is unknown you estimate it with s. That extra estimation error widens uncertainty about the standardized statistic. The t‑distribution’s wider tails give larger critical values and wider confidence intervals than the normal, producing more conservative (safer) inference for small samples.

3. Mathematical core formulas
– t-statistic (one‑sample):
T = (x̄ − μ0) / (s / sqrt(n))
where x̄ = sample mean, μ0 = hypothesized population mean, s = sample standard deviation, n = sample size.
– Degrees of freedom (one‑sample):
df = n − 1
– (1 − α)100% Confidence interval for population mean μ:
x̄ ± t_{α/2, df} * (s / sqrt(n))
where t_{α/2, df} is the critical t-value for tail area α/2.
– Two-sample cases:
• Equal variances assumed: pooled t with df = n1 + n2 − 2.
• Unequal variances (Welch’s t): use Welch–Satterthwaite approximation for df (non-integer df allowed).

4. When to use the t‑distribution
– Population σ is unknown (most common real-world case).
– Sample size is small (common rule of thumb: n < 30), but t is also valid for larger n.
– Data are approximately normally distributed (for small n); t is robust to mild departures from normality for moderate n.
– Typical uses: one-sample t-test, paired t-test, two-sample t-test (equal or unequal variances), confidence intervals for a mean.

5. Step‑by‑step: Compute a t‑based confidence interval for a mean
1. Check assumptions: observations independent; sampling roughly from a normal population (especially important when n is small).
2. Compute sample mean x̄ and sample standard deviation s.
3. Choose confidence level (e.g., 95%) → α = 0.05.
4. Get df = n − 1. Look up t_{α/2, df} in a t-table or compute with software.
5. Compute standard error SE = s / sqrt(n).
6. Margin of error = t_{α/2, df} * SE.
7. CI = x̄ ± margin of error.

Example (Investopedia’s DJIA illustration)
– Data: n = 27 trading days, sample mean x̄ = −0.33% (−0.0033), s = 1.07% (0.0107).
– df = 26. For 95% CI, t_{0.025,26} ≈ 2.055.
– SE = 0.0107 / sqrt(27) ≈ 0.00206 (0.206%).
– Margin = 2.055 * 0.00206 ≈ 0.00423 (0.423%).
– 95% CI = −0.33% ± 0.423% → approximately (−0.75%, +0.09%).
(See Investopedia example and OpenStax for background.)

6. Step‑by‑step: Perform a one‑sample t‑test
1. State H0 and H1 (e.g., H0: μ = μ0 vs H1: μ ≠ μ0).
2. Check assumptions: independence and approximate normality (for small n).
3. Compute t = (x̄ − μ0) / (s / sqrt(n)) and df = n − 1.
4. Determine p-value from t-distribution or compare |t| to critical t.
5. Decide: if p ≤ α, reject H0; otherwise fail to reject H0.
6. Report t, df, p-value, and confidence interval for effect size.

7. Types of t‑tests and when to use each
– One‑sample t-test: compare a sample mean to a known value.
– Paired t-test: compare means of dependent pairs (before/after).
– Two‑sample t-test (independent): compare means of two independent groups.
• Use pooled t-test if variances are equal (rarely justified without testing).
• Use Welch’s t-test (unequal variances) by default — it's robust and recommended when variances differ.

8. Practical software commands (quick)
– R: t.test(x, mu=…, alternative=…) for one-sample; t.test(x,y, var.equal=FALSE) for Welch.
– Python/scipy: scipy.stats.ttest_1samp(x, popmean); scipy.stats.ttest_ind(a,b, equal_var=False).
– Excel: T.TEST(array1, array2, tails, type) or Data Analysis Toolpak t-Test functions.
– For confidence intervals in Python: use scipy.stats.t.ppf to get critical t-value and compute x̄ ± t*SE.

9. t‑distribution vs. normal distribution (summary)
– Both: symmetric, bell-shaped, centered at zero (when standardized).
– Difference: t has heavier tails (higher probability of extreme values) — effect strongest when df small.
– Practical implication: t yields larger critical values and wider CIs than the z/normal approach when σ is unknown and sample sizes are small.

10. Limitations and cautions
– Assumption of approximate normality: for very small samples (n < ~15), severe non-normality (skewness, heavy tails) can invalidate t-based inference. Consider nonparametric methods or bootstrapping in such cases.
– Independence matters: correlated observations undermine t-test validity.
– Known σ: when the population standard deviation truly is known (rare), use z methods.
– Multiple comparisons: if conducting many t-tests, adjust for multiple testing (Bonferroni, FDR).
– For extreme non-normality or small n, resampling (bootstrap), permutation tests, or robust estimators may be preferable.

11. Practical diagnostics and robustness tips
– Visualize data (histogram, Q‑Q plot) to assess normality.
– For moderate n (≥30), t is robust to mild departures from normality due to the central limit theorem.
– If variances differ across groups, use Welch’s t-test rather than pooled variance.
– Report effect sizes and confidence intervals, not just p-values.

12. Explain Like I’m 5
Imagine you are guessing the average number of marbles in cookie jars. If you peek inside only a few jars, you’re less sure about your guess — your estimate of how spread out jar counts are is shaky. The t-distribution is a math tool that makes your guess more cautious to account for that extra shakiness.

13. Bottom line
The t‑distribution is a fundamental tool for inference about means when the population standard deviation is unknown. It corrects for extra uncertainty from estimating variability and is the basis for t-tests and t‑based confidence intervals. Use it when σ is unknown and especially when sample sizes are small; check assumptions, prefer Welch’s t for unequal variances, and consider alternatives (bootstrap, nonparametric) when normality or independence are questionable.

References and further reading
– Investopedia. “T-Distribution.”
– Illowsky, B., & Dean, S. (2018). Introductory Statistics. OpenStax. (Sections on t-distribution and hypothesis testing.)
– Student (W.S. Gosset), original motivation: small-sample inference for means.

Editor’s note: The following topics are reserved for upcoming updates and will be expanded with detailed examples and datasets.