Bayes Theorem

Updated: September 26, 2025

What Bayes’ theorem is (short definition)
– Bayes’ theorem is a formula that revises the probability of a hypothesis when new information (evidence) becomes available. In probability language: it converts a prior probability (what you believed before seeing the evidence) into a posterior probability (what you should believe after seeing the evidence) by using the likelihood of the evidence under the hypothesis.

Key terms (defined)
– Probability: a number from 0 to 1 that measures how likely an event is to occur.
– Hypothesis (A): the statement whose probability you want to update (e.g., “the person uses the drug”).
– Evidence (B): the new information you observe (e.g., “the test result is positive”).
– Prior probability P(A): the probability of the hypothesis before seeing the evidence.
– Likelihood P(B|A): the probability of observing the evidence assuming the hypothesis is true.
– Posterior probability P(A|B): the probability of the hypothesis after accounting for the evidence.
– Marginal probability P(B): the overall probability of observing the evidence (under all possible hypotheses).

The formula
– Bayes’ theorem (one common form):
P(A|B) = P(B|A) × P(A) / P(B)
– Where P(B) can be expanded (for a binary hypothesis A vs. not-A):
P(B) = P(B|A) × P(A) + P(B|¬A) × P(¬A)

Derivation (brief, from conditional probability)
– By the definition of conditional probability:
P(A and B) = P(A) × P(B|A) = P(B) × P(A|B).
– Rearranging gives Bayes’ theorem:
P(A|B) = P(B|A) P(A) / P(B).

Why it’s useful (intuition)
– Bayes’ theorem corrects for base rates (how common the hypothesis is) and combines prior knowledge with new data.
– It turns “What is the probability of evidence given the hypothesis?” into “What is the probability of the hypothesis given the evidence?” — a common need in diagnostics, finance, and decision-making.

When to use Bayes’ theorem
– You want to update the probability of a hypothesis after receiving new information.
– You can quantify (or estimate) the prior and the likelihoods.
– The events and probabilities are well-defined and interpretable.

Special considerations and common pitfalls
– Base-rate neglect: ignoring the prior P(A) can produce misleading conclusions (common in interpreting tests).
– Choice of prior: in some contexts the prior is subjective; sensitivity analysis (checking different priors) is important.
– Independence assumptions: if you combine multiple pieces of evidence, ensure you don’t assume independence incorrectly.
– Small probabilities: numerical stability can be an issue; compute with care when probabilities are tiny.

Checklist before applying Bayes’ theorem
1. Define events precisely: state hypothesis A and evidence B.
2. Estimate the prior P(A) (base rate) and P(¬A) = 1 − P(A).
3. Estimate the likelihoods: P(B|A) and P(B|¬A) (or sensitivity/specificity if dealing with tests).
4. Compute the marginal P(B) = P(B|A)P(A) + P(B|¬A)P(¬A).
5. Compute the posterior P(A|B) = P(B|A)P(A) / P(B).
6. Do a sensitivity check: vary the prior or likelihoods to see how the posterior changes.
7. Interpret the posterior in context; report assumptions and uncertainty.

Worked numeric examples

1) Drug-test example (classic)
– Setup:
– Prior (prevalence) P(User) = 0.5% = 0.005.
– Test sensitivity P(Pos|User

= Continued worked numeric examples =

1) Drug-test example (classic) — continued
– Setup (continued):
– Prior (prevalence) P(User) = 0.5% = 0.005.
– Test sensitivity (true positive rate) P(Pos | User) = 99% = 0.99. Sensitivity: probability the test is positive when the person is a user.
– False positive rate P(Pos | ¬User) = 5% = 0.05. Specificity = 1 − false positive rate = 95%. Specificity: probability the test is negative when the person is not a user.

Step-by-step calculation:
1. Compute the marginal probability of a positive test (P(Pos)):
P(Pos) = P(Pos | User)·P(User) + P(Pos | ¬User)·P(¬User)
= 0.99·0.005 + 0.05·0.995
= 0.00495 + 0.04975
= 0.05470

2. Compute the posterior probability that the person is a user given a positive test (P(User | Pos)):
P(User | Pos) = P(Pos | User)·P(User) / P(Pos)
= 0.00495 / 0.05470
≈ 0.0905 = 9.05%

Interpretation: despite a highly sensitive test (99%), a positive result implies only about a 9% chance the person is an actual user because the condition is rare (0.5% prevalence) and the false-positive rate is non-negligible.

Sensitivity check (varying the prior):
– If prevalence = 5% (0.05), keep sensitivity = 0.99 and false positive = 0.05:
P(Pos) = 0.99·0.05 + 0.05·0.95 = 0.0495 + 0.0475 = 0.097
P(User | Pos) = 0.0495 / 0.097 ≈ 0.5103 = 51.0%

So increasing the prior (prevalence) from 0.5% to 5% changed the post-test probability from ~9% to ~51%. That illustrates how strongly the prior matters.

Odds form (useful for numerical stability and sequential updates)
– Prior odds = P(User) / P(¬User) = 0.005 / 0.995 ≈ 0.005025.
– Likelihood ratio (LR) = P(Pos | User) / P(Pos | ¬User) = 0.99 / 0.05 = 19.8.
– Posterior odds = prior odds × LR = 0.005025 × 19.8 ≈ 0.0996.
– Convert back to probability: P = odds / (1 + odds) = 0.0996 / 1.0996 ≈ 0.0906 (≈9.06%), same as above.

Log-odds update (helps when probabilities are tiny):
– logit(P) = ln[P / (1 − P)]. Then logit(posterior) = logit(prior

+ ln(LR). In words: converting probabilities to log‑odds (the logit), you add the natural log of the likelihood ratio (LR) to update the belief. This is numerically stable when probabilities are very small or when you want to apply multiple updates sequentially.

Worked logit example (continuing the numeric example above)
– Prior P(User) = 0.005 → prior odds = 0.005025.
– logit(prior) = ln(odds) = ln(0.005025) ≈ −5.293.
– LR = 19.8 → ln(LR) ≈ 2.986.
– logit(posterior) = −5.293 + 2.986 = −2.307.
– posterior odds = exp(−2.307) ≈ 0.0996 → posterior P = 0.0996 / (1 + 0.0996) ≈ 0.0906 (≈9.06%).
This matches the direct Bayes calculation but avoids intermediate rounding issues.

Sequential updates (multiple independent tests)
– If you run two independent identical tests and both are positive, the posterior odds = prior odds × LR^2.
– Using prior odds = 0.005025 and LR = 19.8: posterior odds = 0.005025 × 19.8^2 ≈ 0.005025 × 392.04 ≈ 1.969.
– Posterior probability = 1.969 / (1 + 1.969) ≈ 0.663 (≈66.3%).
Stepwise interpretation: after the first positive test the probability rises to ≈9.1%; after the second positive it rises further to ≈66.3%.

Handling negative (or mixed) test results
– For a negative result use the negative likelihood ratio LR− = P(Neg | User) / P(Neg | ¬User) = (1 − sensitivity) / specificity.
– Example with sensitivity = 0.99, specificity = 0.95 → LR− = 0.01 / 0.95 ≈ 0.010526.
– Prior odds 0.005025 × LR− ≈ 0.0000529 → posterior probability ≈ 0.00529% (very small).
– If you get one positive and one negative test, multiply prior odds by LRpos × LR− (order doesn’t matter if tests are independent).

Common pitfalls and assumptions
– Base‑rate fallacy: ignoring the prior (base rate or prevalence) typically leads to overestimating the significance of a positive result.
– Independence assumption: multiplying LRs assumes test results are conditionally independent given the hypothesis. Correlated tests violate this and overstate confidence.
– Constancy of sensitivity/specificity: test performance can vary across populations, settings, or time. Use context‑appropriate values.
– Circular updates: don’t use the same data twice (e.g., build a prior from the same sample you then update with).
– Small‑probability arithmetic: use odds or log‑odds when probabilities are extreme to avoid underflow or rounding issues.

Practical checklist for applying Bayes in finance, trading, or research
1. Define hypotheses clearly (e.g., “market regime A” vs “not A”).
2. Estimate a realistic prior; document data and judgment used.
3. Specify the likelihoods: how probable is each possible observation under each hypothesis?
4. Convert to odds or log‑odds for sequential updating and numerical stability.
5. Apply LR = P(observation | hypothesis) / P(observation | not hypothesis); update prior odds by multiplying by LR.
6. Convert back to probability if needed and report uncertainty (confidence bounds, sensitivity analyses).
7. Check assumptions: are observations independent? Are likelihoods stationary?
8. Run sanity

8. Run sanity checks: backtest the update on historical periods, run sensitivity analyses on priors and likelihoods, and check whether results change qualitatively if you tweak assumptions.

9. Interpret and communicate uncertainty: report posterior probability, credible intervals or percentiles, and a short description of assumptions (independence, stationarity, sample size). Always show how sensitive the posterior is to different priors and to plausible changes in likelihoods.

10. Operationalize (if appropriate): embed the posterior into decision rules (position sizing, stop-loss, info filters) only after stress-testing and accounting for transaction costs and slippage.

Common practical pitfalls
– Dependent observations: treating correlated signals as independent inflates confidence.
– Over‑confident likelihoods: subjective or small-sample estimates for P(obs | H) can dominate the posterior.
– Data snooping: testing many hypotheses and reporting the one with highest posterior without correcting inflate false discoveries.
– Nonstationarity: regimes change; likelihoods estimated in one period may not apply later.
– Misreading priors: a very small prior cannot be overcome without very strong likelihood evidence; conversely, a tiny likelihood ratio cannot overturn a strong prior.

Worked numeric example — single Bayesian update (trading signal)
Scenario: you believe the probability that the market is in “regime A” today is 10% (prior = 0.10). You observe a trading signal. Historical analysis gives:
– P(signal | regime A) = 0.70
– P(signal | not A) = 0.20

Step 1 — compute likelihood ratio (LR):
LR = P(signal | A) / P(signal | not A) = 0.70 / 0.20 = 3.5

Step 2 — convert prior to odds:
prior odds = p / (1 − p) = 0.10 / 0.90 = 0.1111

Step 3 — update odds:
posterior odds = prior odds × LR = 0.1111 × 3.5 = 0.3889

Step 4 — convert back to probability:
posterior p = posterior odds / (1 + posterior odds) = 0.3889 / 1.3889 ≈ 0.28 → 28%

Interpretation: after seeing the signal, the probability of regime A rises from 10% to about 28%.

Sequential updating (same independent signal repeated)
If you see a second independent signal with the same characteristics, apply LR again:
– new posterior odds = 0.3889 × 3.5 = 1.361
– new posterior p = 1.361 / (1 + 1.361) ≈ 0.576 → 57.6%

Log‑odds (numerical stability)
– logit(p) = ln(p / (1 − p)). Update by addition: logit(posterior) = logit(prior) + ln(LR).
Numeric example for initial prior 0.10:
logit(prior) = ln(0.1/0.9) ≈ −2.197
ln(LR) = ln(3.5) ≈ 1.253
logit(posterior) = −2.197 + 1.253 = −0.944
posterior p = 1 / (1 + e^{0.944}) ≈ 0.28 (matches odds method)

Conjugate example — Beta‑Binomial (binary signal with uncertain rate)
Use a Beta(a,b) prior for the success probability θ of a binary signal. Observing s successes in n trials gives posterior Beta(a + s, b + n − s).

Example:
– Prior Beta(2,8) → prior mean = 2/(2+8) = 0.20
– Observe n = 5 trials with s = 3 successes
– Posterior = Beta(2+3, 8+2) = Beta(5,10) → posterior mean = 5/(5+10) = 0.333

This gives a principled way to combine prior belief about a signal’s hit rate with observed outcomes and to compute credible intervals.

Computation and implementation tips
– Use log-odds for sequential updates and when LR is extreme to avoid underflow/overflow.
– If likelihoods are uncertain, treat them as distributions and use Monte Carlo / simulation

If likelihoods are uncertain, treat them as distributions and use Monte Carlo / simulation.

Practical next steps and checks

1) Sequential updating with log-odds (numeric worked example)
– Why use log-odds: multiplying many likelihood ratios can underflow/overflow; using logarithms converts products to sums and is numerically stable. Log-odds = log(p / (1 − p)), where p is a probability.
– Formula (one observation): posterior log-odds = prior log-odds + log(LR), where LR = P(data | H) / P(data | not H).
– Numeric example:
– Prior p = 0.20 → prior log-odds = ln(0.20 / 0.80) = ln(0.25) ≈ −1.3863.
– Model: a “success” has P(success | H) = 0.60 and P(success | not H) = 0.10 → LR_success = 0.60 / 0.10 = 6 → log-LR ≈ 1.7918.
– After observing one success: posterior log-odds = −1.3863 + 1.7918 = 0.4055 → posterior p = exp(0.4055) / (1+exp(0.4055)) ≈ 0.60.
– If instead you observed one failure: LR_failure = (1−0.60) / (1−0.10) = 0.40 / 0.90 ≈ 0.4444 → log-LR ≈ −0.8109. Posterior log-odds = −1.3863 − 0.8109 = −2.1972 → posterior p ≈ 0.10.
– How to implement in code: maintain a running scalar for log-odds and add log-LR for each observation.

2) Computing credible intervals for Beta posteriors
– Credible interval: interval [L, U] such that P(θ ∈ [L,U] | data) = desired probability (e.g., 0.95). This is a Bayesian interval, not a frequentist confidence interval.
– Exact method (preferred): use the inverse CDF (quantiles) of the Beta posterior:
– If posterior = Beta(a’, b’), then a 95% equal-tailed credible interval is [qbeta(0.025, a’, b’), qbeta(0.975, a’, b’)] (qbeta is available

qbeta is available in many statistical packages (R, Python wrappers, etc.). Practical options and steps:

3) Software examples — exact credible interval (equal-tailed)
– R (base): use qbeta(probability, shape1, shape2). If prior = Beta(a,b) and you observe k successes in n Bernoulli trials, posterior = Beta(a+k, b+n−k). Example:
– prior Beta(1,1) (uniform), n = 10, k = 7 → posterior Beta(1+7, 1+3) = Beta(8,4).
– 95% equal-tailed credible interval = [qbeta(0.025, 8, 4), qbeta(0.975, 8, 4)].
– Python (SciPy): use scipy.stats.beta.ppf. Example code (conceptual, not individualized advice):
– from scipy.stats import beta
– a_post, b_post = 8, 4
– lower, upper = beta.ppf(0.025, a_post, b_post), beta.ppf(0.975, a_post, b_post)

Note: for Beta(8,4) the posterior mean = 8/(8+4) ≈ 0.6667. A normal approximation gives an approximate 95% interval ≈ 0.41 to 0.92; the exact Beta quantiles are similar but slightly different because the Beta is skewed.

4) Highest posterior density (HPD) interval — shortest credible interval
– Definition: the HPD interval of level 1−α is the interval [L,U] that contains 1−α of the posterior probability and such that every point inside has posterior density at least as large as any point outside. For skewed posteriors, HPD is often shorter and more informative than equal-tailed intervals.
– How to compute (practical algorithms):
1. If closed-form quantiles are available, numeric optimization can locate L and U such that U > L, ∫_L^U p(θ|data)dθ = 1−α, and p(L) = p(U). Many packages include HPD functions (e.g., some Bayesian libraries).
2. Monte Carlo method (simple and robust):
– Draw M samples from the posterior distribution (for Beta, directly sample from Beta(a’, b’)).
– Sort the samples.
– For each i from 1 to M − floor((1−α)M), consider interval [sample_i, sample_{i + floor((1−α)M)}].
– Choose the shortest such interval; that’s an empirical HPD.
– This Monte Carlo approach generalizes to any posterior you can sample from.

5) When equal-tailed vs HPD matters
– Equal-tailed: symmetrical in probability but not in density; easier to compute and interpret.
– HPD: gives the shortest interval and better reflects the highest-density region for skewed posteriors.
– Choose equal-tailed for routine reporting and HPD for decision problems where minimizing interval length matters.

6) Combining sequential updates with credible intervals
– Maintain two running representations:
1. Log-odds scalar for fast sequential decision updates (useful for likelihood-ratio based decisions).
2. Conjugate-parameter pair (a_post, b_post) for the Beta posterior to compute means/credible intervals exactly.
– Update rules for Bernoulli/Beta model:
– After observing success (1): a_post ← a_post + 1.
– After observing failure (0): b_post ← b_post + 1.
– Posterior mean = a_post / (a_post + b_post).
– Posterior credible interval = [qbeta(α/2, a_post, b_post), qbeta(1−α/2, a_post, b_post)] or HPD via sampling.
– Example mixing log-odds and Beta updates:
– Prior odds for event with prior p0: odds0 = p0 / (1−p0); logod