sampling errors - DominionFX

• A sampling error is the difference between a value estimated from a sample and the true value in the full population. It arises because a sample, even a randomly drawn one, is only an approximation of the population.
– Sampling error is distinct from systematic bias (sampling bias) and non-sampling errors (measurement, processing, coverage mistakes).
– You can quantify sampling uncertainty with a margin of error: for means, MOE ≈ Z × (σ / √n); for proportions use MOE ≈ Z × √[p(1 − p)/n]. When σ is unknown use the sample standard deviation or a t-distribution for small n.
– Practical ways to reduce sampling error include increasing sample size, using appropriate sampling designs (random, stratified, cluster as appropriate), improving response rates, and ensuring a correct sample frame.

Understanding sampling errors
A sampling error is the random fluctuation between a statistic computed from a sample and the corresponding parameter in the whole population. It is inevitable whenever you study only part of the population. Two important points:
– Sampling error is random variability — it does not imply careless work or bias by itself.
– A well-designed sample (random, sufficiently large, correctly framed) minimizes sampling error but cannot eliminate it entirely.

Types of sampling-related problems (definitions and examples)
1. Population specification error
– What it is: The researcher defines the target population incorrectly (who should be in or out).
– Example: Surveying ages 15–25 to estimate buying decisions, when many in that group aren’t primary purchasers.
– Practical fix: Write precise inclusion/exclusion criteria and validate them in a pilot.

2. Selection error (self-selection or voluntary response)
– What it is: The chosen sampling process allows participants to opt in or unevenly selects certain types of respondents.
– Example: Publishing an online poll and only hearing from highly motivated respondents.
– Practical fix: Use probability-based sampling, actively recruit hard-to-reach groups, and offer incentives or follow-ups.

3. Sample-frame error (coverage error)
– What it is: The list or frame used to select the sample does not match the target population.
– Example: Using a phone directory that omits cell-only households for a general-population survey.
– Practical fix: Build or buy a representative frame, combine multiple frames, or apply post-survey weighting (carefully).

4. Nonresponse error
– What it is: Selected respondents cannot be contacted or refuse and their nonresponse is correlated with the outcome.
– Example: A satisfaction survey where dissatisfied customers are less likely to reply.
– Practical fix: Follow-up contacts, incentives, mixed-mode data collection (phone + web + mail), and nonresponse analysis with weighting adjustments.

Calculating sampling error — formulas and procedure
– For a sample mean (continuous variable):
• Standard error (SE) = σ / √n (σ = population standard deviation). When σ unknown, use s (sample SD).
• Margin of error (MOE) at confidence level (e.g., 95%): MOE = Z × SE. For 95% confidence, Z ≈ 1.96. For small sample sizes, use t critical value.
• Example: σ = 15, n = 100 → SE = 15/10 = 1.5; MOE (95%) = 1.96 × 1.5 ≈ 2.94 units.

• For a proportion p (binary outcome):
• SE = √[p(1 − p) / n]
• MOE = Z × SE
• Conservative default for sample planning uses p = 0.5 because it maximizes SE.
• Example: p = 0.40, n = 1,000 → SE = √[0.4×0.6/1000] ≈ 0.0155; MOE (95%) ≈ 1.96×0.0155 ≈ 0.030 ≈ ±3.0 percentage points.

• Finite population correction (FPC): if sample is a substantial fraction (>5%) of a small population, multiply SE by √[(N − n)/(N − 1)].

Sampling error vs. standard error vs. margin of error
– Standard error (SE) is the estimated standard deviation of a sample statistic across repeated samples; it measures sampling variability.
– Margin of error (MOE) is a multiple of the SE (Z or t) that gives the radius of a confidence interval.
– People sometimes use “sampling error” to mean MOE, but strictly it’s the difference between sample estimate and true population parameter (an unobserved quantity).

Sampling error vs. sampling bias (and vs. non-sampling error)
– Sampling error: random differences due to limited sample size (reduced by larger n).
– Sampling bias (systematic error): consistent deviation in one direction due to faulty design (e.g., wrong frame, self-selection). Not fixed by increasing n.
– Non-sampling errors: other errors not due to sampling — measurement error (bad questions), data-entry mistakes, interviewer effects, etc.

Practical steps to reduce sampling error (actionable checklist)
1. Increase the sample size
• Doubling n reduces SE by √2. Determine n from desired MOE and confidence level:
• For proportions: n ≈ (Z^2 * p(1 − p)) / MOE^2 (use p = 0.5 if unknown).
• For means: n ≈ (Z^2 * σ^2) / MOE^2.
2. Use probability-based selection
• Simple random, stratified, or multi-stage cluster sampling ensures representativeness and allows correct SE estimation.
3. Stratify intelligently
• Stratify by known subgroups (age, region, income) to reduce within-stratum variability and obtain more precise subgroup estimates.
4. Improve the sample frame
• Ensure the sampling frame covers the target population; combine frames where necessary.
5. Boost response rates
• Multiple contact attempts, reminders, mixed-modes (phone + web + mail), incentives, and short clear questionnaires reduce nonresponse bias.
6. Weight and adjust
• Apply post-stratification or raking weights to align sample marginals with known population margins; evaluate increased variance due to weighting.
7. Pilot and replicate
• Pretest instruments and replicate studies or repeat sampling at different times to check stability.
8. Monitor and document nonresponse
• Compare respondents and nonrespondents on known characteristics; adjust weights or use imputation if nonresponse is systematic.
9. Use appropriate statistical methods for inference
• Use t-distribution for small samples, account for design effects when using complex samples, and apply finite-population correction when appropriate.

How sampling errors apply to real life — examples and consequences
– Public opinion polls: Small n or poor frame (e.g., landline-only lists) can produce misleading estimates of election support.
– Market research: Surveying only current customers produces biased forecasts about wider market demand.
– Audits: Transaction sampling must be well-designed to detect material misstatement; small samples risk missing fraud.
– Economic indicators: Large-scale government surveys (e.g., the U.S. Bureau of Labor Statistics’ employment surveys) use very large samples so sampling error is small relative to the estimates — but the remaining uncertainty is still reported.

Concrete examples
1. Polling example
• A poll of 1,000 likely voters with p̂ = 0.52 favoring Candidate A:
• SE = √[0.52×0.48/1000] ≈ 0.0158; MOE (95%) ≈ 1.96×0.0158 ≈ 0.031 = ±3.1 percentage points.
• Interpretation: Candidate A’s share is 52% ±3.1% (95% CI: 48.9% to 55.1%), ignoring bias and non-sampling errors.
2. Mean spending example
• Sample mean monthly spending = $200, sample SD s = $60, n = 150:
• SE = 60/√150 ≈ 4.90; MOE (95%) ≈ 1.96×4.90 ≈ 9.6; CI ≈ $200 ± $9.6.
3. Sampling frame error scenario
• If an online service surveys only its email list, it misses non-email users — coverage error that increasing n won’t fix.

When increasing sample size isn’t enough
– If the sample is biased (e.g., frame misses a subgroup or response patterns differ systematically), larger n still yields a precise but wrong answer. Address design and coverage first.

Fast fact
– Large government surveys reduce sampling error by using very large samples; for example, some labor surveys sample tens or hundreds of thousands of units to shrink sampling variability (see national statistical agency publications for current figures).

How to report sampling uncertainty correctly
– Always report:
• The point estimate (mean or proportion).
• The margin of error and confidence level (e.g., ±3 percentage points at 95% confidence).
• The sample size and sampling design (simple random, stratified, cluster).
• Any weighting or adjustments used and limitations (possible sources of bias).

Bottom line
Sampling error quantifies the unavoidable uncertainty from studying a subset of a population. It can be reduced (but not eliminated) by larger, better-designed samples, higher response rates, correct frames, and appropriate analysis. However, systematic errors (sampling bias and non-sampling errors) must be addressed by improving design and data-collection procedures — otherwise a precise estimate can still be inaccurate.

Sources and further reading
– Investopedia: “Sampling Error” (Paige McLaughlin).
– For methodology and official survey practice, see your national statistical office (e.g., Bureau of Labor Statistics) publications and sampling textbooks (e.g., Cochran, Sampling Techniques).