What the Delphi method is (short definition)
– The Delphi method is a structured forecasting and decision‑making process that seeks agreement (consensus) from a panel of subject‑matter experts. It uses a sequence of anonymous questionnaires. After each round the facilitator shares a summary of the group’s replies so experts can revise their answers in light of the aggregated feedback.
Origins and purpose
– Developed at RAND in the early 1950s by Olaf Helmer and Norman Dalkey, the technique was named after the Oracle of Delphi to emphasize its role in anticipating future developments. It is used when empirical data are limited and expert judgment is the best available input.
Key elements (what makes a Delphi study)
– Facilitator: a neutral coordinator who designs questionnaires, aggregates responses, and controls rounds.
– Panel of experts: selected for knowledge relevant to the question.
– Anonymity: experts respond without identifying themselves to the group, reducing dominance or status bias.
– Iteration: multiple rounds of questionnaires with controlled feedback.
– Aggregation and feedback: summaries (e.g., averages, common rationales) are shared after each round.
– Predefined stopping rule: the study defines in advance how and when consensus or sufficient stability has been reached.
Step‑by‑step process (practical)
1. Define the question or scope precisely (forecast, guideline, priorities, etc.).
2. Choose and recruit an expert panel with relevant, diverse expertise.
3. Design the first questionnaire with clear items and response formats (ratings, ranks, open comments).
4. Send round 1; collect responses and anonymized rationales.
5. Aggregate results (quantitative summaries, common themes) and prepare feedback.
6. Send round 2 with the group summary and opportunity to revise responses.
7. Repeat aggregation and further rounds until the stopping rule is met (consensus, stability, or resource limit).
8. Report results: quantitative summary, range of opinion, and any remaining disagreements.
How consensus is treated
– Consensus is defined in the study protocol before the survey begins. The facilitator may use statistical summaries (means, medians, interquartile range) or a percentage agreement rule to judge whether views have converged. The facilitator also decides whether additional rounds are necessary.
Typical length and rounds
– There is no fixed number of rounds. In practice many studies use two to four rounds; some problems reach useful convergence after two rounds, while more complex topics may require more.
Advantages
– Combines expert judgment and group wisdom without face‑to‑face pressure.
– Preserves anonymity, reducing dominance, “halo,” or status effects.
– Allows experts to reflect and revise their views based on others’ reasoning.
– Can be run remotely, making
making it practical for geographically dispersed panels and time‑constrained projects.
Disadvantages
– Selection bias — Outcomes depend heavily on which experts are chosen and how “expert” is defined.
– Facilitator influence — The person or team running the process can shape results via question wording, presentation of feedback, or how summaries are framed.
– False consensus — Statistical convergence can hide substantive disagreement if minority viewpoints are marginalized or if questions are leading.
– Time and attrition — Multiple rounds take time; some experts drop out between rounds (attrition), which can skew results.
– Limited empirical validation — Delphi yields structured expert opinion, not empirical proof; results should be treated as informed judgment, not fact.
When to use Delphi
– Long‑range forecasting (technology adoption, market demand).
– Policy or guideline development where evidence is incomplete.
– Prioritization exercises (risk ranking, research agendas).
– Complex problems needing multidisciplinary input and anonymity to reduce group pressure.
Step‑by‑step checklist for running a Delphi study
1. Define the problem and scope clearly. State objectives, timeline, and how consensus will be judged.
2. Choose a facilitator or facilitation team. They design instruments, summarize responses, and maintain anonymity.
3. Select experts. Document selection criteria (experience, domain coverage). Aim for diversity and a sample size that balances practicality and robustness (commonly 10–50).
4. Design
4. Design the instrument and response formats.
– Decide question types. Start with open-ended prompts in Round 1 to capture ideas, then convert responses into structured items (statements, scenarios, or quantitative forecasts) for subsequent rounds.
– Choose response scales. Common options: numerical probabilities (0–100%), Likert scale (agreement 1–5; define on first use), or ranked lists. Define anchors (what “1” and “5” mean).
– Specify what counts as “consensus” (see Step 8). Common operational rules: median within a pre‑specified band, interquartile range (IQR) below a threshold for ordinal items, or standard deviation below a threshold for continuous estimates. Define these before data collection.
– Prepare instructions and definitions. Include concise definitions for technical terms so experts rate the same thing. Provide examples.
– Plan anonymity and communication. State how you will keep responses anonymous (e.g., unique ID numbers), how feedback will be presented (aggregated statistics + anonymized comments), and rules for follow‑up.
– Plan data recording and security. Choose a survey tool that supports rounds and preserves timestamps and versioning.
5. Pilot the instrument.
– Test with 3–5 people similar to your panel (not the panel itself). Check clarity, length, and whether answers can be aggregated.
– Revise items that produce ambiguous or widely divergent interpretations. Record pilot changes so readers can evaluate possible bias.
6. Run iterative rounds (typical: 2–4 rounds).
– Round 1: collect initial judgments and rationales. Aim for qualitative breadth plus any quantitative estimates. Deadline: 7–14 days, depending on panel availability.
– Synthesize Round 1: facilitator converts open responses into structured items and extracts representative anonymized rationales.
– Round 2: present aggregated statistics (e.g., median, range, IQR) and selected anonymized arguments. Ask experts to revise or explain persistence of their view. Deadline: 7–14 days.
– Optional Round 3+: repeat synthesis and revision. Each round typically narrows dispersion; additional rounds have decreasing returns.
– Keep response-rate targets explicit (e.g., ≥70% per round). Send polite reminders, and track attrition.
7. Aggregate and measure consensus.
– Use
– Use robust summary statistics. For ordinal or skewed quantitative judgments prefer the median and interquartile range (IQR) over the mean and standard deviation; for probability estimates report median and a credible interval (e.g., 25th–75th). For ranked choices report the modal rank and dispersion. Avoid single-number summaries without a dispersion measure.
– Report convergence metrics. Common practical measures:
– IQR (interquartile range): width between 75th and 25th percentiles. Narrowing IQR across rounds signals convergence.
– Percent within tolerance: share of panel whose estimate lies within a prespecified band around the median (e.g., ±10 percentage points).
– Change-in-median or change-in-IQR between rounds: absolute or relative change to show movement.
– Kendall’s coefficient of concordance (W) for ranked items: quantifies overall agreement across raters (0 = no agreement, 1 = complete agreement). Formula:
W = (12 * S) / (m^2 * (n^3 − n))
where m = number of raters, n = number of items ranked, and S = sum over items of (R_j − R̄)^2, with R_j the sum of ranks for item j and R̄ the mean rank-sum.
– Note: choose measures that match your response type (ranked vs numeric vs probabilistic).
Worked numeric example (point estimates)
– Seven experts give probability estimates for Event A: [10, 15, 20, 20, 30, 35, 50] (%)
– Median = 20%.
– Q1 (25th percentile) = 15%, Q3 (75th) = 30% → IQR = 15 percentage points.
– Percent within ±10 points of median (10–30%): five of seven experts → 71.4% convergence.
– If next round yields median 22% and IQR 10, you can quantify change-in-median = +2 pp, IQR reduced by 33%.
8. Stopping rules (when to end rounds)
– Predefine stop criteria before Round 1. Typical rules:
– Fixed max rounds (commonly 2–4).
– Minimal change thresholds: e.g., change-in-median < 2 percentage points and IQR reduction 30% or at least 80% of participants change their answer by 30%), or stability of median over two consecutive rounds.
5. Run rounds with structured feedback
– Round 1: collect anonymous individual responses.
– Between rounds: provide summary feedback (median, IQR, distribution graphic, anonymized rationales).
– Round 2+: allow participants to revise answers after seeing feedback.
– Maintain anonymity to reduce social pressure and dominance effects.
6. Aggregate and report results
– Report central tendency (median and mean), dispersion (IQR, SD), and any trimmed or weighted estimates.
– Document methods: panel composition, exact questions, aggregation formulas, stopping rule, number of rounds, and any weights applied.
Aggregation metrics — formulas and when to use them
– Median: middle value when responses sorted. Use by default for skewed distributions and small samples.
– Arithmetic mean: sum(xi)/n. Use when distribution is symmetric and outliers are meaningful.
– Trimmed mean: remove top and bottom p% then average remaining values. Reduces outlier influence; specify trimming fraction.
– Weighted mean: sum(wi * xi) / sum(wi), where wi are nonnegative weights. Use only when weights are transparent (e.g., inverse calibration error); avoid ad-hoc weighting.
– IQR: