Healthcare decisions should be based on all relevant evidence.

Network meta-analysis (NMA), also termed multiple treatment meta-analysis or mixed treatment comparisons, was developed as an extension of pairwise meta-analysis to allow comparisons of more than two interventions in a single, coherent analysis of all the relevant RCTs.

The underlying idea is very simple: consider three friends, Anne, Ben and Charles. If we know that Ben is 7 cm taller than Anne, and that Charles is 10 cm taller than Anne, then we know that Charles is 3 cm taller than Ben, and is therefore the tallest. We can also rank the friends in terms of who is tallest as 1=Charles, 2=Ben, 3=Anne. So, by taking Anne’s height as reference and measuring the heights of the others compared with hers, we know how everyone’s height compares to each other and how to order the friends by height. The only assumption being made is that the heights we measured are an accurate reflection of the true heights of the three friends (in other words, we used a sufficiently accurate measuring tool). It is easy to see that the same relative heights and ranks would be obtained if one of the male friends had been the reference, and how the height relationships would extend if more than three friends had been measured. This is exactly how NMA works, although we also take the uncertainty (ie, the sampling error) in the relative effect estimates into account, as is standard in meta-analysis.

Suppose we are interested in comparing treatments B and C. We find one trial comparing B with A, giving an MD of −2.3 with an SE of 0.45 and one trial comparing C with A giving an MD of −4 with an SE of 0.5. This suggests that both treatments B and C are better than A with 95% CIs that exclude no effect: (−3.18 to –1.42) for B compared with A and (−4.98 to –3.02) for C compared with A (assuming a reduction in the mean is desirable, eg, for pain). The network formed by these comparisons is given in

An example of a network of three treatments compared in two trials (solid black lines), where an indirect comparison can be made (dashed grey line). MD, mean difference.

Suppose now that we also had evidence from a new study on the same patient population which compared treatment C with B, giving an MD of −1.8 95% CI (−3.66 to 0.06). Traditional hierarchies of evidence state that estimates from direct head-to-head RCTs provide the ‘best available’ evidence of intervention effects. Should we now discard the indirect evidence? Or perhaps we should prefer the indirect evidence since it suggests a statistically significant effect? To do either is contrary to the principle of using all relevant evidence for decision making.

NMA relies on the same assumptions underlying pairwise meta-analysis, that is, the included studies are sufficiently homogenous in terms of the condition being studied, the included participants and the definition of active and control interventions. In other words, we are assuming that the effects of B versus A and C versus B that would have been observed if the C versus A RCT had included all three treatments, is the same as that observed in the B versus A RCT (apart from sample variability). This assumption is the basis for coherent decisions whether they involve two or more treatments. One way to empirically check this is to ask: ’given the known study and participant characteristics, if all these studies compared the same two treatments, would it be suitable to combine them in a meta-analysis?' If the answer to this is yes, and the only distinction is that instead the studies compare different sets of interventions, then the assumption of ‘sufficient homogeneity’ is, in principle, satisfied.

Because NMA pools the relative treatment effects estimated across RCTs, within-trial randomisation is preserved. As long as the interventions of interest form a connected network of comparisons, then relative effects of each intervention compared with every other can be obtained, along with estimates of their uncertainty (eg, 95% CIs).

Tocolytic therapies of interest

Interventions | |

1 | Placebo/control |

2 | Prostaglandin inhibitors |

3 | Magnesium sulfate |

4 | Betamimetics |

5 | Calcium channel blockers |

6 | Nitrates |

7 | Oxytocin receptor blockers |

8 | Alcohol/ethanol |

9 | Other treatments |

Network plots for (A) perinatal death and (B) estimated gestational age at delivery. The size of the circles is proportional to the number of patients randomised to each intervention and width of the lines is proportional to the number of studies making each comparison. Data from the National Institute for Health and Care Excellence guideline.

The first network (

NMA simultaneously combines the relative treatment effects estimated within each study while accounting for the individual treatments being compared and correctly incorporating studies with more than two arms. Fixed or random effects models can be fitted, the latter allows for between-study heterogeneity. NMA random effects models usually assume that the between-study heterogeneity is the same across all comparisons, that is, a single measure of heterogeneity is calculated across the whole network, although models allowing for different heterogeneity for each comparison can also be fitted.

As a statistical model, NMA can be fitted using a frequentist or Bayesian approach.

When data are sparse, for example, for adverse or rare events, Bayesian methods have additional advantages such as the ability to better handle studies with zero cells and the potential for including any relevant prior information. However, for most NMAs only simple models are required and no prior information is used, with Bayesian approaches typically defining non-informative prior distributions for all treatment effect parameters, making results from frequentist or Bayesian analyses very similar. The main difference between these approaches is how results are presented. Results from a frequentist NMA will be presented as estimated relative effects (eg, MD, OR, etc) and a 95% CI, whereas results from a Bayesian analysis will be presented as summaries from the posterior distribution of the MD (or OR), which can be the mean or median MD (OR, etc) and their 95% credible interval (CrI).

Regardless of the framework used, the fit of the model to the data should be assessed, and in networks with both direct and indirect evidence contributing to estimates, the assumption of consistency should also be statistically assessed. This can be done by comparing the results obtained using the direct evidence alone with the results obtained using the indirect evidence alone,

Mean differences and 95% CrI for EGA at delivery (in weeks) from the pairwise and network meta-analyses

. | Placebo/control | Prostaglandin inhibitors | Magnesium sulfate | Betamimetics | Calcium channel blockers | Nitrates | Oxytocin receptor blockers |

Placebo/control | 2.32 (1.27 to 3.35) | 1.29 (0.29 to 2.27) | 1.25 (0.40 to 2.07) | 1.69 (0.69 to 2.66) | 1.65 (0.52 to 2.78) | 0.68 (−1.32 to 2.67) | |

Prostaglandin inhibitors | 3.27 (1.68 to 4.78) | −1.04 (−2.01 to –0.04) | −1.08 (−2.08 to –0.05) | −0.64 (−1.68 to 0.42) | −0.67 (−1.97 to 0.67) | −1.65 (−3.76 to 0.52) | |

Magnesium sulfate | −0.14 (−1.60 to 1.28) | −0.23 (−1.45 to 0.97) | −0.04 (−0.99 to 0.91) | 0.40 (−0.51 to 1.31) | 0.36 (−0.88 to 1.63) | −0.61 (−2.69 to 1.50) | |

Betamimetics | 1.91 (0.90 to 2.90) | −1.56 (−3.42 to 0.28) | −0.19 (−2.78 to 2.45) | 0.44 (−0.32 to 1.20) | 0.40 (−0.54 to 1.37) | −0.57 (−2.58 to 1.47) | |

Calcium channel blockers | na | −0.53 (−2.32 to 1.25) | −0.02 (−1.25 to 1.22) | 0.80 (−0.08 to 1.67) | −0.03 (−1.16 to 1.10) | −1.01 (−2.98 to 0.99) | |

Nitrates | 0.17 (−1.72 to 2.06) | na | na | −0.58 (−0.47 to 1.67) | 1.30 (−1.07 to 3.68) | −0.98 (−3.15 to 1.21) | |

Oxytocin receptor blockers | 0.90 (−1.74 to 3.53) | na | na | na | −1.21 (−3.66 to 1.23) | na |

The upper diagonal displays the mean differences for the column intervention vs the row intervention, derived from the NMA. Values >0 favour the column defining intervention. The lower diagonal displays the mean differences for the row intervention vs the column intervention, derived from direct comparisons only. Values >0 favour the row defining the intervention.

Adapted from the National Institute for Health and Care Excellence guideline.

Crl, credible interval; EGA, estimated gestational age; na, not available; NMA, network meta-analysis.

The empty cells in the lower diagonal denote that no direct evidence was available for that comparison (eg, calcium channel blockers vs placebo), whereas the NMA can make all the comparisons and show that there is evidence of an increase in EGA at delivery for all interventions compared with placebo, except oxytocin receptor blockers (MD 0.68, 95% CrI −1.32 to 2.67).

Triangle tables can also be used to report two different outcomes, with one in the top half and the other in the bottom half. This can be a good way to display results from two important outcomes, for example, effectiveness and acceptability

Posterior rank statistics and probabilities for the outcome EGA at delivery

Interventions | Rank statistics | Probabilities | |||

Mean | Median | 95% CrI | Best | Top 3 | |

Placebo/control | 6.74 | 7 | (6 to 7) | 0.00 | 0.00 |

Prostaglandin inhibitors | 1.38 | 1 | (1 to 4) | 0.74 | 0.97 |

Magnesium sulfate | 4.26 | 4 | (2 to 6) | 0.01 | 0.28 |

Betamimetics | 4.48 | 5 | (2 to 6) | 0.00 | 0.16 |

Calcium channel blockers | 2.84 | 3 | (1 to 5) | 0.07 | 0.76 |

Nitrates | 3.04 | 3 | (1 to 6) | 0.13 | 0.65 |

Oxytocin receptor blockers | 5.27 | 6 | (1 to 7) | 0.05 | 0.19 |

An alternative way of reporting ranks is to consider the cumulative probabilities using the surface under the cumulative ranking curve (SUCRA),

All probabilities, SUCRA values and rankings should be interpreted with caution as they are very sensitive to the uncertainty in the relative treatment effects used to produce them. Measures or displays which capture this uncertainty such as a table of rank statistics with 95% CrI (

Importantly, all results should be interpreted taking into account the uncertainty in the estimates (conveyed by the 95% CrI) as well as the risk of bias in the included evidence. Tools that allow an examination of the impact of studies at risk of bias,

When more than two interventions are being considered, synthesis of RCTs using an NMA will ensure that all the relevant evidence, whether direct or indirect, is used to produce coherent estimates of the relative effects of every intervention compared with every other. This allows for more efficient use of the relevant evidence, which can increase the precision of the estimates. In addition, because multiple sources of evidence are used, the final estimates are more robust than if only direct sources of evidence were included, that is, they are less likely to be influenced by the inclusion or exclusion of a single trial. The underlying assumption is that there are no participant or study characteristics that would modify the relative treatment effect of each treatment compared with every other.

Relying on multiple pairwise meta-analyses, each including a different set of trials may lead to incoherent decisions and does not make the best use of the available evidence.

It is important to display NMA results carefully to aid interpretation and to clinically and statistically assess the plausibility of the assumptions made.

SD drafted the manuscript, DMC contributed to the manuscript. Both authors approved the final version.

SD was funded by the Medical Research Council, UK (MR/M005232/1). DC was supported by The Centre for the Development and Evaluation of Complex Interventions for Public Health Improvement (DECIPHer), a UKCRC Public Health Research Centre of Excellence. Joint funding (MR/KO232331/1) from the British Heart Foundation, Cancer Research UK, Economic and Social Research Council, Medical Research Council, the Welsh Government and the Wellcome Trust, under the auspices of the UK Clinical Research Collaboration, is gratefully acknowledged.

None declared.

Not required.

Commissioned; externally peer reviewed.