Article Text

Download PDFPDF

Predictors of extubation readiness in preterm infants: a systematic review and meta-analysis
  1. Wissam Shalish1,
  2. Samantha Latremouille1,
  3. Jesse Papenburg2,
  4. Guilherme Mendes Sant’Anna1
  1. 1 Department of Pediatrics, Neonatal Division, McGill University Health Center, Montreal, Quebec, Canada
  2. 2 Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
  1. Correspondence to Dr Guilherme Mendes Sant’Anna, Department of Pediatrics, Division of Neonatology, Montreal Children’s Hospital, McGill University, Montreal, Quebec H4A 3J1, Canada; guilherme.santanna{at}


Context A variety of extubation readiness tests have already been incorporated into clinical practice in preterm infants.

Objective To identify predictor tests of successful extubation and determine their accuracy compared with clinical judgement alone.

Methods MEDLINE, Embase, PubMed, Cochrane Library and Web of Science were searched between 1984 and June 2016. Studies evaluating predictors of extubation success during a period free of mechanical inflations in infants less than 37 weeks’ gestation were included. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. After identifying and describing all predictor tests, pooled sensitivity and specificity estimates for the different test categories were generated using a bivariate random-effects model.

Results Thirty-five studies were included, showing wide heterogeneities in population characteristics, methodologies and definitions of extubation success. Assessments ranged from a few seconds to 24 hours, provided 0–6 cmH2O positive end-expiratory pressure and measured several clinical and/or physiological parameters. Thirty-one predictor tests were identified, showing good sensitivities but low and variable specificities. Given the high variation in test definitions across studies, pooling could only be performed on a subset. The commonly performed spontaneous breathing trials had pooled sensitivity of 95% (95% CI 87% to 99%) and specificity of 62% (95% CI 38% to 82%), while composite tests offered the best performance characteristics.

Conclusions There is a lack of strong evidence to support the use of extubation readiness tests in preterm infants. Although spontaneous breathing trials are attractive assessment tools, higher quality studies are needed for determining the optimal strategies for improving their accuracy.

  • intensive care
  • neonatology
  • respiratory

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

What is already known on this topic?

  • The decision to extubate preterm infants is currently subjective and relies primarily on clinical judgement.

  • There is wide variability in periextubation practices and high reintubation rates across neonatal intensive care units.

  • A variety of objective extubation readiness tests, such as the spontaneous breathing trial, are increasingly being incorporated into clinical practice.

What this study adds?

  • There is a lack of strong evidence to support the use of any predictor of extubation readiness in preterm infants over clinical judgement alone.

  • Although currently studied spontaneous breathing trials are highly sensitive, they add little benefit in the identification of extubation failures.

  • Higher quality studies are needed to determine the best strategies for improving the accuracy of such predictors.


Preterm infants commonly require intubation and mechanical ventilation (MV) after birth.1 Due to complications associated with MV, early extubation is generally recommended.2 3 However, premature extubation increases the risk of respiratory failure and reintubation, which also carries hazards.4 Therefore, both an early and successful extubation are desirable.

Currently, the decision to extubate relies primarily on clinical judgement, that is, the physician’s experience and interpretation of infants’ overall clinical stability.5 This subjective assessment has resulted in widely variable periextubation practices across neonatal intensive care units.5 For those reasons, clinicians have attempted to identify objective predictors of extubation readiness. Assessments done while patients receive invasive ventilatory support have been rather disappointing; mechanical inflations likely mask the infant’s ability to sustain breathing once disconnected from the ventilator.6 Instead, investigators have turned towards assessments of clinical and physiological parameters during a predetermined period free of mechanical inflations, either via endotracheal continuous positive airway pressure (ETT-CPAP) or through temporary disconnection from the ventilator. A variety of extubation readiness tests, particularly spontaneous breathing trials (SBT), have already been incorporated in clinical practice worldwide,5 but the evidence supporting their use has not been established. Thus, we performed a systematic review of the literature to identify predictor tests of successful extubation in preterm infants and determine their accuracy compared with clinical judgement alone.


A protocol was developed in conformity with standard guidelines on systematic reviews of diagnostic studies7 and reported using recommended Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines.8

Search strategy

A prespecified written protocol was designed with the help of medical librarians (online supplementary appendix 1). Articles in all languages between 1984 and June 2016 were searched within Ovid MEDLINE, Ovid Embase, PubMed, Cochrane Library and Web of Science. References of articles assessed for eligibility were hand-searched for additional relevant studies.

Supplementary file 1

Study selection

After removing duplicates, the title and abstract of all articles were screened by one investigator (WS). Studies were eligible for full-text review if they met the following predetermined criteria: (1) study population included preterm infants <37 weeks’ gestation; (2) topic was about extubation readiness and/or extubation success/failure; and (3) full text was available. Animal or in vitro studies, review articles, conference proceedings, case reports and commentaries were excluded. Once abstracts were identified, two independent investigators (WS and SL) reviewed the articles for eligibility. Only studies that specifically evaluated potential predictors or tests of extubation readiness during a period free of mechanical inflations were included. Any discrepancies regarding final inclusions were resolved through discussion with a third reviewer (GMS).

Data extraction

Two investigators (WS and SL) independently extracted all information using a standardised piloted data collection form.

Population characteristics: the study inclusion criteria and the cohort’s birth weight and gestational age (GA) were recorded. In cases where weight and GA were reported in subgroups, weighted averages were calculated to deduce the cohort’s mean values.

Reference standard: extubation was defined as the reference standard and was based on the treating physician’s clinical judgement, routine institutional practices or study-specific criteria. A note was made of the ventilator mode, settings and blood gas ranges when infants were deemed ‘ready’ for extubation, as well as type of postextubation respiratory support provided.

Index test: the index test referred to the extubation readiness assessment under evaluation. The duration, level of endotracheal positive end-expiratory pressure (PEEP) and types of physiological measurements and/or clinical observations performed during the assessment were recorded. The results of all predictors of extubation success evaluated during that assessment were also abstracted; some studies reported the means or medians in patients who were successfully and unsuccessfully extubated. Others defined a diagnostic test (using thresholds or composite definitions) and reported its sensitivity, specificity, predictive values, accuracy or area under the receiver operating characteristic (ROC) curve.

Target condition: the primary definition and time frame used to classify infants into extubation success or failure were recorded. The proportion of infants that were successfully extubated was also noted for all definitions and time frames provided.

Assessment of risk of bias

Two reviewers (WS and SL) assessed the methodological quality of included studies using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool.9 A narrative summary was produced outlining whether the studies had low, high or unclear risk of bias and any applicability concerns.

Data synthesis and analysis

A descriptive analysis was first conducted on all identified predictors of extubation readiness. Distinction was made between predictor tests that were incorporated into clinical practice (ie, extubation on the premise of passing the test) versus those evaluated by cross-sectional design (ie, tests were performed but did not guide extubation). Meta-analysis was only possible for studies in which one or more predictor test was defined and evaluated by cross-sectional design. From the available data, 2×2 tables were constructed to derive sensitivity/specificity and generate coupled forest plots (Review Manager 5.3). A ‘cross-hairs’ plot was also produced (R V.3.1.0) to better display the variability in ROC space between sensitivity/specificity estimates.10 Wherever appropriate, pooled estimates of sensitivity and specificity were computed for the different types of predictor tests. Subgroups with ≥5 evaluations of the test were analysed using the bivariate random-effects model (‘metandi’ module, Stata, V.10), while those with 2–4 evaluations could only be analysed using a univariate model (Meta-DiSc software, V.1.4). A hierarchical summary ROC curve was planned to be constructed whenever more than five evaluations of the predictor test could be pooled.


Our search strategy yielded 3052 abstracts, out of which 207 full-text articles were reviewed and 35 included for analysis (figure 1).11–44 ,45 A detailed outline of the quality of each included study is available in online supplementary appendix 2. Of note, all but three studies had at least two or more domains from the QUADAS-2 evaluation with unclear or high risk of bias and applicability concerns.

Supplementary file 2

Figure 1
Figure 1

PRISMA flow diagram. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

The overall characteristics of the included articles are shown on table 1 and expanded in online supplementary appendix 3. There were 12 randomised controlled trials (RCTs), 22 prospective observational and 1 retrospective studies. Sample sizes were small (median 49 patients, IQR 35–59) and mostly single centre. All assessments of extubation readiness were performed once the patient was deemed ‘ready’ for extubation, ranged anywhere from a few seconds to 24 hours and used PEEP levels between 0 cmH2O and 6 cmH2O. Infants were exposed to different periextubation practices, weaning strategies and postextubation respiratory support modalities. Extubation success was described using varying definitions and time frames ranging from 24 hours to 120 hours of observation after extubation.

Supplementary file 3

Table 1

Overall characteristics of included studies

Eighteen studies evaluated at least one index test by cross-sectional design (table 2). The most commonly investigated parameters related to tidal volume, spontaneous minute ventilation and respiratory muscle function. The majority of variables failed to classify infants into their respective extubation outcomes, except for some measures of minute ventilation and diaphragmatic function. From these variables, a large number of predictor tests were derived (online supplementary appendix 4). Test definitions were highly variable across studies and were divided into three categories: physiological, clinical and composite tests. Clinical tests defined extubation success/failure based on a combination of clinical events (apnoeas, bradycardias and desaturations) and/or blood gases. The assessment periods were either short (≤30 min), intermediate (1 hour) or prolonged (4–24 hours). Composite tests combined two or more predictors instead of evaluating each component separately. These included tests combining SBT with variability indices of breathing, assessments of the load/capacity ratio of inspiratory muscles or cardiorespiratory signal analysis.

Supplementary file 4

Table 2

Predictors of extubation readiness

Thirteen studies had at least one diagnostic test for which 2×2 tables could be constructed, resulting in 31 predictor tests included in the meta-analysis. As illustrated on the forest plots (figure 2) and ‘cross hairs’ plot (online supplementary appendix 5), predictor tests had high sensitivity but low and variable specificity. Pooled sensitivities and specificities of the different tests are shown on table 3. Minute ventilation-related tests had pooled sensitivity and specificity of 84% (95% CI 77% to 90%) and 71% (95% CI 57% to 83%), while SBTs had pooled sensitivity and specificity of 95% (95% CI 87% to 99%) and 62% (95% CI 38% to 82%), respectively. Compared with individual tests, composite tests had higher sensitivities and specificities, with more balanced tradeoffs between the two values. Given the limited number of studies evaluating each type of predictor test, no hierarchical summary ROC could be generated.

Supplementary file 5

Figure 2
Figure 2

Forest plots of sensitivity and specificity of predictor tests of extubation readiness. Forest plots of sensitivity and specificity of diagnostic tests of extubation readiness performed during a period free of mechanical inflations. Data are presented in order of increasing sensitivity for each test subgroup. CRS, compliance of the respiratory system; FN, false negative; FP, false positive; MIP, maximum inspiratory pressure; MVm, mechanical minute ventilation; MVs, spontaneous minute ventilation; RR, respiratory rate; SBT, spontaneous breathing trial; Te, expiratory time; Ti, inspiratory time; TN, true negative; TP, true positive; TN, true negative; TTIdi, diaphragmatic pressure-time index; TTmus, tension time index of respiratory muscles; Ttot, total breath time; VI, variability index; Vt/Ti, mean inspiratory flow; VT, tidal volume.

Table 3

Pooled results for different predictors of successful extubation

Finally, 20 studies extubated infants on the basis of passing a predictor test; only five were evaluated using a RCT.11 12 21 23 29 Four RCTs examined the usefulness of prolonged ETT-CPAP trials (4–24 hours) compared with direct extubation from low ventilatory settings, showing no added benefits and possible harm when ETT-CPAP was used for several hours. In the most recent RCT, outcomes were compared between infants extubated after passing a minute ventilation test compared with clinical judgement alone.29 Although infants receiving the test were extubated significantly sooner, there were no statistically significant differences in extubation success rates between both groups. As for SBTs, four studies have reported using the test as part of routine practice, reporting extubation success rates between 67% and 78%.33 35 39 42 In the largest study, the performance of daily 3 min SBTs was compared with a historical cohort of infants extubated based on clinical judgement alone.33 Although infants in the SBT group were extubated from significantly higher ventilator settings, they had similar weaning durations and extubation success rates compared with controls.


To our knowledge, this is the first systematic review appraising the evidence for using extubation readiness tests in preterm infants. The majority of identified studies were small, single centre and with significant risks of bias and applicability concerns. Assessments were done using heterogeneous methodologies and different definitions of extubation success, making it very difficult to infer any strong recommendation. From the meta-analysis, predictor tests had high sensitivity but low and variable specificity. For clinicians, this means that at the time a patient is deemed ‘ready’ for extubation, passing a test correctly identifies almost all patients that will have a successful extubation, but a significant proportion of infants that fail extubation would be misclassified by the test. In other words, predictors are great at reinforcing the clinician’s intent to extubate but add little to no value in detecting failures.

The fact that infants were only evaluated when deemed ‘ready’ for extubation by the clinicians introduces test-referral bias, whereby physicians’ own judgements of extubation readiness influenced which patients actually underwent the test. As such, systematically fewer patients with negative results and relatively more patients with positive results were tested, thereby overestimating sensitivity and underestimating specificity.46 47 Moreover, given that perceptions of extubation readiness can highly vary within and between studies, there is considerable heterogeneity in the pretest probability of extubation success, which in turn affects the results of diagnostic tests. Both phenomena can potentially impair internal validity and compromise generalisability of the predictor tests, as previously demonstrated.48

Despite the aforementioned limitations, some units have already incorporated predictors (especially SBTs) into daily practice as a way to promptly recognise an infant’s potential for extubation.5 28 33 Unfortunately, these tests are often interpreted differently and applied outside unit-specific guidelines.5 Although SBTs are attractive, their diagnostic accuracy has only been evaluated in two small single-centre studies, showing pooled sensitivity and specificity of 95% and 62%, respectively.31 40 Moreover, evidence from only two studies demonstrated that serial readiness tests did not affect extubation success rates.29 33 Contrary to neonates, the incorporation of extubation readiness tests into MV weaning protocols in adult and paediatric patients has been extensively studied,49 50 showing improved outcomes, reduced costs and decreased MV duration.51 52 Such level of evidence is still lacking in preterm infants, but with the rising number of neonatal units developing weaning protocols,53 understanding the role of those tests during that process is critical.

As demonstrated by our review, designing a predictor of extubation readiness in preterm infants is challenging. These infants are highly vulnerable and can fail extubation due to many reasons, including underdeveloped lungs, low lung compliance, high airway resistance and immature central respiratory drive. Ideally, the perfect test would accurately predict an infant’s ability to tolerate extubation by integrating all these factors and mimic their postextubation physiological conditions. As such, the choice of duration, level of support used and measurements performed during the test could considerably influence its accuracy.


A wide range of durations (few seconds to 24 hours) were noted in all studies. Original investigations performed ETT-CPAP trials of 6–24 hours, but this practice went out of style after mounting concerns of high airway resistance when breathing through an endotracheal tube.54 For this reason, more recent studies curtailed the time frame to 3–5 min. Nonetheless, short trials could potentially be misleading, as they may not provide sufficient time to ensure that the highest risk patients can sustain spontaneous breathing.

Level of support

An interesting change has occurred in the amount of PEEP provided during the assessment, from 0 cmH2O to 3 cmH2O to the currently adopted 5–6 cmH2O. This stems from observations that infants submitted to low ETT-CPAP levels were at higher risk of derecruitment and extubation failure.54 However, these same infants were kept at 2–3 cmH2O for 12–24 hours, a significantly prolonged duration that may have potentiated the loss of functional residual capacity. Evidence from adult and paediatric critical care patients suggests that PEEP can reduce patient efforts by 30%–60%, significantly decreasing the respiratory load in comparison with the expected work of breathing after extubation.55 56 Evidence for this is limited in neonates, but if the postextubation period is truly characterised by relatively high upper airway resistance, then the addition of PEEP may underestimate the true failure risk.

Clinical and physiological measurements

Researchers have mostly been interested in studying tests that rely on simple physiological measurements or bedside clinical observations because of their ease of use and convenience. Unfortunately, studies investigating these predictors individually have shown suboptimal results. This is not surprising, as it is unlikely that any single predictor would accurately encompass the entire spectrum of reasons for failing extubation. Consequently, studies have begun exploring more complex assessments, such as diaphragmatic function and automated biological signal analyses, to better describe the integrity and maturity of individuals’ intrinsic cardiorespiratory behaviour.43 In fact, the combination of multiple predictors resulted in the most favourable performance characteristics. Although promising, such tests are presently impractical for clinical use and deserve further investigation.

The review had some limitations. There are no established method for formally assessing publication bias in diagnostic studies,57 and it was not possible to perform a bivariate random effects model due to the small number of studies evaluating most predictors (this is the preferred method for meta-analysis of diagnostic studies, since it takes into consideration the tradeoff between sensitivity and specificity within individual studies).58 Nevertheless, the review had several strengths, including its permissive inclusion criteria, rigorous design and comprehensive data synthesis. Additionally, the review highlights some major gaps in the methodological quality of diagnostic studies of extubation readiness, emphasising the need to standardise the reporting process and achieve consensus on important outcomes of interest (eg, extubation success).59

In conclusion, there is a lack of strong evidence to support using extubation readiness tests in preterm infants. Current predictors have low overall accuracy and add little benefit in the identification of extubation failures. Although SBTs are attractive assessment tools, higher quality studies are needed to determine the best duration, level of PEEP and definition of test pass/failure to guide their use in the most vulnerable infants. Moreover, a combination of clinical and physiological measurements during such assessments may further improve their accuracy.


The authors would like to thank the librarians at the McGill University Health Center, Ibtisam Mahmoud, Elena Guadagno and Alex Amar for their assistance in performing the search strategy.



  • Contributors WS had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. WS and GMS participated in the concept and design of the study. WS, SL and GMS performed the study selection and data collection. WS and SL conducted the risk of bias assessment. WS, JP and GMS performed the analysis and contributed to interpretation of the data for the work. WS drafted the manuscript, and all authors critically revised the manuscript and approved the version to be published. All authors agree to be accountable for all aspects of the work.

  • Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.