Article Text
Abstract
Objective To compare the association of the severity categories of the 2001-National Institutes of Health (NIH), the 2018-NIH and the 2019-Jensen bronchopulmonary dysplasia (BPD) definitions with neurodevelopmental and respiratory outcomes at 2 and 5 years’ corrected age (CA), and several BPD risk factors.
Design Single-centre historical cohort study with retrospective data collection.
Setting Infants born between 2009 and 2015 at the Amsterdam University Medical Centers, location Amsterdam Medical Center.
Patients Preterm infants born at gestational age (GA) <30 weeks and surviving up to 36 weeks’ postmenstrual age.
Interventions Perinatal characteristics, (social) demographics and comorbidities were collected from the electronic patient records.
Main outcome measures The primary outcomes were neurodevelopmental impairment (NDI) or late death, and respiratory morbidity at 2 and 5 years’ CA. Using logistic regression and Brier scores, we investigated if the ordinal grade severity is associated with incremental increase of adverse long-term outcomes.
Results 584 preterm infants (median GA: 28.1 weeks) were included and classified according to the three BPD definitions. None of the definitions showed a clear ordinal incremental increase of risk for any of the outcomes with increasing severity classification. No significant differences were found between the three BPD definitions (Brier scores 0.169–0.230). Respiratory interventions, but not GA, birth weight or small for GA, showed an ordinal relationship with BPD severity in all three BPD definitions.
Conclusion The severity classification of three BPD definitions showed low accuracy of the probability forecast on NDI or late death and respiratory morbidity at 2 and 5 years’ CA, with no differences between the definitions.
- Respiratory
- Follow-Up Studies
- Intensive Care Units, Neonatal
- Neonatology
Data availability statement
Data are available upon reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
WHAT IS ALREADY KNOWN ON THIS TOPIC
Bronchopulmonary dysplasia (BPD) remains the most common complication of prematurity. Studies have shown that different BPD definitions have considerable difference in reported incidences, but no evident differences in discriminating performances for long-term neurodevelopmental and respiratory outcomes.
WHAT THIS STUDY ADDS
This historical cohort study with retrospective data collection cohort study shows that no current BPD definition is superior in classifying BPD severity, and that all definitions lack a good calibration, meaning that with every incremental increase in BPD severity, the risk of long-term respiratory and neurological outcomes does not equally increase.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
If deemed important by clinicians and researchers that BPD is diagnosed as an ordinal severity-based definition, future initiatives need to focus on the calibration as well as the discriminative predictive performance of these definitions.
Introduction
The validity of the current bronchopulmonary dysplasia (BPD) definitions in preterm infants is under debate.1 2 Improved neonatal care has led to increased survival of extremely preterm infants at earlier stages of lung development, triggering the development of various new BPD definitions.3–5 Studies have shown that these new and different definitions have considerable differences in the reported incidences of BPD, which may hamper valid comparison of BPD rates in benchmarking projects and interpretation of neonatal trials.1 6 7 To avoid this unwanted variation, it is imperative that the neonatal community adopts one uniform definition of BPD.
The optimal BPD definition should identify those infants at risk of long-term adverse neurodevelopmental and respiratory outcomes. As most current BPD definitions classify infants based on BPD severity using descriptive terminology (mild, moderate, severe) or grades (I–II–III), the level of BPD severity should ideally also correlate with the degree of risk of adverse outcomes. Furthermore, the level of BPD severity should also correlate with the level of exposure to known risk factors for BPD such as lower gestational age (GA), lower birth weight, duration of invasive mechanical ventilation (IMV) and supplemental oxygen. Some studies have shown this ordinal association between disease severity and its short-term risk factors as well as long-term adverse outcomes.8 9 However, to date, it is unclear if this association differs between the most recently published grade-based BPD definitions.3–5 9 This information could help select the currently best-performing BPD definition.
Therefore, the first aim of this study was to compare the association between BPD severity and long-term neurodevelopmental and respiratory outcomes at 2 and 5 years’ corrected age (CA) of three severity-based BPD definitions. The second aim was to investigate the association between BPD severity and perinatal characteristics and respiratory interventions.
Methods
Study population
In this single-centre historical cohort study with retrospective data collection, preterm infants with a GA less than 30 weeks, admitted within 24 hours after birth to the neonatal intensive care unit of the Emma Children’s Hospital Amsterdam University Medical Centers, between January 2009 and December 2015, and surviving up to 36 weeks’ postmenstrual age (PMA), were included.10 Severe congenital malformations and parental refusal to reuse clinical data were exclusion criteria.
Study outcomes
Infants were categorised by BPD severity at 36 weeks’ PMA following the diagnostic criteria of the 2001-National Institutes of Health (NIH), 2018-NIH and the 2019-Jensen definition.3–5 11 High-flow nasal cannula was implemented in daily practice in our unit in 2013. Our primary outcomes were the composite outcome of neurodevelopmental impairment (NDI) or late death (after 36 weeks’ PMA up until follow-up), and respiratory morbidity at 2 and 5 years’ CA. The specific definitions of the primary and secondary outcomes and how these were assessed can be found in the online supplemental material.
Supplemental material
Statistical analysis
We investigated the accuracy of the probability forecast (calibration) of the primary composite outcome NDI or late death and respiratory morbidity at 2 and 5 years’ CA.12 Patient characteristics for the cohort were described, and compared with those lost to follow-up using appropriate testing depending on their distribution. Multiple imputation was performed 10 times based on clinical characteristics and social economic status. Analyses were performed in each imputation set separately and combined using Rubin’s rules.13
For each separate BPD definition, a logistic regression with Firth’s correction was performed because of expected imbalances between the incidence of BPD per severity category.14 We calculated ORs with 95% CIs for each severity category of the three BPD definitions, using the no BPD group of each definition as the reference category. Next, we assessed the accuracy of the probability forecast (calibration) of the three BPD definition models with Brier score which is the mean squared deviation between the predicted probabilities and their respective outcomes. The Brier score can range from 0 for a perfect model to 0.25 for a non-informative model,15 and was calculated for each model. We compared the two recently published definitions with the 2001-NIH definition as reference category using a two-sided t-test with a null hypothesis. For our secondary objective, we calculated the median (IQR) or mean (SD) of the perinatal characteristics and respiratory interventions per BPD severity, and created three different logistic regression models showing the association between these parameters as predictors and the BPD severity of the three definitions as outcome. A p value of <0.05 was considered statistically significant. Statistical analyses were performed with R statistical software (V.3.6.3 for Windows) and RStudio (integrated development for R, Boston, 2020; R studio desktop 1.3.1093, package mice & logistf).
Results
Patient characteristics
During the study period, 777 infants were eligible. Of these infants, 162 infants (20.8%) died before 36 weeks’ PMA and 31 infants (4%) were not included because of either congenital abnormalities, no parental approval for use of data or admission to the research centre after 24 hours of life (online supplemental figure 1). The remaining 584 infants were classified according to the three BPD definitions. Long-term outcomes were assessed in 513 infants (87.8%) and in 380 children (65.1%) at 2 and 5 years’ CA, respectively.
Supplemental material
Supplemental material
The median GA of the study cohort was 28.1 weeks (IQR 26.7–29.0), the median birth weight was 1040 g (IQR 850–1240) and 55.3% were males (table 1). The cumulative incidence of any BPD classification was 38.9% for the 2001-NIH definition, 21.1% for the 2018-NIH definition and 38.9% for the 2019-Jensen definition.12 The incidence of NDI or late death was 18.0% and 37.3% at 2 and 5 years’ CA, respectively. Respiratory morbidity was present in 22.4% and 18.4% at 2 and 5 years’ CA, respectively. More details of the composite outcome of NDI and respiratory morbidity have been described in a previous publication.12 Differences in antenatal corticosteroids, caesarean section, persistent ductus arteriosus, surfactant and doxapram between infants who were lost to follow-up and those who were present at follow-up are described in online supplemental table 1.
Supplemental material
Neurodevelopmental and respiratory outcomes at 2 years’ CA
Logistic regression analysis of the 2-year data for both the 2001-NIH and the 2018-NIH definition showed that the infants with severe or grade 3 BPD, but not less severe forms of BPD, had a significantly increased risk of NDI or late death, compared with the no BPD group (table 2 and online supplemental figure 2). The 2019-Jensen definition showed that the infants classified as grade 2 and grade 3 BPD had a significant increased risk of NDI or late death at 2 years’ CA, compared with infants without BPD (table 2).
Supplemental material
Regarding the respiratory morbidity at 2 years’ CA, the analysis showed that only infants classified with severe BPD according to the 2001-NIH definition had a significant increased risk compared with infants without BPD (table 2 and online supplemental figure 2). The 2018-NIH and 2019-Jensen definition showed a significant risk of respiratory morbidity at 2 years’ CA for grade 2 and grade 3 BPD, compared with those without a BPD diagnosis (table 2).
All three logistic regression models resulted in comparable Brier scores for NDI or late death (0.169–0.171) and respiratory morbidity (0.198–0.199) at 2 years’ CA (table 3). Analyses showed limited Brier score differences (−0.0005 to 0.0018), resulting in no difference between the accuracy of the Firth’s logistic regression models for NDI/late death and respiratory morbidity at 2 years’ CA for 2018-NIH and the 2019-Jensen compared with the 2001-NIH definition (reference) (table 3).
Neurodevelopmental and respiratory outcomes at 5 years’ CA
Analysis at 5 years’ CA showed an increased risk of NDI/late death for infants with mild BPD and severe BPD following the 2001-NIH definition, compared with those without BPD (table 2 and online supplemental figure 3). The 2018-NIH definition showed an increased risk of NDI or late death at 5 years’ CA for infants with grade 2 and grade 3 BPD. In contrast to the 2-year data, the 2019-Jensen definition showed an increased risk for infants with grade 1 BPD, but not for grades 2 and 3.
Supplemental material
Regarding respiratory morbidity at 5 years’ CA, infants with severe or grade 3 BPD following 2001-NIH and 2018-NIH definition showed an increased risk for this outcome, compared with infants without BPD diagnosis. The 2019-Jensen definition showed no increased risk of respiratory morbidity at this time point compared with their peers without BPD diagnosis (table 2 and online supplemental figure 3).
In line with the Brier score results at 2 years’ CA, the three BPD definitions showed similar mediocre Brier scores at 5 years’ CA for NDI or late death (0.227 to 0.230) and respiratory morbidity (0.179), and limited differences (−0.0024 to – 0.0002), all of them testing non-significant (table 3).
Perinatal characteristics and respiratory interventions
The median GA had no stepwise decrease with increasing BPD severity in any of the BPD definitions, whereas the median birth weight did show a slight decrease with increasing BPD severity (online supplemental table 2). Only the 2001-NIH and 2019-Jensen definition showed an increase in small for GA percentage with increasing BPD severity. The median days of supplemental oxygen and days of IMV increased with increasing BPD severity in all BPD definitions. Ordinal regression analysis showed that GA and birth weight were significant risk factors for all BPD severity categories for all BPD definitions, with some exceptions presumably due to lack of power. The analyses showed that the ORs of GA, birth weight and small for GA between severity categories were very similar and without any signs of an ordinal decrease with increasing BPD severity. This is in contrast to the median days on supplemental oxygen and IMV showing an ordinal stepwise increase as BPD severity increases (online supplemental table 2).
Supplemental material
Discussion
This is the first study investigating the ordinal relationship between three different severity-based BPD definitions and long-term outcomes and risk factors. In this large single-centre cohort, we showed that the 2001-NIH, the 2018-NIH and the 2019-Jensen definition have similar mediocre accuracy of the probability forecast (calibration) for neurological and respiratory outcomes at 2 and 5 years’ CA. Likewise, perinatal characteristics and respiratory interventions have similar associations with the three BPD definitions. An ideal grade-based or severity-based definition predicts an adverse outcome with higher odds with every increasing grade or disease severity, and therefore several findings need to be discussed. First, the logistic regression analyses showed no clear ordinal stepwise risk increment with each more severe BPD category in the three separate BPD definitions for neurodevelopmental and respiratory outcomes. Only the infants with the most severe grade of BPD in all three definitions had a significant increased risk of NDI or respiratory morbidity, compared with the infants with no BPD. However, there are three exceptions to this statement. First, infants with grade 2 BPD following the 2019-Jensen definition also showed a significant increased risk of the NDI and respiratory morbidity, and the 2018-NIH grade 2 BPD definition for 2-year respiratory morbidity. It is unknown what the clinical relevance of this finding is since the comparisons between the definitions using the Brier scores showed similar accuracy in the probability forecast (calibration) for these long-term outcomes. Second, at 5 years’ CA, we found no significant association between the infants with grade 3 BPD following the 2019-Jensen definition compared with the infants without the BPD diagnosis. We speculate that this might be due to the lack of infants in our cohort (4%) with this severe grade of BPD, defined as needing IMV at 36 weeks’ PMA. Finally, our analyses at 5 years show some conflicting results compared with the 2-year data (table 2). A possible reason might be that the current neurodevelopmental tests at 2 years’ CA only detect the most severe cases of NDI, resulting in a strong association between BPD severity and NDI.16 However, as previously shown, the motor and cognitive functions are increasingly being challenged (‘growing into deficit’) over time, so an NDI can become more apparent, resulting in a higher prevalence of NDI and less power to detect an association at 5 years’ CA.17 In our study, we employed Bayley-III scores for the classification of NDI. It is widely recognised that the Bayley-III may lead to underestimation of developmental delay. Consequently, we have opted to use a cut-off threshold of −1 SD instead of −2 SD to better capture and address these delays.
All three BPD definitions had comparable Brier scores for both long-term outcomes at both time points. After formal statistical testing, none of the definitions appeared to be superior in probability forecasting (calibration) of the long-term outcomes. But even more striking were the high Brier scores ranging from 0.169 to 0.230, which classify the definitions as non-informative and ill-calibrating.18 The Brier score, like accuracy and precision, is an essential part in prediction research and shows the calibration performance of a prediction model. A possible explanation for this result might be that none of the definitions under investigation were derived using calibration, but only accuracy.5 6 Our study showed that the additional value of the currently used classification of preterm infants into three severity categories of BPD to improve risk stratification for the outcomes NDI or respiratory morbidity might be limited. Risk stratification is important for long-term prediction to inform parents on the possible outcomes their children might encounter.
In the current study, we show that only the most severe forms of BPD have a significant increased risk of poor outcome, which is in line with another retrospective cohort study.19 However, in contrast to our study, an ordinal increased risk association for respiratory morbidity was shown, which might be explained by the profound BPD incidence (96.8%) and the lower incidence of the long-term respiratory morbidity in that study.20
Our study also showed a lack of incremental association between severity classification of three BPD definitions and well-established perinatal risk factors for BPD, such as GA, birth weight and small for GA.21 However, in line with other studies investigating the Jensen definition, an ordinal increased risk of BPD severity with increasing total days of supplemental oxygen and days of IMV was found.9 22 Unfortunately, infants diagnosed with grade 1 and grade 2 BPD were combined into one category in the study by Jensen et al, hampering further risk stratification and comparison with our results.
Some limitations of the current study need to be discussed. First, retrospective data from a single-centre cohort were used. Further nationwide or global studies are needed to confirm our results. Second, respiratory morbidity was determined at follow-up during outpatient visits and therefore recall bias may have occurred. Ideally, respiratory morbidity needs to also be assessed using lung function testing.23 Third, due to the COVID-19-related lockdown, some children were assessed by telephone interviews with their parents and therefore did not undergo neurodevelopmental testing at 5 years’ CA, which led to a higher loss to follow-up percentage. To correct for these missing data, we performed multiple imputation analyses to improve statistical power and minimise selection bias. Finally, some baseline differences were found between the infants lost to follow-up and those included, which may indicate selection bias. However, the impact on the comparative results is probably limited as the same groups were used to compare the three BPD definitions.
Despite these limitations, our study shows that none of these three BPD definitions is superior in classifying infants into severity categories based on the associations with long-term outcomes until 5 years’ CA. If deemed important to clinicians and researchers for BPD to be an ordinal severity-based definition, future initiatives need to focus on the calibration as well as the discriminative predictive performance of BPD definitions.
In conclusion, this historical cohort study with retrospective data collection shows that the 2001-NIH, the 2018-NIH and the 2019-Jensen definition show similar, but low accuracy of probability forecast for neurodevelopmental and respiratory outcomes at 2 and 5 years’ CA. In addition, the three BPD defintions show similar associations with perinatal characteristics and respiratory interventions. These results need to be confirmed in a large multicentre setting and if supported, the current BPD classifications need to be updated.
Data availability statement
Data are available upon reasonable request.
Ethics statements
Patient consent for publication
Ethics approval
This study was approved by the Institutional Review Board at the Amsterdam UMC (IRB Amsterdam UMC W22_403 # 22.478).
Acknowledgments
This article is part of the PhD thesis of the first author.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Contributors TAK contributed to the conception and design of the study, acquisition of data, analysis and interpretation of data, and drafting of the manuscript. SB, GJB, AAMWvK, HvL, CAML, MR, IAS, NCR, FV and EvS contributed to the acquisition of data and critical revision of the manuscript. CSHA-M and AGvW-L contributed to the acquisition of data, analysis and interpretation of data and critical revision of the manuscript. WO contributed to the conception and design of the study, acquisition of data, analysis and interpretation of data, and critical revision of the manuscript. WO serves as guaranteer of this paper. All authors approved the final version of the manuscript to be published and agree to be accountable for all aspects of the work.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.