Article Text

Validation of a machine learning algorithm for identifying infants at risk of hypoxic ischaemic encephalopathy in a large unseen data set
  1. Anne L Murray1,2,
  2. Daragh S O’Boyle2,
  3. Brian H Walsh1,2,3,
  4. Deirdre M Murray2,3
  1. 1 Cork University Maternity Hospital, Wilton, Cork, Ireland
  2. 2 INFANT Centre, Paediatric Academic Unit, Cork University Hospital, Wilton, Cork, Ireland
  3. 3 Department of Paediatrics and Child Health, University College Cork, Cork, Ireland
  1. Correspondence to Dr Deirdre M Murray; d.murray{at}ucc.ie

Abstract

Objective To validate a hypoxic ischaemic encephalopathy (HIE) prediction algorithm to identify infants at risk of HIE immediately after birth using readily available clinical data.

Design Secondary review of electronic health record data of term deliveries from January 2017 to December 2021.

Setting A tertiary maternity hospital.

Patients Infants >36 weeks’ gestation with the following clinical variables available: Apgar Score at 1 min and 5 min, postnatal pH, base deficit, and lactate values taken within 1 hour of birth

Interventions Previously trained open-source logistic regression and random forest (RF) prediction algorithms were used to calculate a probability index (PI) for each infant for the occurrence of HIE.

Main outcome Validation of a machine learning algorithm to identify infants at risk of HIE in the immediate postnatal period.

Results 1081 had a complete data set available within 1 hour of birth: 76 (6.95%) with HIE and 1005 non-HIE. Of the 76 infants with HIE, 37 were classified as mild, 29 moderate and 10 severe. The best overall accuracy was seen with the RF model. Median (IQR) PI in the HIE group was 0.70 (0.53–0.86) vs 0.05 (0.02–0.15), (p<0.001) in the non-HIE group. The area under the receiver operating characteristics curve for prediction of HIE=0.926 (0.893–0.959, p<0.001). Using a PI cut-off to optimise sensitivity of 0.30, 936 of the 1081 (86.5%) infants were correctly classified.

Conclusion In a large unseen data set an open-source algorithm could identify infants at risk of HIE in the immediate postnatal period. This may aid focused clinical examination, transfer to tertiary care (if necessary) and timely intervention.

  • Neonatology
  • Neurology
  • Intensive Care Units, Neonatal

Data availability statement

Data may be obtained from a third party and are not publicly available.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Early diagnosis of hypoxic ischaemic encephalopathy is essential for timely initiation of therapeutic hypothermia.

WHAT THIS STUDY ADDS

  • This study validates a previously developed machine learning algorithm using real-world data.

  • This algorithm may aid early diagnosis and can be used to quickly identify those who should have a detailed neurological or neurophysiological monitoring to detect and grade encephalopathy.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • This algorithm may be useful to aid decision-making for timely transfer of infants to cooling centres for assessment and possible treatment.

Background

Hypoxic ischaemic encephalopathy (HIE) is one of the leading causes of morbidity and mortality in infants worldwide.1 It has an incidence of 1.5 cases per 1000 in high-income countries and 10–20 cases per 1000 in low-income and middle-income countries.1–3 Globally, 23% of infant mortality can be attributed to HIE and those who survive are at risk of adverse neurological outcomes, such as cerebral palsy and epilepsy, as well as behavioural and intellectual disabilities.4–6

Therapeutic hypothermia (TH) is currently the only treatment available to mitigate the damage caused by HIE, and it has been repeatedly confirmed to reduce cerebral injury and adverse neurological outcomes in cases of moderate and severe encephalopathy.1 7 8 Evidence suggests that TH is most beneficial when initiated within 6 hours of birth. Earlier initiation before 3 hours of life has been suggested to further enhance neurodevelopmental outcomes in affected infants.9–11 Therefore, the timely identification of HIE and accurate classification of its severity is of vital importance.

Diagnosis of HIE is difficult, as is the choice to initiate TH. Decision-making is often based on the neurological examination of the infant, which is known to change and evolve over time.12–14 Furthermore, the examination used may influence eligibility for TH, and variation between even experienced clinicians may impact the interpretation of the exam findings.15 16 According to the American College of Obstetricians and Gynecologists consensus statement on neonatal encephalopathy (NE), knowledge gaps still preclude a definitive test or set of markers that accurately identify an infant in whom NE is due to an acute intrapartum event.17

Machine learning (ML) can be defined as a branch of artificial intelligence in which computer software learns to perform a task by being exposed to representative data.18 Using training data, the ML algorithm is able to create a set of rules which can be used as predictors for new data with similar characteristics. These algorithms should then be tested with independent data not used in the development of the model.19 For our model, the training and testing phases have been previously published.20 21 The use of ML in clinical decision-making has the potential to positively affect the lives of patients and enhance our understanding of human disease.22

Logistic regression (LR) and random forest (RF) are two types of ML algorithms. LR is commonly used when examining binary outcome and can be used to study the effect of multiple independent variables on an outcome of interest.23–25 RF is a form of ML that is becoming a common artificial intelligence (AI) tool in the medical field.24 26 Using multiple models when examining data sets is commonplace; if one algorithm outperforms the other in one metric, it may lose in another metric.27

The aim of this study was to validate the previously reported ML algorithms to identify infants at risk of HIE immediately after birth using real-world, readily available clinical data.

Methods

This was a retrospective review of electronic health record data of all term deliveries in Cork University Maternity Hospital during a 5-year period from 1 January 2017 to 31 December 2021. This hospital is a tertiary unit with approximately 7500 deliveries annually. Eligible infants were identified using the hospital’s electronic health record. Inclusion criteria were all infants ≥36 weeks who had a blood gas drawn within 1 hour of birth. Infants were excluded if they were <36 weeks gestation or were missing key data points for the decision support algorithm; Apgar Score at 1 min or 5 min, postnatal pH, base deficit (BD), or lactate. Infants <36 weeks gestation were not included given current consensus guidelines on the use of TH extend only to infants ≥36 weeks gestation.17

Health records of all infants were examined for a diagnosis of HIE. Initial diagnosis was assigned following a contemporaneous prospective clinical exam based on a modified Sarnat Score.13 28 This exam consists of six domains, each assigned a stage of 1–3. Severe NE was defined as having ≥3 moderate/severe domains with more severe than moderate domains; moderate NE was defined as having ≥3 moderate/severe domains with more moderate than severe domains; and mild NE was defined as having one or more abnormal domains, but not meeting definition for moderate or severe NE. This was carried out by a trained registrar (resident or fellow) or consultant and the examination template did not change during the study period. All staff are provided training in this modified Sarnat Score by an experienced consultant when starting their rotation and it is local policy to assess for NE severity after 1 hour of age. Annually all encephalopathy cases are reviewed by two experienced senior clinicians, who were blinded to the outcomes of the infants, to ensure that fidelity of grading is maintained.29 All infants who did not have a diagnosis of HIE were assigned to the non-HIE group and their diagnoses were also noted.

Previously trained open-source LR and RF prediction algorithms (https://www.infantcentre.ie/predictionapp.html) were used to calculate a Probability Index (PI) for each infant for the occurrence of HIE. These models were trained and assessed on separate data sets using the BiHIVE2 cohort of infants with signs of perinatal asphyxia with and without HIE matched against healthy controls.20 21 During development, multiple variables were examined to find those most predictive of HIE. The algorithms require the five variables listed above (Apgar score at 1 min or 5 min, postnatal pH, BD and lactate) to be inputted to calculate the PI for HIE. Each model provided a predicted probability for the development of HIE from 0 to 1, with higher values indicating a higher risk of HIE. The development data sets and this validation data set are from the same/similar healthcare settings, with consistent criteria for the clinical diagnosis of NE.

Statistical analysis

Categorical data were described using frequencies and percentages. Continuous data were described using medians and IQRs. Differences between groups were evaluated with the Mann-Whitney U test for continuous non-parametrical data, or the χ2 test for categorical data. The area under the receiver operating characteristics (AUROC) curve was used to investigate the predictive performance of the models. SPSS V.28 was used and a value of p<0.05 was considered statistically significant. Calibration curves (online supplemental figure S1) were created using R studio V.24.04.02 following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD)+AI guidelines (research checklist).

Results

Over the 5-year study period, 1191 infants ≥36 weeks gestation had a blood gas measurement in the first hour of life. Of these, 1081 had a complete data set available. In total, 110 infants were excluded, all due to missing data. The groups differed by birth weight (p=0.04), mode of delivery, with a higher proportion of prelabour caesarean sections recorded in the non-HIE group (p=0.005), and by type of blood gas, with more arterial gases in the HIE population (p<0.001) (table 1).

Table 1

Demographic details of the study population

Of the 1081 infants, 76 (7%) had a diagnosis of HIE and 1005 (93%) did not have HIE. Of the 76 infants with HIE, 37 were clinically classified as mild, 29 as moderate and 10 as severe. 46 (60.5%) of the HIE group received TH (7 (19%) mild, 29 (100%) moderate and 10 (100%) severe) (table 2).

Table 2

HIE group by severity

The primary diagnoses for the non-HIE infants were respiratory presentations, such as transcient tachypnoea of the newborn and respiratory distress syndrome (44%), perinatal asphyxia without encephalopathy (23%), and suspected infection (16.5%). Twenty-six infants’ diagnoses were classed as miscellaneous, and these included infants with rashes and neonatal admissions due to maternal health (table 1). The excluded infants had a similar range of diagnoses (online supplemental table S1).

For the RF model, the AUROC (CI) for prediction of HIE was 0.926 (0.893–0.959, p<0.001) (figure 1). The median (IQR) PI in the HIE group was 0.70 (0.53–0.86) vs 0.05 (0.02–0.15), (p<0.001) in the non-HIE group (figure 2). With the RF model, using a PI cut-off of 0.30 to optimise sensitivity, 936 of the 1081 (86.5%) infants were correctly classified by the algorithm (sensitivity=86.8%, specificity 86.6%). Of the infants with HIE, 66/76 (86.8%) were correctly identified. In addition, 135 of the 1005 were incorrectly classified as HIE (13.4%) (online supplemental table S2).

Figure 1

The area under the curve (AUC) of all study samples using the random forest model and the logistic regression model.

Figure 2

Median Probability Index (PI) in the non-HIE versus HIE groups. HIE, hypoxic ischaemic encephalopathy.

Using the LR model, the PI was calculated in the HIE versus non-HIE cohorts. The AUROC (CI) was 0.928 (0.892–0.964, p<0.001) (figure 1). The median (IQR) PI for HIE was 0.71 (0.49–0.93) vs 0.05 (0.03–0.12), (p<0.001) in the non-HIE group. Using a PI cut-off of 0.3, 955 of the 1081 (88.3%) infants were correctly classified by the algorithm (sensitivity of 84.2% and specificity of 88.7%). Of the infants with HIE, 64/76 (84.2%) were correctly identified. In addition, 114 of the 1005 were incorrectly classified as HIE (11.3%).

The RF model, using the cut-off threshold of 0.30, identified 78.4% of mild HIE, 93% of moderate HIE and 100% of severe HIE cases (figure 3). Of the eight cases of mild HIE that were incorrectly classified as non-HIE, none had seizures and none received TH. In the two cases of moderate HIE that were incorrectly classified, one had clinical and electrographic seizures and also had an abnormal magnetic resonance imaging (MRI) of brain in the newborn period (single punctate area of high signal within deep white matter on conventional and diffusion imaging). The other moderate case had concern for clinical seizures but did not have electrographic seizures and had a normal MRI brain prior to discharge. The LR model has slightly worse ability to identify cases: identifying 73% of mild HIE, 93% of moderate HIE and 100% of severe HIE cases using the same PI threshold cut-off of 0.30. Of the two extra cases of mild HIE that were incorrectly classified, neither had seizures, and neither were cooled, but one had an abnormal MRI brain (single punctate area on diffusion restriction).

Figure 3

HIE group median Probability Index by severity of HIE. HIE, hypoxic ischaemic encephalopathy; PI, Probability Index; RF, random forest.

Discussion

In this study we have validated the ability of our ML algorithm to correctly identify infants with HIE within the first hour of life from a large sample of over 1000 infants with a range of diagnoses requiring admission to a tertiary neonatal unit. This algorithm has the potential to provide a user-friendly and readily accessible support tool in early decision-making in HIE. Using standard clinical variables available in the majority of high-income and middle-income settings, over 86.5% of infants could have been correctly identified as HIE or non-HIE within minutes of birth. Overall, the RF model performed better with higher sensitivity of 86.6% compared with 84.2% with the LR model (both at the PI cut-off of 0.3). The area under the curve of both models was similar and a statistically significant difference between the groups was seen with both models.

When developing our algorithms, we employed two different ML models to explore their capabilities in the early prediction of HIE. LR is a parametrics-based statistical model used for binary classification, predicting categorical outcomes based on logarithmic function.23 30 Second, we used an RF model, an algorithm that can learn non-linear interactions by averaging the prediction of multiple decision trees and creating a single prediction output.26 30 Both models can generate a continuous probability score.31–33 While the LR model had less false positives, it missed two more cases of HIE compared with the RF model. As the priority in clinical practice is to identify all possible cases at risk of HIE, our preference for future work with this ML algorithm is to use the RF model.

Our algorithms have been developed through the prior study of a large cohort of carefully characterised infants, examining multiple factors, with detailed maternal, perinatal and postnatal data.20 21 The most predictive model was found to include the condition of the infant at birth (Apgar Score at 1 min), and the clinical and biochemical response to resuscitation (Apgar Score at 5 min, and postresuscitation metabolic pH, BD and lactate). During development, the algorithms were trained using all blood gas types. The use of blood gas measurements and Apgar Scores in the prediction of HIE have been used in the past, but many studies use these as dichotomous variables. In reality, each variable provides a proportionate risk. Although dichotomising exposures is a common occurrence to aid interpretation of results, there is an increased likelihood of false positives along with decreasing the extent of variability within each group.34 35 ML techniques allow the incorporation of the strength of each marker to aid more accurate prediction.20 While both the LR and RF models had good sensitivity for the prediction of HIE, both misclassified infants. Thus, we do not expect that clinicians would use the model’s output in isolation, but rather as a support tool that considers condition at birth and response to resuscitation when examining risk of HIE.

Our results show a statistically significant difference between HIE grades with both the LR and RF models. Although this highlights that our model could be used to aid with differentiation of the grade of HIE, ongoing use of serial clinical assessment and neuromonitoring is required for accurate assignment of HIE grade.

Both the LR and RF models described have the potential to be strong support tools for clinical decision-making, helping to identify infants who may benefit from specialist care and possibly neuroprotective strategies. Up to 60% of infants with HIE are born in non-tertiary facilities, and the challenges associated with clinical examination and its role in identification of infants with encephalopathy may be even more pronounced in centres with fewer cases.36–38 The tool is designed to act as a decision support tool, to identify infants at risk of HIE, triggering clinical assessment and senior review. This may aid appropriate risk stratification for transfer to tertiary and quaternary units. Unnecessary transfers represent a significant financial cost to health systems, with high costs also incurred by families, including financial, emotional and psychological costs.39 By using our models, communication between referral and receiving sites could be greatly improved along with supporting transportation stratification, with the PI offering a supportive objective measure of risk of HIE. We hope that in the future, this model will be further validated in varied populations, to allow the tool to be used across many different settings. Our hope is that the model will be used to identify infants early who are at risk of HIE and prompt clinicians to perform a neurological examination and consider further action, to allow infants to receive appropriate referral and timely treatment if required.

The strengths of this study include the large cohort of non-HIE infants with a wide variety of aetiologies with which to validate the use of the algorithm. We have shown that the prediction remains high across all grades of HIE. The clinical variables used in this model are routine clinical measurements that are familiar and attainable. Our model, which can be used with little training or expertise, offers a valuable objective support tool for decision-making. Unlike using these clinical variables in a binary fashion (eg, pH<7.0 or ≥7.0), the algorithm allows for the consideration of the weight of each variable itself, the weight of its value and the unique interaction between variables. By combining multiple clinical variables in this way, the algorithm can offer a more complete picture to aid identification of infants at risk of HIE. This may be especially helpful in smaller centres with less exposure to at-risk encephalopathic infants and may optimise early and targeted transfers, and ultimately, timely initiation of neuroprotective treatments, if indicated.

There were several limitations to this study. The recruiting centre was a tertiary facility in a high-income country. The model has not yet been validated in non-tertiary or international settings, or in middle-income or low-income settings. As this algorithm was designed to be used as a decision aid in the immediate postnatal period, we did not incorporate electroencephalographic (EEG) and MRI data into the diagnosis of HIE. Although this may have provided a more robust assessment of HIE, these data are not often available within the first hour of life, so only a clinical diagnosis of HIE was used in validating this prediction model. Our algorithm is open-source and available online. We encourage others to use it and validate its utility in their own local populations.

Conclusion

In a large unseen data set an open-source algorithm could identify infants at risk of HIE in the immediate postnatal period. When an infant has a PI Score ≥0.3, this should prompt a focused clinical examination and may aid in identification of infants who require transfer to tertiary care and timely neuroprotective intervention.

Data availability statement

Data may be obtained from a third party and are not publicly available.

Ethics statements

Patient consent for publication

Ethics approval

Ethical approval was granted prior to starting the study by the Clinical Research Ethics Committee of The Cork Teaching Hospitals (CREC Review Reference Number: ECM 4 (h) 13/04/2021).

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 26.
  28. 28.
  29. 26.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Correction notice This paper has been corrected since it was first published. The authors noticed two number errors in the main text of the document. In the Results, paragraph 1, line 3, ‘122’ has been corrected to 110. In Results, paragraph 2, line 1, ‘1008’ has been changed to 1005.

  • Contributors Guarantor: DMM. DMM had full access to the data in the study and takes responsibility for its integrity and the accuracy of the data analysis. Concept and design: all authors. Acquisition, analysis or interpretation of data: all authors. Drafting of the manuscript: ALM. Critical revision of the manuscript for important intellectual content: BHW, DMM; Statistical analysis: ALM, DSO’B, DMM. Administrative, technical or material support: ALM, BHW, DMM. Supervision: BHW, DMM.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.