Objectives To investigate the predictive value of the Clinical Risk Index for Babies (CRIB) score in current practise, the predictive value of blood lactate concentrations ([L]) and to develop a new clinical scoring system for very low birthweight (VLBW) babies.
Methods The predictive ability of CRIB, [L] and the development of the new score was based on retrospective data collected from all inborn VLBW babies born between March 2001 and February 2004 in a tertiary neonatal unit. Predictive ability was determined from area under the receiver operator curve (AUC). A new score was developed and validated with a second cohort of VLBW babies.
Results 408 babies were studied in the development cohort and 275 in the validation cohort. AUC for CRIB was 0.933 (95% CI 0.897-0.969). Initial [L] was significantly higher in babies who died than in those who survived (median (range) 9.2 (1.26–21.1) vs 3.64 (0.67– 17.9) mmol/l, p<0.0001) as was the highest [L] in the first 12 h (10.2 (3.37–26) vs 3.84 (1.05–20.7) mmol/l, p<0.0001). A new score was developed using; highest [L], gestation and the presence of life-threatening malformation. AUC for the new score was 0.918 (95% CI 0.876-0.961) in the development cohort and 0.859 (95% CI 0.805-0.913) in the validation cohort.
Conclusions CRIB score retains its predictive ability for mortality in VLBW babies. Early hyperlactataemia is a predictor of death in VLBW babies. The new score appears to perform as well as CRIB but requires fewer data items.
Statistics from Altmetric.com
The neonatal intensive care unit (NICU) provides continually improving specialised care to an increasingly more vulnerable group of infants. Birth weight, especially for very low birthweight (VLBW) infants, accounts for a substantial amount of the variation in neonatal mortality within and between countries.1 However, increasing evidence over previous decades2 has indicated that birth weight is insufficient to explain large variations in neonatal mortality among NICUs. Survival of VLBW infants depends on birth weight and gestational age but also on other perinatal factors and disease severity in the first hours of life.2 Illness severity scores were thus developed with the aim of quantifying the clinically obvious fact that infants of the same gestational age and birth weight differ in their risk of dying.3 4
What is already known on this subject?
▶ Prediction of the risk of mortality in a population of very low birthweight (VLBW) babies is necessary in order to make comparisons between neonatal units or across time.
▶ The Clinical Risk Index for Babies (CRIB) score, developed 20 years ago, is the most widely used clinical scoring system in use in UK.
▶ Blood lactate concentrations ([L]) have been shown to be associated with adverse outcomes in other patient groups
What this study adds
▶ Blood lactate concentration measurements in the first hours of life of VLBW babies have prognostic significance.
▶ The CRIB score retains its predictive ability for death in these babies despite being developed in a neonatal population almost 20 years ago.
▶ The Neonatal Illness Prognosis Indicator score incorporating the highest blood lactate concentration in the first 12 h of life, gestational age and the presence of life-threatening malformation has a similar predictive ability as CRIB.
Information about the risk status of a population is necessary to allow meaningful comparisons of outcomes for babies cared for in different places or at different times. A unit tending to treat only those infants with good prognoses would be expected to have a high rate of ‘good outcome’. Conversely, those treating infants with poor prognoses would expect a higher rate of ‘poor outcome’. As put by Poloniecki,5 risk adjustment tries to help answer the question ‘Is it you, Doc, or your patients, who are below average?’
A properly validated, risk-based, prognostic scoring system can be used to detect variations in the outcomes seen in different units and within the same unit across time to support quality improvement initiatives. A scoring system that identifies babies at high risk of adverse outcome could also be used to identify babies who may be suitable for enrolment into clinical trials of new therapies.
Several clinical scoring systems have been developed to identify babies with a high risk of death. The mostly widely used of these in the UK are Clinical Risk Index for Babies (CRIB)6 and CRIB-II.7 Other scoring systems include score for neonatal acute physiology (SNAP),8 SNAP-PE (SNAP's perinatal extension)9 and the premature risk evaluation score.10 These clinical scoring systems assign each baby a score depending on a number of physiological or demographic variables that have prognostic significance. The CRIB, SNAP and SNAP-PE scores were developed over a decade ago. Neonatal survival has improved over that period and the predictive value of the CRIB score may not have been maintained.
Measurement of blood lactate concentration ([L]) has become readily available since the development of these scoring systems. [L] is a potentially useful marker of oxygenation and circulatory sufficiency that may have prognostic significance in sick babies and its inclusion in a scoring system may be helpful.
The aims of this study were to investigate the prognostic value of CRIB score in a population of VLBW babies in the current era, to evaluate the prognostic ability of [L] measurements in VLBW babies, to investigate whether a clinical scoring system for VLBW babies using [L] could be developed and to compare its performance with that of the CRIB score.
The study setting was Liverpool Women's Hospital, a tertiary NICU. All babies admitted to the neonatal unit with a birth weight of less than 1501 g during the study period were included. This was a retrospective cohort study using analysis of routinely collected data. Only inborn babies were included. There were no exclusions within this group.
Accurate estimation of a sample size was not possible. We based our estimate on the ability to detect a reduction of the area under the receiver operator characteristic (ROC) curve (AUC) for the CRIB score from 90%, when first developed, to 80%, which we judged would be a clinically significant deterioration in performance of the test. The sample size estimation was based on the work of Hanley and McNeil on the meaning and use of an ROC curve.11 To detect this magnitude of deterioration in CRIB score performance with 95% CI would require a sample including 50 deaths. Based on the admission and mortality at LWH an adequate sample was expected by including a 3-year cohort of approximately 400 babies.
The score was developed on a cohort born between March 2001 and February 2004 and validated on a cohort born between January 2005 and December 2007.
The study was approved by the local medical research ethics committee.
Blood gas measurements and [L] measurements were made using our blood gas analyser (Siemens rapidlab 1200, Siemens, UK). The results of all blood gas and [L] measurements in the first 12 h of life were collected from the unit computer system for each individual. From these we derived several parameters; first [L] (first [L] obtained within the first 2 h of life), highest [L] (highest [L] obtained within the first 12 h of life), first [H+] (pH obtained within the first 2 h of life converted to hydrogen ion concentration (10−pH)) and first base excess (first BE obtained within the first 2 h of life). Other data items required to calculate CRIB and CRIB-II scores were collected and these scores were calculated from these data as per their design.
Other data items that may potentially contribute to a new score were also collected. Items chosen were those that were routinely available from retrospective data sources and considered to be of potential prognostic significance. Data collected included multiple birth, type of delivery, antenatal steroid use, the presence or absence of obstetric complications (antepartum haemorrhage, pre-eclampsia, pregnancy-induced hypertension, prolonged rupture of the membranes, maternal infection requiring treatment with antibiotics), presence of acutely life-threatening congenital malformations (requiring medical or surgical intervention in the immediate neonatal period, without which the infant would die), maximum resuscitation required at birth (including the need for intubation, external cardiac massage and the use of drugs), admission temperature and the need for blood pressure or respiratory support in the first 12 h of life.
The primary outcome measure was death before discharge. Secondary outcomes included the adverse outcomes of abnormal cranial ultrasound appearance (intraventricular haemorrhage with ventricular dilatation needing treatment, parenchymal haemorrhage or cystic periventricular leucomalacia), bronchopulmonary dysplasia (BPD) (oxygen requirement at 36 weeks corrected gestational age), retinopathy of prematurity (ROP) needing treatment, necrotising enterocolitis (NEC) (clinically suspected and treated—nil by mouth and antibiotics for more than 5 days or proven at surgery or postmortem).
All the variables for which data was collected were then considered individually and analysed for significance with relation to the outcome of death. Univariate analysis was performed using Mann–Whitney U test for continuous variables and χ2 test with Yates' correction (or two-tailed Fisher's exact test if an expected value was below 5) for categorical variables with death as the outcome variable.
A multivariate analysis of the data was then performed using only those variables that had been shown to be significantly associated with mortality on univariate testing in order to establish which variables were independently associated. This was performed using multiple logistic regression analysis and the inclusion of variables into the model was done in a stepwise fashion.
Variables which remained significantly related to death in the multivariate model were then used to construct a new score. A heuristic approach was used to develop the score guided by exploration of the raw data, the multiple regression model and its coefficients.
Performance of the new scoring system, [L] measurements and the CRIB and CRIB-II scores to predict mortality was assessed by constructing a ROC curve for each and calculating the AUC. The Hosmer–Lemeshow (HL) test12 was used to measure the goodness of fit of the models.
ROC curves were also constructed for the prediction of the combined outcome of death or any adverse outcome (ROP, NEC, BPD, abnormal cranial ultrasound).
SPSS version 11.5 for windows (SPSS Inc., Chicago, Illinois, USA) was used for all statistical calculations. Throughout all analyses the level of statistical significance that was accepted as a true difference was 5% (p<0.05).
Four hundred and eight inborn babies were admitted between 1 March and 4 February. Of the 408 infants, 337 had enough data to calculate a CRIB score, although it was only possible to calculate a CRIB-II score in 123 cases. This was normally due to missing admission temperature data.
Of the 408 infants, 381 had at least one [L] measurement recorded in the first 12 h of life. Three hundred and twenty of these had a first [L] measurement recorded within the first 2 h of life. The only statistically significant difference between the 27 babies in whom no [L] measurements were made and the 381 who did was their need for respiratory support, and blood gas analysis because of this (table 1).
Seven infants had a life-threatening malformation; three died (pulmonary hypoplasia, osteogenesis imperfecta and bilateral choanal atresia) and four survived (atrioventricualr septal defect, gastroschisis, congenital diaphragmatic hernia and pulmonary artery atresia).
There were 47 deaths in the cohort. The median age at death was 3 (range 1–210) days, with a median corrected gestational age (CGA) of 26 weeks (range 23–57). CRIB score, CRIB-II score, initial [L] and highest [L] were all significantly higher in babies who died than in babies who survived (table 2, figures 1 and 2).
The most informative value for first [L] was 7.7 mmol/l with a sensitivity of 65% and a specificity of 84%. The most informative value for highest [L] was 6.9 mmol/l with a sensitivity of 77% and a specificity of 78%.
Univariate analysis revealed that the following variables were associated with an increased risk of death: gender, gestation, birth weight, need for blood pressure support in the first 12 h, first [L], highest [L], antenatal steroids, first BE, presence of acutely life-threatening malformation. Multiple logistic regression analysis was then performed with stepwise entry of these variables in order to identify those that were independently associated with death. Variables that remained significant and thus were independently associated with mortality were: gestation (p<0.0001), highest [L] (p<0.0001), presence of acutely life-threatening malformations (p=0.011) and the need for blood pressure support in the first 12 h (p=0.043).
Developing a NIPI score
The need for blood pressure support was the least powerful component of the multiple logistic regression model. The threshold for starting inotropic support will vary between neonatal units and inclusion of this variable could introduce a subjective component into the new score. For these reasons, the new score was developed using only the other three variables from the regression model (gestation, highest [L] and presence of acutely life-threatening malformation).
The data was explored using chart representation of the mortality across different gestations and [L] ranges. The regression model was also used to guide the categorisation and subsequent allocation of scores to the various gestation, [L] and malformation categories. The relationship between gestation and mortality was not linear. Lower gestations had a greater influence on the prediction of mortality and this is reflected in the scoring system with infants born at 24 weeks or less receiving proportionally much higher scores than those born at later gestations. Highest [L] is a more linear predictor of mortality and this is reflected in a more proportional distribution of scores. The value of the scores across all categories is based on the relative importance each variable is given in the multiple logistic regression model and the corresponding regression coefficients. The name chosen for the new scoring system was the Neonatal Illness Prognosis Indicator (NIPI score). The NIPI score is calculated from the sum of three subscores assigned to the baby on the basis of the three component variables as described in table 4.
Performance of the new score
NIPI scores were then calculated for the cohort of 381 infants that had all of the required data and an ROC curve was constructed for its ability to predict death. The area under the ROC curve was 0.918 (95% CI 0.876–0.961) with a HL test of p=0.980.
During the development of the CRIB score if fractional inspired oxygen concentration or blood gases were not recorded in infants who received no respiratory support, oxygenation and BE were assumed to be normal.6 [L] data is missing from some subjects in this study, mostly from well babies who did not require ventilation. If a similar assumption is made and we assign the lowest [L] value (<3 mmol/l) to all cases with missing [L] values then a ROC curve can be produced based on all 408 infants. The predictive ability of the new score remained excellent with AUC 0.892 (95% CI 0.838 to 0.947) and HL test p=0.847. Three of the infants who had missing [L] measurements died. We believe it is likely that their [L] measurements would have been elevated if they had been measured so the AUC in this model is more likely to be an underestimate of predictive ability.
Of the 408 infants a total of 164 had an adverse outcome defined as death, (47) abnormal cranial ultrasound scan (33), BPD (108), ROP (5) or NEC (24) or a combination of any of the above. The ability of CRIB, CRIB-II and NIPI to predict these outcomes was also assessed from ROC curves (table 5).
There were 275 babies included in the validation cohort. They were similar to the development cohort with a median (range) gestation of 28 (23–35) weeks and a median (range) birth weight of 1060 (460–1500) g. The median (range) NIPI score for this cohort was 3 (0–18). The cohort included 50 deaths. NIPI score continued to perform well as a predictor of death. The AUC curve was 0.859 (95% CI 0.805–0.913).
The CRIB score is a simple scoring system which predicts the risk of death from six parameters available in the first 12 h of life. It was developed retrospectively in a cohort of 812 infants of birth weight 1500 g or less or gestational age less than 31 weeks treated in four UK tertiary hospitals between 1988 and 1990. The scoring system was then validated by comparing its value as a predictor of hospital death with that of birth weight in a separate cohort of 488 similar infants. The AUC for predicting death in this validation cohort was significantly greater for CRIB than birth weight alone (0.9 (SE 0.05) vs 0.78 (0.03), p=0.03). Other studies have produced similar values for the AUC using CRIB of 0.87 and 0.90.13 14 Manktelow et al have recently demonstrated that by recalibrating the CRIB-II score it provides excellent predictive characteristics for infants less than 32 weeks gestational age at birth.15 The predictive ability of the CRIB score in our study was similar to that described when the score was initially developed. Despite advances in neonatal medicine and improvements in neonatal survival, the predictive ability of the CRIB score and CRIB-II score seem to have been maintained.
Cotside measurement of [L] has become much easier in the years since the development of the CRIB scores with the incorporation of this parameter into many blood gas analysers. Hyperlactataemia has been shown to be of prognostic significance in both adults16 17 18 19 20 21 22 23 and children undergoing intensive care.24 25 Deshpande26 assessed [L] in 75 ventilated babies of all gestational ages and found an increased mortality in groups with higher [L]. Hyperlactataemia has also been shown to be of prognostic significance in babies with NEC27 28 and in babies receiving extracorporeal membrane oxygenation.29 30 31 32
We have shown in this study that hyperlactataemia in the first hours of life carries a poor prognosis for survival in a large cohort of VLBW infants. This was not perimortem hyperlactataemia. Most of the babies who died did so several days or weeks later and their [L] measurements returned to normal in the intervening period. Early hyperlactataemia was also independently associated with other adverse outcomes in these babies. While highest [L] in the first 12 h may reflect the effects of early treatment, we believe that this is unlikely given that both initial and highest [L] were significantly elevated in the infants who died. We hypothesise that this initial hyperlactataemia is a consequence of insults occurring in the antepartum, intrapartum or immediate postnatal period to restrict oxygen availability in these babies, increasing anaerobic respiration and lactate production. The processes that commonly lead to preterm delivery (bacterial infection, placental insufficiency, pre-eclampsia, antepartum haemorrhage etc) are all associated with foetal circulatory disturbances and this is the likely mechanism through which this tissue hypoxia occurs. This period of tissue hypoxia may cause organ injury which causes or contributes to later death or morbidity. [L] measurements were not considered as candidate data items in the development of the CRIB score. Elevation of [L] occurs when oxygen availability is decreased at a cellular level, whether due to hypoxia, profound anaemia or reduced perfusion. Conventional analysis of blood gas measurements makes the assumption that similar information can be provided by measurement of metabolic acidosis. This does not appear to be the case. Chanrachakul33 found no linear correlation between [L], pH, pCO2, or base deficit in a study of over 500 babies. Thus, the incorporation of [L] into a clinical scoring system may be expected to be more informative than the measurements of acidosis incorporated into the CRIB score. The multivariate analysis that we performed found that [L] was independently associated with death and acidosis was not. The NIPI score developed in this study incorporates [L] and has a good ability to predict death. The overall performance of the score appears to be similar to that of the CRIB score.
We recognise the limitations in the development of the NIPI score, chiefly the validation cohort being a retrospective one and that it has only been validated in a single centre. The score assigned to the individual components of the NIPI score may differ between units and we would encourage further research to assess its accuracy before encouraging its widespread use. We believe, however, that the NIPI score has several advantages. First, the number of data items required for the NIPI score is smaller (three compared to six for CRIB and five in CRIB-II). In this study, 285/403 infants could not be retrospectively ascribed a CRIB-II score due to missing data items. In addition, some of the data items required to calculate CRIB may not be comparable between units. For example, while the highest and lowest ‘appropriate’ FiO2 were originally defined within clear PaO2 ranges, in clinical practice this variable is dependent on the frequency and method of blood gas sampling.
Other scoring systems which have been developed include SNAP and SNAP-PE (SNAP's perinatal extension),8 9 which was developed using data from three units in Boston, USA, in 1990. Although the SNAP derivation cohort contained 1643 infants, only 154 weighed less than 1500 g at birth, the group in which the most deaths occur. SNAP-PE scores are more difficult to calculate than CRIB or NIPI, being based on the results of 28 items collected over the first 24 h of life.
In order for the new score to be accepted and used, not only must it be demonstrated to be accurate and validated but it must be user-friendly. A useful method to evaluate the score is to consider how well it fulfils the specifications defined by the developers of the SNAP score.8 It should
(1) Rely on physiology rather than diagnosis;
(2) Provide information beyond that possible from basic data such as birth weight and gestation;
(3) Have established validity and reliability;
(4) Be able to be used by physicians and researchers;
(5) Be readily available information;
(6) Have sufficient range to cover the spectrum of cases seen on a neonatal unit;
(7) Reflect status on admission.
The NIPI score fulfils all of these criteria. While the decision as whether or not a congenital malformation is acutely life threatening could introduce some subjectivity, we believe the definition as given in this study it should be able to be made with satisfactory certainty.
This study only allowed the consideration of short-term morbidity, but these are likely to be markers of more long-term problems. The performance of the scoring systems to predict any individual adverse outcome other than death cannot be assessed from this study due to the small numbers of babies with these individual diagnoses.
In summary, we have demonstrated that [L] measurements in the first hours of life of VLBW babies have prognostic significance. The CRIB score retains its predictive ability for death in these babies despite being developed in a neonatal population almost 20 years ago. The NIPI score developed in this study has a similar predictive ability as CRIB, but we believe it is preferable as it depends on fewer data items with no components that may be affected by variation in clinical practice between units.
The authors wish to acknowledge the contribution of Dr Anna Hart of University of central Lancashire for her advice on the statistical aspects of developing the new scoring system.
Competing interests None.
Ethics approval This study was conducted with the approval of the Liverpool Paediatric Research Ethics Committee.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.