Article Text

Download PDFPDF

Limited comparability of classifications of levels of neonatal care in UK units
  1. The ECSURF (Economic Evaluation of Surfactant) Collaborative Study Group
  1. Dr William Tarnow-Mordi, Department of Child Health, University of Dundee, Ninewells Hospital and Medical School, Dundee DD19SY.w.o.tarnowmordi{at}


AIM To assess whether different classifications of neonatal care or dependency scales are comparable when used in multicentre studies of cost effectiveness.

METHODS A survey of classifications was used in a nationally representative group of 57 units in 1990–1, with a retrospective study of 10 354 cot days using patient records from a 5% random sample of 1042 admissions. Local and national classifications were correlated with medical and nursing procedures recorded for up to 26 days after each admission.

RESULTS Classifications varied substantially. Of the 57 units in our sample, 26 used one of two national classifications, sometimes modified; 17 used the Northern Neonatal Network dependency scale; and the other 14 did not record daily levels of care. In each classification, the highest level was having respiratory support by ventilation or continuous distending pressure through an endotracheal tube, nasal prongs, facemask or negative pressure device. This level of care was consistently comparable between classifications; lower levels were not.

CONCLUSIONS Retrospective comparisons between units with different classifications can only reliably differentiate between days with and without respiratory support. There is a pressing need to develop and validate more appropriate scales for prospective multicentre studies. These should relate activity to costs and outcome.

  • classifications of neonatal care
  • dependency scales
  • prospective multicentre studies

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Assessing the cost effectiveness of neonatal care is increasingly important. However, the work of neonatal units varies from highly invasive support to basic nursing care, and this cannot be ignored when comparing costs within or between units. Many studies in single UK centres1-6 have estimated the relative daily costs of different levels of care by detailed observation of individual infants. Estimates of average daily costs per cot were made by dividing up the total costs of each centre in proportion to the total days at each level of care. The overall costs of different groups of infants were then calculated according to their length of stay at each level of care.

This approach has been useful. An observational study showed that the daily costs of caring for very low birthweight infants who died compared with those who survived were six times higher in one unit then another.3 Variation in medical policy therefore seems to be a crucial determinant of costs. Other research has combined estimates of daily costs with the results of randomised controlled trials, to calculate the cost effectiveness of confirmed treatments, such as surfactant4 5 or antenatal steroids,4 thus enhancing their implementation.

Detailed observation of individual infants by independent observers may be too expensive in large multicentre studies. Fordham and colleagues7 therefore estimated the relative daily costs of two broad levels of care—intensive and non-intensive—in all neonatal units in Trent using another approach. They apportioned costs entirely from the top down, using routine data. They documented the total costs of each unit and its total cot days at each level of care over a defined period. They then derived the relative daily costs of each level of care across the region from a multiple regression equation.7-9 This was an important methodological development, as it showed that variation in total costs between units in a multicentre study could be explained using broad definitions of levels of care. However, this approach requires that each unit uses a comparable classification.

Several classifications of care have been adopted in the UK. In 1984 the then British Paediatric Association (BPA) and the British Association for Perinatal Paediatrics (BAPP) recommended a simple classification to audit workload, denoted as BPA84 in this report.10 In 1992 a more comprehensive system,11 denoted here as BAPM91, was recommended by the British Association of Perinatal Medicine. In 1993 simple dependency scales were published, supported by detailed observations of nursing activity in the Merseyside regional neonatal intensive care unit and throughout the Northern Region.12-13 Many neonatal units have modified one of these classifications or developed their own, and some collect no workload data. At present, purchasers cannot tell whether differences in daily costs between units represent true differences in cost or cost effectiveness, or just differences in definition.

The Medical Research Council funded a three year project from November 1991 for the economic evaluation of surfactant (ECSURF), using data from UK units participating in two international trials of surfactant therapy.14 15 The main aim of the ECSURF study16 (Mugford et al, unpublished data presented at the 1st annual meeting of the Royal College of Paediatrics and Child Health, 1997) was to cost different levels of intensity of neonatal care in the UK. The relative daily costs of each level of care were estimated by a top down approach,7 using multiple regression analysis to relate the total costs of each unit to the proportions of cot days it provided at different levels of care. These estimates were to be combined with the results of the two trials, to measure the incremental short term cost effectiveness of earlyvs delayed selective surfactant,14 and of more vs fewer doses.14 15 The estimates of the daily costs of different levels of neonatal care derived from ECSURF have also been used for measuring the incremental short term cost effectiveness of ECMO (Roberts et al, unpublished data presented at the First Annual Meeting of the Royal College of Paediatrics and Child Health) and could be applied to any effective neonatal intervention for the UK.

As a first step in the ECSURF project, we assessed how classifications of care used by different units should be adjusted to permit appropriate comparisons of the daily costs of different levels of care.


For each of the 61 units participating in the ECSURF study, we collected details of the local classification and the daily care provided to a 5% sample of babies admitted in the year beginning April 1 1990. In each neonatal unit the records of one infant out of the first 10 admissions were selected using a random number table, then of every subsequent twentieth infant. If the records were missing the next admission to the unit was substituted.

Data were collected from patient records during visits to each unit by one of four research nurses. During their training, methods for extracting data had been standardised between them, using common sets of medical and nursing records. They then compiled a comprehensive list of every medical and nursing procedure used as a criterion in determining any level of care in any of the neonatal units. This involved abstracting from each infant’s daily record for up to 26 days after admission all procedures or medications and descriptive variables, including day of death, birthweight, gestational age at birth, and details about transport. The level of care which had been recorded each day by the staff of the unit using its local classification system was also noted for each baby. The highest level of care in every classification was having respiratory support, defined as intermittent positive pressure ventilation (IPPV), intermittent mandatory ventilation (IMV), or continuous positive airway pressure (CPAP) through an endotracheal tube, nasal prongs or face mask, or ventilation or continuous pressure through a negative pressure device. Because all of these babies would be classified similarly in different units, no other procedures were recorded on days when respiratory support was given.

Neonatal units were divided into three groups according to written definitions of their local classifications of care:

(i) units using a local classification modified from the BPA84 or BAPM91 classifications;
(ii) units using the Northern Neonatal Network dependency scale, a system derived from detailed observations of nursing and medical work; and
(iii) other units, with no daily record of level of care.

For units in groups (i) and (ii) the data collected were entered in a computer algorithm to reclassify each cot day according to the original versions of two national classifications, BPA84 and BAPM91. The distribution of cot days at each level of care using the local classification was compared with the distribution of cot days using the two national standards. We used a statistical method described by Fleiss,17 which estimates the degree of inter-rater agreement, beyond what might be expected by chance alone, expressed as the κ statistic. We postulated that κ would exceed 0.9, where a value of 1 indicates perfect correspondence between classification systems. As this method can only compare classifications with identical numbers of categories, we amalgamated local classifications with four or more categories in two ways to produce only three levels of care. First, we merged the two highest categories below respiratory support in each local classification as intensive care (local a). Second, we kept the highest category below respiratory support in each local classification separate and merged the lower categories (local b). In each comparison the proportion of cot days at each of the three levels constructed using the local classification was correlated with the corresponding proportion of cot days using the original BPA84 and BAPM91 classifications.

Finally, we used methods for probit and ordered probit regression analysis which do not require the same number of categories when comparing each classification.18 Probit analysis can estimate the relation between a categorical outcome, such as the level of care a baby was assigned in any given classification, and the specific procedures conducted on that day. This approach was used to identify criteria associated with particular levels of care which were shared between groups of neonatal units using different classifications. The aim was to show how consistently different classifications allocated babies to similar levels of care.

The data were processed and checked using double data entry, and were subsequently analysed using SAS (Version 6.08), SPSS for Windows (Version 6.1.1) and the LIMDEP econometric package (Version 6.0).


Of over 100 UK neonatal units taking part in either of the two surfactant trials,14 15 61 were invited to take part in ECSURF because of reasonable proximity to the study coordinators. Four units were unable to provide data about annual costs of care, so 57 UK neonatal units were surveyed. Of these, 26 units used variants of the BPA84 and BAPM91 classifications and 17 used the Northern Region nursing dependency scale which classified care into four groups, consisting of two levels of dependency, high and low, with two subdivisions of each level. The other 14 did not record daily levels of care (table 1).

Table 1

Level of care definitions

We compiled a list of 51 criteria used to classify care in different levels. Table 2 shows that only one of these criteria was shared by all classifications. This was having respiratory support, which always constituted intensive care. There were no consistent criteria for normal care, which includes well babies being prepared for home.

Table 2

Examples of criteria included in all or only some definitions

We recorded details of care given to 1042 infants for 10 354 cot days in the 57 neonatal units. The birthweight profile of these infants was 54 (5%) less than 1000 g; 103 (10%) between 1000 and 1499 g; 317 (30%) between 1500 and 2499 g; and 563 (54%) 2500 g or more. Table 3shows how frequently each of the 51 criteria occurred in our sample of cot days. Only 20 cot days of care (less than 0.2 per cent) included dialysis or exchange transfusion. Although care before and after surgery denoted high dependency and intensive care in some classifications, it accounted for less than 50 cot days in over 10 000.

Table 3

Occurrence of criteria as percentage of baby days

Forty four per cent of cot days included incubator care, 24 per cent included antibiotic treatment, but less than 0.5 per cent were with barrier nursing. Thirty-seven per cent of cot days included constant monitoring but less than 1 per cent included any record of unstable cardio-respiratory disease. Only on about 3 per cent of non-ventilated days could the baby be classified as having had recurrent apnoea requiring at least five stimulations in 24 hours. Nearly 14 per cent of non-ventilated baby days in our sample were for wholly breast or bottle fed babies weighing over 1750 g.

Table 4 shows our estimates of the proportions of days at different levels of care according to the BPA84 and BAPM91 classifications, among neonatal units using local classifications a and b. Using the BPA84 classification, we overpredicted the numbers of non-ventilated intensive care days, and found similar numbers of special care days in both groups. Using the BPA91 definition, non-ventilated intensive days closely matched the intensive care and high dependency categories merged in local classification a. In both groups of neonatal units we estimated a lower proportion of nursery care than was actually recorded locally.

Table 4

Comparison of local classification with BPA84 and BAPM91 national standards

There was a significant correspondence between the predicted workload based on different classifications (table 5). However, in every case, the pre-specified hypothesis that the classifications would be highly concordant was rejected, as the underlying κ statistic never exceeded 0.9.

Table 5

Kappa statistics and Spearman rank correlation coefficients for comparisons of days of care with different classifications

According to probit analysis, no single criterion was a consistent predictor for levels of care other than respiratory support. It was impossible to derive a common model that would predict levels of care in different groups of neonatal units.


Although local and national classifications uniformly identified respiratory support as the highest level of care, they were otherwise not closely comparable, which confirms previous findings.13 In retrospective comparisons between units using different classifications it seems appropriate to differentiate only between days with and without respiratory support.

There are several potential problems with the statistical data and methods we have used. Firstly, we may have missed certain procedures because they were not in the notes or were overlooked during data collection. This would have exaggerated the influence of items which were recorded. Secondly, neonatal units might deviate from their adopted classification to record higher categories of care to justify already stretched staffing levels.19 However, we found the opposite: local classifications overestimated the numbers of days at the lowest level of care. Third, our 5 per cent sample of admissions may have created selection bias. Censoring the data after the 26th day and substituting the next admission when the records of randomly selected infants were missing may have caused infants with long stays and chronic problems to be under-represented. However, as these infants usually received long term respiratory support this bias is unlikely to alter the main conclusion that this level of care was consistent between classifications.

Routinely recording daily occupancy using detailed classifications of levels of care has an established role in allocating staff and documenting trends within units. However, in multicentre studies there are strong grounds for developing a simple common denominator, to minimise interobserver variation and the expense of coordinating comprehensive data collection.13 An alternative would be to use diagnostic related groups for neonatal care, as suggested by the Clinical Standards Advisory Group.. 20 However, this approach has been criticised when applied in the USA for reimbursement of costs.21

Clearly, there is a pressing need to develop appropriate classifications of care or dependency scales for prospective multicentre studies and these should be carefully validated before they are widely accepted. How should this be done?

The most rigorous validation of a dependency scale would require studies which relate its levels of care to costs, activity, and outcome. In the original top down costing study by Fordham and colleagues,7 the broad categories of intensive vsnon intensive care accounted for 76% of the variation in costs between neonatal units in Trent. Williams et al 12 and the Northern Neonatal Network13 reported simple, practical dependency scales which were supported by detailed, prospective analyses of nursing activity. These suggested that, on average, ventilated infants who were stable required only slightly more nursing time than highly dependent infants who were not ventilated. Following these studies, the British Association of Perinatal Medicine has recommended that a nurse should not be responsible for more than two infants receiving neonatal intensive care or more than four infants receiving special care,22 and has acknowledged that its previous more stringent standards11 for levels of nursing staff lacked empirical evidence and were rarely met.22 Another important test of the validity of a classification of care would be to use it to show whether outcomes deteriorate when workload is excessive. This would allow us in turn to test the validity of current recommendations for minimum safe levels of staffing.22

Figure 1 illustrates an example of a practical dependency scale consistent with previously validated analyses of nursing activity,12 13 which will be used in a random sample of units in the UK Neonatal Staffing Study.23 This twice daily log of unit workload is designed to provide a common currency using uniform definitions across about 50 units in the prospective phase of that study, which aims to relate patient volume, levels of staffing provision, and workload to costs and outcome.

Figure 1

A practical dependency scale, based on previous work,13 used in the UK Neonatal Staffing Study.27

We conclude that simple and uniform scales for comparative studies of the costs of neonatal care are needed. Without this, research on the costs and cost effectiveness of neonatal care in the NHS will continue to be more expensive and less reliable than it should be.


We thank Susan Fritz, Mary McCulloch, Beth McDonagh, Hazel Ashurst, Lisa Wood and Lindsay Bramley for data collection and processing. We thank all of the neonatal units who participated in this study and colleagues at the National Perinatal Epidemiology Unit and members of the UK Neonatal Staffing Study for their comments.

We are grateful to Douglas Richardson for advice with the analysis. Miranda Mugford, Sarah Howard, and Alistair Dunn were funded by the Department of Health. The Project for the Economic Evaluation of Surfactant was funded by the Medical Research Council, which supported Morag Zelisko.



  • Members of the study group: M Mugford, S Howard, C O’Neill, A Dunn, M Zelisko, C Normand, M Malek, E Hey, H Halliday, W Tarnow-Mordi