Article Text

Routinely collected English birth data sets: comparisons and recommendations for reproductive epidemiology
  1. Rebecca E Ghosh1,
  2. Danielle C Ashworth1,
  3. Anna L Hansell1,2,
  4. Kevin Garwood1,
  5. Paul Elliott1,2,
  6. Mireille B Toledano1
  1. 1UK Small Area Health Statistics Unit, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
  2. 2Imperial College Healthcare NHS Trust, London, UK
  1. Correspondence to Dr Mireille B Toledano, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK; m.toledano{at}


Background In England there are four national routinely collected data sets on births: Office for National Statistics (ONS) births based on birth registrations; Hospital Episode Statistics (HES) deliveries (mothers’ information); HES births (babies’ information); and NHS Numbers for Babies (NN4B) based on ONS births plus gestational age and ethnicity information. This study describes and compares these data, with the aim of recommending the most appropriate data set(s) for use in epidemiological research and surveillance.

Methods We assessed the completeness and quality of the data sets in relation to use in epidemiological research and surveillance and produced detailed descriptive statistics on common reproductive outcomes for each data set including temporal and spatial trends.

Results ONS births is a high quality complete data set but lacks interpretive and clinical information. HES deliveries showed good agreement with ONS births but HES births showed larger amounts of missing or unavailable data. Both HES data sets had improved quality from 2003 onwards, but showed some local spatial variability. NN4B showed excellent agreement with ONS and HES deliveries for the years available (2006–2010). Annual number of births increased by 17.6% comparing 2002 with 2010 (ONS births). Approximately 6% of births were of low birth weight (2.6% term low birth weight) and 0.5% were stillbirths.

Conclusions Routinely collected data on births provide a valuable resource for researchers. ONS and NN4B offer the most complete and accurate record of births. Where more detailed clinical information is required, HES deliveries offers a high quality data set that captures the majority of English births.

  • Data Collection
  • Epidemiology
  • Statistics

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

What is already known on this topic

  • Routinely collected birth data sets provide an important resource for epidemiological studies and for surveillance of reproductive health.

  • Of the four national sources of birth data ONS births and NN4B offer the most complete and accurate record of all births in England.

  • But ONS births and NN4B do not provide the detailed clinical information held in HES.

What this study adds

  • HES deliveries are recommended for use over HES births.

  • Researchers should undertake a descriptive analysis of the data to identify any temporal or spatial trends.

  • Policies are required to reduce the high burden of permissions and information governance required to obtain and link birth data sets.


Routinely collected data on births are a valuable resource for use in epidemiological studies of reproductive outcomes and surveillance.1 ,2 In England there are four national births data sets: Office for National Statistics (ONS) births based on birth registrations; Hospital Episode Statistics (HES) deliveries; HES births; and National Health Service (NHS) Numbers for Babies (NN4B) which is based on ONS births plus gestational age and ethnicity information.

ONS births consists of information on births in England and Wales (live or still) registered within 42 days of birth, a statutory requirement. ONS births is a complete, high quality data set that holds some sociodemographic information, but lacks key information such as gestational age. ONS provides detailed metadata as well as produces annual publications on a range of summary birth statistics and trends.3 This data set has also been used in studies of reproductive health and environmental exposures,1 ,2 ,4–6 sociodemographic effects,7–9 temporal trends in birth weight10 ,11 and survival.12

HES from the Health and Social Care Information Centre (HSCIC), documents all admissions into English NHS hospitals and facilities funded by NHS and routinely publishes descriptive statistics and data quality summaries of their maternity data.13 The HES maternity records are a subset of HES and include two data sets: HES births (babies record), and HES deliveries relating to the birth process (mothers record with information on each baby). HES deliveries contains detailed clinical information and has been used to investigate obstetric surgery outcomes and practice.14–17 HES births has been mainly used for methodological papers creating linked birth cohorts18 and linkage with other routine data sets.19 ,20

NN4B was created to allocate NHS numbers to babies who are notified (within a few days of birth) to a Central Issuing System and for 2006 onwards NN4B is available from ONS. NN4B is a high quality record of births with additional key variables including gestational age21 and has been linked to birth and death registrations by ONS to produce gestation-specific infant mortality statistics.22 From 2015 NN4B functions have been replaced by the Personal Demographics Service on the NHS Spine but the information currently being collected will remain comparable to NN4B.

The specific aims of this study were to: (1) assess, for the first time, the quality of all four national data sources on birth outcomes; (2) to produce comparative statistics for each data set for several common outcomes in reproductive epidemiology; and (3) make recommendations to researchers on the most appropriate data set(s) for use in epidemiological studies.


Data were extracted from ONS births registrations and HES maternity records for all English births for calendar years 2002–2010; (NN4B from 2006 the earliest available year from ONS). For ONS births and NN4B, each record relates to one birth. In HES maternity, records relate to an episode of care during pregnancy rather than a birth and contain variables that are also held in standard HES records. HES deliveries and HES births hold space for up to nine additional fields known as a ‘baby tail; in which variables relate to the delivery and babies.

Online supplementary table S1 and figure S1 document an exploration of the different criteria, filtering conditions and deduplication that can be used to define a birth within HES. This was conducted in accordance with previously published papers,18 ,19 reports23 and personal communication with the HES team.

Data analysis

Variables available, total numbers of births and missing data in each data set were compared for the whole period (2002–2010) and by year. Descriptive statistics were produced for the following four common adverse birth outcomes:24 ,25

  1. Low birth weight (LBW): Live singleton births with a birth weight between 200 g and 2500 g

  2. Stillbirths: Births coded as stillbirths occurring ≥24 weeks of pregnancy

  3. Term LBW: Live singleton births with a birth weight between 200 g and 2500 g born ≥37 weeks of pregnancy

  4. Preterm delivery (PTD): Live singleton birth occurring <37 weeks of pregnancy but >10 weeks of gestation.

Variability in rates and counts was investigated by maternal age, plurality, region, deprivation (quintiles of the Carstairs index26 of Census Output Areas (COAs) 2001 defined by residential postcode), birth weight, sex, parity (number of previous children), ethnicity, previous pregnancies, delivery method, delivery place and gestational age.

Geography: COA are the smallest geographical areas for which census data are published. They are created from census data using clusters of adjacent postcodes and are designed to have similar characteristics and population sizes (on average around 100 people). Lower Layer Super Output Areas (LSOAs) are geographical areas built up from groups of adjacent COAs with similar characteristics and for the 2001 census LSOAs had a mean population of 1514.27

For the two most complete data sets (ONS and HES deliveries) the numbers of live births and stillbirths and the average birth weight were calculated at a small area level (LSOA 2001—LSOA) across: (A) England and (B) two regions, the North-East and Greater London. The rates of live births per 10 000 population (using ONS midyear population estimates) of the two regions were then mapped at LSOA level.

All data handling and analysis was performed in R V.2.14.2 and STATA V.13 (Stata Corp, College Station, Texas, USA); maps were produced using Arc GIS 10.1 (Environmental Systems Research Institute, California, USA).


All data sets contain information on birth status and weight, sex, mothers’ date of birth and residential postcode (table 1). ONS and NN4B have residential address while the HES maternity data sets only have information at postcode level. The HES data sets and NN4B include information on gestational age and ethnicity, with HES providing the mothers’ ethnic group and NN4B providing the ethnic category of the baby as defined by the mother. HES maternity also provides additional clinical information.

Table 1

Availability of data by birth data set

Total births and time trends

There were clear differences in capture across the study period (see online supplementary Figure S2) and in 2001 HES deliveries captured only 73.0% of ONS. Capture was much higher thereafter, so 2001 was excluded from subsequent comparisons. From 2002 to 2010, ONS recorded 5 727 407 births, HES deliveries 5 545 905 and HES births 5 534 194, while NN4B recorded 3 333 154 in 2006–2010 (table 2). From 2002 to 2010, HES deliveries captured 96.8% of all ONS births, HES births captured 96.6% and for 2006–2010 NN4B captured 99.8%.

Table 2

Missing data in selected birth outcome variables (2002–2010)

ONS had few missing or unavailable data except for parity which was only recorded for married mothers (51.6% of births) and gestational age (stillbirths only); NN4B also had few missing data (table 2). The HES data sets had more variables with larger proportions of missing data than ONS, with HES births being worse than HES deliveries. ONS and NN4B had little variation in missing data by year, both HES data sets showed a decrease in missing data over time but with a spike in missing data in 2007 (see online supplementary table S2). For HES births the sex of the baby was not collected from 2003 and for key variables there was an increase in missing data over time.

A comparison of selected variables for each data set is presented in online supplementary table S3. For total births the HES data sets were broadly consistent with ONS and NN4B data, with fewer multiple births (HES deliveries 2%; ONS 3.1%) and more female births (HES deliveries 49.6%; ONS 48.7%).

LBW births and stillbirths

Similar proportions of births were recorded as LBW in all data sets (ONS=5.9%; HES deliveries=6%; HES births 6.3%; NN4B=5.5%) (see online supplementary table S4) and all showed a decreasing trend in LBW rates (figure 1A). When comparing specific variables in each data set for LBW births only, HES deliveries was similar to ONS but HES births was not, especially when comparing sex and region. When comparing the characteristics of LBW births with those of all births (see online supplementary table S3), LBW births were more likely to be Asian (NN4B Asian LBW births=16.9%; NN4B Asian births =10.4%), delivered more frequently by Caesarean section and in the most deprived Carstairs quintile.

Figure 1

Annual rates of (A) low birthweight births per 10 000 live singleton births and (B) stillbirths per 10 000 total births (live and still) in each birth datasets from 2002 to 2010. *HES deliveries and HES births presented with and without inclusion of one NHS trust with known stillbirth reporting issues.27 HES, Hospital Episode Statistics; LBW, low birth weight; ONS, Office for National Statistics; NN4B, NHS Numbers for Babies.

Similar proportions of all births were stillborn in all data sets (ONS 0.5%; HES deliveries 0.6%; HES births 0.5%; NN4B 0.5%) (see online supplementary table S4). Between 2002 and 2010 there was a slight decrease in the annual rate of stillbirths with a large peak in 2007 in HES (figure 1B). This was due to reporting issues with one NHS trust (which has previously been reported in HES data quality notes) and after excluding this trust, the rates of stillbirths in HES were lower than in ONS. When comparing specific variables in each data set for stillbirths only, HES deliveries was generally similar to ONS, but HES births showed larger discrepancies. Compared with all births (see online supplementary table S3) stillbirths were more often multiple births (ONS multiple births=3.1%, ONS multiple stillbirths=7.7%), in deprived areas, of LBW and from non-white ethnic groups.

Term LBW deliveries and PTDs

Analysis of term LBW deliveries and PTDs requires information on gestational age which is held in HES data sets and NN4B only (see online supplementary table S5). Using data for 2006–2010, the proportion of live singleton births that were term LBW varied from 2.5% in NN4B to 2.7% in HES births. Compared with all births (see online supplementary table S3) term LBW babies were more likely to be female, Asian (HES deliveries Asian births=11.5%, HES deliveries Asian term LBW births=23.1%) and from deprived areas.

The two HES data sets recorded a similar, higher proportion of PTDs than NN4B (HES deliveries=7.4%; NN4B=5.9%) (see online supplementary table S5) and the HES deliveries data set was more similar to NN4B than HES births. Compared with all births, PTDs were more likely to be LBW (HES deliveries LBW births=5.8%; HES deliveries preterm LBW births=45.9%) and from the most deprived areas.

Regional spatial analysis

The national HES deliveries data set had fewer live births per LSOA than ONS (HES mean 122.8 live births per LSOA; ONS=175.1) with this difference being most marked in London (table 3). Stillbirth counts at LSOA level were similar in both data sets at a national level (mean 0.9 stillbirths per LSOA), but lower in HES deliveries in the North-East and London. The birthweight data at LSOA level showed good agreement between HES deliveries and ONS but differences were observed in the variability. A similar pattern was seen for the North-East and London regions.

Table 3

LSOA level comparison of live births, stillbirths and average birth weight by selected government office regions 2002–2010

The spatial distribution of the rates of live births in London and the North-East at LSOA level (2002–2010) is shown in figure 2. In London there were clear spatial differences in the live birth rates, with the South-East and North-West of London showing particularly low rates of HES deliveries compared with ONS births (figure 2A). In the North-East the distribution of births by LSOA was broadly similar in both data sets (figure 2B).

Figure 2

Super Output Area level live birth rates in ONS births and HES deliveries in (A) London (B) the North-East (2002–2010). *Actual live birth rates are not provided to prevent any potential identifiability of the data. HES, Hospital Episode Statistics; ONS, Office for National Statistics.


This is the first study to provide a detailed assessment of the quality of reproductive health data from all four national routine births data sets in England. Overall the ONS births data set is the most complete and accurate record of all births in England (2002–2010) and NN4B is a valuable enhancement to this data set. HES deliveries is more complete than HES births and captures the majority of English births (96.8%) with good comparability to ONS but still has inaccuracies relating to missing data resulting in temporal and spatial anomalies. However HES deliveries offers detailed clinical information that cannot be obtained from the ONS data sets.

Descriptive statistics and trends for the birth outcomes were broadly similar for ONS, NN4B and HES deliveries, but less so for HES births due to missing data. The prevalence of LBW babies in ONS (5.9%) was similar to a WHO estimate for similar European countries (6.6%),28 as was the prevalence of stillbirths in ONS (0.5%) which also was consistent with other European countries (<1%).29 The prevalence of PTDs in the NN4B data (5.9%) was similar to recent 2010 estimates for other northern European countries (5%).30 Known risk factors for LBW, PTD and stillbirth include deprivation and non-white ethnicity;31 this was consistent with our results which found term LBW, LBW birth, PTD and stillbirth were more likely in non-white ethnic groups and in the most deprived Carstairs quintiles.

Our recommendations for those considering using the four national routine births data sets in England for epidemiological studies of birth outcomes are:

  • For studies where clinical and lifestyle data are not required, for example, birth rates/prevalence studies, time trend studies, etc, ONS birth registrations is preferred.

  • For studies that require information on gestational age and potential confounders such as ethnicity, NN4B is preferred but is currently only available from ONS from 2006 onwards.

  • If clinical or pre-2006 information is needed HES deliveries is preferred over HES births unless, for example, information on the child's ethnicity is required over mother's ethnicity.

  • Temporal and spatial trends in HES data should be thoroughly explored before use especially if HES data prior to 2002 are to be used.

  • Any spatial and temporal trends identified should be interpreted in the light of changes in reporting.

  • Despite information governance and technical challenges, linkage between data sets has the greatest potential to provide the richest and best quality data sets for use in research.

Previous studies of birth outcomes in England have primarily used ONS,1 ,2 ,7–12 and HES data have seldom been used for peer reviewed research papers.14–17 ,32 It is unclear why HES data sets have been underused but could be due to concerns over data quality or being a more complicated data set to work with. Differences in how the data are collected may also influence the choice of data set.

While ONS birth registrations have remained consistently high quality, the HES data set had poorer capture and more missing data in earlier years, particularly pre-2002. Completeness has improved considerably and HES currently captures almost all English births in hospitals, although it does not record births outside NHS hospitals (eg, 2.8% of births occurring at home33 or in private hospitals). Moreover, geographical identifiers in all the data sets are based on residential postcode of the mother. ONS and NN4B capture English resident mothers who give birth in Welsh, Scottish or northern Irish hospitals. HES will not capture English mothers who give birth outside England, although numbers of these are likely to be small.

HES data sets remain susceptible to data artefacts due to the nature of their collection and recording processes across many different hospitals. HES data quality is investigated and reported in HES data quality notes. There are various methods for selecting and deduplicating HES maternity records and the choice of method may influence the final data set. The Dr Foster method used in this paper34 is not the only available method.35 The apparent large peak in stillbirths in 2007 was the result of one NHS trust recording 99% of all its delivery episodes as stillborn and was reported in the annual HES data quality note.36 The increase in missing HES data for 2007 is related to cessation of intensive manual data cleaning for 2007–2008.36

While HES deliveries data on a national scale were similar to ONS, we found spatial variations at the small area level. Low rates were observed in the South-East of London caused by under-reporting or lack of reporting of births by several hospitals. One method to deal with any variations in quality in HES data is to focus research or surveillance only on hospitals with high completeness of recording;18 another is to link birth data sets. Linking ONS to HES deliveries would combine the completeness of ONS with additional information from HES. Pilot studies testing linkage between HES and ONS records have found that a high rate of linkage can be achieved.37 However the linkage rate will depend on the years of data investigated, with the most recent pilot studies (2005–2007) able to link between 91–93% of HES deliveries to ONS births.19 ,20 ONS routinely links infant mortality records with births to produce statistics on infant and perinatal mortality38 as well as linking NN4B with births to produce gestation-specific infant mortality statistics.22

While administrative data sets are a rich data source for epidemiological studies, gaining access can be a slow process taking many months. Access to routinely available births data sets that are not publicly available (ie, with sensitive and/or personal information) is only possible with appropriate ethical approval, Health Research Authority Confidentiality Advisory Group governance approval, and data provider approval in place. Researchers will also need to use approved suitably secure facilities, either at their own institution or those provided by ONS or the Administrative Data Service. Changes to legislation and/or data provider changes may introduce further delays to obtaining data; HSCIC updating of data access processes as part of the Health and Social Care act 2012 and issues relating to the introduction of the project have recently resulted in substantial delays. Linkage between HES and ONS data sets is not available routinely and there are additional technical challenges related to record matching and validation. Due to these constraints it is currently more common for researchers to use only one birth data set, therefore reducing the possible data coverage or depth of clinical information.


Routine birth data sets in England provide a valuable resource for epidemiological research on birth outcomes, surveillance of reproductive trends and provision of maternity services. The NN4B data set appears to be a promising addition for years from 2006, as it has the quality and coverage of ONS births but includes gestational age and ethnicity. The HES deliveries data set, currently underused, contains rich clinical information unavailable elsewhere but an appreciation of potential data anomalies is important for researchers. Streamlining data access procedures and routine linkage between these data sets would provide the best use of resources possible and improve use of these data by the research community.


Hospital Episode Statistics data 2014 are reused with the permission of the Health and Social Care Information Centre. The ONS births data used were supplied by the Office for National Statistics (ONS), derived from the national birth registrations.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors MBT conceived the study. KG was responsible for data extraction and KG and DCA were responsible for data cleaning and preparation. DCA and REG carried out the statistical analyses and drafted the initial report. REG carried out the mapping. The analyses were interpreted by REG, DCA, ALH, PE and MBT. All coauthors revised the report and approved the final version. MBT is the guarantor of this paper.

  • Funding The work of the UK Small Area Health Statistics Unit is funded by Public Health England as part of the MRC-PHE Centre for Environment and Health, funded also by the UK Medical Research Council. Grant number: MR/L01341X/1. PE is an NIHR Senior Investigator and is supported by the MRC-PHE Centre for Environment and Health, the Imperial College Healthcare NHS Trust, the NIHR Imperial College Biomedical Research Centre, and the NIHR Health Protection Research Unit on Health Impact of Environmental Hazards.

  • Competing interests PE and ALH report grants from the Medical Research Council & Public Health England, during the conduct of the study.

  • Ethics approval Governance statement SAHSU holds approvals from the National Research Ethics Service—reference 12/LO/0566 and 12/LO/0567—and from the Health Research Authority Confidentiality Advisory Group (HRA-CAG) for Section 251 support (HRA—14/CAG/1039).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No identifiable information will be shared with any other organisation. SAHSU does not have permission to supply data to third parties.