Article Text

Download PDFPDF

Early cortical maturation predicts neurodevelopment in very preterm infants
  1. Julia E Kline1,
  2. Venkata Sita Priyanka Illapani1,
  3. Lili He1,2,
  4. Mekibib Altaye2,3,
  5. John Wells Logan4,
  6. Nehal A Parikh1,2
  1. 1 Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
  2. 2 Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
  3. 3 Division of Biostatistics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
  4. 4 Department of Pediatrics, Nationwide Children's Hospital, Columbus, Ohio, USA
  1. Correspondence to Dr Nehal A Parikh, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; nehal.parikh{at}


Objective To evaluate the ability of four objectively defined, cortical maturation features—surface area, gyrification index, sulcal depth and curvature—from structural MRI at term-equivalent age (TEA) to independently predict cognitive and language development at 2 years corrected age in very preterm (VPT) infants.

Design Population-based, prospective cohort study. Structural brain MRI was performed at term, between 40 and 44 weeks postmenstrual age and processed using the developing Human Connectome Project pipeline.

Setting Multicentre study comprising four regional level III neonatal intensive care units in the Columbus, Ohio region.

Patients 110 VPT infants (gestational age (GA) ≤ 31 weeks).

Main outcome measures Cognitive and language scores at 2 years corrected age on the Bayley Scales of Infant and Toddler Development, Third Edition.

Results Of the 94 VPT infants with high-quality T2-weighted MRI scans, 75 infants (80%) returned for Bayley-III testing. Cortical surface area was positively correlated with cognitive and language scores in nearly every brain region. Curvature of the inner cortex was negatively correlated with Bayley scores in the frontal, parietal and temporal lobes. In multivariable regression models, adjusting for GA, sex, socioeconomic status, and injury score on MRI, regional measures of surface area and curvature independently explained more than one-third of the variance in cognitive and language scores at 2 years corrected age in our cohort.

Conclusions We identified increased cortical curvature at TEA as a new prognostic biomarker of adverse neurodevelopment in very premature infants. When combined with cortical surface area, it enhanced prediction of cognitive and language development. Larger studies are needed to externally validate our findings.

  • neonatology
  • neurodisability
  • neurodevelopment
  • neurology
  • outcomes research
View Full Text

Statistics from

What is already known on this topic?

  • Preterm infants are at risk for neurodevelopmental impairments, including cognitive and language deficits; however, biomarkers of domain-specific impairment around the time of birth are limited.

  • Cortical maturational features including surface area, sulcal depth, gyrification index and curvature are altered in preterm infants and may predict neurodevelopmental outcomes.

  • Surface area and sulcal depth are promising predictors of cognitive ability, but no studies have examined the predictive power of curvature or these features combined.

What this study adds?

  • We identified several regional maturational measures in very preterm infants to be significantly correlated with Bayley-III cognitive and language scores at 2 years corrected age.

  • Using combinations of these metrics, regional curvature and surface area measures remained independently predictive and explained more than one-third of the variance in Bayley-III scores.

  • These cortical metrics are promising biomarkers of later impairment and may help facilitate accurate early risk stratification to design targeted neuroprotective trials when neuroplasticity is maximal.


Across the globe, approximately 11% of pregnancies end in preterm birth.1 Today, with advances in neonatology, most of the roughly 15 million preterm infants born each year survive. Very preterm (VPT) birth ( ≤ 31 weeks gestation) is a critical risk factor for neurodevelopmental impairments (NDI), including cognitive and language deficits. It currently takes 2–5 years after birth to diagnose such deficits, and no accurate models exist that predict domain-specific NDI using biomarkers around the time of birth. Cranial ultrasound remains the clinical standard with structural MRI being increasingly utilised, but both exhibit low sensitivity and positive predictive value for later cognitive and language deficits in VPT infants.2–5 Brain volumes and diffusion MRI measures are promising prognostic biomarkers at term-equivalent age (TEA); however, these require independent validation.6 Early prognostic biomarkers of NDI are urgently needed to allow clinicians and researchers to provide domain-specific interventions when neuroplasticity is at its peak.

On structural MRI at TEA, VPT infants present a constellation of cerebral cortical maturational abnormalities compared with term infants. They exhibit decreased cortical surface area,7–13 decreased cortical folding measures—gyrification index 11 14 15 and sulcal depth 11 12 14—and increased curvature of the white matter surface.13 Cortical surface area7 8 10 and sulcal depth16 measured around TEA have shown promise in predicting cognitive outcomes. However, no studies have examined these biomarkers in combination or examined the prognostic value of white matter/inner cortical surface curvature. The main aim of this study was to evaluate the ability of objectively defined, cortical features on brain MRI at TEA to independently predict cognitive and language development at 2 years corrected age in VPT infants.



We prospectively enrolled a consecutive sample of 110 VPT infants from four regional level III neonatal intensive care units (NICUs)—Nationwide Children’s Hospital, Ohio State University Medical Centre, Riverside Hospital, and Mount Carmel St. Ann’s Hospital—between December 2014 and April 2016. Inclusion criteria included a gestational age (GA) of 31 weeks or less. Exclusion criteria included congenital or chromosomal anomalies of the brain, spine or heart. Infants hospitalised at Ohio State, Riverside, or St Ann’s NICUs at more than 44 weeks postmenstrual age (PMA) were also excluded. More than 95% of infants that required hospitalisation beyond 40 weeks PMA at these birthing hospitals were transferred to Nationwide Children’s Hospital, which allowed us to recruit them before 44 weeks PMA. The Nationwide Children’s Hospital Institutional Review Board approved the study. We obtained written informed consent from a parent or guardian of each infant, after they were given sufficient time to elect to participate.

Imaging methods

We acquired MRI data using a 3T Siemens Skyra scanner at Nationwide Children’s Hospital. The VPT subjects were imaged at a mean (SD) PMA of 40.4 (0.6) weeks under the supervision of a skilled neonatal nurse and neonatologist. The imaging occurred during the inpatient period for most subjects from Nationwide Children’s and the outpatient period for the other three NICUs. Imaging was performed as follows: the infant was fed 30 min prior, and silicone earplugs (Instaputty, E.A.R., Boulder, CO, USA), a blanket and a vacuum immobilisation device (MedVac, CFI Medical Solutions, Fenton, MI, USA) promoted natural sleep. The following acquisition parameters were used: axial T2-weighted image: echo time 147 ms, repetition time 9500 ms, flip angle 150°, voxel dimensions 0.93×0.93×1.0 mm3, scan time 4:09 min; three-dimensional magnetisation-prepared rapid gradient echo: echo time 2.9, repetition time 2270 ms, echo spacing time 8.5 ms, flip angle 13°, voxel dimensions 1.0×1.0×1.0 mm3, scan time 3:32 min.

Cognitive and language testing

The Bayley Scales of Infant and Toddler Development, Third Edition (Bayley-III)17 cognitive and language subscale scores were our two main outcomes. The cognitive subscale examines global mental function based on memory, object manipulation and problem solving. The language subscale assesses language understanding and vocabulary development. Both scales have a mean (SD) score of 100 (15) and range from 40 to 160. All subjects were scheduled for testing between 22 and 26 months corrected age. Four infants did not return during their appointment window but were tested between 33 and 36 months corrected age. For them, we used these later age-corrected Bayley scores. Bayley testers were blinded to the cortical surface metric results but not the structural neuroimaging results.

MRI data processing

We used the developing Human Connectome Pipeline (dHCP)18 to post-process the T2-weighted MRI scans. The dHCP performs cortical and subcortical segmentation automatically. It calculates values for cortical metrics: surface area, sulcal depth, gyrification index, inner cortical curvature and thickness for every Gousias atlas19 region and also for the entire brain. We rejected segmented scans with poor tissue identification, improper segmentation or artefacts. Subjects with moderate or severe ventriculomegaly were also excluded, as large ventricles interfered with tissue classification.

Statistical analysis

The dHCP pipeline generates 244 total cortical surface metrics. Hence, significant variable reduction was necessary to create parsimonious models that predicted each Bayley-III outcome. We performed a Pearson correlation analysis between all regional cortical metrics and 2-year Bayley scores. Any cortical metric correlated with Bayley scores at p<0.05 in bivariable analyses was retained for multivariable linear regression analysis. Cortical thickness was not correlated with cognitive score in any region and was only correlated with language score in two regions, so we elected not to examine it further. All variables (surface area, gyrification index, sulcal depth and inner cortical curvature) that were significant in bivariable analysis were entered into a multivariable regression model using backward, stepwise selection and were only retained if significance was maintained at p<0.05. We took care to avoid multicollinearity by examining the variance inflation factor of our models and ensured that the models met all other assumptions of linear regression, including normality, additivity, no-autocorrelation, and homoscedasticity.20

We internally validated our final models with leave-one-out cross-validation, in which we trained the model on n–1 data points and fit the model on the held-out data points. We repeated this process n times (n=75) and used the residuals to calculate our final adjusted R2. To assess the independent predictive ability of our cortical biomarkers, we also tested the inclusion of known predictors of cognitive and language outcomes: sex, GA, global injury score on structural MRI15 21 and socioeconomic status (SES). Additionally, we tested the inclusion of two potential confounders, birth hospital and PMA at MRI, but retained them only if significant.

To compare our final cohort to the 20% lost to follow-up, we used Fischer’s exact test for binary variables and Student’s t-test or Shapiro Wilk’s test for continuous variables. We examined the p values to identify between-group differences. All analyses were performed in STATA 15.1 (Stata Corp, College Station, TX, USA), except the leave-one-out analysis and the lost to follow-up analysis, which were performed in python.


Of the 110 eligible VPT infants, 94 had high-quality T2-weighted MRI scans that were successfully processed with the dHCP pipeline. Of the excluded cases, 10 were due to severe ventriculomegaly or severe brain injury, four were due to severe artefact, one was missing a T2 image altogether and one was missing cortical boundaries. Of these 94 infants, 75 (80%) returned for follow-up testing and were included in our analyses. Baseline characteristics for the mothers and infants with and without follow-up were comparable (table 1).

Table 1

Baseline characteristics of the final very preterm cohort with follow-up data and also those infants lost to follow-up

Our preterm cohort had a mean (SD) Bayley cognitive score of 97.5 (14.8) and Bayley language score of 94.6 (17.0) at a mean (SD) follow-up age of 24.7 (2.8) months. In bivariable analyses, increased surface area in nearly every brain region was positively correlated with Bayley cognitive (figure 1) and language scores (figure 2). Increased inner cortical curvature was negatively correlated with both outcomes in the temporal and parietal lobes. Increased inner cortical curvature was also negatively correlated with cognitive scores in the frontal lobe. Although gyrification index and sulcal depth did not follow consistent trends, a few regions were correlated with each Bayley outcome.

Figure 1

Pearson correlations with Bayley-III cognitive score are displayed for surface area (top left), curvature (top right), sulcal depth (bottom left) and gyrification index (bottom right) on a representative subject brain. The magnitude of correlation is displayed only for regions that were significant at p<0.05. The colour bar represents the magnitude of the Pearson correlation (R value).

Figure 2

Pearson correlations with Bayley-III language score are displayed for surface area (top left), curvature (top right), sulcal depth (bottom left) and gyrification index (bottom right) on a representative subject brain. The magnitude of correlation is displayed only for regions that were significant at p<0.05. The colour bar represents the magnitude of the Pearson correlation (R value).

The most robust predictive model for cognitive score included surface area of the parietal lobe (R), curvature of the temporal lobe (L) and curvature of the posterior cingulate gyrus (L) (table 2). The most robust predictive model for language score included surface area of the parietal lobe (L), curvature of the insula (L) and curvature of the superior temporal gyrus (L). These models explained approximately one-third of the variance in Bayley-III cognitive and language scores in our VPT cohort. A leave-one-out cross validation on these cortical-metric-only models resulted in a cross-validated adjusted R2 of 0.26 for the cognitive model and 0.28 for the language model. All cortical metrics retained their significance in the final models after known predictors: sex, GA, SES and global injury score on structural MRI were added. Birth hospital and PMA at MRI were not significant and were removed from the final models. Figure 3 shows the regression scatter plots of the measured Bayley-III scores versus the scores predicted by our final models.

Figure 3

Measured Bayley-III scores versus predicted Bayley-III scores (left: cognitive model, right: language model). The predicted scores were calculated using our final fitted models, which included sex, gestational age, global injury score on structural MRI and maternal socioeconomic status as covariates.

Table 2

Cortical surface maturation metrics at term-equivalent age from structural MRI with and without adjustment for known clinical predictors and prediction of Bayley-III cognitive and language scores at 2 years corrected age in very preterm infants


In our multicentre cohort of VPT infants, cortical maturation metrics at TEA, especially selected metrics of surface area and inner cortical curvature, were independent biomarkers of future intellectual and linguistic ability. Cortical surface area in almost every region of the VPT cortex was positively correlated with cognitive and language scores, which validates and extends previous published findings.7 8 10 Curvature was negatively correlated with cognitive and language scores for large swaths of the brain, and this metric explained a greater amount of variability in Bayley scores than cortical surface area. This is a novel finding, as curvature of the inner cortical surface and outer white matter has not previously been evaluated as a prognostic biomarker. Surface area and inner cortical curvature metrics together explained approximately one-third of the variance in Bayley-III scores at 2 years corrected age in our VPT infants. These metrics retained their significance in our models after internal validation and adjustment for known clinical predictors, illustrating their independent predictive power.

Decreased cortical surface area is a well-established consequence of prematurity.7–13 In this cohort, surface area of nearly every region of the preterm brain correlated positively with cognitive ability, corroborating prior studies.7–10 Rathbone and colleagues assessed overall cortical surface area at TEA in a preterm cohort8 and found that a 5%–11% reduction equated to a one SD drop in cognitive ability assessed at 2 and 6 years of age. In our cognitive model, a 29% decrease in right parietal lobe surface area equated to a one SD drop in cognitive score at age two. Kapellou and colleagues developed a ‘scaling exponent’ that describes the amount of cortical surface area relative to cerebral volume in extremely preterm infants around TEA. They found that this exponent was correlated with cognitive ability around age 2 years.7 In our cohort, surface area was also positively correlated with language ability in most brain regions, which is a new finding. Surface area of the parietal lobe was an independent predictor in both models, with the right being most important for the cognitive model and the left being most important for the language model. Sripada et al also found that reduced left parieto-occipital surface area was associated with lower cognitive scores at early school age in very low birthweight children.10

Increased curvature of the inner cortical surface and outer white matter surface has been previously documented in preterm infants13; however, it has not been assessed as a prognostic biomarker at TEA. We found that inner cortical curvature of regions of the frontal, parietal and temporal lobes was negatively correlated with Bayley outcomes. Curvature may be a highly predictive metric because it is related to the nature of the abnormality in preterm infants, as has been suggested by Shimony and colleagues.14 Preterm infants tend to have shallower sulci11 12 14 than term infants, and these sulci retain high-curvature troughs but lack low-curvature sulcal wall area, leading to higher overall curvature. Therefore, inner cortical curvature of entire lobes may generalise the predictive power of altered cortical folding in preterm infants. Our two final models each included inner cortical curvature of one whole lobe (left temporal for cognitive and left insula for language) and inner cortical curvature of one gyral region (left posterior cingulate gyrus for cognitive and left superior temporal gyrus for language). For the cognitive model, the significance of the left temporal lobe is unsurprising given that this lobe, particularly the medial portion, is prominently involved in working memory.22 23 Likewise, the posterior cingulate is involved in myriad cognitive functions, including episodic memory retrieval and spatial memory.24 25 For the language model, inclusion of left insular curvature likely reflects the insula’s role in the motor aspects of speech production.26 The superior temporal gyrus includes Wernicke’s area, a major nexus of language comprehension.27 Owing to the inherent collinearity between regional maturation metrics, other comparably accurate models could have been created using inner cortical curvature values from many combinations of regions. The same is true for surface area, as this metric is decreased in prematurity7–13 and positively correlated with Bayley-III outcomes in almost every region tested.

Sulcal depth and gyrification index of a few regions were correlated with Bayley outcomes. However, unlike surface area and inner cortical curvature, the correlations were less widespread and the direction of correlation was not consistent. Given the large number of areas studied, some regional correlations were likely spurious. Morphology of sulci and gyri is complex and highly variable between individuals,16 28 29 which may contribute to its lack of predictive power for NDI. Only a few published studies have shown any degree of correlation between gyrification15 or sulcification16 and neurodevelopmental outcomes. Dubois et al 16 measured both sulcification and neurodevelopment at TEA, which does not capture the range of abilities that the Bayley-III measures in early childhood. Severe delay in visually diagnosed gyrification development was predictive of NDI in a high-risk cohort of extremely low birthweight infants.15 However, milder delays in objectively diagnosed gyrification development was not highly predictive in our cohort of lower risk VPT infants.

A significant limitation of this study was the modest sample size and 20% loss to follow-up. However, our sample was drawn from a geographically defined region, making it more generalisable, and infants lost to follow-up did not differ significantly from those that returned for testing, thus minimising any bias secondary to the 20% attrition. Nevertheless, it will be important to validate our findings in a larger cohort. Finally, while the dHCP pipeline represents a substantial improvement over manual segmentation, the author’s report a 2% rate of significant error in the results.18 Strengths of our study include the objective assessments of cortical maturation metrics, adjustment for known predictors of NDI, internal validation of our models and the high degree of variability in outcomes explained by our models despite the lack of infants with severe injury. Milder perinatal brain injuries/abnormalities are far more common, and improvements in prediction of NDI are most needed for these infants.

In conclusion, increased inner cortical curvature at TEA is a new biomarker of NDI in VPT infants. Curvature of many regions of the VPT brain correlates negatively with cognitive and language scores at 2 years corrected age. Combining this new potential prognostic biomarker with a known biomarker, cortical surface area, we created models that predict a substantial amount of the variance in cognitive and language ability at 2 years corrected age in VPT infants. We plan to repeat this analysis with longer-term follow-up data, to validate our biomarkers over time. Additionally, in further work we will combine these cortical biomarkers with other promising predictors, including brain volumes and diffusion MRI measures, to develop a more complete model of domain-specific impairment, which may facilitate accurate risk stratification during the most optimal developmental window for neuroplasticity.


National Institutes of Health grants R01-NS094200 and R01-NS096037 supported this study. We thank Jennifer Notestine and Valerie Marburger for study coordination, Josh Goldberg for recruitment assistance, Jonathan Dudley for coding help, Mark Smith for serving as our lead MR technologist, and the families and NICU personnel that made this study possible.


View Abstract


  • Contributors NP, LH, MA and JWL designed this study. NP, LH and JWL collected the data. JK, MA, VSPI and NP analysed the data. JK wrote the manuscript. JK, VSPI, LH, MA, JWL and NAP reviewed and edited the manuscript. All authors contributed to the interpretation of the data, the drafting and revision of this manuscript and the final approval of the version published. All authors agree to be accountable for all aspects of the work.

  • Funding This work was supported by National Institutes of Health grants R01-NS094200 and R01-NS096037.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available upon reasonable request.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles

  • Editorial
    A David Edwards
  • Fantoms
    Ben J Stenson