External validation is necessary in prediction research:: A clinical example

https://doi.org/10.1016/S0895-4356(03)00207-5Get rights and content

Abstract

Background and objective

Prediction models tend to perform better on data on which the model was constructed than on new data. This difference in performance is an indication of the optimism in the apparent performance in the derivation set. For internal model validation, bootstrapping methods are recommended to provide biascorrected estimates of model performance. Results are often accepted without sufficient regard to the importance of external validation. This report illustrates the limitations of internal validation to determine generalizability of a diagnostic prediction model to future settings.

Methods

A prediction model for the presence of serious bacterial infections in children with fever without source was derived and validated internally using bootstrap resampling techniques. Subsequently, the model was validated externally.

Results

In the derivation set (n = 376), nine predictors were identified. The apparent area under the receiver operating characteristic curve (95% confidence interval) of the model was 0.83 (0.78–0.87) and 0.76 (0.67–0.85) after bootstrap correction. In the validation set (n = 179) the performance was 0.57 (0.47–0.67).

Conclusion

For relatively small data sets, internal validation of prediction models by bootstrap techniques may not be sufficient and indicative for the model's performance in future patients. External validation is essential before implementing prediction models in clinical practice.

Introduction

The performance of regression models used in diagnostic and prognostic prediction research is generally better on the data set on which the model has been constructed (derivation set) compared to the performance of the same model on new data (validation set) [1], [2], [3], [4], [5], [6], [7], [8], [9], especially in small data sets [10], [11]. To address this, several approaches have been suggested to estimate a model's optimism [3], [12], [13], [14], [15], in particular bootstrap resampling techniques. Bootstrapping, crossvalidation, and split-sampling techniques are internal validation techniques, because the performance is estimated using patients from the model's derivation set only [3], [12], [13]. Bootstrapping involves taking a large number of samples with replacement from the original sample. In contrast to crossvalidation or split-sample approaches, bootstrap methods are very efficient, as the entire data set is used for model development, and no new data have to be collected for validation. Moreover, it has been shown that bootstrapping provides nearly unbiased estimates of predictive accuracy that are of relatively low variance [2], [16]. However, only pure sampling variability is considered with bootstrap techniques, and changes in the patient population are not [5].

External validation aims to address the accuracy of a model in patients from a different but plausibly related population, which may be defined as a selected study population representing the underlying disease domain [5], [9], [17]. Most reports evaluating prediction models focus on the issue of internal validity, leaving the important issue of external validity behind. We will illustrate the limitations of internal validation to determine the generalizability of a prediction model. To this aim we use a clinical example from a diagnostic study on the prediction of the presence of a serious bacterial infection in children presenting with fever without apparent source in pediatric Emergency Departments.

Section snippets

Methods

Fever without apparent source is a common diagnostic and therapeutic dilemma in pediatrics. Approximately 10 to 35% of all visits at pediatric Emergency Departments concern febrile children [18], [19], [20], [21], and in 14 to 40% no apparent source is found after history taking and physical examination [19], [22]. The underlying cause of fever varies from mild viral to serious bacterial infections, such as sepsis or meningitis [19]. Bacterial infections are reported in 3 to 15% of febrile

Results

The derivation set was comprised of 376 children with fever without apparent source, and the validation set consisted of 179 children who had been referred for the same reason (three respectively zero patients were excluded because of isolation of Haemophilus influenzae). Except for the variable pale skin, no material differences were found in the distribution of the general characteristics and the predictors between the two sets (Table 1). A serious bacterial infection was present in 20% of

Discussion

The aim of this study was to construct and validate a diagnostic prediction model to distinguish children with and without serious bacterial infections in children referred with fever without apparent source. Selected predictors in the derivation set were age above 1 year, duration of fever, changed crying pattern, nasal discharge, or earache in history, ill clinical appearance, pale skin, chest-wall retractions, crepitations, and signs of pharyngitis or tonsillitis. These results agree with

Acknowledgements

We gratefully acknowledge Wilfried de Jong, and Femke Mineur, medical students, for support in data collection. The Health Care Insurance Counsel of The Netherlands financially supported this project.

References (43)

  • J.H Wasson et al.

    Clinical prediction rules. Applications and methodological standards

    N Engl J Med

    (1985)
  • K.G Moons et al.

    Redundancy of single diagnostic test evaluation

    Epidemiology

    (1999)
  • A Laupacis et al.

    Clinical prediction rules. A review and suggested modifications of methodological standards

    JAMA

    (1997)
  • T.G McGinn et al.

    Users' guides to the medical literature: XXII: how to use articles about clinical decision rules. Evidence-Based Medicine Working Group

    JAMA

    (2000)
  • E.W Steyerberg et al.

    Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets

    Stat Med

    (2000)
  • B Efron et al.

    An introduction to the bootstrap. Monographs on statistics and applied probability

    (1993)
  • B Efron et al.

    Improvements on cross-validation: the.632+ bootstrap method

    J Am Stat Assoc

    (1997)
  • R.R Picard et al.

    Data splitting

    Am Stat

    (1990)
  • W Sauerbrei

    The use of resampling methods to simplify regression models in medical statistics

    Journal of the Royal Statistical Society Series C: Applied Statistics

    (1999)
  • J.A Knottnerus

    Prediction rules: statistical reproducibility and clinical similarity

    Med Decis Making

    (1992)
  • P.L McCarthy

    Fever

    Pediatr Rev

    (1998)
  • Cited by (549)

    View all citing articles on Scopus
    View full text