External validation is necessary in prediction research:: A clinical example
Introduction
The performance of regression models used in diagnostic and prognostic prediction research is generally better on the data set on which the model has been constructed (derivation set) compared to the performance of the same model on new data (validation set) [1], [2], [3], [4], [5], [6], [7], [8], [9], especially in small data sets [10], [11]. To address this, several approaches have been suggested to estimate a model's optimism [3], [12], [13], [14], [15], in particular bootstrap resampling techniques. Bootstrapping, crossvalidation, and split-sampling techniques are internal validation techniques, because the performance is estimated using patients from the model's derivation set only [3], [12], [13]. Bootstrapping involves taking a large number of samples with replacement from the original sample. In contrast to crossvalidation or split-sample approaches, bootstrap methods are very efficient, as the entire data set is used for model development, and no new data have to be collected for validation. Moreover, it has been shown that bootstrapping provides nearly unbiased estimates of predictive accuracy that are of relatively low variance [2], [16]. However, only pure sampling variability is considered with bootstrap techniques, and changes in the patient population are not [5].
External validation aims to address the accuracy of a model in patients from a different but plausibly related population, which may be defined as a selected study population representing the underlying disease domain [5], [9], [17]. Most reports evaluating prediction models focus on the issue of internal validity, leaving the important issue of external validity behind. We will illustrate the limitations of internal validation to determine the generalizability of a prediction model. To this aim we use a clinical example from a diagnostic study on the prediction of the presence of a serious bacterial infection in children presenting with fever without apparent source in pediatric Emergency Departments.
Section snippets
Methods
Fever without apparent source is a common diagnostic and therapeutic dilemma in pediatrics. Approximately 10 to 35% of all visits at pediatric Emergency Departments concern febrile children [18], [19], [20], [21], and in 14 to 40% no apparent source is found after history taking and physical examination [19], [22]. The underlying cause of fever varies from mild viral to serious bacterial infections, such as sepsis or meningitis [19]. Bacterial infections are reported in 3 to 15% of febrile
Results
The derivation set was comprised of 376 children with fever without apparent source, and the validation set consisted of 179 children who had been referred for the same reason (three respectively zero patients were excluded because of isolation of Haemophilus influenzae). Except for the variable pale skin, no material differences were found in the distribution of the general characteristics and the predictors between the two sets (Table 1). A serious bacterial infection was present in 20% of
Discussion
The aim of this study was to construct and validate a diagnostic prediction model to distinguish children with and without serious bacterial infections in children referred with fever without apparent source. Selected predictors in the derivation set were age above 1 year, duration of fever, changed crying pattern, nasal discharge, or earache in history, ill clinical appearance, pale skin, chest-wall retractions, crepitations, and signs of pharyngitis or tonsillitis. These results agree with
Acknowledgements
We gratefully acknowledge Wilfried de Jong, and Femke Mineur, medical students, for support in data collection. The Health Care Insurance Counsel of The Netherlands financially supported this project.
References (43)
- et al.
Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis
J Clin Epidemiol
(1999) - et al.
Internal validation of predictive models. Efficiency of some procedures for logistic regression analysis
J Clin Epidemiol
(2001) - et al.
Predictors of occult pneumococcal bacteremia in young febrile children
Ann Emerg Med
(1998) - et al.
Observation, history, and physical examination in diagnosis of serious illnesses in febrile children less than or equal to 24 months
J Pediatr
(1987) - et al.
Identification of infants unlikely to have serious bacterial infection although hospitalized for suspected sepsis
J Pediatr
(1985) - et al.
Regression models for prognostic prediction: advantages, problems, and suggested solutions
Cancer Treat Rep
(1985) - et al.
Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors
Stat Med
(1996) Estimating the error rate of a prediction rule: improvement on cross-validation
J Am Stat Assoc
(1983)- et al.
Predictive value of statistical models
Stat Med
(1990) - et al.
Assessing the generalizability of prognostic information
Ann Intern Med
(1999)
Clinical prediction rules. Applications and methodological standards
N Engl J Med
Redundancy of single diagnostic test evaluation
Epidemiology
Clinical prediction rules. A review and suggested modifications of methodological standards
JAMA
Users' guides to the medical literature: XXII: how to use articles about clinical decision rules. Evidence-Based Medicine Working Group
JAMA
Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets
Stat Med
An introduction to the bootstrap. Monographs on statistics and applied probability
Improvements on cross-validation: the.632+ bootstrap method
J Am Stat Assoc
Data splitting
Am Stat
The use of resampling methods to simplify regression models in medical statistics
Journal of the Royal Statistical Society Series C: Applied Statistics
Prediction rules: statistical reproducibility and clinical similarity
Med Decis Making
Fever
Pediatr Rev
Cited by (549)
Artificial intelligence image-based prediction models in IBD exhibit high risk of bias: A systematic review
2024, Computers in Biology and MedicineMachine and deep learning models for accurate detection of ischemia and scar with myocardial blood flow positron emission tomography imaging
2024, Journal of Nuclear CardiologyEnhanced artificial intelligence-based diagnosis using CBCT with internal denoising: Clinical validation for discrimination of fungal ball, sinusitis, and normal cases in the maxillary sinus
2023, Computer Methods and Programs in BiomedicineDevelopment of a Bispectral index score prediction model based on an interpretable deep learning algorithm
2023, Artificial Intelligence in Medicine