Statistics from Altmetric.com
Disagreements exist between research evidence and clinical practice,1–3 and disagreement may lead to inappropriate use of interventions not supported by research evidence or limited use of interventions that may be beneficial. The Cochrane Database of Systematic Reviews of mainly randomised trials is an initiative that tries to facilitate implementation of research evidence into clinical practice.4 Clinicians need to decide how applicable the evidence is to their setting and which evidence to implement in local guidelines. The Cochrane Library 2004, Issue 2, included 1999 reviews from 51 collaborative review groups.5 The Cochrane Neonatal Review Group had conducted 170 (9%) of these reviews.6
In Denmark, and many other countries, national neonatal guidelines cover only a few interventions. Accordingly, the majority of neonatal departments have developed local guidelines for several interventions. The extent to which Cochrane reviews are used and are in agreement with clinical practice guidelines is little known and some studies have found the agreement to be disappointingly low.3 7–11
In Denmark there are currently three neonatal intensive care units (NICU) and 14 neonatal subunits in paediatric departments with approximately 300 cots (table 1). Every year, about 7000 (10%) newborns are hospitalised with every second hospitalisation being due to premature delivery.12 The rates of preterm delivery have been increasing, currently 5–6% of all births.13 The rate of survival of preterm infants in Denmark is about 30% when gestational age is 24 weeks, increasing to 80% when gestational age is 28 weeks.14
The objective of our study was to assess the agreement between Cochrane Neonatal Group reviews and guidelines of all Danish neonatal departments. We also aimed to assess the reasons for any disagreements, whether reviews were considered during guideline development and heterogeneity between the guidelines covered by the Cochrane neonatal reviews. Danish doctors have had free access to the Cochrane Library since 1999.
We included only guidelines on interventions that had been evaluated in a Cochrane Neonatal Group review (Cochrane Library, Issue 2, 2004). All the eligible Cochrane reviews were published or updated before May 2004. We used a six-point scale to classify the treatment recommendations in the reviews (table 2).7 We compared the recommendations in the reviews with the clinical guidelines of the included departments. Reviews and guidelines were classified as being in:
review and guideline recommend intervention;
review lacks evidence (scores 3–4, table 2) or has evidence to refute intervention (scores 1–2) and guideline does not recommend or address intervention.
intervention with borderline evidence to use (score 5) and guideline either recommends or does not recommend intervention.
review recommends intervention (score 6), but guideline does not;
review lacks evidence (scores 3–4) or has evidence to refute intervention (scores 1–2) and guideline recommends intervention.
To adjust for misinterpretation of guidelines, all classifications made as agreement, partial agreement and disagreement were reviewed and confirmed by a specialist from the local neonatal department. We audited guidelines from November 2004 to April 2006. This gave a time period of at least 6 months to locally implement the review evidence.
We searched the guidelines for references to the pertinent Cochrane review. If no references were identified, we sent a questionnaire to the author of the guideline asking whether the pertinent Cochrane review had been considered during the development of the guideline.7 In cases of disagreement between a review and a guideline, the guideline authors were asked what they had considered as evidence base of the guideline.
We explored for significant heterogeneity between guideline recommendations in the different departments.
For each department we determined the number of agreements, partial agreements and disagreements.7 The overall level of these three categories are presented as medians with upper and lower ranges. We calculated weighted κ (3×2 tables, partial agreement weighted 0.5) to assess the agreement beyond chance.15 As a sensitivity analysis we used an unweighted kappa with only agreement and disagreement in a 2×2 table. We calculated κ for medians, upper ranges, and lower ranges. κ above 0.80 represents excellent agreement, 0.61–0.80 represents substantial agreement, 0.41–0.60 represents good agreement, 0.21–0.40 represents slight agreement, and below 0.20 represents poor agreement.15
In 2004, there were 17 neonatal departments in Denmark, all admitting preterm and term infants with diseases or other conditions requiring hospitalisation. All departments agreed to participate in this study, but three departments (all neonatal subunits of paediatric departments) failed to provide guideline information and were excluded. The remaining 14 departments (3 neonatal intensive care units and 11 neonatal subunits of paediatric departments) represented all regions of Denmark and were all involved in undergraduate and postgraduate training (table 1). These departments covered 253 of a total of 295 cots (86%) and 95% of all severely ill infants who needed long-term hospitalisation.
Agreement between Cochrane reviews and clinical guidelines
In all, 173 interventions were assessed in the Cochrane reviews (table 2). A median number of 22 (range 11–69) clinical guidelines from the 14 neonatal department were available on the internet, intranet sites or as written guidelines. Of these, we included a median of 12 (range 5–36) guidelines from each department, which contained one or more interventions assessed by Cochrane reviews.
The median number of agreements between treatment recommendations in reviews and each department’s guidelines was 132/173 (76%) interventions (range 129–134). Of these, a median of 19/132 (14%) interventions were recommended as “treatment of choice” in both the reviews and guidelines. A median of 113/132 (86%) interventions was neither recommended in the reviews or in the guidelines (table 3). The median number of partial agreements was 31/173 (18%) interventions (range 29–33).
The median number of disagreements between reviews and each department’s guidelines was 10/173 (6%) interventions (range 8–13) (table 3). Of these, most interventions lacked evidence in the reviews (not recommended), but were recommended in the guidelines (table 4). A few interventions were recommended in the reviews, but not in the guidelines (table 4). All reviews in disagreement with guidelines were published at least 1 year (2003) before the present study.
The weighted κ was 0.56 (range 0.53–0.59). The unweighted kappa was 0.75 (range 0.67–0.80).
Reasons for disagreements between Cochrane reviews and clinical guidelines
The reported reasons for recommending interventions that lacked evidence according to the Cochrane reviews (score 3–4, table 2) were use of other evidence sources than reviews: single studies (both non-randomised and randomised); textbook recommendations; expert opinion; clinical experience; consensus statements; basic immunology and pathophysiological knowledge; and evidence based on intervention effects on surrogate markers or without risk of adverse event (“nothing to lose”).
The reasons for not recommending interventions that were evidence based according to the Cochrane reviews (score 6, table 2) were: unawareness of the review; local consensus (bad habit); reservations about the external validity of the review (ie, locally the basic treatment or the infants’ risks differed substantially from the infants in the reviews); use of evidence from single studies; disagreement with the interpretation of review; easier to administer alternative intervention; and economic constraints.
Use of Cochrane reviews for guideline development
In general one or two neonatologists were developing and updating clinical guidelines in each department. One department reported use of a standardised method to construct or update guidelines.
We searched the guidelines for references to Cochrane reviews and asked the guidelines authors whether the review was considered for the guideline development. The search and the feedback showed that the pertinent Cochrane reviews were used in 10% (median, range 0–36%) of the guidelines.
Heterogeneity among department guidelines
We identified nine interventions (or conditions), assessed in Cochrane reviews, with heterogeneity among the guidelines recommendations (ie, at least two departments were using different treatment regimen) (table 5).
What is already known on this topic
Cochrane reviews with meta-analysis of randomised trials are the gold standard for intervention comparisons.
There are gaps between research evidence and clinical practice.
What this study adds
Evidence for good agreement between Cochrane neonatal reviews and neonatal guidelines in Denmark.
Cochrane reviews are rarely used when developing local guidelines.
Our study had several findings. We found agreement between three-quarters of Cochrane neonatal reviews and clinical guidelines from neonatal departments in Denmark. However, Cochrane reviews were used for guideline development in only 1 of 10 guidelines. The reasons for the few disagreements between reviews and guidelines were diverse. Heterogeneity among the department guidelines was observed in 5% of the guidelines. Only one department had standard procedures for developing clinical guidelines.
Strengths and limitations of the study
We included all the 170 reviews from the Cochrane Neonatal Review Group in 2004.5 6 These reviews cover commonly and less commonly used interventions within neonatology. We included 14/17 neonatal departments in Denmark, which covered more than 80% of all cots and more than 90% of all severely ill infants who needed long-term hospitalisation.12 This ensured a reasonable national overview and we have not identified any similar extensive national evaluation of guidelines. We received feedback from the included departments, which reconfirmed our assessment of guidelines. Hence, misinterpretations of local guidelines seem minimal.
Guidelines are only surrogate measures of clinical practice, and guidelines and clinical practice may differ. To assess potential discrepancies, evaluation of actual treatment of patients or evaluation of records is required. Furthermore, three departments did not provide data. This may have introduced some bias. It seems unlikely that these departments differ substantially from those that participated in the study because Denmark is a relative small country, neonatologists circulate between the departments, and minor departments often follow major departments’ guidelines. However, we cannot rule out that the excluded departments could have significantly influenced the κ obtained in this study. Finally, we allowed a gap of at least 6 months to implement review evidence in guidelines. Some departments were in the process of updating guidelines. If such amendments were to be in concordance with the review evidence, our study might have overestimated the number of disagreements.
We found good agreement between evidence in Cochrane neonatal reviews and clinical guidelines for newborns in Denmark. This contrasts with findings from other fields of medicine that show larger gaps between research evidence and practice.1–3 The observed agreement may reflect the extensive evidence-based practice within neonatal care due to the high number of and the early focus on systematic neonatal reviews.6 36 37 Surprisingly, we found that authors of guidelines rarely directly used Cochrane reviews but often used other sources of evidence or evidence from lower levels of the “evidence hierarchy”. Such evidence is more likely to be biased (eg, single randomised trials, observational cohort or case–control studies, textbooks and expert opinion). However, that only 10% of guidelines built directly on evidence from Cochrane reviews may be an underestimate. We have no chance to estimate the indirect influence of reviews on guidelines via secondary sources.
There are probably diverse reasons why clinicians do not consider reviews more directly during guideline development. It may be that they are not using systematic and explicit methods to develop guidelines. It may include lack of time (and skills) to critically assess the sometimes overwhelming amount of information in Cochrane reviews.38 Reasons for clinicians not adhering to guidelines include: unawareness or lack of familiarity with the intervention, disagreement, lack of self-efficacy (the belief that one cannot master an intervention procedure), inertia of previous practice, time constraints and other external barriers.39 Similar barriers may exist for implementation of Cochrane reviews in guidelines.
Most disagreements were because of guidelines recommending interventions that lacked evidence in reviews rather than a failure to recommend interventions supported by reviews. This may illustrate the current societal opinion: a preference for overuse of potentially beneficial treatment, although it is sometimes more right to withhold treatment.40
Despite reviews providing valid evidence, doctors must decide how applicable the evidence is to their setting (external validity). Some disagreements (eg, not using beneficial interventions) were due to lack of external validity—that is, basic care of preterms in Denmark differs substantially from that in the studies in the reviews.30 31 Other reasons for disagreements were that the beneficial interventions were difficult to administer or too expensive (eg, prophylactic surfactant).31 In some cases the guideline author disagreed with the classification of the intervention being beneficial (eg, indometacin32 and vitamin A33) or not being beneficial (eg, opiates for abstinence25) after having read the reviews. These reasons could have caused differences between departmental guidelines.
Only one department reported use of standard procedures for developing or updating clinical guidelines. Many organisations have recognised the need to follow validated guidelines (eg, Appraisal of Guidelines Research and Evaluation (AGREE)) to ensure adequate clinical guidelines.41 Several methods have been proposed on how to develop guidelines and incorporate systematic reviews.41 42 Additional research on how to make authors of guidelines implement such methods seems warranted.
Limitations of Cochrane reviews
Cochrane reviews are considered the gold standard for intervention comparisons,43 but have limitations and the internal validity varies.44 Limitations include pooling of heterogeneous and biased trials in meta-analyses, exclusion of non-randomised studies that may contain important information, inadequate update and the risk that reviews could be data driven as they are retrospectively conducted.45–47 The internal validity depends on how the Cochrane authors handle random errors (eg, through trial sequential analysis),45 46 heterogeneity, missing data and statistical tests that evaluate or adjust for systematic errors (bias) (eg, funnel plot, sensitivity, subgroup and meta-regression analyses). Lack of internal validity may lead to wrong conclusions in reviews. However, no guideline author in this study questioned the internal validity of the reviews.
Use of interventions that lacked evidence in reviews
Occasionally interventions studied in non-randomised trials have such a convincing effect (eg, insulin for diabetic coma47) that they can be implemented without assessment in randomised trials. The decision when an intervention effect is clearly convincing is not clearcut and many interventions in this grey area are labelled “promising”. The appropriate trade-off between a delay in implementing a “promising” treatment due to further research in randomised trials or the risk of implementing a treatment without benefit is difficult and depends on the conditions. When effective treatments are available, the condition is less serious, or the new intervention has appreciable toxicity or costs, it is advisable to wait for stronger evidence. When an intervention is the only treatment for a serious condition and has minimal adverse effects and cost, early recommendation may be appropriate. The latter may be the reason for neonatal guidelines to recommend interventions that lack evidence according to Cochrane reviews—for eample, epinephrine for cardiac arrest or extremely bradycardic infants.20
Implications of this study
Our study presents a method to critically audit guidelines, which may be applicable to all specialties. By considering all published reviews, it is subsequently easy to cope with updated and new reviews appearing in the Cochrane Library, and clinicians would avoid important disagreements between practice and research evidence in reviews. Such initiatives might minimise the gaps between research evidence and clinical practice.
We thank N C Christensen, L H Rasmussen, R Monrad, K H Johansen and C Grytter for providing data from their neonatal department. We appreciate the helpful comments and suggestions on the manuscript from D Nikolova, C B Christensen, L L Gluud and J Wetterslev.
Competing interests: None.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.