Article Text
Abstract
This paper critically examines ‘kitchen sink regression’, a practice characterised by the manual or automated selection of variables for a multivariable regression model based on p values or model-based information criteria. We highlight the pitfalls of this method, using examples from perinatal/neonatal medicine, and propose more robust alternatives. The concept of directed acyclic graphs (DAGs) is introduced as a tool for describing and analysing causal relationships. We highlight five key issues with ‘kitchen sink regression’: (1) the disregard for the directionality of variable relationships, (2) the lack of a meaningful causal interpretation of effect estimates from these models, (3) the inflated alpha error rate due to multiple testing, (4) the risk of overfitting and model instability and (5) the disregard for content expertise in model building. We advocate for the use of DAGs to guide variable selection for models that aim to examine associations between a putative risk factor and an outcome and emphasise the need for a more thoughtful and informed use of regression models in medical research.
- Statistics
- Epidemiology
Statistics from Altmetric.com
Footnotes
Contributors SK, MMB and SS conceived the idea for the manuscript. SK wrote the initial manuscript draft, reviewed and revised the manuscript and approved the final manuscript as submitted. MMB and SS reviewed and revised the manuscript, and approved the final manuscript as submitted.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; internally peer reviewed.
Linked Articles
- Highlights from this issue