4.5 Article

Count data in biologyData transformation or model reformation?

Journal

ECOLOGY AND EVOLUTION
Volume 8, Issue 6, Pages 3077-3085

Publisher

WILEY
DOI: 10.1002/ece3.3807

Keywords

coefficients estimates; count data; generalized linear model; model comparison; non-normal error structure; Poisson distribution; residuals; transformation

Funding

  1. NSERC Postgraduate Scholarship (PGS)
  2. NSERC Canada Graduate Scholarship (CGS)
  3. Memorial University of Newfoundland's School of Graduate Studies

Ask authors/readers for more resources

Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for F and t tests. Over the years, there has been a movement from data transformation toward model reformationthe use of non-normal error structures within the framework of the generalized linear model (GLM). The principal advantage of model reformation is that parameters are estimated on the original, rather than the transformed scale. However, data transformation has been shown to give better control over type I error, for simulated data with known error structures. We conducted a literature review of statistical textbooks directed toward biologists and of journal articles published in the primary literature to determine temporal trends in both the text recommendations and the practice in the refereed literature over the past 35years. In this review, a trend of increasing use of reformation in the primary literature was evident, moving from no use of reformation before 1996 to >50% of the articles reviewed applying GLM after 2006. However, no such trend was observed in the recommendations in statistical textbooks. We then undertook 12 analyses based on published datasets in which we compared the type I error estimates, residual plot diagnostics, and coefficients yielded by analyses using square root transformations, log transformations, and the GLM. All analyses yielded acceptable residual versus fit plots and had similar p-values within each analysis, but as expected, the coefficient estimates differed substantially. Furthermore, no consensus could be found in the literature regarding a procedure to back-transform the coefficient estimates obtained from linear models performed on transformed datasets. This lack of consistency among coefficient estimates constitutes a major argument for model reformation over data transformation in biology.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available