Economics at your fingertips  

Regression for citation data: An evaluation of different methods

Mike Thelwall and Paul Wilson

Journal of Informetrics, 2014, vol. 8, issue 4, 963-971

Abstract: Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have been investigated with negative binomial regression. Using simulated discrete lognormal data (continuous lognormal data rounded to the nearest integer) this article shows that a better strategy is to add one to the citations, take their log and then use the general linear (ordinary least squares) model for regression (e.g., multiple linear regression, ANOVA), or to use the generalised linear model without the log. Reasonable results can also be obtained if all the zero citations are discarded, the log is taken of the remaining citation counts and then the general linear model is used, or if the generalised linear model is used with the continuous lognormal distribution. Similar approaches are recommended for altmetric data, if it proves to be lognormally distributed.

Keywords: Informetrics; Altmetrics; Citation distributions; Lognormal; Powerlaw; Regression (search for similar items in EconPapers)
Date: 2014
References: View references in EconPapers View complete reference list from CitEc
Citations View citations in EconPapers (15) Track citations by RSS feed

Downloads: (external link)
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link:

Access Statistics for this article

Journal of Informetrics is currently edited by Leo Egghe

More articles in Journal of Informetrics from Elsevier
Series data maintained by Dana Niculescu ().

Page updated 2017-09-29
Handle: RePEc:eee:infome:v:8:y:2014:i:4:p:963-971