Functional analysis of generalized linear models under non-linear constraints with applications to identifying highly-cited papers
K.P. Chowdhury
Journal of Informetrics, 2021, vol. 15, issue 1
Abstract:
This article introduces a versatile functional form for Generalized Linear Models (GLMs) through a simple, yet effective, transformation of the current framework. The models are applied through a new hierarchical bayesian estimation procedure for logistic regression to highly-cited papers in the Management Information Systems (MIS) field. The results are uniformly better, in regards to model fit and inference for in-sample and out-of-sample data, for simulation studies and real-world data applications, requiring very little time to convergence to true population parameters. In simulation studies, I show that the method contains the true parameters nearly three times as often as widely used existing GLMs, and does so while having confidence intervals that are 54.50% smaller, while requiring around two-thirds the number of MCMC iterations as existing bayesian methods. In Scientometric applications the methodology is shown to be highly robust with predictive/classification accuracy, either equaling or exceeding existing methods for identifying highly-cited articles including Artificial Neural Networks (ANN). Thus, the method is shown to be robust to the amount of asymmetry (or symmetry) of the probability of success (or failure) and robust to unbalanced samples and varying Data Generating Processes. Further, the methodology is equivalent to current methods if the data support them and is therefore complementary to existing methods, without loss of interpretability of model parameters. For the MIS field it finds that Popularity Parameter (PP) of an article Keywords can predict whether a paper will be highly-cited (top 25% of highly-cited articles) between two to three years after publication and beyond. Furthermore, given the small number of iterations needed for convergence, the methodology can also be used as a baseline method in Big Data (BD) settings for both Artificial Intelligence (AI) and Machine Learning (ML) contexts as well.
Keywords: Unbalanced data; MCMC; Neural Networks; Artificial Intelligence; Machine Learning; Logistic regression; Categorical data analysis; Bayesian estimation; Model fit; Classification; Inference (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S1751157720306295
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:infome:v:15:y:2021:i:1:s1751157720306295
DOI: 10.1016/j.joi.2020.101112
Access Statistics for this article
Journal of Informetrics is currently edited by Leo Egghe
More articles in Journal of Informetrics from Elsevier
Bibliographic data for series maintained by Catherine Liu ().