A Comparison Study of Goodness of Fit Tests of Logistic Regression in R: Simulation and Application to Breast Cancer Data
El-Housainy A. Rady,
Mohamed Abonazel () and
Mariam H. Metaweâ€™e
Additional contact information
El-Housainy A. Rady: Department of Applied Statistics and Econometrics, Faculty of Graduate Studies for Statistical Research, Cairo University, Egypt
Mariam H. Metaweâ€™e: Department of Applied Statistics and Econometrics, Faculty of Graduate Studies for Statistical Research, Cairo University, Egypt
Academic Journal of Applied Mathematical Sciences, 2021, vol. 7, issue 1, 50-59
Goodness of fit (GOF) tests of logistic regression attempt to find out the suitability of the model to the data. The null hypothesis of all GOF tests is the model fit. R as a free software package has many GOF tests in different packages. A Monte Carlo simulation has been conducted to study two situations; the first, studying the ability of each test, under its default settings, to accept the null hypothesis when the model truly fitted. The second, studying the power of these tests when assumptions of sufficient linear combination of the explanatory variables are violated (by omitting linear covariate term, quadratic term, or interaction term). Moreover, checking whether the same test in different R packages had the same results or not. As the sample size supposed to affect simulation results, so the pattern of change of GOF tests results under different sample sizes as well as different model settings was estimated. All tests accept the null hypothesis (more than 95% of simulation trials) when the model truly fitted except modified Hosmer-Lemeshow test in "LogisticDx" package under all different model settings and Osius and Rojekâ€™s (OsRo) test when the true model had an interaction term between binary and categorical covariates. In addition, le Cessie-van Houwelingen-Copas-Hosmer unweighted sum of squares (CHCH) test gave unexpected different results under different packages. Concerning the power study, all tests had a very low power when a departure of missing covariate existed. Generally, stukelâ€™s test (package â€™LogisticDX) and CHCH test (package "RMS") reached a power in detecting a missing quadratic term greater than 80% under lower sample size while OsRo test (package â€™LogisticDXâ€™) was better in detecting missing interaction term. Beside the simulation study, we evaluated the performance of GOF tests using the breast cancer dataset.
Keywords: Binary logistic regression model; Hosmer-lemeshow test; Misspecification; Power of goodness of fit tests; Pseudo R squared; R packages. (search for similar items in EconPapers)
References: Add references at CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:arp:ajoams:2021:p:50-59
Access Statistics for this article
Academic Journal of Applied Mathematical Sciences is currently edited by Dr. Diana Bílková
More articles in Academic Journal of Applied Mathematical Sciences from Academic Research Publishing Group Rahim Yar Khan 64200, Punjab, Pakistan.
Bibliographic data for series maintained by Managing Editor ().