Testing Goodness-of-Fit with the Kernel Density Estimator: GoFKernel

Pavia, Jose M.

Testing Goodness-of-Fit with the Kernel Density Estimator: GoFKernel

Jose M. Pavia

Journal of Statistical Software, 2015, vol. 066, issue c01

Abstract: To assess the goodness-of-fit of a sample to a continuous random distribution, the most popular approach has been based on measuring, using either L∞ - or L2 -norms, the distance between the null hypothesis cumulative distribution function and the empirical cumulative distribution function. Indeed, as far as I know, almost all the tests currently available in R related to this issue (ks.test in package stats, ad.test in package ADGofTest, and ad.test, ad2.test, ks.test, v.test and w2.test in package truncgof) use one of these two distances on cumulative distribution functions. This paper (i) proposes dgeometric.test, a new implementation of the test that measures the discrepancy between a sample kernel estimate of the density function and the null hypothesis density function on the L1 -norm, (ii) introduces the GoFKernel package, and (iii) performs a large simulation exercise to assess the calibration and sensitivity of the above listed tests as well as the Fan's test (Fan'94), fan.test, also implemented in the GoFKernel package. In addition to dgeometric.test and fan.test, the GoFKernel package adds a couple of functions that R users might also find of interest: density.reflected extends density, allowing the computation of consistent kernel density estimates for bounded random variables, and random.function offers an ad-hoc and universal (although computational expensive and potentially inaccurate for long tail distributions) sampling method. In light of the simulation results, we can conclude that (i) the tests implemented in the truncgof package should not be used to assess goodness-of-fit (at least for non-truncated distributions), (ii) the test fan.test shows an over-tendency to not reject the null hypothesis, being visibly miscalibrated (at least in its default option, where the bandwidth parameter is estimated using dpik from package KernSmooth), (iii) the tests ks.test and ad.test show similar power, with ad.test being slightly preferable in large samples, and (iv) dgeometric.test represents a good alternative given its satisfactory calibration and its, in general, superior power in samples of medium and large sizes. As a counterpart it entails more computational burden when the random generator of the null hypothesis density function is not available in R and random.function must be used.

Date: 2015-08-26
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
https://www.jstatsoft.org/index.php/jss/article/view/v066c01/v66c01.pdf
https://www.jstatsoft.org/index.php/jss/article/do ... FKernel_2.0-6.tar.gz
https://www.jstatsoft.org/index.php/jss/article/do ... ile/v066c01/v66c01.R

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:jss:jstsof:v:066:c01

DOI: 10.18637/jss.v066.c01

Access Statistics for this article

Journal of Statistical Software is currently edited by Bettina Grün, Edzer Pebesma and Achim Zeileis

More articles in Journal of Statistical Software from Foundation for Open Access Statistics
Bibliographic data for series maintained by Christopher F. Baum ().