Semiparametric regression in Stata
Vincenzo Verardi
United Kingdom Stata Users' Group Meetings 2014 from Stata Users Group
Abstract:
The boxplot is probably the most commonly used tool to represent the distribution of the data and identify atypical observations in a univariate dataset. The problem with the standard boxplot is that as soon as asymmetry or tail heaviness appears, the percentage of values identified as atypical becomes excessive. To cope with this issue, Hubert and Vandervieren (2008) proposed an adjusted boxplot for skewed data. Their idea is to move the whiskers of the boxplot according to the degree of asymmetry of the data. The rule to set the whiskers of the adjusted boxplot was found by running a large number of simulations using a wide range of (moderately) skewed distributions. The idea was to find a rule that guaranteed that 0.7% of the observations would lie outside the interval delimited by the whiskers. Even if their rule works satisfactorily for most commonly used distributions, it suffers from some limitations: (i) the adjusted boxplot is not appropriate for severely skewed distributions and for distributions with heavy tails; (ii) it is specifically related to a theoretical rejection rate of 0.7%; (iii) it is extremely sensitive to the estimated value of the asymmetry parameter; and (iv) it requires a substantial computational complexity, O(n \log n). To tackle these drawbacks, we propose a much simpler method to find the whiskers of the boxplot in case of (eventually) skewed and heavy-tailed data. We apply a simple rank-preserving transformation on the original data so that the transformed data can be adjusted by a so-called Tukey g-and-h distribution. Using the quantiles of this distribution, we can easily recover whiskers of the boxplot related to the original data. The computational complexity of the proposed method is O(n), the same as the standard boxplot.
Date: 2014-09-28
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://repec.org/usug2014/verardi_uksug14.pdf (application/pdf)
Related works:
Working Paper: Semiparametric regression in Stata (2013) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:boc:usug14:09
Access Statistics for this paper
More papers in United Kingdom Stata Users' Group Meetings 2014 from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().