EconPapers    
Economics at your fingertips  
 

Ordered quantile normalization: a semiparametric transformation built for the cross-validation era

Ryan A. Peterson and Joseph E. Cavanaugh

Journal of Applied Statistics, 2020, vol. 47, issue 13-15, 2312-2327

Abstract: Normalization transformations have recently experienced a resurgence in popularity in the era of machine learning, particularly in data preprocessing. However, the classical methods that can be adapted to cross-validation are not always effective. We introduce Ordered Quantile (ORQ) normalization, a one-to-one transformation that is designed to consistently and effectively transform a vector of arbitrary distribution into a vector that follows a normal (Gaussian) distribution. In the absence of ties, ORQ normalization is guaranteed to produce normally distributed transformed data. Once trained, an ORQ transformation can be readily and effectively applied to new data. We compare the effectiveness of the ORQ technique with other popular normalization methods in a simulation study where the true data generating distributions are known. We find that ORQ normalization is the only method that works consistently and effectively, regardless of the underlying distribution. We also explore the use of repeated cross-validation to identify the best normalizing transformation when the true underlying distribution is unknown. We apply our technique and other normalization methods via the bestNormalize R package on a car pricing data set. We built bestNormalize to evaluate the normalization efficacy of many candidate transformations; the package is freely available via the Comprehensive R Archive Network.

Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (8)

Downloads: (external link)
http://hdl.handle.net/10.1080/02664763.2019.1630372 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:47:y:2020:i:13-15:p:2312-2327

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20

DOI: 10.1080/02664763.2019.1630372

Access Statistics for this article

Journal of Applied Statistics is currently edited by Robert Aykroyd

More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:japsta:v:47:y:2020:i:13-15:p:2312-2327