A framework of irregularity enlightenment for data pre-processing in data mining
Siu-Tong Au (),
Rong Duan,
Siamak Hesar and
Wei Jiang ()
Annals of Operations Research, 2010, vol. 174, issue 1, 47-66
Abstract:
Irregularities are widespread in large databases and often lead to erroneous conclusions with respect to data mining and statistical analysis. For example, considerable bias is often resulted from many parameter estimation procedures without properly handling significant irregularities. Most data cleaning tools assume one known type of irregularity. This paper proposes a generic Irregularity Enlightenment (IE) framework for dealing with the situation when multiple irregularities are hidden in large volumes of data in general and cross sectional time series in particular. It develops an automatic data mining platform to capture key irregularities and classify them based on their importance in a database. By decomposing time series data into basic components, we propose to optimize a penalized least square loss function to aid the selection of key irregularities in consecutive steps and cluster time series into different groups until an acceptable level of variation reduction is achieved. Finally visualization tools are developed to help analysts interpret and understand the nature of data better and faster before further data modeling and analysis. Copyright Springer Science+Business Media, LLC 2010
Keywords: Activity monitoring; Change point; Feature selection; LASSO; Outliers; Regression models (search for similar items in EconPapers)
Date: 2010
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (12)
Downloads: (external link)
http://hdl.handle.net/10.1007/s10479-008-0494-z (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:annopr:v:174:y:2010:i:1:p:47-66:10.1007/s10479-008-0494-z
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10479
DOI: 10.1007/s10479-008-0494-z
Access Statistics for this article
Annals of Operations Research is currently edited by Endre Boros
More articles in Annals of Operations Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().