EconPapers    
Economics at your fingertips  
 

Detecting and classifying outliers in big functional data

Oluwasegun Taiwo Ojo (), Antonio Fernández Anta, Rosa E. Lillo and Carlo Sguera
Additional contact information
Oluwasegun Taiwo Ojo: IMDEA Networks Institute
Antonio Fernández Anta: IMDEA Networks Institute
Rosa E. Lillo: Universidad Carlos III de Madrid
Carlo Sguera: Universidad Carlos III de Madrid

Advances in Data Analysis and Classification, 2022, vol. 16, issue 3, No 9, 725-760

Abstract: Abstract We propose two new outlier detection methods, for identifying and classifying different types of outliers in (big) functional data sets. The proposed methods are based on an existing method called Massive Unsupervised Outlier Detection (MUOD). MUOD detects and classifies outliers by computing for each curve, three indices, all based on the concept of linear regression and correlation, which measure outlyingness in terms of shape, magnitude and amplitude, relative to the other curves in the data. ‘Semifast-MUOD’, the first method, uses a sample of the observations in computing the indices, while ‘Fast-MUOD’, the second method, uses the point-wise or $$L_1$$ L 1 median in computing the indices. The classical boxplot is used to separate the indices of the outliers from those of the typical observations. Performance evaluation of the proposed methods using simulated data show significant improvements compared to MUOD, both in outlier detection and computational time. We show that Fast-MUOD is especially well suited to handling big and dense functional datasets with very small computational time compared to other methods. Further comparisons with some recent outlier detection methods for functional data also show superior or comparable outlier detection accuracy of the proposed methods. We apply the proposed methods on weather, population growth, and video data.

Keywords: Outlier detection; Functional data analysis; MUOD; Semifast-MUOD; Fast-MUOD; 62R10 Functional data analysis; 62R07 Statistical aspects of big data and data science; 62-08 Computational methods for problems pertaining to statistics; 62P99 Application of Statistics (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s11634-021-00460-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:16:y:2022:i:3:d:10.1007_s11634-021-00460-9

Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2

DOI: 10.1007/s11634-021-00460-9

Access Statistics for this article

Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs

More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:advdac:v:16:y:2022:i:3:d:10.1007_s11634-021-00460-9