Compactness score: a fast filter method for unsupervised feature selection
Peican Zhu (),
Xin Hou,
Keke Tang,
Zhen Wang and
Feiping Nie ()
Additional contact information
Peican Zhu: Northwestern Polytechnical University (NWPU)
Xin Hou: Northwestern Polytechnical University (NWPU)
Keke Tang: Cyberspace Institute of Advanced Technology, Guangzhou University
Zhen Wang: Northwestern Polytechnical University (NWPU)
Feiping Nie: Northwestern Polytechnical University (NWPU)
Annals of Operations Research, 2025, vol. 348, issue 1, No 13, 299-315
Abstract:
Abstract The rapid development of big data era incurs the generation of huge amount of data day by day in various fields. Due to the large-scale and high-dimensional characteristics of these data, it is often difficult to achieve better decision-making in practical applications. Therefore, an efficient big data analytical method is urgently necessary. For feature engineering, feature selection seems to be an important research topic which is anticipated to select “excellent” features from candidate ones. The implementation of feature selection can not only achieve the purpose of dimensionality reduction, but also improve the computational efficiency and result performance of the model. In many classification tasks, researchers found that data seem to be usually close to each other if they are from the same class; thus, local compactness is of great importance for the evaluation of a feature. Based on this discovery, we propose a fast unsupervised feature selection algorithm, named Compactness Score (CSUFS), to select desired features. To prove the superiority of the proposed algorithm, several public data sets are considered with extensive experiments being performed. The experiments are presented by applying feature subsets selected through several different algorithms to the clustering task. The performance of clustering tasks is indicated by two well-known evaluation metrics, while the efficiency is reflected by the corresponding running time. As demonstrated, our proposed algorithm is more accurate and efficient compared with existing ones.
Keywords: Big data analytics; Unsupervised feature selection; Dimensionality reduction; k nearest neighbor distances (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10479-023-05271-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:annopr:v:348:y:2025:i:1:d:10.1007_s10479-023-05271-z
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10479
DOI: 10.1007/s10479-023-05271-z
Access Statistics for this article
Annals of Operations Research is currently edited by Endre Boros
More articles in Annals of Operations Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().