Economics at your fingertips  

A clustering-based feature selection method for automatically generated relational attributes

Mostafa Rezaei (), Ivor Cribben () and Michele Samorani ()
Additional contact information
Mostafa Rezaei: University of Alberta
Ivor Cribben: University of Alberta
Michele Samorani: Santa Clara University

Annals of Operations Research, 2021, vol. 303, issue 1, No 12, 233-263

Abstract: Abstract Although data mining problems require a flat mining table as input, in many real-world applications analysts are interested in finding patterns in a relational database. To this end, new methods and software have been recently developed that automatically add attributes (or features) to a target table of a relational database which summarize information from all other tables. When attributes are automatically constructed by these methods, selecting the important attributes is particularly difficult, because a large number of the attributes are highly correlated. In this setting, attribute selection techniques such as the Least Absolute Shrinkage and Selection Operator (lasso), elastic net, and other machine learning methods tend to under-perform. In this paper, we introduce a novel attribute selection procedure, where after an initial screening step, we cluster the attributes into different groups and apply the group lasso to select both the true attributes groups and then the true attributes. The procedure is particularly suited to high dimensional data sets where the attributes are highly correlated. We test our procedure on several simulated data sets and a real-world data set from a marketing database. The results show that our proposed procedure obtains a higher predictive performance while selecting a much smaller set of attributes when compared to other state-of-the-art methods.

Keywords: Relational attribute generation; Feature selection; Lasso; Elastic net; Clustering (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed

Downloads: (external link) Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link:

Ordering information: This journal article can be ordered from

DOI: 10.1007/s10479-018-2830-2

Access Statistics for this article

Annals of Operations Research is currently edited by Endre Boros

More articles in Annals of Operations Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

Page updated 2022-05-12
Handle: RePEc:spr:annopr:v:303:y:2021:i:1:d:10.1007_s10479-018-2830-2