EconPapers    
Economics at your fingertips  
 

A supervised weeding method to cluster high dimensional predictors with application to job market analysis

Yuyang Li, Jianxin Bi, Jingyuan Liu and Ying Yang

Journal of Applied Statistics, 2024, vol. 51, issue 16, 3350-3365

Abstract: The clustering of high-dimensional predictors draws increasing attention in various scientific areas, such as text mining and biological data analysis. In standard clustering procedures, when predictors are clustered, they only showcase the inherent patterns within the predictor set, lacking the capacity to predict the response variable. To this end, a new supervised weeding algorithm is advocated to address the dual requirement of detecting sparse clusters and capturing the prediction effects. The proposed algorithm is based on an iterative feature screening and coherence evaluation procedure. It iteratively weeds off the unimportant predictors in a backward fashion, forming sequences of nested sets to determine data-driven optimal cut-offs. This study uses Monte Carlo simulation to assess the finite-sample performance of the proposed method. The findings demonstrate that both the clustering and prediction performance of the proposed method are comparable to existing methods that concentrate solely on one aspect of the dual targets. An analysis of a job description dataset is conducted to explore significant groups of keywords that affect employees' salaries.

Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/02664763.2024.2348634 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:51:y:2024:i:16:p:3350-3365

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20

DOI: 10.1080/02664763.2024.2348634

Access Statistics for this article

Journal of Applied Statistics is currently edited by Robert Aykroyd

More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:japsta:v:51:y:2024:i:16:p:3350-3365