EconPapers    
Economics at your fingertips  
 

Clustering and Representative Selection for High-Dimensional Data with Human-in-the-Loop

Sheng-Tao Yang (), Jye-Chyi Lu () and Yu-Chung Tsao ()
Additional contact information
Sheng-Tao Yang: Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30339
Jye-Chyi Lu: Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30339
Yu-Chung Tsao: Department of Industrial Management, National Taiwan University of Science and Technology, Taipei City 106, Taiwan

INFORMS Joural on Data Science, 2025, vol. 4, issue 2, 154-172

Abstract: This article proposes a novel decision-making procedure called human-in-the-loop clustering and representative selection (HITL-CARS) that involves users’ domain knowledge for analyzing high-dimensional data sets. The proposed method simultaneously clusters strongly correlated variables and estimates a linear regression model with only a few selected variables from cluster representatives and independent variables. In this work, we model the CARS procedure as a mixed-integer programming problem on the basis of penalized likelihood and partition around medoids clustering. After users obtain analysis results from CARS and provide their advice based on their domain knowledge, HITL-CARS refines analyses for accounting users’ inputs. Simulation studies show that the one-stage CARS performs better than the two-stage group Lasso and clustering representative Lasso in metrics such as true-positive, false-positive, exchangeable representative selection, and so on. Additionally, sensitivity and parameter misspecification studies present the robustness of the CARS to different preset parameters and provide guidance on how to start and adjust the HILT-CARS procedure. A real-life example of brain mapping data shows that HITL-CARS could aid in discovering important brain regions associated with depression symptoms and provide predictive analytics on cluster representatives.

Keywords: interactive machine learning; Lasso; mixed-integer programming; partition around medoids; large p small n variable selection (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/ijds.2022.9014 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:orijds:v:4:y:2025:i:2:p:154-172

Access Statistics for this article

More articles in INFORMS Joural on Data Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-06-11
Handle: RePEc:inm:orijds:v:4:y:2025:i:2:p:154-172