An Optimization-Based Order-and-Cut Approach for Fair Clustering of Data Sets

Li, Su; Aprahamian, Hrayer; Nouiehed, Maher; El-Amine, Hadi

An Optimization-Based Order-and-Cut Approach for Fair Clustering of Data Sets

Su Li (), Hrayer Aprahamian (), Maher Nouiehed () and Hadi El-Amine ()
Additional contact information
Su Li: Department of Industrial and Systems Engineering, Texas A&M University, College Station, Texas 77843
Hrayer Aprahamian: Department of Industrial and Systems Engineering, Texas A&M University, College Station, Texas 77843
Maher Nouiehed: Department of Industrial Engineering and Management, American University of Beirut, Beirut 1107 2020, Lebanon
Hadi El-Amine: Department of Systems Engineering and Operations Research, George Mason University, Fairfax, Virginia 22030

INFORMS Joural on Data Science, 2024, vol. 3, issue 2, 124-144

Abstract: Machine learning algorithms have been increasingly integrated into applications that significantly affect human lives. This surged an interest in designing algorithms that train machine learning models to minimize training error and imposing a certain level of fairness. In this paper, we consider the problem of fair clustering of data sets. In particular, given a set of items each associated with a vector of nonsensitive attribute values and a categorical sensitive attribute (e.g., gender, race, etc.), our goal is to find a clustering of the items that minimizes the loss (i.e., clustering objective) function and imposes fairness measured by Rényi correlation. We propose an efficient and scalable in-processing algorithm, driven by findings from the field of combinatorial optimization, that heuristically solves the underlying optimization problem and allows for regulating the trade-off between clustering quality and fairness. The approach does not restrict the analysis to a specific loss function, but instead considers a more general form that satisfies certain desirable properties. This broadens the scope of the algorithm’s applicability. We demonstrate the effectiveness of the algorithm for the specific case of k -means clustering as it is one of the most extensively studied and widely adopted clustering schemes. Our numerical experiments reveal the proposed algorithm significantly outperforms existing methods by providing a more effective mechanism to regulate the trade-off between loss and fairness.

Keywords: clustering; fairness; R’enyi correlation; optimal ordering; combinatorial optimization; k -means (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/ijds.2022.0005 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:orijds:v:3:y:2024:i:2:p:124-144

Access Statistics for this article

More articles in INFORMS Joural on Data Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().