Algorithms for Generalized Clusterwise Linear Regression

Park, Young Woong; Jiang, Yan; Klabjan, Diego; Williams, Loren

Algorithms for Generalized Clusterwise Linear Regression

Young Woong Park (), Yan Jiang (), Diego Klabjan () and Loren Williams ()
Additional contact information
Young Woong Park: Cox School of Business, Southern Methodist University, Dallas, Texas 75275
Yan Jiang: Sears Holdings Corporation, Hoffman Estates, Illinois 60179
Diego Klabjan: Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
Loren Williams: Ernst & Young LLP, Atlanta, Georgia 30308

INFORMS Journal on Computing, 2017, vol. 29, issue 2, 301-317

Abstract: Clusterwise linear regression (CLR), a clustering problem intertwined with regression, finds clusters of entities such that the overall sum of squared errors from regressions performed over these clusters is minimized, where each cluster may have different variances. We generalize the CLR problem by allowing each entity to have more than one observation and refer to this as generalized CLR. We propose an exact mathematical programming-based approach relying on column generation, a column generation–based heuristic algorithm that clusters predefined groups of entities, a metaheuristic genetic algorithm with adapted Lloyd’s algorithm for K -means clustering, a two-stage approach, and a modified algorithm of Späth [Späth (1979) Algorithm 39 clusterwise linear regression. Comput. 22(4):367–373] for solving generalized CLR. We examine the performance of our algorithms on a stock-keeping unit (SKU)-clustering problem employed in forecasting halo and cannibalization effects in promotions using real-world retail data from a large supermarket chain. In the SKU clustering problem, the retailer needs to cluster SKUs based on their seasonal effects in response to promotions. The seasonal effects result from regressions with predictors being promotion mechanisms and seasonal dummies performed over clusters generated. We compare the performance of all proposed algorithms for the SKU problem with real-world and synthetic data.

Keywords: regression; optimization; data mining; clustering (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://doi.org/10.1287/ijoc.2016.0729 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:orijoc:v:29:y:2017:i:2:p:301-317

Access Statistics for this article

More articles in INFORMS Journal on Computing from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().