Data Exploration by Representative Region Selection: Axioms and Convergence
Alexander S. Estes (),
Michael O. Ball () and
David J. Lovell ()
Additional contact information
Alexander S. Estes: Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, Minnesota 55455
Michael O. Ball: Robert H. Smith School of Business and Institute of Systems Research, University of Maryland, College Park, Maryland 20742
David J. Lovell: Department of Civil and Environmental Engineering and Institute of Systems Research, University of Maryland, College Park, Maryland 20742
Mathematics of Operations Research, 2021, vol. 46, issue 3, 970-1007
Abstract:
We present a new type of unsupervised learning problem in which we find a small set of representative regions that approximates a larger data set. These regions may be presented to a practitioner along with additional information in order to help the practitioner explore the data set. An advantage of this approach is that it does not rely on cluster structure of the data. We formally define this problem, and we present axioms that should be satisfied by functions that measure the quality of representatives. We provide a quality function that satisfies all of these axioms. Using this quality function, we formulate two optimization problems for finding representatives. We provide convergence results for a general class of methods, and we show that these results apply to several specific methods, including methods derived from the solution of the optimization problems formulated in this paper. We provide an example of how representative regions may be used to explore a data set.
Keywords: Primary: 62G05; secondary: 62G07; 62G20; Primary: Statistics/data analysis; secondary: statistics/nonparametric; representative region selection; unsupervised learning; density estimation; data analysis; data summarization (search for similar items in EconPapers)
Date: 2021
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/moor.2020.1115 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:46:y:2021:i:3:p:970-1007
Access Statistics for this article
More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().