Conjecturing-Based Discovery of Patterns in Data
J. Paul Brooks (),
David J. Edwards (),
Craig E. Larson () and
Nico Van Cleemput ()
Additional contact information
J. Paul Brooks: Department of Information Systems, Virginia Commonwealth University, Richmond, Virginia 23284
David J. Edwards: Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, Virginia 23284
Craig E. Larson: Department of Mathematics and Applied Mathematics, Virginia Commonwealth University, Richmond, Virginia 23284
Nico Van Cleemput: Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
INFORMS Joural on Data Science, 2024, vol. 3, issue 2, 179-202
Abstract:
We propose the use of a conjecturing machine that suggests feature relationships in the form of bounds involving nonlinear terms for numerical features and Boolean expressions for categorical features. The proposed C onjecturing framework recovers known nonlinear and Boolean relationships among features from data. In both settings, true underlying relationships are revealed. We then compare the method to a previously proposed framework for symbolic regression on the ability to recover equations that are satisfied among features in a data set. The framework is then applied to patient-level data regarding COVID-19 outcomes to suggest possible risk factors that are confirmed in the medical literature. Discovering patterns in data is a first step toward establishing causal relationships, which can be the basis for effective decision making.
Keywords: automated conjecturing; computational scientific discovery; interpretable artificial intelligence; nonlinear pattern discovery; Boolean pattern discovery (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/ijds.2021.0043 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:orijds:v:3:y:2024:i:2:p:179-202
Access Statistics for this article
More articles in INFORMS Joural on Data Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().