Simple Pattern Minimality Problems: Integer Linear Programming Formulations and Covering-Based Heuristic Solving Approaches
Maurizio Boccia (),
Antonio Sforza () and
Claudio Sterle ()
Additional contact information
Maurizio Boccia: Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, 80125 Napoli, Italy;
Antonio Sforza: Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, 80125 Napoli, Italy
Claudio Sterle: Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, 80125 Napoli, Italy; Istituto di Analisi dei Sistemi ed Informatica Antonio Ruberti, Consiglio Nazionale delle Ricerche, 00185 Rome, Italy
INFORMS Journal on Computing, 2020, vol. 32, issue 4, 1049-1060
Abstract:
The simple pattern minimality problem ( SPMP ) represents a central problem in the logical analysis of data and association rules mining, and it finds applications in several fields as logic synthesis, reliability analysis, and automated reasoning. It consists of determining the minimum number of patterns explaining all the observations of a data set, that is, a Boolean logic formula that is true for all the elements of the data set and false for all the unseen observations. We refer to this problem as covering SPMP ( C-SPMP ), because each observation can be explained (covered) by more than one pattern. Starting from a real industrial application, we also define a new version of the problem, and we refer to it as partitioning SPMP ( P-SPMP ), because each observation has to be covered just once. Given a propositional formula or a truth table, C-SPMP and P-SPMP coincide exactly with the problem of determining the minimum disjunctive and minimum exclusive disjunctive normal form, respectively. Both problems are known to be NP-hard and have been generally tackled by heuristic methods. In this context, the contribution of this work is twofold. On one side, it provides two original integer linear programming formulations for the two variants of the SPMP . These formulations exploit the concept of Boolean hypercube to build a graph representation of the problems and allow to exactly solve instances with more than 1,000 observations by using an MIP solver. On the other side, two effective and fast heuristics are proposed to solve relevant size instances taken from literature ( SeattleSNPs ) and from the industrial database. The proposed methods do not suffer from the same dimensional drawbacks of the methods present in the literature and outperform either existing commercial and freeware logic tools or the available industrial solutions in the number of generated patterns and/or in the computational burden.
Keywords: data mining and machine learning; logical analysis of data; simple pattern minimality problem; covering/partitioning approach (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1287/ijoc.2019.0940 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:orijoc:v:32:y:4:i:2020:p:1049-1060
Access Statistics for this article
More articles in INFORMS Journal on Computing from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().