Introduction to pattern mining
ULB Institutional Repository from ULB -- Universite Libre de Bruxelles
We present an overview of data mining techniques for extracting knowledge from large databases with a special emphasis on the unsupervised technique pattern mining. Pattern mining is often defined as the automatic search for interesting patterns and regularities in large databases. In practise this definition most often comes down to listing all patterns that exceed a user-defined threshold for a fixed interestingness measure. The simplest such problem is that of listing all frequent itemsets: given a database of sets, called transactions, list all sets of items that are subset of at least a given number of the transactions. We revisit the two main strategies for mining all frequent itemsets: the breadth-first Apriori algorithm and the depth-first FPGrowth, after which we show what are the main issues when extending to more complex patterns such as listing all frequent subsequences or subgraphs. In the second part of the paper we then look into the pattern explosion problem. Due to redundancy among patterns, most often the list of all patterns satisfying the frequency thresholds is so large that post-processing is required to extract useful information from them. We give an overview of some recent techniques to reduce the redundancy in pattern collections using statistical methods to model the expectation of a user given background knowledge on the one hand, and the minimal description length principle on the other. © Springer International Publishing Switzerland 2014.
Note: SCOPUS: cp.k
References: Add references at CitEc
Citations: Track citations by RSS feed
Published in: Lecture Notes in Business Information Processing (2014) v.172 LNBIP,p.1-32
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:ulb:ulbeco:2013/187686
Ordering information: This working paper can be ordered from
http://hdl.handle.ne ... lb.ac.be:2013/187686
Access Statistics for this paper
More papers in ULB Institutional Repository from ULB -- Universite Libre de Bruxelles Contact information at EDIRC.
Bibliographic data for series maintained by Benoit Pauwels ().