# Introduction to pattern mining

*Toon Calders*

ULB Institutional Repository from ULB -- Universite Libre de Bruxelles

**Abstract:**
We present an overview of data mining techniques for extracting knowledge from large databases with a special emphasis on the unsupervised technique pattern mining. Pattern mining is often defined as the automatic search for interesting patterns and regularities in large databases. In practise this definition most often comes down to listing all patterns that exceed a user-defined threshold for a fixed interestingness measure. The simplest such problem is that of listing all frequent itemsets: given a database of sets, called transactions, list all sets of items that are subset of at least a given number of the transactions. We revisit the two main strategies for mining all frequent itemsets: the breadth-first Apriori algorithm and the depth-first FPGrowth, after which we show what are the main issues when extending to more complex patterns such as listing all frequent subsequences or subgraphs. In the second part of the paper we then look into the pattern explosion problem. Due to redundancy among patterns, most often the list of all patterns satisfying the frequency thresholds is so large that post-processing is required to extract useful information from them. We give an overview of some recent techniques to reduce the redundancy in pattern collections using statistical methods to model the expectation of a user given background knowledge on the one hand, and the minimal description length principle on the other. © Springer International Publishing Switzerland 2014.

**Date:** 2014

**Note:** SCOPUS: cp.k

**References:** Add references at CitEc

**Citations:** Track citations by RSS feed

**Published** in: Lecture Notes in Business Information Processing (2014) v.172 LNBIP,p.1-32

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

**Related works:**

This item may be available elsewhere in EconPapers: Search for items with the same title.

**Export reference:** BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text

**Persistent link:** https://EconPapers.repec.org/RePEc:ulb:ulbeco:2013/187686

**Ordering information:** This working paper can be ordered from

http://hdl.handle.ne ... lb.ac.be:2013/187686

Access Statistics for this paper

More papers in ULB Institutional Repository from ULB -- Universite Libre de Bruxelles Contact information at EDIRC.

Bibliographic data for series maintained by Benoit Pauwels ().