Pattern Discovery and Detection: A Unified Statistical Methodology
David Hand and
Richard Bolton
Journal of Applied Statistics, 2004, vol. 31, issue 8, 885-924
Abstract:
Modern statistical data analysis is predominantly model-driven, seeking to decompose an observed data distribution in terms of major underlying descriptive features modified by some stochastic variation. A large part of data mining is also concerned with this exercise. However, another fundamental part of data mining is concerned with detecting anomalies amongst the vast mass of the data: the small deviations, unusual observations, unexpected clusters of observations, or surprising blips in the data, which the model does not explain. We call such anomalies patterns. For sound reasons, which are outlined in the paper, the data mining community has tended to focus on the algorithmic aspects of pattern discovery, and has not developed any general underlying theoretical base. However, such a base is important for any technology: it helps to steer the direction in which the technology develops, as well as serving to provide a basis from which algorithms can be compared, and to indicate which problems are the important ones waiting to be solved. This paper attempts to provide such a theoretical base, linking the ideas to statistical work in spatial epidemiology, scan statistics, outlier detection, and other areas. One of the striking characteristics of work on pattern discovery is that the ideas have been developed in several theoretical arenas, and also in several application domains, with little apparent awareness of the fundamentally common nature of the problem. Like model building, pattern discovery is fundamentally an inferential activity, and is an area in which statisticians can make very significant contributions.
Keywords: Patterns; pattern discovery; data mining; association analysis; bioinformatics; technical analysis; market basket analysis; configural frequency analysis; scan statistics; spatial epidemiology (search for similar items in EconPapers)
Date: 2004
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.tandfonline.com/doi/abs/10.1080/0266476042000270518 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:31:y:2004:i:8:p:885-924
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20
DOI: 10.1080/0266476042000270518
Access Statistics for this article
Journal of Applied Statistics is currently edited by Robert Aykroyd
More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().