EconPapers    
Economics at your fingertips  
 

Pattern Discovery and Detection: A Unified Statistical Methodology

David Hand and Richard Bolton

Journal of Applied Statistics, 2004, vol. 31, issue 8, 885-924

Abstract: Modern statistical data analysis is predominantly model-driven, seeking to decompose an observed data distribution in terms of major underlying descriptive features modified by some stochastic variation. A large part of data mining is also concerned with this exercise. However, another fundamental part of data mining is concerned with detecting anomalies amongst the vast mass of the data: the small deviations, unusual observations, unexpected clusters of observations, or surprising blips in the data, which the model does not explain. We call such anomalies patterns. For sound reasons, which are outlined in the paper, the data mining community has tended to focus on the algorithmic aspects of pattern discovery, and has not developed any general underlying theoretical base. However, such a base is important for any technology: it helps to steer the direction in which the technology develops, as well as serving to provide a basis from which algorithms can be compared, and to indicate which problems are the important ones waiting to be solved. This paper attempts to provide such a theoretical base, linking the ideas to statistical work in spatial epidemiology, scan statistics, outlier detection, and other areas. One of the striking characteristics of work on pattern discovery is that the ideas have been developed in several theoretical arenas, and also in several application domains, with little apparent awareness of the fundamentally common nature of the problem. Like model building, pattern discovery is fundamentally an inferential activity, and is an area in which statisticians can make very significant contributions.

Keywords: Patterns; pattern discovery; data mining; association analysis; bioinformatics; technical analysis; market basket analysis; configural frequency analysis; scan statistics; spatial epidemiology (search for similar items in EconPapers)
Date: 2004
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://www.tandfonline.com/doi/abs/10.1080/0266476042000270518 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:31:y:2004:i:8:p:885-924

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20

DOI: 10.1080/0266476042000270518

Access Statistics for this article

Journal of Applied Statistics is currently edited by Robert Aykroyd

More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:japsta:v:31:y:2004:i:8:p:885-924