EconPapers    
Economics at your fingertips  
 

New upper bounds for tight and fast approximation of Fisher’s exact test in dependency rule mining

Wilhelmiina Hämäläinen

Computational Statistics & Data Analysis, 2016, vol. 93, issue C, 469-482

Abstract: In the dependency rule mining, the goal is to discover the most significant statistical dependencies among all possible collapsed 2×2 contingency tables. Fisher’s exact test is a robust method to estimate the significance and it enables efficient pruning of the search space. The problem is that evaluating the required p-value can be very laborious and the worst case time complexity is O(n), where n is the data size. The traditional solution is to approximate the significance with the χ2-measure, which can be estimated in a constant time. However, the χ2-measure can produce unreliable results (discover spurious dependencies but miss the most significant dependencies). Furthermore, it does not support efficient pruning of the search space. As a solution, a family of tight upper bounds for Fisher’s p is introduced. The new upper bounds are fast to calculate and approximate Fisher’s p-value accurately. In addition, the new approximations are not sensitive to the data size, distribution, or smallest expected counts like the χ2-based approximation. In practice, the execution time depends on the desired accuracy level. According to experimental evaluation, the simplest upper bounds are already sufficiently accurate for dependency rule mining purposes and they can be estimated in 0.004–0.1% of the time needed for exact calculation. For other purposes (testing very weak dependencies), one may need more accurate approximations, but even they can be calculated in less than 1% of the exact calculation time.

Keywords: Fisher’s exact test; p-value; Hypergeometric distribution; Upper bound; Approximation; Dependency rule; Data mining (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947315001747
Full text for ScienceDirect subscribers only.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:93:y:2016:i:c:p:469-482

DOI: 10.1016/j.csda.2015.08.002

Access Statistics for this article

Computational Statistics & Data Analysis is currently edited by S.P. Azen

More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:csdana:v:93:y:2016:i:c:p:469-482