Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases

Lin, Jerry Chun-Wei; Gan, Wensheng; Fournier-Viger, Philippe; Hong, Tzung-Pei; Chao, Han-Chieh

Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases

Jerry Chun-Wei Lin (), Wensheng Gan (), Philippe Fournier-Viger (), Tzung-Pei Hong and Han-Chieh Chao ()
Additional contact information
Jerry Chun-Wei Lin: School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, P. R. China
Wensheng Gan: School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, P. R. China
Philippe Fournier-Viger: #x2020;School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, P. R. China
Tzung-Pei Hong: #x2021;Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan§Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
Han-Chieh Chao: #xB6;Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien, Taiwan

International Journal of Information Technology & Decision Making (IJITDM), 2017, vol. 16, issue 06, 1549-1579

Abstract: Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. In recent decades, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance. It can thus discover itemsets that are more useful and meaningful by ignoring irrelevant itemsets with lower weights. UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, these two ideas have been combined in the HEWI-Uapriori algorithm. This latter considers both item weights and transaction uncertainty to mine the high expected weighted itemsets (HEWIs) using a two-phase Apriori-based approach. Although the upper-bound proposed in HEWI-Uapriori can reduce the size of the search space, it still generates a large amount of candidates and uses a level-wise search. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating candidates. This algorithm relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is generally much more efficient than traditional methods for WFIM and UFIM, as well as the state-of-the-art HEWI-Uapriori algorithm, in terms of runtime, memory consumption, and scalability.

Keywords: Frequent itemsets; uncertain databases; weighted frequent itemsets; WP-table; candidate generation (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219622017500341
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:ijitdm:v:16:y:2017:i:06:n:s0219622017500341

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219622017500341

Access Statistics for this article

International Journal of Information Technology & Decision Making (IJITDM) is currently edited by Yong Shi

More articles in International Journal of Information Technology & Decision Making (IJITDM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().