DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset Mining
Mohsin Shaikh,
Sabina Akram,
Jawad Khan (),
Shah Khalid and
Youngmoon Lee ()
Additional contact information
Mohsin Shaikh: Department of Computer Science, The University of Larkano, Larkana 77062, Pakistan
Sabina Akram: Department of Computer Science and Engineering, Fast National University, Islamabad 44000, Pakistan
Jawad Khan: School of Computing, Gachon University, Seongnam 13120, Republic of Korea
Shah Khalid: School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan
Youngmoon Lee: Department of Robotics, Hanyang University, Ansan 15588, Republic of Korea
Mathematics, 2024, vol. 12, issue 24, 1-29
Abstract:
Traditional approaches to data mining are generally designed for small, centralized, and static datasets. However, when a dataset grows at an enormous rate, the algorithms become infeasible in terms of huge consumption of computational and I/O resources. Frequent itemset mining (FIM) is one of the key algorithms in data mining and finds applications in a variety of domains; however, traditional algorithms do face problems in efficiently processing large and dynamic datasets. This research introduces a distributed incremental approximation frequent itemset mining (DIAFM) algorithm that tackles the mentioned challenges using shard-based approximation within the MapReduce framework. DIAFM minimizes the computational overhead of a program by reducing dataset scans, bypassing exact support checks, and incorporating shard-level error thresholds for an appropriate trade-off between efficiency and accuracy. Extensive experiments have demonstrated that DIAFM reduces runtime by 40–60% compared to traditional methods with losses in accuracy within 1–5%, even for datasets over 500,000 transactions. Its incremental nature ensures that new data increments are handled efficiently without needing to reprocess the entire dataset, making it particularly suitable for real-time, large-scale applications such as transaction analysis and IoT data streams. These results demonstrate the scalability, robustness, and practical applicability of DIAFM and establish it as a competitive and efficient solution for mining frequent itemsets in distributed, dynamic environments.
Keywords: distributed data mining; MapReduce; large-scale data processing; big data analytics (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.mdpi.com/2227-7390/12/24/3930/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/24/3930/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:24:p:3930-:d:1543309
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().