DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset Mining

Shaikh, Mohsin; Akram, Sabina; Khan, Jawad; Khalid, Shah; Lee, Youngmoon

DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset Mining

Mohsin Shaikh, Sabina Akram, Jawad Khan (), Shah Khalid and Youngmoon Lee ()
Additional contact information
Mohsin Shaikh: Department of Computer Science, The University of Larkano, Larkana 77062, Pakistan
Sabina Akram: Department of Computer Science and Engineering, Fast National University, Islamabad 44000, Pakistan
Jawad Khan: School of Computing, Gachon University, Seongnam 13120, Republic of Korea
Shah Khalid: School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan
Youngmoon Lee: Department of Robotics, Hanyang University, Ansan 15588, Republic of Korea

Mathematics, 2024, vol. 12, issue 24, 1-29

Abstract: Traditional approaches to data mining are generally designed for small, centralized, and static datasets. However, when a dataset grows at an enormous rate, the algorithms become infeasible in terms of huge consumption of computational and I/O resources. Frequent itemset mining (FIM) is one of the key algorithms in data mining and finds applications in a variety of domains; however, traditional algorithms do face problems in efficiently processing large and dynamic datasets. This research introduces a distributed incremental approximation frequent itemset mining (DIAFM) algorithm that tackles the mentioned challenges using shard-based approximation within the MapReduce framework. DIAFM minimizes the computational overhead of a program by reducing dataset scans, bypassing exact support checks, and incorporating shard-level error thresholds for an appropriate trade-off between efficiency and accuracy. Extensive experiments have demonstrated that DIAFM reduces runtime by 40–60% compared to traditional methods with losses in accuracy within 1–5%, even for datasets over 500,000 transactions. Its incremental nature ensures that new data increments are handled efficiently without needing to reprocess the entire dataset, making it particularly suitable for real-time, large-scale applications such as transaction analysis and IoT data streams. These results demonstrate the scalability, robustness, and practical applicability of DIAFM and establish it as a competitive and efficient solution for mining frequent itemsets in distributed, dynamic environments.

Keywords: distributed data mining; MapReduce; large-scale data processing; big data analytics (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/24/3930/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/24/3930/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:24:p:3930-:d:1543309

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().