EconPapers    
Economics at your fingertips  
 

DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset Mining

Mohsin Shaikh, Sabina Akram, Jawad Khan (), Shah Khalid and Youngmoon Lee ()
Additional contact information
Mohsin Shaikh: Department of Computer Science, The University of Larkano, Larkana 77062, Pakistan
Sabina Akram: Department of Computer Science and Engineering, Fast National University, Islamabad 44000, Pakistan
Jawad Khan: School of Computing, Gachon University, Seongnam 13120, Republic of Korea
Shah Khalid: School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan
Youngmoon Lee: Department of Robotics, Hanyang University, Ansan 15588, Republic of Korea

Mathematics, 2024, vol. 12, issue 24, 1-29

Abstract: Traditional approaches to data mining are generally designed for small, centralized, and static datasets. However, when a dataset grows at an enormous rate, the algorithms become infeasible in terms of huge consumption of computational and I/O resources. Frequent itemset mining (FIM) is one of the key algorithms in data mining and finds applications in a variety of domains; however, traditional algorithms do face problems in efficiently processing large and dynamic datasets. This research introduces a distributed incremental approximation frequent itemset mining (DIAFM) algorithm that tackles the mentioned challenges using shard-based approximation within the MapReduce framework. DIAFM minimizes the computational overhead of a program by reducing dataset scans, bypassing exact support checks, and incorporating shard-level error thresholds for an appropriate trade-off between efficiency and accuracy. Extensive experiments have demonstrated that DIAFM reduces runtime by 40–60% compared to traditional methods with losses in accuracy within 1–5%, even for datasets over 500,000 transactions. Its incremental nature ensures that new data increments are handled efficiently without needing to reprocess the entire dataset, making it particularly suitable for real-time, large-scale applications such as transaction analysis and IoT data streams. These results demonstrate the scalability, robustness, and practical applicability of DIAFM and establish it as a competitive and efficient solution for mining frequent itemsets in distributed, dynamic environments.

Keywords: distributed data mining; MapReduce; large-scale data processing; big data analytics (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/24/3930/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/24/3930/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:24:p:3930-:d:1543309

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-22
Handle: RePEc:gam:jmathe:v:12:y:2024:i:24:p:3930-:d:1543309