A case study for performance analysis of big data stream classification using spark architecture
B. Srivani (),
N. Sandhya () and
B. Padmaja Rani ()
Additional contact information
B. Srivani: JNTUH
N. Sandhya: VNRVJIET
B. Padmaja Rani: JNTUCEH
International Journal of System Assurance Engineering and Management, 2024, vol. 15, issue 1, No 23, 253-266
Abstract:
Abstract A variety of huge data is being produced at an incredibly high speed in different sectors. Due to the large location of computing devices, the large volume of information is increasingly growing in the recent decades. A main role of big data is that a large set of data enables the machine learning techniques to obtain more accurate and better results. As the amount of data is exploding, it raises more challenges and opportunities for data analytic research in the data mining domain. The massively parallel databases not only have storage mechanisms but also have compute platforms. The extra capacity in the databases to really put some algorithms and move the data into in-memory to solve the problems. However, the big data stream contains different characteristics, such as high dimensionality, sparsity, volume and velocity. These characteristic features pose huge issues for the classification process when employing traditional data stream classification methods. For huge collection of data, effectively selecting the features and then classifying the data is important to make patterns. Recent feature selection strategies are involving the use of optimization methods for picking a subset of important features to get good classification results. Therefore, in this case study the feature selection is performed based on the Dragonfly Moth Search (DMS) optimization. The performance of the classification method is carried out in two different phases, such as offline and online phase by considering the master and slave node with stacked auto encoder (SAE) in the spark architecture. The parameters like accuracy, sensitivity and specificity metrics are evaluated on the performance of the DMS-SAE method.
Keywords: Stream data; Spark framework; Imbalance data; Classification; Stacked auto encoder (SAE) (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s13198-022-01703-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:ijsaem:v:15:y:2024:i:1:d:10.1007_s13198-022-01703-4
Ordering information: This journal article can be ordered from
http://www.springer.com/engineering/journal/13198
DOI: 10.1007/s13198-022-01703-4
Access Statistics for this article
International Journal of System Assurance Engineering and Management is currently edited by P.K. Kapur, A.K. Verma and U. Kumar
More articles in International Journal of System Assurance Engineering and Management from Springer, The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().