EconPapers    
Economics at your fingertips  
 

Straggler identification approach in large data processing frameworks using ensembled gradient boosting in smart-cities cloud services

Shyam Deshmukh () and Komati Thirupathi Rao ()
Additional contact information
Shyam Deshmukh: Koneru Lakshmaiah Education Foundation
Komati Thirupathi Rao: Koneru Lakshmaiah Education Foundation

International Journal of System Assurance Engineering and Management, 2022, vol. 13, issue 1, No 15, 146-155

Abstract: Abstract A smart city's efficiency must be achieved by mining large amounts of data generated by cyber-physical systems and electronic platforms using the large-scale data processing framework in cloud environment. Many cloud services rely on data parallel computing frameworks in cloud environment, which runs on hundreds of interconnected nodes. These frameworks divide the computationally intensive and data-intensive tasks into smaller tasks and run them concurrently on different nodes to improve performance. But providing improved performance in the processing environment is a challenge due to runtime variability. Due to different internal and external factors, nodes running these tasks do not perform well, resulting in the delay in the execution of these jobs. As a result of the inherent complexity of runtime variability, preventive measures for stragglers proved inadequate, and the problem continued to affect compute workloads even after the measures were taken. Several researchers proposed dynamic straggler identification approaches based on historical log analysis. This paper analyzes the relationship between several parameters obtained during job execution that will aid us in formulating and detecting the stragglers. Using data analysis, we developed the straggler identification approach and labeled the generated dataset. To achieve high performance using statistical features of historical resource usage, the proposed approach trains distributed XGBoost classifier which showed highest accuracy of 88.57%. Furthermore, we have empirically shown that blacklisting predicted stragglers led to a significant reduction in CPU, I/O, and mixed application execution times.

Keywords: Data parallel computing; Smart cities; Spark; Straggler identification; Ubiquitous computing; XGBoost Classifier (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s13198-021-01311-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:ijsaem:v:13:y:2022:i:1:d:10.1007_s13198-021-01311-8

Ordering information: This journal article can be ordered from
http://www.springer.com/engineering/journal/13198

DOI: 10.1007/s13198-021-01311-8

Access Statistics for this article

International Journal of System Assurance Engineering and Management is currently edited by P.K. Kapur, A.K. Verma and U. Kumar

More articles in International Journal of System Assurance Engineering and Management from Springer, The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:ijsaem:v:13:y:2022:i:1:d:10.1007_s13198-021-01311-8