EconPapers    
Economics at your fingertips  
 

Analysis of real-time data with spark streaming

Nikitha Johnsirani Venkatesan, Choon Sung Nam, Earl Kim and Dong Ryeol Shin
Additional contact information
Nikitha Johnsirani Venkatesan: School of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea
Choon Sung Nam: School of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea
Earl Kim: School of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea
Dong Ryeol Shin: School of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea

Journal of Advances in Technology and Engineering Research, 2017, vol. 3, issue 4, 108-116

Abstract: Data analysis in real-world application domains is a very challenging issue. For example, Thousand Gigabytes of multimedia data gets poured into Social media each and every minute. Since social media and most of the organizations are dealing with Big Data, tools like Hadoop and Spark system is more appropriate for dealing with those data. Hadoop and Map Reduce analyze the data only in batch mode. This makes it difficult for the real-time analysis because it increases latency. In order to solve the above problem, we used Spark streaming to do real-time data analysis. Spark streaming helps to iterate through the data much faster due to its in-memory processing. This paper presents an online machine learning system for real-time data. Using Spark streaming, data from online messaging system is streamed into the local system. Streaming K-means algorithm is applied to cluster the different languages of the people from various countries. Results show that predictions of the incoming data is accurate and fast the when Apache spark is used. Our results and methods are compared with other articles which have used spark streaming for real-time data processing. Queries like total word count and segregation based on keywords are done and the results are presented. The data are then stored in the local disk for future querying process.

Keywords: Spark Streaming; RDD; Streaming K-means; Slack (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://tafpublications.com/platform/Articles/full-jater3.4.1.php (application/pdf)
https://tafpublications.com/gip_content/paper/jater-3.4.1.pdf (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:apb:jaterr:2017:p:108-116

DOI: 10.20474/jater-3.4.1

Access Statistics for this article

Journal of Advances in Technology and Engineering Research is currently edited by A/Professor Akbar A. Khatibi

More articles in Journal of Advances in Technology and Engineering Research from A/Professor Akbar A. Khatibi Calle Alarcon 66, Sant Adrian De Besos 08930, Barcelona Spain.
Bibliographic data for series maintained by A/Professor Akbar A. Khatibi ().

 
Page updated 2025-03-19
Handle: RePEc:apb:jaterr:2017:p:108-116