EconPapers    
Economics at your fingertips  
 

Integrating R and Hadoop for Big Data Analysis

Bogdan Oancea and Raluca Mariana Dragoescu
Additional contact information
Raluca Mariana Dragoescu: The Bucharest University of Economic Studies

Romanian Statistical Review, 2014, vol. 62, issue 2, 83-94

Abstract: Analyzing and working with big data could be very difficult using classical means like relational database management systems or desktop software packages for statistics and visualization. Instead, big data requires large clusters with hundreds or even thousands of computing nodes. Official statistics is increasingly considering big data for deriving new statistics because big data sources could produce more relevant and timely statistics than traditional sources. One of the software tools successfully and wide spread used for storage and processing of big data sets on clusters of commodity hardware is Hadoop. Hadoop framework contains libraries, a distributed file-system (HDFS), a resource-management platform and implements a version of the MapReduce programming model for large scale data processing. In this paper we investigate the possibilities of integrating Hadoop with R which is a popular software used for statistical computing and data visualization. We present three ways of integrating them: R with Streaming, Rhipe and RHadoop and we emphasize the advantages and disadvantages of each solution.

Keywords: big data; Hadoop; R; RHadoop; Rhipe; Streaming (search for similar items in EconPapers)
JEL-codes: C88 L8 (search for similar items in EconPapers)
Date: 2014
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.revistadestatistica.ro/wp-content/uploads/2014/07/RRS_2_2014_a08.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:rsr:journl:v:62:y:2014:i:2:p:83-94

Access Statistics for this article

More articles in Romanian Statistical Review from Romanian Statistical Review Contact information at EDIRC.
Bibliographic data for series maintained by Adrian Visoiu ().

 
Page updated 2025-03-19
Handle: RePEc:rsr:journl:v:62:y:2014:i:2:p:83-94