EconPapers    
Economics at your fingertips  
 

Improving the Performance of Data Mining by Using Big Data in Cloud Environment

Djilali Dahmani (), Sid Ahmed Rahal () and Ghalem Belalem ()
Additional contact information
Djilali Dahmani: Department of Mathematics and Computer Science, University of Sciences and Technology-Mohammed Boudiaf USTO, Oran, Algeria
Sid Ahmed Rahal: Department of Mathematics and Computer Science, University of Sciences and Technology-Mohammed Boudiaf USTO, Oran, Algeria
Ghalem Belalem: Department of Computer Science, Faculty of Exact and Applied Sciences, University of Oran 1, Ahmed Ben Bella, Oran, Algeria

Journal of Information & Knowledge Management (JIKM), 2016, vol. 15, issue 04, 1-18

Abstract: The volume of business data is increasing very quickly, most of these data are relational. The need to extract knowledge with Data Mining requires keeping all historical data. This complicates more and more the processing and storage of data, and requires further power and capacity which surpass the ability of any machine. So, using distributed environments like cloud computing becomes very useful to share storage and processing between multiple nodes. Unfortunately, data based on relational model cannot be easily used in cloud because of its rigidity and elasticity in such environments. To solve this issue, new big data systems appear such as NoSQL that make data easier to share and distribute in cloud environments. So, this is theoretically beneficial for data mining use case. However, in practice we need to prove it by evaluating performance for both multi-nodes NoSQL and mono-node relational. Also, in case of cloud, it is very interesting to know if performance is still proportionally increasing according to the number of nodes, and if there is an optimum number of nodes in which performance becomes nearly steady or starts dropping off. Motivated by this topic, we propose in this paper an approach to migrate relational data to an appropriate NoSQL system in cloud environment, and then evaluate their performance to capture some interesting results for Data mining. As experimentation, we use industrial data deployed in a data mining process of an oil and gas company. After migrating these data, we perform some experiments to compare and evaluate storage, processing and execution time. As objective, we verify data elasticity, run time performance, and try to find the optimum number of nodes.

Keywords: Big data; data mining; NoSQL; cloud computing; relational data (search for similar items in EconPapers)
Date: 2016
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649216500386
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:15:y:2016:i:04:n:s0219649216500386

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219649216500386

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:jikmxx:v:15:y:2016:i:04:n:s0219649216500386