Design of data scoring model for big data
Ranjan Kumar Dash
International Journal of Intelligent Enterprise, 2020, vol. 7, issue 1/2/3, 356-371
Abstract:
The huge volume and variety of data stored in big data provide more accurate predictive platform for the users. However, the decision-making process becomes a tedious task due to requirement of much computational time and memory to access them. Thus, a solution to the said problem is data scoring that provides the selection of only those variables or features that impact the decision-making process to a greater extend. To cater the need of an efficient data scoring model, the work carried out in this paper proposes a new data scoring model for big data. The proposed model uses adaptive LASSO as the statistical method. The steps involved in the design of the proposed model are outlined with proper explanation. The model is trained and tested by k-fold cross validation technique. The performance of the model is measured using ROC curve. The model is simulated using R and is applied on three distinct datasets. To make a comparison with LASSO, LASSO is also applied on these datasets. The simulated results reveal that the adaptive LASSO performs better than LASSO for large-sized datasets.
Keywords: big data; regression analysis; data scoring; receiver operating characteristic curves; discriminant analysis; decision tree; support vector machine; random forest; intelligent system. (search for similar items in EconPapers)
Date: 2020
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=104666 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijient:v:7:y:2020:i:1/2/3:p:356-371
Access Statistics for this article
More articles in International Journal of Intelligent Enterprise from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().