Composite Quantile Regression Neural Network for Massive Datasets
Jun Jin and
Zhongxun Zhao
Mathematical Problems in Engineering, 2021, vol. 2021, 1-10
Abstract:
Traditional statistical methods and machine learning on massive datasets are challenging owing to limitations of computer primary memory. Composite quantile regression neural network (CQRNN) is an efficient and robust estimation method. But most of existing computational algorithms cannot solve CQRNN for massive datasets reliably and efficiently. In this end, we propose a divide and conquer CQRNN (DC-CQRNN) method to extend CQRNN on massive datasets. The major idea is to divide the overall dataset into some subsets, applying the CQRNN for data within each subsets, and final results through combining these training results via weighted average. It is obvious that the demand for the amount of primary memory can be significantly reduced through our approach, and at the same time, the computational time is also significantly reduced. The Monte Carlo simulation studies and an environmental dataset application verify and illustrate that our proposed approach performs well for CQRNN on massive datasets. The environmental dataset has millions of observations. The proposed DC-CQRNN method has been implemented by Python on Spark system, and it takes 8 minutes to complete the model training, whereas a full dataset CQRNN takes 5.27 hours to get a result.
Date: 2021
References: Add references at CitEc
Citations:
Downloads: (external link)
http://downloads.hindawi.com/journals/MPE/2021/6682793.pdf (application/pdf)
http://downloads.hindawi.com/journals/MPE/2021/6682793.xml (text/xml)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlmpe:6682793
DOI: 10.1155/2021/6682793
Access Statistics for this article
More articles in Mathematical Problems in Engineering from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().