Improved software defect prediction using Pruned Histogram-based isolation forest
Zhiguo Ding and
Liudong Xing
Reliability Engineering and System Safety, 2020, vol. 204, issue C
Abstract:
Software defect prediction (SDP) is a hot topic in the modern software engineering research community. It has been used for evaluating software quality and reliability and allocating limited testing resources effectively. Based on analyzing the software source code and development process and extracting critical metrics, many data mining and machine learning methods have been used for SDP. However, these existing learning methods have difficulty with handling the imbalanced data distribution of accumulated training dataset. Isolation forest, an anomaly detection method based on the ensemble learning, has been studied to deal with the imbalanced data distribution issue for obtaining high prediction performance. However, the isolation forest method suffers from a main drawback of slow convergence, which is caused by selecting the feature value at random during the process of building isolation trees. To conquer this problem, in this paper histogram is constructed for the value set of selected isolation feature helping identify feature values preferable to build isolation trees. Motivated by the “many could be better than all†principle in the ensemble learning, the ensemble pruning strategy is further employed to optimize the obtained isolation forest, leading to a novel SDP method named PHIForest (Pruned Histogram-based Isolation Forest) in this work. The proposed method can provide fast convergence through the histogram-based splitting feature value selection, and decrease the ensemble scale and improve prediction performance through the ensemble pruning. Comprehensive experiments conducted on ten real datasets are performed to demonstrate effectiveness of the proposed SDP method.
Keywords: Software defect prediction; Software quality and reliability; Isolation forest; Imbalanced data distribution; Ensemble pruning; Histogram (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0951832020306712
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:reensy:v:204:y:2020:i:c:s0951832020306712
DOI: 10.1016/j.ress.2020.107170
Access Statistics for this article
Reliability Engineering and System Safety is currently edited by Carlos Guedes Soares
More articles in Reliability Engineering and System Safety from Elsevier
Bibliographic data for series maintained by Catherine Liu ().