EconPapers    
Economics at your fingertips  
 

Improved software defect prediction using Pruned Histogram-based isolation forest

Zhiguo Ding and Liudong Xing

Reliability Engineering and System Safety, 2020, vol. 204, issue C

Abstract: Software defect prediction (SDP) is a hot topic in the modern software engineering research community. It has been used for evaluating software quality and reliability and allocating limited testing resources effectively. Based on analyzing the software source code and development process and extracting critical metrics, many data mining and machine learning methods have been used for SDP. However, these existing learning methods have difficulty with handling the imbalanced data distribution of accumulated training dataset. Isolation forest, an anomaly detection method based on the ensemble learning, has been studied to deal with the imbalanced data distribution issue for obtaining high prediction performance. However, the isolation forest method suffers from a main drawback of slow convergence, which is caused by selecting the feature value at random during the process of building isolation trees. To conquer this problem, in this paper histogram is constructed for the value set of selected isolation feature helping identify feature values preferable to build isolation trees. Motivated by the “many could be better than all†principle in the ensemble learning, the ensemble pruning strategy is further employed to optimize the obtained isolation forest, leading to a novel SDP method named PHIForest (Pruned Histogram-based Isolation Forest) in this work. The proposed method can provide fast convergence through the histogram-based splitting feature value selection, and decrease the ensemble scale and improve prediction performance through the ensemble pruning. Comprehensive experiments conducted on ten real datasets are performed to demonstrate effectiveness of the proposed SDP method.

Keywords: Software defect prediction; Software quality and reliability; Isolation forest; Imbalanced data distribution; Ensemble pruning; Histogram (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0951832020306712
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:reensy:v:204:y:2020:i:c:s0951832020306712

DOI: 10.1016/j.ress.2020.107170

Access Statistics for this article

Reliability Engineering and System Safety is currently edited by Carlos Guedes Soares

More articles in Reliability Engineering and System Safety from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:reensy:v:204:y:2020:i:c:s0951832020306712