Abstract:Timely identification of defective modules improves both software quality and testing efficiency. A software metrics-based ensemble k-NN algorithm is proposed for software defect prediction. Firstly, a set of base k-NN predictors is constructed iteratively from different bootstrap sampling datasets. Next, the base k-NN predictors estimate the software module independently and their individual outputs are combined as the composite result. Then, an adaptive threshold training approach is designed for the ensemble to classify new software modules. If the composite result is greater than the threshold value, the software module is recognized as defective, otherwise as normal. Finally, the experiments are conducted on NASA MDP and PROMISE AR datasets. Compared with a widely referenced defect prediction approach, the results show the considerable improvements of the ensemble k-NN and prove the effectiveness of software metrics in defect prediction.
[1] Nikora A,Munson J.Developing Fault Predictors for Evolving Software Systems // Proc of the 9th International Software Metrics Symposium.Sydney,Australia,2003: 338-350 [2] Nagappan N,Ball T.Static Analysis Tools as Early Indicators of Prerelease Defect Density // Proc of the 27th International Conference on Software Engineering.St.Louis,USA,2005: 580-586 [3] Menzies T,Greenwald J,Frank A.Data Mining Static Code Attributes to Learn Defect Predictors.IEEE Trans on Software Engineering,2007,33(1): 2-13 [4] Lessmann S,Baesens B,Mues C,et al.Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings.IEEE Trans on Software Engineering,2008,34 (4): 485-496 [5] Khoshgoftaar T M,Seliya N.Analogy-Based Practical Classification Rules for Software Quality Estimation.Empirical Software Engineering,2003,8(4): 325-350 [6] Emam K E,Benlarbi S,Goel N,et al.Comparing Case-Based Reasoning Classifiers for Predicting High Risk Software Components.Journal of Systems and Software,2001,55(3): 301-320 [7] Turhan B,Bener A.Analysis of Nave Bayes’ Assumptions on Software Fault Data: An Empirical Study.Data and Knowledge Engineering,2009,68(2): 278-290 [8] Khoshgoftaar T M,Allen E B,Hudepohl J P,et al.Application of Neural Networks to Software Quality Modeling of a Very Large Telecommunications System.IEEE Trans on Neural Networks,1997,8(4): 902-909 [9] Zheng Jun.Cost-Sensitive Boosting Neural Networks for Software Defect Prediction.Expert Systems with Applications,2010,37(6): 4537-4543 [10] Selby R W,Porter A A.Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis.IEEE Trans on Software Engineering,1988,14(12): 1743-1757 [11] Guo Lan,Ma Yan,Cukic B,et al.Robust Prediction of Fault-Proneness by Random Forests // Proc of the 15th International Symposium on Software Reliability Engineering.Saint-Malo,France,2004: 417-428 [12] Ceylan E,Kutlubay F O,Bener A B.Software Defect Identification Using Machine Learning Techniques // Proc of the 32nd Euromicro Conference on Software Engineering and Advanced Applications.Cavtat,Croatia,2006: 240-246 [13] Elish K O,Elish M O.Predicting Defect-Prone Software Modules Using Support Vector Machines.Journal of Systems and Software,2008,81(5): 649-660 [14] Shepperd M,Schofield C.Estimating Software Project Effort Using Analogies.IEEE Trans on Software Engineering,1997,23(11): 736-743 [15] Freund Y,Schapire R.Decision-Theoretic Generalization of On-line Learning and an Application to Boosting.Journal of Computer and System Sciences,1997,55(1): 119-139 [16] Drucker H.Improving Regressors Using Boosting Techniques // Proc of the 14th International Conference on Machine Learning.San Francisco,USA,1997: 107-115