|
|
Semi-supervised Ensemble Learning Based Software Defect Prediction |
WANG Tiejian1, WU Fei2, JING Xiaoyuan1 |
1.State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan 430072 2.School of Automation, Nanjing University of Posts and Telecommunications, Nanjing 210023 |
|
|
Abstract The software defect prediction is usually adversely affected by the limitation of the labeled modules and the class-imbalance of software defect data. Aiming at this problem, a semi-supervised ensemble learning software defect prediction approach is proposed. High-performance classifiers can be built through semi-supervised ensemble learning by using a large amount of unlabeled modules and a better prediction capability is achieved for class-imbalanced data by using a series of weak classifiers to reduce the bias generated by the majority class. With the consideration of the cost of risk in software defect prediction, a sample weight vector updating strategy is employed to reduce the cost of risk caused by misclassifying defective modules as non-defective ones. Experimental results on NASA MDP datasets show better software defect prediction capability of the proposed approach.
|
Received: 20 October 2016
|
|
Fund:Supported by National Natural Science Foundation of China(No.61272273) |
Corresponding Authors:
(JING Xiaoyuan(Corresponding author), born in 1971, Ph.D., professor. His research interests include pattern recognition, machine learning and software engineering.)
|
About author:: (WANG Tiejian, born in 1982, Ph.D. candidate. His research interests include pa-ttern recognition, machine learning and software engineering.) (WU Fei, born in 1989, Ph.D., lecturer. His research interests include pattern recognition, machine learning and software enginee-ring.) (JING Xiaoyuan(Corresponding author), born in 1971, Ph.D., professor. His research interests include pattern recognition, machine learning and software engineering.) |
|
|
|
[1] HALL T, BEECHAM S, BOWES D, et al. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering, 2012, 38(6): 1276-1304. [2] NAM J, PAN S J, KIM S. Transfer Defect Learning // Proc of the International Conference on Software Engineering. Piscataway, USA: IEEE, 2013: 382-391. [3] GRAY D, BOWES D, DAVEY N, et al. The Misuse of the NASA Metrics Data Program Data Sets for Automated Software Defect Prediction // Proc of the 15th Annual Conference on Evaluation & Assessment in Software Engineering. London, UK: IET, 2011: 96-103. [4] SHEPPERD M, SONG Q B, SUN Z B, et al. Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Transactions on Software Engineering, 2013, 9(9): 1208-1215. [5] ZHU X J. Semi-supervised Learning Literature Survey. Technical Report, 1530. Madison, USA: University of Wisconsin-Madison, 2005. [6] GAO K H, KHOSHGOFTAAR T M, NAPOLITANO A. A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction // Proc of the 11th International Conference on Machine Learning and Applications. Washington, USA: IEEE, 2012, II: 281-288. [7] KHOSHGOFTAAR T M, GAO K H, SELIYA N. Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction // Proc of the 22nd IEEE International Conference on Tools with Artificial Intelligence. Washington, USA: IEEE, 2010, I: 137-144. [8] ROKACH L. Ensemble-Based Classifiers. Artificial Intelligence Review, 2010, 33(1): 1-39. [9] SUN Z B, SONG Q B, ZHU X Y. Using Coding-Based Ensemble Learning to Improve Software Defect Prediction. IEEE Transactions on Systems, Man, and Cybernetics(Applications and Reviews), 2012, 42(6): 1806-1817. [10] ZHENG J. Cost-Sensitive Boosting Neural Networks for Software Defect Prediction. Expert Systems with Applications, 2010, 37(6): 4537-4543. [11] MALLAPRAGADA P K, JIN R, JAIN A K, et al. SemiBoost: Boosting for Semi-supervised Learning. IEEE Transactions on Pa-ttern Analysis and Machine Intelligence, 2009, 31(11): 2000-2014. [12] JIANG Y, LI M, ZHOU Z H. Software Defect Detection with ROCUS. Journal of Computer Science and Technology, 2011, 26(2): 328-342. [13] LU H H, CUKIC B, CULP M. Software Defect Prediction Using Semi-supervised Learning with Dimension Reduction // Proc of the 27th IEEE/ACM International Conference on Automated Software Engineering. New York, USA: ACM, 2012: 314-317. [14] CATAL C. A Comparison of Semi-supervised Classification App-roaches for Software Defect Prediction. Journal of Intelligent Systems, 2014, 23(1): 75-82. [15] THUNG F, LE X D, LO D. Active Semi-supervised Defect Categorization // Proc of the 23rd IEEE International Conference on Program Comprehension. Piscataway, USA: IEEE, 2015: 60-70. [16] ZHANG Z W, JING X Y, WANG T J. Label Propagation Based Semi-supervised Learning for Software Defect Prediction. Automated Software Engineering, 2017, 24(1): 47-69. |
|
|
|