|
|
Twice Regression Learning and Its Application on Software Effort Estimation |
YANG Zi-Xu, LI Ming |
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023 |
|
|
Abstract Regression learning belongs to supervised learning, which is to build models on examples with real-valued labels. It usually needs a great amount of training samples to obtain significant performance. However, there are few training samples that can be collected in real applications. Aiming at this problem, the neural network ensemble to regression tree(NERT) algorithm is proposed based on the twice learning framework. By means of the virtual sample generation technique, this method makes effective utilization of two sequential learning stages to relieve the problem of insufficient training samples for enhancing its performance. By choosing two methods with high generalization ability and significant comprehensibility respectively for the two stages, a model with two characteristics can be obtained. Results on software effort estimation with few training samples show that NERT is capable of achieving better performance from these small data than existing methods, and reveals the key factors within effort estimation effectively due to its inherent comprehensibility.
|
Received: 13 May 2013
|
|
|
|
|
[1] Mjolsness E, DeCoste D. Machine Learning for Science: State of the Art and Future Prospects. Science, 2001, 293(5537): 2051-2055 [2] Fox J. Applied Regression Analysis, Linear Models and Related Methods. New York, USA: Sage Publications, Inc, 1997 [3] Dejaeger K, Verbeke W, Martens D, et al. Data Mining Techniques for Software Effort Estimation: A Comparative Study. IEEE Trans on Software Engineering, 2012, 38(2): 375-397 [4] Park H, Baek S. An Empirical Validation of a Neural Network Model for Software Effort Estimation. Expert Systems with Applications, 2008, 35(3): 929-937 [5] Finnie G R, Wittig G E, Desharnais J M. A Comparison of Software Effort Estimation Techniques: Using Function Points with Neural Network, Case-Based Reasoning and Regression Models. Journal of Systems and Software, 1997, 39(3): 281-289 [6] Breiman L, Friedman J, Stone C J, et al. Classification and Regre-ssion Trees. Boca Raton, USA: Chapman and Hall/CRC, 1984 [7] Schapire R E. The Strength of Weak Learnability. Machine Learning, 1990, 5(2): 197-227 [8] Zhou Z H, Jiang Y. Medical Diagnosis with C4.5 Rule Preceded by Artificial Neural Network Ensemble. IEEE Trans on Information Technology in Biomedicine, 2003, 7(1): 37-42 [9] Zhou Z H, Jiang Y. NeC4.5: Neural Ensemble Based C4.5. IEEE Trans on Knowledge and Data Engineering, 2004, 16(6): 770-773 [10] Jiang Y, Li M, Zhou Z H. Mining Extremely Small Data Sets with Application to Software Reuse. Software: Practice and Experience, 2009, 39(4): 423-440 [11] Kocaguneli E, Menzies T, Keung J W. On the Value of Ensemble Effort Estimation. IEEE Trans on Software Engineering, 2012, 38(6): 1403-1416 [12] Zhou Z H, Li M. Semi-Supervised Regression with Co-training Style Algorithms. IEEE Trans on Knowledge and Data Engineering, 2007, 19(11): 1479-1493 [13] Menzies T, Caglayan B, Kocaguneli E, et al. The PROMISE Repository of Empirical Software Engineering Data. [DB/OL]. [2012-6-15]. http://promisedata.googlecode.com [14] Albrecht A J, Gaffney J E. Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Va-lidation. IEEE Trans on Software Engineering, 1983, SE-9(6): 639-648 [15] Boehm B W. Software Engineering Economics. Enghewood, USA: Prentice Hall, 1981 [16] Hyndman R J, Koehler A B. Another Look at Measures of Forecast Accuracy. International Journal of Forecasting, 2006, 22(4): 679-688 |
|
|
|