Abstract:An improved random forest algorithm (IRFA) is proposed to handle imbalanced classification and improve the prediction accuracy of high-value customers in telecom customer churn prediction. The node partition method for generating each tree is improved. Nodes are divided based on the life value of customers. Thus the problem of imbalanced data distribution is solved, and the accuracy of churn prediction of high-value customers is raised. IRFA is applied to customer churn prediction for a telecom company. Experimental results show that compared with other methods, the proposed algorithm has a better performance in classification and it improves the accuracy of churn prediction of high-value customers.
丁君美,刘贵全,李慧. 改进随机森林算法在电信业客户流失预测中的应用*[J]. 模式识别与人工智能, 2015, 28(11): 1041-1049.
DING Jun-Mei, LIU Gui-Quan, LI Hui. The Application of Improved Random Forest in the Telecom Customer Churn Prediction. , 2015, 28(11): 1041-1049.
[1] Hadden J, Tiwari A, Roy R, et al. Computer Assisted Customer Churn Management: State-of-the-Art and Future Trends. Computers & Operations Research, 2007, 34(10): 2902-2917 [2] Coussement K, Van den Poel D. Churn Prediction in Subscription Services: An Application of Support Vector Machines While Comparing Two Parameter-Selection Techniques. Expert Systems with Applications, 2008, 34(1): 313-327 [3] Idris A, Rizwan M, Khan A. Churn Prediction in Telecom Using Random Forest and PSO Based Data Balancing in Combination with Various Feature Selection Strategies. Computers & Electrical Engineering, 2012, 38(6): 1808-1819 [4] Burez J, Van den Poel D. Handling Class Imbalance in Customer Churn Prediction. Expert Systems with Applications, 2009, 36(3): 4626-4636 [5] Dasgupta K, Singh R, Viswanathan B, et al. Social Ties and Their Relevance to Churn in Mobile Telecom Networks // Proc of the 11th International Conference on Extending Database Technology: Advances in Database Technology. Nantes, France, 2008: 668-677 [6] Richter Y, Yom-Tov E, Slonim N. Predicting Customer Churn in Mobile Networks through Analysis of Social Groups // Proc of the SIAM International Conference on Data Mining. Columbus, USA, 2010: 732-741 [7] Saravanan M, Vijay Raajaa G S. A Graph-Based Churn Prediction Model for Mobile Telecom Networks // Proc of the 8th International Conference on Advanced Data Mining and Applications. Nanjing, China, 2012: 367-382 [8] Luo B, Shao P J, Liu J. Customer Churn Prediction Based on the Decision Tree in Personal Handyphone System Service // Proc of the International Conference on Service Systems and Service Management. Chengdu, China, 2007. DOI: 10.1109/ICSSSM.2007.A280145 [9] Tsai C F, Lu Y H. Customer Churn Prediction by Hybrid Neural Networks. Expert Systems with Applications, 2009, 36(10): 12547-12553 [10] Liaw A, Wiener M. Classification and Regression by RandomForest. R News, 2002, 2(3): 18-22 [11] Breiman L. Random Forests. Machine Learning, 2001, 45(1): 5-32 [12] Guyon I, Lemaire V, Boullé M, et al. Design and Analysis of the KDD Cup 2009: Fast Scoring on a Large Orange Customer Database. ACM SIGKDD Explorations Newsletter, 2009, 11(2): 68-76 [13] Larivière B, Van den Poel D. Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques. Expert Systems with Applications, 2005, 29(2): 472-484 [14] Ying W Y, Li X, Xie Y Y, et al. Preventing Customer Churn by Using Random Forests Modeling // Proc of the IEEE International Conference on Information Reuse and Integration. Las Vegas, USA, 2008: 429-434 [15] Breiman L, Friedman J H, Olshen R A, et al. Classification and Regression Trees. New York, USA: Chapman & Hall/CRC Press, 1984 [16] Khoshgoftaar T M, Golawala M, Van Hulse J. An Empirical Study of Learning from Imbalanced Data Using Random Forest // Proc of the 19th IEEE International Conference on Tools with Artificial Intelligence. Patras, Greece, 2007, II: 310-317 [17] Davis J, Goadrich M. The Relationship between Precision-Recall and ROC Curves // Proc of the 23rd International Conference on Machine Learning. Pittsburgh, USA, 2006: 233-240 [18] Veropoulos K, Campbell C, Cristianini N. Controlling the Sensitivity of Support Vector Machines // Proc of the International Joint Conference on Artificial Intelligence. Stockholm, Sweden, 1999: 55-60