改进随机森林算法在电信业客户流失预测中的应用<sup>*</sup>

doi:10.16451/j.cnki.issn1003-6059.201511010

摘要
图/表
参考文献
相关文章 (9)

全文: PDF (664 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要为有效处理不平衡分类，提高电信业客户流失预测中高价值客户流失预测的准确率，提出改进的随机森林算法(IRFA).该算法改进随机森林中生成每棵树时节点划分的方法，基于客户生命价值划分节点，这是对信息增益的修改，不但解决数据分布不平衡问题，而且提高对有流失倾向的高价值客户预测的准确率.将算法应用于某电信公司的客户流失预测，实验表明，与其他方法相比，IRFA具有更好的分类性能，而且提高高价值客户流失预测的准确率.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	丁君美
	刘贵全
	李慧

关键词 ：流失预测, 随机森林, 不平衡数据

Abstract：An improved random forest algorithm (IRFA) is proposed to handle imbalanced classification and improve the prediction accuracy of high-value customers in telecom customer churn prediction. The node partition method for generating each tree is improved. Nodes are divided based on the life value of customers. Thus the problem of imbalanced data distribution is solved, and the accuracy of churn prediction of high-value customers is raised. IRFA is applied to customer churn prediction for a telecom company. Experimental results show that compared with other methods, the proposed algorithm has a better performance in classification and it improves the accuracy of churn prediction of high-value customers.

Key words： Churn Prediction Random Forest Imbalanced Data

收稿日期: 2014-09-18

ZTFLH:

TP 181

基金资助:中央高校基本科研基金项目(No.WK2100100021)资助

作者简介: 丁君美(通讯作者)，女，1991年生，硕士研究生，主要研究方向为数据挖掘、机器学习.E-mail:dingjm@mail.ustc.edu.cn.刘贵全，男，1970年生，博士，副教授，主要研究方向为机器学习、智能信息处理与挖掘、互联网信息抽取、深度搜索与挖掘.李慧，男，1990年生，硕士研究生，主要研究方向为机器学习.

引用本文:

丁君美，刘贵全，李慧. 改进随机森林算法在电信业客户流失预测中的应用^*[J]. 模式识别与人工智能, 2015, 28(11): 1041-1049. DING Jun-Mei, LIU Gui-Quan, LI Hui. The Application of Improved Random Forest in the Telecom Customer Churn Prediction. , 2015, 28(11): 1041-1049.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201511010 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2015/V28/I11/1041

[1] Hadden J, Tiwari A, Roy R, et al. Computer Assisted Customer Churn Management: State-of-the-Art and Future Trends. Computers & Operations Research, 2007, 34(10): 2902-2917
[2] Coussement K, Van den Poel D. Churn Prediction in Subscription Services: An Application of Support Vector Machines While Comparing Two Parameter-Selection Techniques. Expert Systems with Applications, 2008, 34(1): 313-327
[3] Idris A, Rizwan M, Khan A. Churn Prediction in Telecom Using Random Forest and PSO Based Data Balancing in Combination with Various Feature Selection Strategies. Computers & Electrical Engineering, 2012, 38(6): 1808-1819
[4] Burez J, Van den Poel D. Handling Class Imbalance in Customer Churn Prediction. Expert Systems with Applications, 2009, 36(3): 4626-4636
[5] Dasgupta K, Singh R, Viswanathan B, et al. Social Ties and Their Relevance to Churn in Mobile Telecom Networks // Proc of the 11th International Conference on Extending Database Technology: Advances in Database Technology. Nantes, France, 2008: 668-677
[6] Richter Y, Yom-Tov E, Slonim N. Predicting Customer Churn in Mobile Networks through Analysis of Social Groups // Proc of the SIAM International Conference on Data Mining. Columbus, USA, 2010: 732-741
[7] Saravanan M, Vijay Raajaa G S. A Graph-Based Churn Prediction Model for Mobile Telecom Networks // Proc of the 8th International Conference on Advanced Data Mining and Applications. Nanjing, China, 2012: 367-382
[8] Luo B, Shao P J, Liu J. Customer Churn Prediction Based on the Decision Tree in Personal Handyphone System Service // Proc of the International Conference on Service Systems and Service Management. Chengdu, China, 2007. DOI: 10.1109/ICSSSM.2007.A280145
[9] Tsai C F, Lu Y H. Customer Churn Prediction by Hybrid Neural Networks. Expert Systems with Applications, 2009, 36(10): 12547-12553
[10] Liaw A, Wiener M. Classification and Regression by RandomForest. R News, 2002, 2(3): 18-22
[11] Breiman L. Random Forests. Machine Learning, 2001, 45(1): 5-32
[12] Guyon I, Lemaire V, Boullé M, et al. Design and Analysis of the KDD Cup 2009: Fast Scoring on a Large Orange Customer Database. ACM SIGKDD Explorations Newsletter, 2009, 11(2): 68-76
[13] Larivière B, Van den Poel D. Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques. Expert Systems with Applications, 2005, 29(2): 472-484
[14] Ying W Y, Li X, Xie Y Y, et al. Preventing Customer Churn by Using Random Forests Modeling // Proc of the IEEE International Conference on Information Reuse and Integration. Las Vegas, USA, 2008: 429-434
[15] Breiman L, Friedman J H, Olshen R A, et al. Classification and Regression Trees. New York, USA: Chapman & Hall/CRC Press, 1984
[16] Khoshgoftaar T M, Golawala M, Van Hulse J. An Empirical Study of Learning from Imbalanced Data Using Random Forest // Proc of the 19th IEEE International Conference on Tools with Artificial Intelligence. Patras, Greece, 2007, II: 310-317
[17] Davis J, Goadrich M. The Relationship between Precision-Recall and ROC Curves // Proc of the 23rd International Conference on Machine Learning. Pittsburgh, USA, 2006: 233-240
[18] Veropoulos K, Campbell C, Cristianini N. Controlling the Sensitivity of Support Vector Machines // Proc of the International Joint Conference on Artificial Intelligence. Stockholm, Sweden, 1999: 55-60