模式识别与人工智能
Friday, May. 2, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2020, Vol. 33 Issue (3): 249-257    DOI: 10.16451/j.cnki.issn1003-6059.202003006
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Cost Sensitive Random Forest Classification Algorithm for Highly Unbalanced Data
PING Rui1, ZHOU Shuisheng1, LI Dong1
1.School of Mathematics and Statistics, Xidian University, Xi'an 710126

Download: PDF (886 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  For highly unbalanced data, insufficient learning of minority class samples is caused by self-sampling method of the traditional cost sensitive random forest algorithm, and the cost sensitive mechanism of the algorithm is easily weakened by the large proportion of majority class samples. Therefore, a weak balance cost sensitive random forest algorithm based on clustering is proposed. After clustering the majority class samples, the weak balance criterion is used to reduce the samples of each cluster repeatedly. The selected majority class samples and the minority class samples of the original training set are fused to generate a number of new unbalanced datasets for the training of cost sensitive decision tree. The proposed algorithm not only enables the minority class samples to be fully learned, but also ensures that the cost sensitive mechanism is less affected by reducing the majority class samples. Experiment indicates the better performance of the proposed algorithm in processing highly unbalanced datasets.
Key wordsImbalanced Data      Cluster Sampling      Cost Sensitive Learning      Random Forest     
Received: 19 August 2019     
ZTFLH: TP 181  
Fund:Supported by National Natural Science Foundation of China(No.61772020)
Corresponding Authors: ZHOU Shuisheng, Ph.D., professor. His research interests include data mining and machine learning.   
About author:: PING Rui, master student. Her research interests include data mining and machine learning. LI Dong, master student. His research interests include data mining and machine lear-ning.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
PING Rui
ZHOU Shuisheng
LI Dong
Cite this article:   
PING Rui,ZHOU Shuisheng,LI Dong. Cost Sensitive Random Forest Classification Algorithm for Highly Unbalanced Data[J]. , 2020, 33(3): 249-257.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202003006      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2020/V33/I3/249
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn