A Semi-Supervised Rough Set Model for Classification Based on Active Learning and Co-Training
GAO Can, MIAO Duo-Qian, ZHANG Zhi-Fei, LIU Cai-Hui
Department of Computer Science and Technology,College of Electronics and Information Engineering,Tongji University,Shanghai 201804 Key Laboratory of Embedded System and Service Computing,Ministry of Education,Tongji University,Shanghai 201804
Abstract:Rough set theory, as an effective supervised learning model, usually relies on the availability of an amount of labeled data to train the classifier. Howerer, in many practical problems, large amount of unlabeled data are readily available, and labeled ones are fairly expensive to obtain because of high cost. In this paper, a semi-supervised rough set model is proposed to deal with the partially labeled data. The proposed model firstly employs two diverse semi-supervised reducts to train its base classifiers on labeled data. The unlabeled ramified samples for two base classifiers are selected to be labeled based on the principle of active learning, and then the updated classifiers learn from each other by labeling confident unlabeled samples to its concomitant. The experimental results on selected UCI datasets show that the proposed model greatly improves the classification performance of partially labeled data, and even the best performance of dataset is obtained.
高灿,苗夺谦,张志飞,刘财辉. 主动协同半监督粗糙集分类模型[J]. 模式识别与人工智能, 2012, 25(5): 745-754.
GAO Can, MIAO Duo-Qian, ZHANG Zhi-Fei, LIU Cai-Hui. A Semi-Supervised Rough Set Model for Classification Based on Active Learning and Co-Training. , 2012, 25(5): 745-754.
[1] Pawlak Z.Rough Sets.International Journal of Computer and Information Science,1982,11(5): 341-356 [2] Pawlak Z.Rough Sets: Theoretical Aspects of Reasoning about Data.Dordrecht,Netherlands: Kluwer Academic Publishers,1991 [3] Liu Qing.Rough Sets and Rough Reasoning.Beijing,China: Science Press,2001 (in Chinese) (刘 清.Rough集及Rough推理.北京:科学出版社,2001) [4] Wang Guoyin.Rough Sets Theory and Knowledge Acquisition.Xi′an,China: Xi′an Jiaotong University Press,2001 (in Chinese) (王国胤.Rough 集理论与知识获取.西安:西安交通大学出版社,2001) [5] Zhang Wenxiu,Wu Weizhi,Liang Jiye,et al.Rough Sets Theory and Methods.Beijing,China: Science Press,2003 (in Chinese) (张文修,吴伟志,梁吉业,等.粗糙集理论与方法.北京:科学出版社,2003) [6] Liang Jiye,Li Deyu.Uncertainty and Knowledge Acquisition in Information System.Beijing,China: Science Press,2005 (in Chinese) (梁吉业,李德玉.信息系统中的不确定性与知识获取.北京:科学出版社,2005) [7] Miao Duoqian,Li Daoguo.Rough Sets Theory,Algorithms and Applications.Beijing,China: Tsinghua University Press,2008 (in Chinese) (苗夺谦,李道国.粗糙集理论,算法与应用.北京:清华大学出版社,2008) [8] Duan Qiguo,Miao Duoqian.Jin Kaimin.A Rough Set Approach to Classifying Web Page without Negative Examples // Proc of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining.Nanjing,China,2007: 481-488 [9] Lingras P,Chen Min,Miao Duoqian.Semi-Supervised Rough Cost/Benefit Decisions.Fundamenta Informaticae,2009,94(2): 1-12 [10] Gu X P,Tso S K.Applying Rough-Set Concept to Neural-Network-Based Transient-Stability Classification of Power Systems // Proc of the 5th International Conference on Advances in Power System Control,Operation and Management.Hong Kong,China,2000: 400-404 [11] Wang Sheng,Wang Xue,Bi Daowei,et al.Collaborative Statistical Learning with Rough Feature Reduction for Visual Target Classification // Proc of the 5th International Joint Conference on Neural Networks.Hong Kong,China,2008: 1151-1156 [12] Settles B.Active Learning Literature Survey.Computer Sciences Technical Report,1648.Madison,USA: University of Wisconsin-Madison,2009 [13] Long Jun,Yin Jianping,Zhu En,et al.A Survey of Active Learning.Journal of Computer Research and Development,2008,45(Z1): 300-304 (in Chinese) (龙 军,殷建平,祝 恩,等.主动学习研究综述.计算机研究与发展,2008,45(Z1): 300-304) [14] Blum A,Mitchell T M.Combining Labeled and Unlabeled Data with Co-Training // Proc of the 11th Annual Conference on Computational Learning Theory.Madison,USA,1998: 92-100 [15] Zhou Zhihua,Wang Jue.Machine Learning and Its Application.Beijing,China: Tsinghua University Press,2007 (in Chinese) (周志华,王 珏.机器学习及其应用.北京:清华大学出版社,2007) [16] Chapelle O,Schlkopf B,Zien A.Semi-Supervised Learning.Cambridge,USA: MIT Press,2006 [17] Zhu Xiaojin.Semi-Supervised Learning Literature Survey (Revised Edition).Technical Report,1530.Madison,USA: University of Wisconsin-Madison,2008 [18] Liang Jiye,Gao Jiawei,Chang Yu.The Research and Advances on Semi-Supervised Learning.Journal of Shanxi University: Nature Science Edition,2009,32(4): 528-534 (in Chinese) (梁吉业,高嘉伟,常 瑜.半监督学习研究进展.山西大学学报: 自然科学版,2009,32(4): 528-534) [19] Nigam K,Ghani R.Analyzing the Effectiveness and Applicability of Co-Training // Proc of the 9th ACM International Conference on Information and Knowledge Management.McLean,USA,2000: 86-93 [20] Goldman S,Zhou Yan.Enhancing Supervised Learning with Unlabeled Data // Proc of the 17th International Conference on Machine Learning.San Francisco,USA,2000: 327-334 [21] Zhou Zhihua,Li Ming.Tri-Training: Exploiting Unlabeled Data Using Three Classifiers.IEEE Trans on Knowledge and Data Engineering,2005,17(11): 1529-1541 [22] Li Ming,Zhou Zhihua.Improve Computer-Aided Diagnosis with Machine Learning Techniques Using Undiagnosed Samples.IEEE Trans on Systems,Man and Cybernetics,2007,37(6): 1088-1098 [23] Balcan M F,Blum A,Yang K.Co-Training and Expansion: Towards Bridging Theory and Practice // Proc of the 19th Annual Conference on Neural Information Processing Systems.Whistler,Canada,2005: 89-96 [24] Wang Wei,Zhou Zhihua.Analyzing Co-Training Style Algorithms // Proc of the 18th European Conference on Machine Learning.Warsaw,Poland,2007: 454-465 [25] Muslea I,Minton S,Knoblock C.Selective Sampling with Redundant Views // Proc of the 17th National Conference on Artificial Intelligence.Austin,USA,2000: 621-626