Rough Co-training Model for Incomplete Weakly Labeled Data
GAO Can1,2, ZHOU Jie1,2, GAO Tianyu3, LAI Zhihui1,2
1.College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060 2.Faculty of Applied Science and Textiles, Hong Kong Polytechnic University, Hong Kong 3.School of Minerals Processing and Bioengineering, Central South University, Changsha 410083
Abstract:To address the problem of learning from incomplete weakly labeled data, a semi-supervised co-training model based on rough set theory is proposed. A semi-supervised discernibility matrix is firstly defined and then used to generate two sufficient and diverse semi-supervised reducts. The base classifiers are trained on the labeled data with two reducts, and then the two classifiers are learned from each other on the unlabeled data by labeling the confident unlabeled examples to its concomitant until no eligible unlabeled example is available. Experimental results on selected UCI datasets show that the proposed model achieves better performance on incomplete weakly labeled data compared with other models, and the effectiveness of the proposed model is verified.
[1] PAWLAK Z. Rough Sets. International Journal of Computer and Information Science, 1982, 11(5): 341-356.
[2] PAWLAK Z. Rough Sets: Theoretical Aspects of Reasoning about Data. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1991.
[3] XUE Z X, SHANG Y L, FENG A F. Semi supervised Outlier Detection Based on Fuzzy Rough C means Clustering. Mathematics and Computers in Simulation, 2010, 80(9): 1911-1921.
[4] PARTHALAIN N M, JENSEN R. Fuzzy Rough Set Based Semi supervised Learning // Proc of the IEEE International Conference on Fuzzy Systems. Washington, USA: IEEE, 2011: 2465-2472.
[5] LINGRAS P, CHEN M, MIAO D Q. Semi supervised Rough Cost/Benefit Decisions. Fundamenta Informaticae, 2009, 94(2): 233-244.
[6] SHI L, MA X M, XI L,et al. Rough Set and Ensemble Learning Based Semi supervised Algorithm for Text Classification. Expert Systems with Applications, 2011, 38(5): 6300-6306.
[7] MIAO D Q, GAO C, ZHANG N,et al. Diverse Reduct Subspaces Based Co training for Partially Labeled Data. International Journal of Approximate Reasoning, 2011, 52(8): 1103-1117.
[8] 高 灿,苗夺谦,张志飞,等.主动协同半监督粗糙集分类模型.模式识别与人工智能,2012, 25(5): 745-754.
(GAO C, MIAO D Q, ZHANG Z F,et al. A Semi supervised Rough Set Model for Classification Based on Active Learning and Co training. Pattern Recognition and Artificial Intelligence, 2012,25(5):745-754.)
[9] 张 维,苗夺谦,高 灿,等.邻域粗糙协同分类模型.计算机研究与发展, 2014, 51(8): 1811-1820.
(ZHANG W, MIAO D Q, GAO C,et al. A Neighborhood Rough Sets Based Co training Model for Classification. Journal of Computer Research and Development, 2014, 51(8): 1811-1820.)
[10] DAI J H, HU Q H, ZHANG J H,et al. Attribute Selection for Partially Labeled Categorical Data by Rough Set Approach. IEEE Transactions on Cybernetics, 2017, 47(9): 2460-2471.
[11] KUO C C, SHIEH H L. A Semi supervised Learning Algorithm for Data Classification. International Journal of Pattern Recognition and Artificial Intelligence, 2015, 29(5). DOI: 10.1141/S0218001415510076.
[12] SENGOZ C, RAMANNA S. Learning Relational Facts from the Web: A Tolerance Rough Set Approach. Pattern Recognition Le tters, 2015, 67(2): 130-137.
[13]SKOWRON A, RAUSZER C. The Discernibility Matrices and Functions in Information Systems // SLOWINSKI R, ed. Intelligent Decision Support. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1992: 331-362.
[14] YAO Y Y, ZHAO Y. Discernibility Matrix Simplification for Constructing Attribute Reducts. Information Sciences, 2009, 179(7): 867-882.
[15] BLUM A, MITCHELL T. Combining Labeled and Unlabeled Data with Co training // Proc of the 11th Annual Conference on Computational Learning Theory. New York, USA: ACM, 1998: 92-100.
[16] ZHU X J, GOLDBERG A B. Introduction to Semi supervised Learning. San Rafael, USA: Morgan & Claypool Publishers, 2009.
[17] BALCAN M F, BLUM A, YANG K. Co training and Expansion: Towards Bridging Theory and Practice\[C/OL\]. \[2018-03-02\]. http://www.cs.cmu.edu/~ninamf/papers/cotraining.pdf.
[18] WANG W, ZHOU Z H. Analyzing Co training Style Algorithms // Proc of the 18th European Conference on Machine Learning. Berlin, Germany: Springer, 2007: 454-465.
[19] ANGLUIN D, LAIRD P. Learning from Noisy Examples. Machine Learning, 1988, 2(4): 343-370.
[20] WITTEN I H, FRANK E, HALL M A. Data Mining: Practical Machine Learning Tools and Techniques. 4th Edition. Burlington, USA: Morgan Kaufmann Publishers, 2016.
[21] NIGAM K, GHANI R. Analyzing the Effectiveness and Applicabi lity of Co training // Proc of the 9th International Conference on Information and Knowledge Management. New York, USA: ACM, 2000: 86-93.
[22] DU J, LING C X, ZHOU Z H. When Does Co training Work in Real Data? IEEE Transactions on Knowledge and Data Enginee ring, 2011, 23(5): 788-799.