基于单类分类器的半监督学习<sup>*</sup>

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (423 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要提出一种结合单类学习器和集成学习优点的Ensemble one-class半监督学习算法.该算法首先为少量有标识数据中的两类数据分别建立两个单类分类器.然后用建立好的两个单类分类器共同对无标识样本进行识别，利用已识别的无标识样本对已建立的两个分类面进行调整、优化.最终被识别出来的无标识数据和有标识数据集合在一起训练一个基分类器，多个基分类器集成在一起对测试样本的测试结果进行投票.在5个UCI数据集上进行实验表明，该算法与tri-training算法相比平均识别精度提高4.5%，与仅采用纯有标识数据的单类分类器相比，平均识别精度提高8.9%.从实验结果可以看出，该算法在解决半监督问题上是有效的.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	缪志敏
	赵陆文
	胡谷雨
	王琼

关键词 ：单类分类器, 半监督学习, 集成学习, 协同训练

Abstract：A semi-supervised learning algorithm is proposed based on one-class classification. Firstly, one-class classifications are built respectively for each class of data on labeled dataset. Then, some unlabeled data are tested by these one-class classifications. The classification results are used to adjust and optimize two classification surfaces. All labeled data and some recognized unlabeled data are used to train a base classifier. According to the classifying results of the base classifiers, the label of the test sample is determined. Experimental results on UCI datasets illustrate that the average detection precision of the proposed algorithm is 4.5% higher than that of the tri-training algorithm and 8.9% higher than that of the classifier trained by pure labeled data.

Key words： One-Class Classification Semi-Supervised Learning Ensemble Learning Co-Training

收稿日期: 2008-12-22

ZTFLH:

TP181

基金资助:国家自然科学基金(No.60603029)、中国博士后基金(No.20080441320)资助项目

作者简介: 缪志敏，女，1978年生，博士，工程师，主要研究方向为网络安全、模式识别.E-mail: olivermiao@126.com.赵陆文，男，1977年生，博士研究生，主要研究方向为认知无线电、通信信号处理.胡谷雨，男，1963年生，教授，博士生导师，主要研究方向为网络管理、网络安全.王琼，女，1979年生，博士研究生，主要研究方向为网络安全，模式识别.

引用本文:

缪志敏，赵陆文，胡谷雨，王琼. 基于单类分类器的半监督学习^*[J]. 模式识别与人工智能, 2009, 22(6): 924-930. MIAO Zhi-Min, ZHAO Lu-Wen, HU Gu-Yu,WANG Qiong. Semi-Supervised Learning Based on One-Class Classification. , 2009, 22(6): 924-930.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2009/V22/I6/924

[1] D'Alche B F, Grandvalet Y, Ambroise C. Semi-Supervised Margin-Boost // Dietterich T G, Becker S, Ghahramani Z, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2002, 14: 553-560
[2] Goldman S A, Zhou Yan. Enhancing Supervised Learning with Unlabeled Data// Proc of the 17th International Conference on Machine Learning. Standford, USA, 2000: 327-334
[3] Blum A, Chawla S. Learning from Labeled and Unlabeled Data Using Graph Min-Cuts // Proc of the 18th International Conference on Machine Learning. Williams College, USA, 2001: 19-26
[4] Nigam K, McCallum A K, Thrun S, et al. Text Classification from Labeled and Unlabeled Documents Using EM. Machine Learning, 2000, 39(2/3): 103-134
[5] Wu Ying, Huang T S, Toyama K. Self-Supervised Learning for Object Recognition Based on Kernel Discriminate-EM Algorithm // Proc of the IEEE International Conference on Computer Vision. Vancouver, Canada, 2001: 275-280
[6] Hwa R, Osborne M, Sarkar A, et al. Corrected Co-Training for Statistical Parsers // Proc of the ICML Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. Washington, USA, 2003: 95-102
[7] Blum A, Mitchell T. Combining Labeled and Unlabeled Data with Co-Training // Proc of the 11th Annual Conference on Computational Learning Theory. Madison, USA, 1998: 92-100
[8] Zhou Zhihua, Li Ming. Tri-training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Trans on Knowledge and Data Engineering, 2005, 17(11): 1529 -1540
[9] Tax D M J. One-Class Classification: Concept Learning in the Absence of Counter-Examples. Ph.D Dissertation. Delft, Netherlands: Delft University of Technology. Faculty of Information Technology and Systems, 2001
[10] Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd Edition. New York, USA: John Wiley & Sons, 2001
[11] Manevitz L M, Yousef M. One-Class SVMs for Document Classification. Journal of Machine Learning Research, 2001, 2(2):139-154
[12] Rtsch G, Schlkopf B, Mika S, et al. SVM and Boosting: One Class. Technical Report, 119, Berlin, Germany: GMD FIRST, 2000
[13] Campbell C, Bennett K P. A Linear Programming Approach to Novelty Detection // Leen T K, Dietterich T G, Tresp V, eds. Advances in Neural Information Processing System. Cambridge, USA: MIT Press, 2001, 13: 203-208
[14] Chen Yunqiang, Zhou Xiang, Huang T S. One-Class SVM for Learning in Image Retrieval // Proc of the IEEE International Conference on Image Processing. Thessaloniki, Greece, 2001: 34-37
[15] Nigam K, Ghani R. Analyzing the Effectiveness and Applicability of Co-Training // Proc of the 9th ACM International Conference on Information and Knowledge Management. McLean, USA, 2000: 86-93
[16] Pierce D, Cardie C. Limitations of Co-Training for Natural Language Learning from Large Data Sets // Proc of the Conference on Empirical Methods in Natural Language Processing. Pittsburgh, USA, 2001: 1-9
[17] Zhou Zhihua, Li Ming. Semi-Supervised Regression with Co-Training Style Algorithms. IEEE Trans on Knowledge and Data Engineering, 2007, 19(11): 1479-1493
[18] Nordita P S, Sollich P, Krogh A. Learning with Ensembles: How Overfitting Can Be Useful // Mozer M C, Jordan M I, Petsche T, eds. Advances in Neural Information Processing Systems. Cambridge, UK: MIT Press, 1996: 190-196
[19] Blake C, Keogh E, Merz C J. UCI Repository of Machine Learning Databases[DB/OL].[2007-05-21]. http://www.ics.uci.edu/~mlearn/MLRepository.html