对文本分类评测方法稳定性的研究<sup>*</sup>

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (606 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要文本分类算法一般采用宏平均精度、宏平均召回率以及宏平均F₁值作为评价指标,然而同一个分类器在不同数据集上所得的评测数值往往存在很大差异,使得评测数值只在特定的数据集上有价值,而在其他数据集上没有意义.为了解决这个问题,本文提出3个因素来刻画数据集对分类结果的影响,并利用这3个因素构造一种评测指标newmacroF₁.这一评测指标将数据集的因素从评测过程中独立出来,使得newmacroF₁表示的仅仅是分类算法本身.实验结果表明使用该评测指标同一分类器在不同的数据集上波动较小.通过分类器在1个数据集上的表现,可以近似计算得到该分类器在另一个数据集上的分类质量.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	龚笔宏
	彭波

关键词 ：分类技术, 评测方法, 数据挖掘

Abstract：Macro average precision, macro average recall and macro average F₁ are usually used to evaluate classification technique. But those measures are sensitive to the datasets which means the measures are only valid for specific dataset but invalid for the others. To solve this problem, three factors are proposed to describe how datasets affect the classification result. Then a new evaluation method of categorization called newmacroF₁ is presented according to the three factors. Experimental results show that the new measure remains stable on different datasets and through the performance of an algorithm on one dataset, the precision of other datasets could be estimated with the help of new measure.

Key words： Classification Technique Evaluation Method Data Mining

收稿日期: 2007-03-06

ZTFLH:

TP391

基金资助:国家自然科学基金重点项目(No.60435020)、国家自然科学基金项目( No.60573166,60603056)资助

作者简介: 龚笔宏,女,1979年生,博士研究生,主要研究方向为网页分类、个性化搜索.E-mail:bihong.gong@yahoo.com.cn.彭波,男,1976年生,教授,博士,主要研究方向为搜索引擎.

引用本文:

龚笔宏，彭波. 对文本分类评测方法稳定性的研究^*[J]. 模式识别与人工智能, 2008, 21(1): 12-17. GONG BiHong, PENG Bo. Study of Stability of Text Classification Evaluation. , 2008, 21(1): 12-17.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2008/V21/I1/12

[1] Sebastiani F. A Tutorial on Automated Text Categorization // Proc of the European Symposium on Telematics, Hypermedia and Artificial Intelligence. Varese, Italy, 1999: 105119
[2]Harman D. Evaluation Issues in Information Retrieval. Information Processing and Management, 1992, 28(4): 439440
[3]Yang Yiming, Liu Xin. A ReExamination of Text Categorization Methods // Proc of the ACM SIGIR Conference of the Research and Development in Information Retrieval. Berkeley, USA, 1999: 4249
[4] Gong Bihong. The Guideline of Chinese Webpages Categorization Contest in SEWM2005[EB/OL]. [20050721]. http: //www. cwirf. org/ 2005 Web Track /SEWM 2005 ClassificationTrackGuidelines.pdf (in Chinese)
(龚笔宏.SEWM2005中文网页分类评测指南[EB/OL]. [20050721]. http://www.cwirf.org/ 2005WebTrack/ SEWM2005 ClassificationTrackGuidelines.pdf)
[5] Yang Yiming. An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval, 1999, 1(1/2): 6788
[6] Hulth A, Megyesi B B. A Study on Automatically Extracted Keywords in Text Categorization // Proc of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. Sydney, Australia, 2006: 537544
[7] RuizRico F, Vicedo J L, RubioSánchez M C. NEWPAR: An Automatic Feature Selection and Weighting Schema for Category Ranking // Proc of the ACM Symposium on Document Engineering. Amsterdam, Netherlands, 2006: 128137
[8] Liu Tieyan, Yang Yiming, Wan Hao, et al. An Experimental Study on LargeScale Web Categorization // Proc of the 14th International Conference on World Wide Web. Chiba, Japan, 2005: 11061107
[9]Buckley C, Voorhees E M. Evaluating Evaluation Measure Stability // Proc of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Athens, Greece, 2000: 3340
[10] Aslam J A, Yilmaz E. A Geometric Interpretation and Analysis of RPrecision // Proc of the 14th ACM International Conference on Information and Knowledge Management. Bremen, Germany, 2005: 664671