模式识别与人工智能
2025年4月7日 星期一   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2018, Vol. 31 Issue (12): 1074-1084    DOI: 10.16451/j.cnki.issn1003-6059.201812002
论文与报告 最新目录| 下期目录| 过刊浏览| 高级检索 |
面向社会事件的半监督自训练多方立场分析
林俊杰1, 王磊1, 毛文吉1,2
1.中国科学院自动化研究所 复杂系统管理与控制国家重点实验室 北京 100190
2.中国科学院大学 人工智能学院 北京 100049
Semi-supervised Self-training for Multiple Standpoint Analysis in Social Events
LIN Junjie1,2, WANG Lei1, MAO Wenji1,2
1.State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049

全文: PDF (845 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 已有的立场分析方法主要采用有监督或无监督方式训练立场分类模型,有监督模型训练通常需要大量有标注数据支持,而相比有监督模型,无监督模型的性能差距较大.为了降低模型训练对有标注训练数据的要求,同时保证模型性能,文中面向社会事件相关的社交媒体文本,提出半监督自训练多方立场分析方法.对于自训练方法,在模型迭代训练过程中,选择高质量样本加入训练集合,对提升模型性能起到关键作用.为此,文中方法首先根据用户立场一致性度量文本的分类置信度,然后利用话题信息进一步筛选高质量样本扩充训练集合,保证模型性能不断提升.实验表明,相比相关工作中的代表性方法和其它半监督模型训练方式,文中方法能够取得更优的立场分类效果,并且方法依据的用户立场一致性和话题信息均有助于提升立场分类效果.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
林俊杰
王磊
毛文吉
关键词 多方立场分析半监督自训练用户立场一致性话题信息    
Abstract:Existing methods for standpoint analysis mainly train standpoint classification models in a supervised or unsupervised manner. It usually needs a large number of labeled data to support the training of supervised models. In contrast, the performance of unsupervised models differs greatly from that of the supervised models. To reduce the demand of labeled data in model training, and meanwhile to ensure model performance, this paper proposes a semi-supervised self-training method for multiple standpoint analysis based on social media texts related to social events. For self-training methods, selecting and adding high-quality data to the training dataset play a key role in improving the performance of classification models during the iterative training process. The proposed method first measures the classification confidence of texts based on user-level standpoint consistency. It then leverages topic information to select high-quality texts to expand the training dataset, so as to constantly improve the performance of the model. Experimental results show that the proposed method can achieve better performance in standpoint classification compared with the representative methods in the related work as well as other semi-supervised model training methods. In addition, both the user-level standpoint consistency and topic information used in the method contribute to improve the performance of standpoint classification.
Key wordsMultiple Standpoint Analysis    Semi-supervised    Self-training    User-Level Standpoint Consistency    Topic Information   
收稿日期: 2018-02-12     
ZTFLH: TP 24  
基金资助:国家自然科学基金项目(No.71702181,11832001)资助
作者简介: 林俊杰,博士研究生,主要研究方向为社会媒体分析、文本挖掘.E-mail:linjunjie2013@ia.ac.cn.
王 磊(通讯作者),博士,副研究员,主要研究方向为社会媒体信息处理.E-mail:lei.wang@ia.ac.cn.
毛文吉,博士,研究员,主要研究方向为人工智能、社会计算.E-mail:wenji.mao@ia.ac.cn.
引用本文:   
林俊杰, 王磊, 毛文吉. 面向社会事件的半监督自训练多方立场分析[J]. 模式识别与人工智能, 2018, 31(12): 1074-1084. LIN Junjie, WANG Lei, MAO Wenji. Semi-supervised Self-training for Multiple Standpoint Analysis in Social Events. , 2018, 31(12): 1074-1084.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201812002      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2018/V31/I12/1074
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn