基于标签增强的机器阅读理解模型

doi:10.16451/j.cnki.issn1003-6059.202002002

摘要
图/表
参考文献
相关文章 (6)

全文: PDF (771 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要抽取式问答中已有模型仅建模答案的边界,忽视人的潜在标注过程,导致模型仅学习到表面特征,影响泛化能力.因此,文中提出基于标签增强的机器阅读理解模型(LE-Reader),模拟人的标注过程.LE-Reader模型同时建模答案所在句子、答案内容和答案边界.根据用户标注的答案边界推断正确答案的句子和答案内容作为标签,监督模型的学习过程.通过多任务学习的方式融合3个损失函数.预测时融合3种建模结果,确定最终答案,提高模型的泛化性能.在SQuAD数据集上的实验验证LE-Reader的有效性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	苏立新
	郭嘉丰
	范意兴
	兰艳艳
	程学旗

关键词 ：阅读理解, 多任务学习, 答案抽取

Abstract：In the existing extractive reading comprehension models, only the boundary of answers is utilized as the supervision signal and the labeling processed by human is ignored. Consequently, learned models are prone to learn the superficial features and the generalization performance is degraded. In this paper, a label-enhanced reading comprehension model is proposed to imitate human activity. The answer-bearing sentence, the content and the boundary of the answer are learned simultaneously. The answer-bearing sentence and the content of the answer can be derived from the boundary of the answer and these three types of labels are regarded as supervision signals. The model is trained by multitask learning. During prediction, the probabilities from three predictions are merged to determine the answer, and thus the generalization performance is improved. Experiments on SQuAD dataset demonstrate the effectiveness of LE-Reader model.

Key words： Reading Comprehension Multitask Learning Answer Extraction

收稿日期: 2019-10-18

ZTFLH:

TP 391

基金资助:国家重点研发计划(No.2016QY02D0405)、国家自然科学基金项目(No.61425016,61472401,61722211,61872338,61902381)、中国科学院青年创新促进会项目(No.20144310,2016102)、重庆市基础科学与前沿技术研究专项项目(重点)(No.cstc2017jcjyBX0059)、泰山学者工程专项经费(No.ts201511082)资助

通讯作者: 郭嘉丰,博士,研究员,主要研究方向为数据挖掘、信息检索.E-mail:guojiafeng@ict.ac.cn.

作者简介: 苏立新,博士研究生,主要研究方向为信息检索、问答系统.E-mail:sulixinict@gmail.com.范意兴,博士,助理研究员,主要研究方向为数据挖掘、信息检索.E-mail:fanyixng@ict.ac.cn.兰艳艳,博士,副研究员,主要研究方向为机器学习、数据挖掘.E-mail:lanyanyan@ict.ac.cn.程学旗,博士,研究员,主要研究方向为网络科学与社会计算、互联网搜索与挖掘.E-mail:cxq@ict.ac.cn.

引用本文:

苏立新, 郭嘉丰, 范意兴, 兰艳艳, 程学旗. 基于标签增强的机器阅读理解模型[J]. 模式识别与人工智能, 2020, 33(2): 106-112. SU Lixin, GUO Jiafeng, FAN Yixing, LAN Yanyan, CHENG Xueqi. Label-Enhanced Reading Comprehension Model. , 2020, 33(2): 106-112.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202002002 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2020/V33/I2/106