模式识别与人工智能
2025年4月17日 星期四   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2024, Vol. 37 Issue (12): 1094-1106    DOI: 10.16451/j.cnki.issn1003-6059.202412005
研究与应用 最新目录| 下期目录| 过刊浏览| 高级检索 |
基于领域知识融合和短语结构约束的冶金文献命名实体识别方法
陈玮1,2, 余正涛1,2, 王振晗1,2
1.昆明理工大学 信息工程与自动化学院 昆明 650504
2.昆明理工大学 云南省人工智能重点实验室 昆明 650500
Named Entity Recognition Method for Metallurgical Literature Based on Domain Knowledge Fusion and Phrase Structure Constraints
CHEN Wei1,2, YU Zhengtao1,2, WANG Zhenhan1,2
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504
2. Key Laboratory of Artificial Intelligence in Yunnan Province, Kunming University of Science and Technology, Kunming 650500

全文: PDF (827 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 冶金命名实体识别旨在识别冶金领域文本中的冶金技术、冶金工艺、术语、金属元素、冶金机构等相关实体,是冶金领域知识获取与整理、热点识别、信息检索的基础.但是,标注数据稀缺,实体类型与通用领域差异显著,包含较多长实体,这些使通用领域命名实体识别模型难以迁移至冶金领域.因此,文中提出基于领域知识融合和短语结构约束的冶金文献命名实体识别方法,利用少量冶金领域的实体标注语料微调模型,增强迁移模型对冶金领域实体结构和相关知识的理解.在微调过程中:一方面,利用冶金领域词典,通过字词匹配词典信息,将字词相关的领域知识融入表示层,增强迁移能力.另一方面,针对长实体匹配问题,设计短语结构约束模型,将字符级输入序列与冶金领域特定的实体规则匹配,识别符合冶金领域特有命名实体结构的实体.在冶金数据集上的实验表明文中方法的准确率有所提升.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈玮
余正涛
王振晗
关键词 中文命名实体识别冶金文献挖掘迁移学习领域知识融入短语结构约束    
Abstract:Metallurgical named entity recognition(NER) aims to identify relevant entities such as metallurgical techniques, processes, terminologies, metallic elements and institutions in the texts of metallurgical domain. Metallurgical NER serves as the foundation for knowledge extraction and organization, hotspot detection, and information retrieval in this field. However, the scarcity of annotated data, the significant differences in entity types compared to general domains and long entities make the transfer of general domain NER models to the metallurgical field challenging. A named entity recognition method for metallurgical literature based on domain knowledge integration and phrase structure constraints is proposed. By fine-tuning the model with a small amount of annotated metallurgical data, the understanding of entity structures and related knowledge in the metallurgical domain is enhanced. During fine-tuning, a metallurgical domain dictionary is leveraged at the representation layer. Through character-word matching, domain-specific knowledge is incorporated into the representation layer to improve the transferability of the model. A phrase structure constraint module is designed to address the challenge of recognizing long entities. Character-level input sequences are matched with metallurgical-specific entity rules, and thus the entities conforming to the unique structures of metallurgical named entities are recognized. Experiments on metallurgical datasets indicate an accuracy improvement for the proposed method.
Key wordsChinese Named Entity Recognition    Metallurgical Literature Mining    Transfer Learning    Domain Knowledge Integration    Phrase Structure Constraint   
收稿日期: 2024-10-11     
ZTFLH: TP 39  
基金资助:国家自然科学基金项目(No.U21B2027)、云南省基础研究计划项目(No.202401AT070361)、云南省计算机技术应用重点实验室开放基金项目(No.140520200151)资助
通讯作者: 余正涛,博士,教授,主要研究方向为自然语言处理、机器翻译.E-mail:ztyu@hotmail.com.   
作者简介: 陈 玮,博士,讲师,主要研究方向为自然语言处理、文本挖掘、信息检索.E-mail:chenwei1983@kust.edu.cn.
王振晗,博士,讲师,主要研究方向为自然语言处理、机器翻译.E-mail:wangzhenhan93@gmail.com.
引用本文:   
陈玮, 余正涛, 王振晗. 基于领域知识融合和短语结构约束的冶金文献命名实体识别方法[J]. 模式识别与人工智能, 2024, 37(12): 1094-1106. CHEN Wei, YU Zhengtao, WANG Zhenhan. Named Entity Recognition Method for Metallurgical Literature Based on Domain Knowledge Fusion and Phrase Structure Constraints. Pattern Recognition and Artificial Intelligence, 2024, 37(12): 1094-1106.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202412005      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2024/V37/I12/1094
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn