Named Entity Recognition Method for Metallurgical Literature Based on Domain Knowledge Fusion and Phrase Structure Constraints
CHEN Wei1,2, YU Zhengtao1,2, WANG Zhenhan1,2
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504 2. Key Laboratory of Artificial Intelligence in Yunnan Province, Kunming University of Science and Technology, Kunming 650500
Abstract:Metallurgical named entity recognition(NER) aims to identify relevant entities such as metallurgical techniques, processes, terminologies, metallic elements and institutions in the texts of metallurgical domain. Metallurgical NER serves as the foundation for knowledge extraction and organization, hotspot detection, and information retrieval in this field. However, the scarcity of annotated data, the significant differences in entity types compared to general domains and long entities make the transfer of general domain NER models to the metallurgical field challenging. A named entity recognition method for metallurgical literature based on domain knowledge integration and phrase structure constraints is proposed. By fine-tuning the model with a small amount of annotated metallurgical data, the understanding of entity structures and related knowledge in the metallurgical domain is enhanced. During fine-tuning, a metallurgical domain dictionary is leveraged at the representation layer. Through character-word matching, domain-specific knowledge is incorporated into the representation layer to improve the transferability of the model. A phrase structure constraint module is designed to address the challenge of recognizing long entities. Character-level input sequences are matched with metallurgical-specific entity rules, and thus the entities conforming to the unique structures of metallurgical named entities are recognized. Experiments on metallurgical datasets indicate an accuracy improvement for the proposed method.
陈玮, 余正涛, 王振晗. 基于领域知识融合和短语结构约束的冶金文献命名实体识别方法[J]. 模式识别与人工智能, 2024, 37(12): 1094-1106.
CHEN Wei, YU Zhengtao, WANG Zhenhan. Named Entity Recognition Method for Metallurgical Literature Based on Domain Knowledge Fusion and Phrase Structure Constraints. Pattern Recognition and Artificial Intelligence, 2024, 37(12): 1094-1106.
[1] ZHU Z C, LI J Q, ZHAO Q, et al. A Dictionary-Guided Attention Network for Biomedical Named Entity Recognition in Chinese Electronic Medical Records. Expert Systems with Applications, 2023, 231. DOI: 10.1016/j.eswa.2023.120709. [2] NIE B L, SHAO Y M, WANG Y G. Know-Adapter: Towards Know-ledge-Aware Parameter-Efficient Transfer Learning for Few-Shot Named Entity Recognition // Proc of the Joint International Confe-rence on Computational Linguistics, Language Resources and Evaluation. Stroudsburg, USA: ACL, 2024: 9777-9786. [3] TIAN X T, BU X X, HE L. Multi-task Learning with Helpful Word Selection for Lexicon-Enhanced Chinese NER. Applied Intelligence, 2023, 53(16): 19028-19043. [4] QIU Q J, TIAN M, HUANG Z, et al. Chinese Engineering Geolo-gical Named Entity Recognition by Fusing Multi-features and Data Enhancement Using Deep Learning. Expert Systems with Applications, 2024, 238. DOI: 10.1016/j.eswa.2023.121925. [5] LIANG S T, HARTMANN M, SONNTAG D. Cross-Domain German Medical Named Entity Recognition Using a Pre-trained Language Model and Unified Medical Semantic Types // Proc of the 5th Clinical Natural Language Processing Workshop. Stroudsburg, USA: ACL,2023: 259-271. [6] 陈 娜,孙艳秋,燕 燕.结合注意力机制的BERT-BiGRU-CRF中文电子病历命名实体识别.小型微型计算机系统, 2023, 44(8): 1680-1685. (CHEN N, SUN Y Q, YAN Y. Named Entity Recognition for Chinese Electronic Medical Record Based on BERT-BiGRU-CRF and Attention Mechanism. Journal of Chinese Computer Systems, 2023, 44(8): 1680-1685.) [7] 祁鹏年,廖雨伦,覃 飙.基于深度学习的中文命名实体识别研究综述.小型微型计算机系统, 2023, 44(9): 1857-1868. (QI P N, LIAO Y L, QIN B. Survey on Deep Learning for Chinese Named Entity Recognition. Journal of Chinese Computer Systems, 2023, 44(9): 1857-1868.) [8] ZHUANG F Z, QI Z Y, DUAN K Y, et al. A Comprehensive Survey on Transfer Learning. Proceedings of the IEEE, 2021, 109(1): 43-76. [9] SUN C, YANG Z H. Transfer Learning in Biomedical Named Entity Recognition: An Evaluation of BERT in the PharmaCoNER Task // Proc of the 5th Workshop on BioNLP Open Shared Tasks. Stroudsburg, USA: ACL, 2019: 100-104. [10] 吴炳潮,邓成龙,关 贝,等.动态迁移实体块信息的跨领域中文实体识别模型.软件学报, 2022, 33(10): 3776-3792. (WU B C, DENG C L, GUAN B, et al. Dynamically Transfer Entity Span Information for Cross-Domain Chinese Named Entity Re-cognition. Journal of Software, 2022, 33(10): 3776-3792.) [11] SMĂDU R A, DINICĂ I R, AVRAM A M, et al. Legal Named Entity Recognition with Multi-task Domain Adaptation // Proc of the Natural Legal Language Processing Workshop. Stroudsburg, USA: ACL, 2022: 305- 321. [12] MA R T, PENG M L, ZHANG Q, et al. Simplify the Usage of Lexicon in Chinese NER // Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2020: 5951-5960. [13] ZHANG Y, YANG J. Chinese NER Using Lattice LSTM // Proc of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL, 2018: 1554-1564. [14] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural Architectures for Named Entity Recognition // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics(Human Language Technologies). Stroudsburg, USA: ACL, 2016: 260-270. [15] MA X Z, HOVY E. End-to-End Sequence Labeling via Bi-directional LSTM-CNNs-CRF // Proc of the 54th Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL, 2016: 1064-1074.