1.合肥工业大学 计算机与信息学院 情感计算与先进智能机器安徽省重点实验室 合肥 230009 2.Faculty of Engineering, University of Tokushima, Tokushima, Japan 770-8506
Biomedical Named Entity Recognition Based on Deep Conditional Random Fields
SUN Xiao1, SUN Chongyuan1, REN Fuji1,2
1.Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, School of Computer and Information, Hefei University of Technology, Hefei 230009 2.Faculty of Engineering, University of Tokushima, Tokushima, Japan 770-8506
Abstract:Biomedical named entity recognition is the fundamental and key step in bioinformatics. In this paper, a biomedical named entity recognition method based on deep conditional random fields is proposed. The deep conditional random fields of multi-layer structure are constructed by stacking the linear-chain conditional random fields and the optimal feature set is built by incremental learning strategy. Finally, error correction algorithm based on full name-abbreviation and error correction algorithm based on domain knowledge are adopted for further modifying the recognition results. Experiments are conducted on the biomedical named entity recognition corpus JNLPBA, and the results demonstrate the effectiveness of the proposed method.
孙晓, 孙重远,任福继. 基于深层条件随机场的生物医学命名实体识别*[J]. 模式识别与人工智能, 2016, 29(11): 997-1008.
SUN Xiao, SUN Chongyuan, REN Fuji. Biomedical Named Entity Recognition Based on Deep Conditional Random Fields. , 2016, 29(11): 997-1008.
[1] MUNKHDALAI T, LI M J, NAMSRAI E, et al. BFSM: Finite State Machine Learned as Name Boundary Definer for Bio Named Entity Recognition // Proc of the 3rd International Conference on Awareness Science and Technology. Washington, USA: IEEE, 2011: 344-349. [2] 张向喆,王明辉,赵洪波,等.生物医学文本中命名实体识别研究.上海交通大学学报(农业科学版), 2010, 28(2): 132-139. (ZHANG X Z, WANG M H, ZHAO H B, et al. Research on Named Entity Recognition from Biomedical Literature. Journal of Shanghai Jiaotong University(Agricultural Science), 2010, 28(2): 132-139.) [3] ATKINSON J, BULL V. A Multi-strategy Approach to Biological Named Entity Recognition. Expert Systems with Applications, 2012, 39(17): 12968-12974. [4] TSURUOKA Y, TSUJII J. Boosting Precision and Recall of Dictio-nary-Based Protein Name Recognition // Proc of the ACL Workshop on Natural Language Processing in Biomedicine. Stroudsburg, USA: Association for Computational Linguistics, 2003: 41-48. [5] KRAUTHAMMER M, RZHETSKY A, MOROZOV P, et al. Using BLAST for Identifying Gene and Protein Names in Journal Articles.GENE, 2000, 259(1/2): 245-252. [6] KANG N, SINGH B, AFZAL Z, et al. Using Rule-Based Natural Language Processing to Improve Disease Normalization in Biomedical Text. Journal of the American Medical Informatics Association, 2013, 20(5): 876-881. [7] ALFRED R, LEONG L C, ON C K, et al. Malay Named Entity Recognition Based on Rule-Based Approach. International Journal of Machine Learning and Computing, 2014, 4(3): 300-306. [8] RAIS M, LACHKAR A, LACHKAR A, et al. A Comparative Study of Biomedical Named Entity Recognition Methods Based Machine Learning Approach // Proc of the 3rd IEEE International Co-lloquium in Information Science and Technology. Washington, USA: IEEE, 2014: 329-334. [9] WANG H C, ZHAO T J, LI S, et al. A Conditional Random Fields Approach to Biomedical Named Entity Recognition. Journal of Electronics, 2007, 24(6): 838-844. [10] LI K L, AI W, TANG Z, et al. Hadoop Recognition of Biomedical Named Entity Using Conditional Random Fields. IEEE Trans on Parallel and Distributed Systems, 2015, 26(11): 3040-3051. [11] HABIB M S, KALITA J. Scalable Biomedical Named Entity Recognition: Investigation of a Database-Supported SVM Approach. International Journal of Bioinformatics Research and Applications, 2010, 6(2): 191-208. [12] 郑 强,刘齐军,王正华,等. 生物医学命名实体识别的研究与进展.计算机应用研究, 2010, 27(3): 811-815. (ZHENG Q, LIU Q J, WANG Z H, et al. Research and Development on Biomedical Named Entity Recognition. Application Research of Computers, 2010, 27(3): 811-815.) [13] 王浩畅,李 钰,赵铁军.面向生物医学命名实体识别的多Agent元学习框架.计算机学报, 2010, 33(7): 1256-1262. (WANG H C, LI Y, ZHAO T J. Biomedical Named Entity Recognition through a Multi-agent Meta-Learning Framework. Chinese Journal of Computers, 2010, 33(7): 1256-1262.) [14] ZHU F, SHEN B R. Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing. PloS one, 2012, 7(6): e39230. [15] MUNKHDALAI T, LI M J, KIM T, et al. Bio Named Entity Re-cognition Based on Co-training Algorithm // Proc of the 26th International Conference on Advanced Information Networking and Applications Workshops. Washington, USA: IEEE, 2012: 857-862. [16] MUNKHDALAI T, LI M J, BATSUREN K, et al. Incorporating Domain Knowledge in Chemical and Biomedical Named Entity Re-cognition with Word Representations. Journal of Cheminformatics, 2015, 7(S1): 1-8. [17] TANG B Z, CAO H X, WANG X L, et al. Evaluating Word Re-presentation Features in Biomedical Named Entity Recognition Tasks. Biomed Research International, 2014(2). DOI: 10.1155/2014/240403. [18] YAO L, LIU H, LIU Y, et al. Biomedical Named Entity Recognition based on Deep Neutral Network. International Journal of Hybrid Information Technology, 2015, 8(8): 279-288. [19] SONG M, YU H, HAN W S. Developing a Hybrid Dictionary-based Bio-entity Recognition Technique. BMC Medical Informatics and Decision Making, 2015, 15(S1). DOI: 10.1186/1472-6947-15-S1-S9. [20] 豆增发,高 琳.应用粒子群优化-条件随机域的文本生物实体识别.西安交通大学学报, 2010, 44(12): 38-42. (DOU Z F, GAO L. A Bio-Entity Recognition Algorithm for Lite-rature by Conditional Random Field Model Based on Improved Particle Swarm Optimizer. Journal of Xi'an Jiaotong University, 2010, 44(12): 38-42.) [21] KIM J D, OHTA T, TSURUOKA Y, et al. Introduction to the Bio-entity Recognition Task at JNLPBA // Proc of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. Stroudsburg, USA: Association for Computational Linguistics, 2004: 70-75. [22] LAFFERTY J, MCCALLUM A, PEREIRA F C N. Conditional Random Fields: Probabilistic Models For Segmenting And Labeling Sequence Data [C/OL]. [2016-01-01]. http://nlp.cs.nyu.edu/nycnlp/lafferty01conditional.pdf. [23] YU D, DENG L, WANG S Z. Learning in the Deep Structured Conditional Random Fields // Proc of the 23rd Annual Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2009: 1-8. [24] TSURUOKA Y, TATEISHI Y, KIM J D, et al. Developing a Robust Part-of-Speech Tagger for Biomedical Text // Proc of the 10th Panhellenic Conference on Informatics. Berlin, Germany: Sprin-ger, 2005: 382-392. [25] YANG L, ZHOU Y H. Exploring Feature Sets for Two-Phase Biomedical Named Entity Recognition Using Semi-CRFs. Knowledge and Information Systems, 2014, 40(2): 439-453. [26] TANG Z, JIANG L G, YANG L, et al. CRFs Based Parallel Biomedical Named Entity Recognition Algorithm Employing MapReduce Framework. Cluster Computing, 2015, 18(2): 493-505. [27] CHANG F X, GUO J, XU W R, et al. Application of Word Embeddings in Biomedical Named Entity Recognition Tasks. Journal of Digital Information Management, 2015, 13(5): 321-327. [28] LIAO Z H, WU H G. Biomedical Named Entity Recognition Based on Skip-Chain CRFS // Proc of the International Conference on Industrial Control and Electronics Engineering. Washington, USA: IEEE, 2012: 1495-1498.