Abstract:Existing open information extraction methods in the attribute knowledge base extension heavily rely on deep syntax analysis or effective dictionary rules, thus the poor results in short text processing and low recall rates are produced. Therefore, an iterative bootstrapping attribute knowledge base extension algorithm based on word co-occurrence graph is proposed. The co-occurrence relationship between attribute and attribute values is employed to extend the knowledge base and a graph-based community discovery algorithm is designed to find out core nodes of the community. Finally, a model based on convolutional neural network is constructed to denoise the extraction results. Experiments on two real datasets show that the proposed method outperforms the existing ones.
[1] ETZIONI O, CAFARELLA M, DOWNEY D, et al. Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artificial Intelligence, 2005, 165(1): 91-134. [2] FADER A, SODERLAND S, ETZIONI O. Identifying Relations for Open Information Extraction // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2011: 1535-1545. [3] MIN B N, SHI S M, GRISHMAN R, et al. Ensemble Semantics for Large-Scale Unsupervised Relation Extraction // Proc of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroudsburg, USA: ACL, 2012: 1027-1037. [4] AGICHTEIN E, GRAVANO L, PAVEL J, et al. Snowball: A Prototype System for Extracting Relations from Large Text Collections. ACM SIGMOD Record, 2001, 30(2). DOI: 10.1145/376284.375774. [5] YATES A, BANKO M, BROADHEAD M, et al. TextRunner: Open Information Extraction on the Web // Proc of the Human Language Technologies: the Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Stroudsburg, USA: ACL, 2007: 25-26. [6] FORTUNATO S. Community Detection in Graphs. Physics Reports, 2010, 486(3/4/5): 75-174. [7] FORTUNATO S, HRIC D. Community Detection in Networks: A User Guide. Physics Reports, 2016, 659: 1-44. [8] NEWMAN M E J, GIRVAN M. Finding and Evaluating Community Structure in Network. Physical Review E, 2004, 69(2). DOI: 10.1103/PhysRevE.69.026113. [9] SHEN H W, CHENG X Q, CAI K, et al. Detect Overlapping and Hierarchical Community Structure in Networks. Physica A: Statistical Mechanics and Its Applications, 2009, 388(8): 1706-1712. [10] GROVER A, LESKOVEC J. node2vec: Scalable Feature Learning for Networks[C/OL]. [2018-09-15]. https://cs.stanford.edu/people/jure/pubs/node2vec-kdd16.pdf [11] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C/OL]. [2018-09-15]. https://arxiv.org/pdf/1310.4546.pdf. [12] MUNEEB T H, SAHU S K, ANAND A. Evaluating Distributed Word Representations for Capturing Semantics of Biomedical Concepts // Proc of the Workshop on Biomedical Natural Language Processing. Stroudsburg, USA: ACL, 2015: 158-163. [13] TONG H H, FALOUTSOS C, PAN J Y. Fast Random Walk with Restart and Its Applications // Proc of the 6th International Conference on Data Mining. Washington, USA: IEEE, 2006: 613-622. [14] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2012, 3: 993-1022. [15] WALTMAN L, JAN VAN ECK. A Smart Local Moving Algorithm for Large-Scale Modularity-Based Community Detection[J/OL]. [2018-09-15]. https://arxiv.org/ftp/arxiv/papers/1308/1308.6604.pdf. [16] SHAO J M, HAN Z C, YANG Q L, et al. Community Detection Based on Distance Dynamics // Proc of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2015: 1075-1084.