|
|
Solution to Large Scale Extraction of Social Relations of Persons Based on Web |
YAO CongLei, DI Nan |
Computer Networks and Distributed Systems Laboratory, Peking University, Beijing 100871 |
|
|
Abstract Web information about social relations of persons is an important type of information on the Web. A lightweight method for extracting largescale information of social relations of persons is proposed. The minimum descriptive patterns which are used to describe the social relations in web pages are mined from the web with the help of the simulated annealing method. The descriptive patterns are also used to extract more social relations of persons from the web by the redundancy of the web. Six types of social relations are defined to test the proposed method, and each type of the relations is extracted from a specified person name list, which is created from the web. The experimental result shows the average precision and recall of the proposed method are 84.79% and 81.69% respectively.
|
Received: 08 December 2006
|
|
|
|
|
[1] Li Xiaoming. An Estimation of the Growth of Chinese Web Pages. Acta Scientiarum Naturalium Universitatis Pekinensis, 2003, 39(3): 394398 (in Chinese) (李晓明.对中国曾有过静态网页数的一种估计.北京大学学报:自然科学版, 2003, 39(3): 394398) [2] Miller S, Fox H, Ramshaw L A, et al. A Novel Use of Statistical Parsing to Extract Information from Text // Proc of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics. Seattle, USA, 2000: 226233 [3] Zelenko D, Aone C, Richardella A. Kernel Methods for Relation Extraction. Journal of Machine Learning Research, 2003, 3(6): 10831106 [4] McCallum A. Efficiently Inducing Features of Conditional Random Fields // Proc of the 19th Conference on Uncertainty in Artificial Intelligence. Acapulco, Mexico, 2003: 403410 [5] Brin S. Extracting Patterns and Relations from the World Wide Web // Proc of the WebDB Workshop at the 6th International Conference on Extending Database Technology. Valencia, Spain, 1998: 172183 [6] Agichtein E, Gravano L. Snowball: Extracting Relations from Large PlainText Collections // Proc of the 5th ACM Conference on Digital Libraries. San Antonio, USA, 2000: 8594 [7] Sundaresan N, Yi J. Mining the Web for Relations // Proc of the 9th International World Wide Web Conference on Computer Networks. Amsterdam, Netherlands, 2000: 699711 [8] Kautz H, Selman B, Shah M. Referral Web: Combining Social Networks and Collaborative Filtering. Communications of the ACM, 1997, 40(3): 6365 [9] Kautz H, Selman B, Shah M. The Hidden Web. AI Magazine, 1997, 18(2): 2736 [10] Matsuo Y, Mori J, Hamasaki M, et al. POLYPHONET: An Advanced Social Network Extraction System from the Web // Proc of the 15th International Conference on World Wide Web. New York, USA, 2006: 397406 [11] Li Jianhua, Wang Xiaolong. An Effective Method on Automatic Identification of Chinese Name. Chinese High Technology Letters, 2000, 10(2): 4649 (in Chinese) (李建华,王晓龙.中文人名自动识别的一种有效方法.高技术通讯, 2000, 10(2): 4649) [12] Zhang Huaping, Liu Qun. Automatic Recognition of Chinese Personal Name Based on Role Tagging. Chinese Journal of Computers, 2004, 27(1): 8591 (in Chinese) (张华平,刘 群.基于角色标注的中国人名自动识别研究.计算机学报, 2004, 27(1): 8591) [13] Yao Conglei, Nan D I. Technical Report: Mining the Whole Set of Person Names from the Chinese Web [EB/OL]. [20061201]. http://net.pku.edu.cn/~ycl/WholePersonNamesSet.pdf [14] Kirkpatrick S, Gellat C D, Jr Vecchi M P. Optimization by Simulated Annealing. Science, 1983, 220(4598): 671680 [15] Davis I, Jr Vitiello E. RELATIONSHIP: A Vocabulary for Describing Relationships between People [EB/OL]. [20061201]. http://vocab.org/relationship |
|
|
|