A Semantic Similarity Weighted Query Term Proximity Framework for Information Retrieval
QIAO Ya-Nan1,2,LIU Yue-Hu1,QI Yong1
1. School of Electronic and Information Engineering,Xian Jiaotong University,Xian 710049 2. State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093
Abstract:Traditional proximity retrieval models treat query terms equally and they do not distinguish the proximities between query terms. Thus,the parallel concept effect is caused,and the performance of many query term proximity based information retrieval models is affected. A semantic similarity weighted query term proximity framework is proposed.The statistics of query term proximity are weighted in this framework by the semantic similarities between query terms,and then the in-depth information needs can be concluded and mined.Experimental results show that compared with traditional proximity retrieval models,the proposed framework greatly improves the performance of traditional proximity retrieval models and avoids the parallel concept effect efficiently for short queries.
[1] Salton G,Wong A,Yang C S. A Vector Space Model for Information Retrieval. Communications of the ACM,1975,18(11): 613-620 [2] Salton G,Buckley C. Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management: An International Journal,1988,24(5): 513-523 [3] Fuhr N. Probabilistic Models in Information Retrieval. The Computer Journal,1992,35(3): 243-255 [4] Robertson S E,van Rijsbergen C J V,Porter M F. Probabilistic Models of Indexing and Searching // Proc of the 3rd Annual ACM SIGIR Conference on Research and Development in Information Retrieval. Kent,UK,1980: 35-56 [5] Turtle H R,Croft W B. Evaluation of an Inference Network-Based Retrieval Model. ACM Trans on Information Systems,1991,9(3):187-222 [6] Lafferty J,Zhai Chengxiang. Document Language Models,Query Models,and Risk Minimization for Information Retrieval // Proc of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans,USA,2001: 111-119 [7] Lavrenko V,Croft W B. Relevance Based Language Models // Proc of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans,USA,2001: 120-127 [8] Fang Hui,Tao Tao,Zhai Chengxiang. A Formal Study of Information Retrieval Heuristics // Proc of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield,UK,2004: 49-56 [9] Tao Tao,Zhai Chengxiang. An Exploration of Proximity Measures in Information Retrieval // Proc of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam,The Netherlands,2007: 295-302 [10] Cummins R,ORiordan C. Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval // Proc of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Boston,USA,2009: 251-258 [11] Lü Yuanhua,Zhai Chengxiang. Positional Relevance Model for Pseudo-Relevance Feedback // Proc of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva,Switzerland,2010: 579-586 [12] Bendersky M,Metzler D,Croft W B. Learning Concept Importance Using a Weighted Dependence Model // Proc of the 3rd ACM International Conference on Web Search and Data Mining. New York,USA,2010: 31-40 [13] Liu R L,Huang Y C. Ranker Enhancement for Proximity-Based Ranking of Biomedical Texts. Journal of the American Society for Information Science and Technology,2011,62 (12): 2479-2495 [14] Liu Jianwang,Ren Peng. Key Concepts Identification and Weighting in Search Engine Queries // Proc of the 13th Asia-Pacific Web Conference on Web Technologies and Applications. Beijing,China,2011: 357-369 [15] Patwardhan S,Pedersen T. Using Word Net-Based Context Vectors to Estimate the Semantic Relatedness of Concepts // Proc of the EACL Workshop on Making Sense of Sense-Bringing Computational Linguistics and Psycholinguistics Together. Trento,Italy,2006: 1-8 [16] Resnik P. Using Information Content to Evaluate Semantic Similarity in a Taxonomy // Proc of the 14th International Joint Conference on Artificial Intelligence. Montreal,Canada,1995,I: 448-453 [17] Jiang J J,Conrath D W. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy // Proc of the International Conference on Research in Computational Linguistics. Taiwan,China,1997: 19-33 [18] Lin Dekang. An Information-Theoretic Definition of Similarity // Proc of the 15th International Conference on Machine Learning. Madison,USA,1998: 296-304 [19] Robertson S E,Walker S,Jones S,et al. Okapi at TREC-3 // Proc of the 3rd Text Retrieval Conference. Gaithersburg,Zimbabwe,1996: 109-126 [20] Buckley C,Voorhees E M. Retrieval Evaluation with Incomplete Information // Proc of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield,UK,2004: 25-32