HUA Gui-Chun, ZHANG Min, LIU Yi-Qun, MA Shao-Ping, RU Li-Yun
State Key Laboratory of Intelligent Technology and Systems,Beijing 100084 Tsinghua National Laboratory for Information Science and Technology,Beijing 100084 Department of Computer Science and Technology,Tsinghua University,Beijing 100084
Abstract:Learning to rank,the interdisciplinary field of information retrieval and machine learning, draws increasing attention and lots of models are designed to optimize the ranking functions. However, few methods take the differences among the queries into account. In this paper,the queries are modeled as multivariate Gaussian distributions and Kullback-Leibler divergence is adopted as distance measure. The spectral clustering is applied to cluster the queries into several clusters and a ranking function is learned for each cluster.The experimental results show that the ranking functions with clustering are trained with less data,but are comparable to or even outperform the ones without clustering.
[1] Duh K,Kirchhoff K.Learning to Rank with Partially-Labeled Data // Proc of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Singapore,Singapore,2008: 251-258 [2] Broder A.A Taxonomy of Web Search.ACM SIGIR Forum,2002,36(2): 3-10 [3] Rose D E,Levinson D.Understanding User Goals in Web Search // Proc of the 13th International Conference on World Wide Web.New York,USA,2004: 13-19 [4] Gravano L,Hatzivassiloglou V,Lichtenstein R.Categorizing Web Queries According to Geographical Locality // Proc of the 20th International Conference on Information and Knowledge Management.New Orleans,USA,2003: 325-333 [5] Shen Dou,Sun Jiantao,Yang Qiang,et al.Building Bridges for Web Query Classification // Proc of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Seattle,USA,2006: 131-138 [6] Liu Yiqun,Zhang Min,Ru Liyun,et al.Automatic Query Type Identification Based on Click through Information // Proc of the 3rd Asia Information Retrieval Symposium.Singapore,Singapore,2006: 593-600 [7] Lee U,Liu Zhenyu,Cho J.Automatic Identification of User Goals in Web Search // Proc of the 14th International Conference on World Wide Web.Chiba,Japan,2005: 391-400 [8] Craswell N,Hawking D,Robertson S.Effective Site Finding Using Link Anchor Information // Proc of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New Orleans,USA,2001: 250-257 [9] Westerveld T,Kraaij W,Hiemstra D.Retrieving Web Pages Using Content,Links,URLs and Anchors // Proc of the 10th Text Retrieval Conference.Gaithersburg,USA,2001: 663-672 [10] Kang I,Kim G.Query Type Classification for Web Document Retrieval // Proc of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Toronto,Canada,2003: 64-71 [11] Crammer K,Singer Y.Pranking with Ranking // Proc of the Conference on Neural Information Processing Systems.Whistler,Canada,2002: 641-647 [12] Herbrich R,Graepel T,Obermayer K.Large Margin Rank Boundaries for Ordinal Regression.Advances in Large Margin Classifiers,2000,88(2): 115-132 [13] Joachims T.Optimizing Search Engines Using Click through Data // Proc of the 8th ACM Conference on Knowledge Discovery and Data Mining.Edmonton,Canada,2002: 133-142 [14] Yue Yisong,Finley T,Radlinski F,et al.A Support Vector Method for Optimizing Average Precision // Proc of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Amsterdam,Netherlands,2007: 271-278 [15] Jordan M,Weiss Y.On Spectral Clustering: Analysis and an Algorithm // Dietterich T,Becker S,Ghahramani Z,eds.Advances in Neural Information Processing Systems.Cambridge,USA: MIT Press,XIV: 849-856 [16] Jarvelin K,Kekalainen J.IR Evaluation Methods for Retrieving Highly Relevant Documents // Proc of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Athens,Greece,2000: 41-48 [17] Jarvelin K,Kekalainen J.Cumulated Gain-Based Evaluation of IR Techniques.ACM Trans on Information Systems,2002,20(4): 422-446