Query Expansion Combining Copulas Theory and Association Rules Mining
HUANG Mingxuan1,2, Hu Xiaochun2
1. Guangxi Key Laboratory of Cross-Border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning 530003 2. School of Information and Statistics, Guangxi University of Finance and Economics, Nanning 530003
Abstract:The Copulas theory is introduced into the association pattern mining of text feature terms, and a query expansion algorithm combining Copulas theory and association rules mining is proposed. Firstly, top n documents of the document set returned by the query are extracted to construct the pseudo-relevance feedback document set (PRFDS) or user relevance feedback document set(URFDS). Then, the support and the confidence based on Copulas theory are applied to mine the feature term frequent itemsets and association rule patterns with the original query terms in PRFDS or URFDS, and the expansion terms are obtained from the patterns to realize query expansion. The experimental results on NTCIR-5 CLIR Chinese and English corpus show that the proposed expansion algorithm effectively restrains the problems of query topic drift and word mismatch, and enhances the performance of information retrieval with the quality of expansion terms improved and the invalid expansion terms reduced.
[1] LÜ Y H, ZHAI C X, CHEN W. A Boosting Approach to Improving Pseudo-Relevance Feedback // Proc of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2011: 165-174. [2] VAIDYANATHAN R, DAS S, SRIVASTAVA N. Query Expansion Strategy Based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval. International Journal of Computer Applications, 2014, 105(8): 1-6. [3] KEIKHA A, ENSAN F, BAGHERI E. Query Expansion Using Pseu-do Relevance Feedback on Wikipedia. Journal of Intelligent Information Systems, 2018, 50(3): 455-478. [4] PAN M, HUANG X J, HE T T, et al. A Simple Kernel Co-occu-rrence-Based Enhancement for Pseudo-Relevance Feedback. Journal of the Association for Information Science and Technology, 2020, 71(3): 264-281. [5] LATIRI C, HADDAD H, HAMROUNI T. Towards an Effective Automatic Query Expansion Process Using an Association Rule Mining Approach. Journal of Intelligent Information Systems, 2012, 39(1): 209-247. [6] BOUZIRI A, LATIRI C, GAUSSIER E, et al. Learning Query Expansion from Association Rules between Terms // Proc of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. Washington, USA: IEEE, 2015: 525-530. [7] BOUZIRI A, LATIRI C, GAUSSIER E. Efficient Association Rules Selecting for Automatic Query Expansion // Proc of the 18th International Conference on Computational Linguistics and Intelligent Text Processing. Berlin, Germany: Springer, 2017: 563-574. [8] BOUZIRI A, LATIRI C, GAUSSIER E. LTR-Expand: Query Expansion Model Based on Learning to Rank Association Rules. Journal of Intelligent Information Systems, 2020, 55: 261-286. [9] JABRI S, DAHBI A, GADI T, et al. Improving Retrieval Perfor-mance Based on Query Expansion with Wikipedia and Text Mining Technique. International Journal of Intelligent Engineering and Systems, 2018, 11(4): 283-292. [10] JABRI S, DAHBI A, GADI T. A Graph-Based Approach for Text Query Expansion Using Pseudo Relevance Feedback and Association Rules Mining. International Journal of Electrical and Computer Engineering, 2019, 9(6): 5016-5023. [11] 黄名选,严小卫,张师超.基于矩阵加权关联规则挖掘的伪相关反馈查询扩展.软件学报, 2009, 20(7): 1854-1865. (HUANG M X, YAN X W, ZHANG S C. Query Expansion of Pseudo Relevance Feedback Based on Matrix-Weighted Association Rules Mining. Journal of Software, 2009, 20(7):1854-1865.) [12] 王旭阳.基于本体和用户相关反馈的扩展查询研究.计算机应用, 2008, 28(11): 2958-2960. (WANG X Y. Query Expansion Based on User Relevance Feed-back and Ontology. Computer Applications, 2008, 28(11): 2958-2960.) [13] RAHIMI M, ZAHEDI M. Query Expansion Based on Relevance Feedback and Latent Semantic Analysis. Journal of AI and Data Mining, 2014, 2(1): 79-84. [14] SINGH J, SHARAN A. Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach. Computational Intelligence and Neuroence, 2015. DOI: 10.1155/2015/568197. [15] COLACE F, DE SANTO M, GRECO L, et al. Improving Relevance Feedback-Based Query Expansion by the Use of a Weighted Word Pairs Approach. Journal of the American Society for Information Science and Technology, 2015, 66(11): 2223-2234. [16] 黄名选.基于加权关联模式挖掘的越英跨语言查询扩展.情报学报, 2017, 36(3): 307-318. (HUANG M X. Vietnamese-English Cross Language Query Expansion Based on Weighted Association Patterns Mining. Journal of the China Society for Scientific and Technical Information, 2017, 36(3): 307-318.) [17] 黄名选,蒋曹清.基于项权值排序挖掘的跨语言查询扩展.电子学报, 2020, 48(3): 568-576. (HUANG M X, JIANG C Q. Cross Language Query Expansion Based on Item Weight Sorting Mining. Acta Electronica Sinica, 2020, 48(3): 568-576.) [18] 黄名选.完全加权模式挖掘与相关反馈融合的印尼汉跨语言查询扩展.小型微型计算机系统. 2017, 38(8): 1783-1791. (HUANG M X. Indonesian-Chinese Cross Language Query Expansion Based on All-Weighted Patterns Mining and Relevance Feed-back. Journal of Chinese Computer Systems, 2017, 38(8): 1783-1791.) [19] 黄名选,蒋曹清,何冬蕾.基于矩阵加权关联规则的跨语言查询译后扩展.模式识别与人工智能, 2018, 31(10): 887-898. (HUANG M X, JIANG C Q, HE D L. Cross Language Query Post-Translation Expansion Based on Matrix-Weighted Association Rules. Pattern Recognition and Artificial Intelligence, 2018, 31(10): 887-898.) [20] 黄名选,蒋曹清.基于完全加权正负关联模式挖掘的越-英跨语言查询译后扩展.电子学报, 2018, 46(12): 3029-3036. (HUANG M X, JIANG C Q. Vietnamese-English Cross Language Query Post-Translation Expansion Based on All-Weighted Positive and Negative Association Patterns Mining. Acta Electronica Sinica, 2018, 46(12): 3029-3036.) [21] RUNGSAWANG A, TANGPONG A, LAOHAWEE P, et al. Novel Query Expansion Technique Using Apriori Algorithm[C/OL]. [2020-06-15]. https://trec.nist.gov/pubs/trec8/papers/trec8-ku.pdf. [22] ZHANG H R, ZHANG J W, WEI X Y, et al. A New Frequent Pattern Mining Algorithm with Weighted Multiple Minimum Su-pports. Intelligent Automation and Soft Computing, 2017, 23(4): 605-612. [23] SKLAR A. Fonctions de Repartition à n Dimensions et Leurs Marges. Publication de l'Institut de Statistique de l'Université de Pa-ris, 1959, 8(1): 229-231. [24] EICKHOFF C, DE VRIES A P, COLLINS-THOMPSON K. Copulas for Information Retrieval // Proc of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2013: 663-672. [25] 张书波,张 引,张 斌,等.基于Copulas框架的混合式查询扩展方法.计算机科学, 2016, 43(6A): 485-488, 496. (ZHANG S B, ZHANG Y, ZHANG B. Combined Query Expansion Method Based on Copulas Framework. Computer Science, 2016, 43(6A): 485-488, 496.) [26] NELSON R B. An Introduction to Copulas. 2nd Edition. New York, USA: Springer, 2006.