Abstract:The sequence recommendation can be formalized as a Markov decision process and then transformed into a deep reinforcement learning problem. Mining critical information from user sequences is a key step, such as preference drift and dependencies between sequences. In most current deep reinforcement learning recommendation systems, a fixed sequence length is taken as the input. Inspired by knowledge graphs, a knowledge-guided adaptive sequence reinforcement learning model is proposed. Firstly, using the entity relationship of the knowledge graph, a partial sequence is intercepted from the complete user feedback sequence as a drift sequence. The item set in the drift sequence represents the user's current preference, and the sequence length represents the user's preference change speed. Then, a gated recurrent unit is utilized to extract the user's preference changes and dependencies between items, while the self-attention mechanism selectively focuses on key item information. Finally, a compound reward function is designed, including discount sequence rewards and knowledge graph rewards, to alleviate the problem of sparse reward.Experiments on four real-world datasets demonstrate that the proposed model achieves superior recommendation accuracy.
[1] 韩爽,王衡.基于时间访问轨迹的文件的智能推荐.软件学报, 2009, 20(S): 59-65. (HAN S, WANG H.Intelligent File Recommendation Based on Time Access Tracking. Journal of Software, 2009, 20(S): 59-65.) [2] 刘旭东,陈德人,王惠敏.一种改进的协同过滤推荐算法.武汉理工大学学报(信息与管理工程版), 2010, 32(4): 550-553. (LIU X D, CHEN D R, WANG H M.A User-Based and Item-Based Collaborative Filtering Recommendation Algorithm. Journal of WUT(Information and Management Engineering), 2010, 32(4): 550-553.) [3] 邓爱林,朱扬勇,施伯乐.基于项目评分预测的协同过滤推荐算法.软件学报, 2003, 14(9): 1621-1628. (DENG A L, ZHU Y Y, SHI B L.A Collaborative Filtering Re-commendation Algorithm Based on Item Rating Prediction. Journal of Software, 2003, 14(9): 1621-1628.) [4] NGUYEN J, ZHU M.Content-Boosted Matrix Factorization Techniques for Recommender Systems. Statistical Analysis and Data Mi-ning(Special Issue on Statistical Learning), 2013, 6(4): 286-301. [5] 童向荣,姜先旭,王莹洁,等.信任网络形成及其在智能推荐中的应用研究进展.小型微型计算机系统, 2017, 38(1): 92-98. (TONG X R, JIANG X X, WANG Y J, et al. Research on the Formation of Trust Network and Its Applications in Intelligent Reco-mmender Systems. Journal of Chinese Computer Systems, 2017, 38(1): 92-98.) [6] 刘全,翟建伟,章宗长,等.深度强化学习综述.计算机学报, 2018, 41(1): 1-27. (LIU Q, ZHAI J W, ZHANG Z C, et al. A Survey on Deep Reinforcement Learning. Chinese Journal of Computers, 2018, 41(1): 1-27.) [7] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous Control with Deep Reinforcement Learning[C/OL].[2022-08-20]. https://arxiv.org/pdf/1509.02971.pdf. [8] COHEN A, YU L, WRIGHT R.Diverse Exploration for Fast and Safe Policy Improvement // Proc of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Application of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto USA: AAAI, 2018: 2876-2883. [9] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-Level Control through Deep Reinforcement Learning. Nature, 2015, 518(7540): 529-533. [10] AFSAR M M, CRUMP T, FAR B.Reinforcement Learning Based Recommender Systems: A Survey. ACM Computing Surveys, 2022, 55. DOI: 10.1145.3543846. [11] JEUNEN O, GOETHALS B.Pessimistic Reward Models for Off-Policy Learning in Recommendation // Proc of the 15th ACM Conference on Recommender Systems. New York USA: ACM, 2021: 63-74. [12] XIE R B, ZHANG S L, WANG R, et al. Hierarchical Reinforcement Learning for Integrated Recommendation // Proc of the AAAI Conference on Artificial Intelligence. Palo Alto USA: AAAI, 2021: 4521-4528. [13] LIU F, GUO H F, LI X T, et al. End-to-End Deep Reinforcement Learning Based Recommendation with Supervised Embedding // Proc of the 13th International Conference on Web Search and Data Mining. New York USA: ACM, 2020: 384-392. [14] ZHAO P, LUO C, ZHOU C, et al. RLNF: Reinforcement Lear-ning Based Noise Filtering for Click-Through Rate Prediction // Proc of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York USA: ACM, 2021: 2268-2272. [15] ZHANG X, JIA H N, SU H J, et al. Counterfactual Reward Modification for Streaming Recommendation with Delayed Feedback // Proc of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York USA: ACM, 2021: 41-50. [16] 秦川,祝恒书,庄福振,等.基于知识图谱的推荐系统研究综述.中国科学(信息科学), 2020, 50(7): 937-956. (QIN C, ZHU H S, ZHUANG F Z, et al. A Survey on Knowledge Graph-Based Recommender Systems. Scientia Sinica(Informationis), 2020, 50(7): 937-956.) [17] XIAN Y K, FU Z H, MUTHUKRISHNAN S, et al. Reinforcement Knowledge Graph Reasoning for Explainable Recommendation // Proc of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York USA: ACM, 2019: 285-294. [18] FU Z H, XIAN Y K, GAO R Y, et al. Fairness-Aware Explainable Recommendation over Knowledge Graphs // Proc of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York USA: ACM, 2020: 69-78. [19] ZHANG F Z, YUAN N J, LIAN D F, et al. Collaborative Know-ledge Base Embedding for Recommender Systems // Proc of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York USA: ACM, 2016: 353-362. [20] HUANG J, ZHAO W X, DOU H J, et al. Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks // Proc of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. New York USA: ACM, 2018: 505-514. [21] ZHANG Y F, AI Q Y, CHEN X, et al. Learning over Knowledge-Base Embeddings for Recommendation[C/OL].[2022-08-20]. https://arxiv.org/pdf/1803.06540.pdf. [22] WANG X, HE X N, CAO Y X, et al. KGAT: Knowledge Graph Attention Network for Recommendation // Proc of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York USA: ACM, 2019: 950-958. [23] 宁泽飞,孙静宇,王欣娟.基于知识图谱和标签感知的推荐算法.计算机科学, 2021, 48(11): 192-198. (NING Z F, SUN J Y, WANG X J.Recommendation Algorithm Based on Knowledge Graph and Tag-Aware. Computer Science, 2021, 48(11): 192-198.) [24] 李想,杨兴耀,于炯,等.基于知识图谱卷积网络的双端推荐算法.计算机科学与探索, 2022, 16(1): 176-184. (LI X, YANG X Y, YU J, et al. Double End Knowledge Graph Convolutional Networks for Recommender Systems. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 176-184.) [25] LEI Y, PEI H B, YAN H Q, et al. Reinforcement Learning Based Recommendation with Graph Convolutional Q-Network // Proc of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York USA: ACM, 2020: 1757-1760. [26] LIU F, TANG R M, LI X T, et al. State Representation Modeling for Deep Reinforcement Learning Based Recommendation. Know-ledge-Based Systems, 2020, 205. DOI: 10.1016/j.knosys.2020.106170. [27] XIN X, KARATZOGLOU A, ARAPAKIS I, et al. Self-Supervised Reinforcement Learning for Recommender Systems // Proc of the 43rd International ACM SIGIR Conference on Research and Deve-lopment in Information Retrieval. New York USA: ACM, 2020: 931-940. [28] HE X, AN B, LI Y H, et al. Learning to Collaborate in Multi-module Recommendation via Multi-agent Reinforcement Learning Without Communication // Proc of the 14th ACM Conference on Recommender Systems. New York USA: ACM, 2020: 210-219. [29] LEI Y, WANG Z T, LI W J, et al. Social Attentive Deep Q-Networks for Recommender Systems. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(5): 2443-2457. [30] 王潇,刘红岩,车尚锟.一种基于深度强化学习的直播推荐方法.信息系统学报, 2022(1): 1-18. (WANG X, LIU H Y, CHE S K.A Supervised Deep Reinforcement Learning Based Live Streaming Recommendation Method. China Journal of Information Systems, 2022(1): 1-18.) [31] 亓法欣,童向荣,于雷.基于强化学习DQN的智能体信任增强.计算机研究与发展, 2020, 57(6): 1227-1238. (QI F X, TONG X R, YU L.Agent Trust Boost via Reinforcement Learning DQN. Journal of Computer Research and Development, 2020, 57(6): 1227-1238.) [32] VASWANI A, SHAZEER N, PARMAR N, et al.Attention Is All You Need // Proc of the 31st Annual Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010. [33] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A Method for Automatic Evaluation of Machine Translation // Proc of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg USA: ACL, 2002: 311-318. [34] ZHAO W X, HE G L, YANG K L, et al. KB4Rec: A Dataset for Linking Knowledge Bases with Recommender Systems. Data Intelligence, 2019, 1(2): 121-136. [35] WANG H W, ZHANG F Z, WANG J L, et al. RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems // Proc of the 27th ACM International Conference on Information and Knowledge Management. New York USA: ACM, 2018: 417-426. [36] SONG L Q, BI Y, YAO M Q, et al. DREAM: A Dynamic Relation-Aware Model for Social Recommendation // Proc of the 29th ACM International Conference on Information and Knowledge Ma-nagement. New York USA: ACM, 2020: 2225-2228. [37] HIDASI B, KARATZOGLOU A, BALTRUNAS L, et al. Session-Based Recommendations with Recurrent Neural Networks[C/OL].[2022-08-20]. https://arxiv.org/pdf/1511.06939.pdf. [38] RENDLE S, FREUDENTHALER C, SCHMIDT-THIEME L.Factorizing Personalized Markov Chains for Next-Basket Recommendation // Proc of the 19th International Conference on World Wide Web. New York USA: ACM, 2010: 811-820.