|
|
Tag Clustering Method of Joint Topic Model |
HU Xuegang1, LI Huizong1,2, PAN Jianhan3, HE Wei1, YANG Hengyu1 |
1.School of Computer and Information, Hefei University of Technology, Hefei 230009 2. School of Economics and Management, Anhui University of Science and Technology, Huainan 232001 3. School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116 |
|
|
Abstract Improving the clustering quality of social tags is a key problem in the semantics recognition of tags. A joint topic model based on resource is proposed to cluster tags. Firstly, reference relations of the resource are utilized to acquire the authority scores of resource by using random walk method. Secondly, the resource authority is applied to set the weights of two binary relations of resource-tag and resource word. Grounded on that, the joint latent Dirichlet allocation(LDA) model of the word and the tag based on resource weighted is constructed. By iterative learning, the latent topics of the tag are acquired, and the clusters are decided according to the maximum membership degree of the tag. The results show that the proposed method has a better clustering performance than other tag clustering methods based on resource.
|
Received: 05 May 2016
|
|
About author:: (HU Xuegang, born in 1961, Ph.D., professor. His research interests include data mining and information processing.) (LI Huizong(Corresponding author), born in 1979, Ph.D., associate professor. His research interests include intelligent information processing.) (PAN Jianhan, born in 1983, Ph.D., lecturer. His research interests include transfer learning.) (HE Wei, born in 1986, Ph.D. candidate. His research interests include social network.) (YANG Hengyu, born in 1973, Ph.D. candidate. His research interests include information processing.) |
|
|
|
[1] PETERS I. Folksonomies: Indexing and Retrieval in Web 2.0. Berlin, Germany: Walter De Gruyter, 2009: 369-374. [2] LI H Z, HU X G, LIN Y J, et al. A Social Tag Clustering Method Based on Common Co-occurrence Group Similarity. Frontiers of Information Technology & Electronic Engineering, 2016, 17(2): 122-134. [3] XU G D, ZONG Y, JIN P, et al. KIPTC: A Kernel Information Propagation Tag Clustering Algorithm. Journal of Intelligent Information Systems, 2015, 45(1): 95-112. [4] HEYMANN P, GARCIA-MOLINA H. Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems. InfoLab Technical Report, 2006-10. Stanford, USA: Stanford University, 2006. [5] SHEPITSEN A, GEMMELL J, MOBASHER B, et al. Personalized Recommendation in Social Tagging Systems Using Hierarchical Clustering // Proc of the ACM Conference on Recommender Systems. New York, USA: ACM, 2008: 259-266. [6] GEMMELL J, SHEPITSEN A, MOBASHER B, et al. Personalizing Navigation in Folksonomies Using Hierarchical Tag Clustering // Proc of the 10th International Conference on Data Warehousing and Knowledge Discovery. Berlin, Germany: Springer, 2008: 196-205. [7] NOLL M G,MEINEL C. Web Search Personalization via Social Bookmarking and Tagging // Proc of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference. Berlin, Germany: Springer, 2007: 367-380. [8] LEHWARK P, RISI S, ULTSCH A. Visualization and Clustering of Tagged Music Data // Proc of the 31st Annual Conference on Data Analysis,Machine Learning and Applications. Berlin, Germany: Springer, 2008: 673-680. [9] LIU R J, NIU Z D. A Collaborative Filtering Recommendation Algorithm Based on Tag Clustering // PARK J J, STOJMENOVIC I, CHOI M, et al., eds. Future Information Technology. Berlin, Germany: Springer, 2014: 177-183. [10] KRESTEL R, FANKHAUSER P, NEJDL W. Latent Dirichlet Allocation for Tag Recommendation // Proc of the 3rd ACM Conference on Recommender Systems. New York, USA: ACM, 2009: 61-68. [11] BLEI D M, MCAULIFFE J D. Supervised Topic Models[C/OL]. [2016-03-21]. https://arxiv.org/pdf/1003.0783.pdf. [12] BLEI D M, JORDAN M I. Modeling Annotated Data // Proc of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM,2003: 127-134. [13] QIAN Z M, ZHONG P, WANG R S. Class-Specific Gaussian-Multinomial Latent Dirichlet Allocation for Image Annotation. EURASIP Journal on Advances in Signal Processing, 2015. DOI: 10.1186/s13634-015-0224-z. [14] SI X C, LIU Z Y, LI P, et al. Content-Based and Graph-Based Tag Suggestion[C/OL]. [2016-03-21]. http://ceur-ws.org/Vol-497/paper_14.pdf. [15] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003, 3: 993-1022. [16] HOFMANN B T. Probabilistic Latent Semantic Indexing // Proc of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 1999: 50-57. [17] STEYVERS M,GRIFFITHS T. Probabilistic Topic Models // LA-NDAUER T, MCNAMANN D, DENNIS S, et al., eds. Latent Semantic Analysis: A Road to Meaning. Hillsdale, USA: Laurence Erlbaum, 2006. [18] PAGE L,BRIN S,MOTWANI R, et al. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report, 1999-66. Stanford, USA: Stanford University, 1999. [19] PAN J Y, YANG H J, FALOUTSOS C, et al. GCap: Graph-Based Automatic Image Captioning[C/OL]. [2016-03-21]. http://www.informedia.cs.cmu.edu/documents/MDDE04GCap.pdf. [20] LIU J J, LAI W, HUA X S, et al. Video Search Re-ranking via Multi-graph Propagation // Proc of the 15th International Conference on Multimedia. New York, USA: ACM, 2007: 208-217. [21] JIN Y S, BALUJA S. VisualRank: Applying PageRank to Large-Scale Image Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11): 1877-1890. [22] 刘凯鹏,方滨兴.基于社会性标注的本体学习方法.计算机学报, 2010, 33(10): 1823-1834. (LIU K P, FANG B X. Ontology Induction Based on Social Annotations. Chinese Journal of Computers,2010,33(10):1823-1834.) [23] 石 晶,胡 明,石 鑫,等.基于LDA模型的文本分割.计算机学报, 2008, 31(10): 1865-1873. (SHI J, HU M, SHI X, et al. Text Segmentation Based on Model LDA. Chinese Journal of Computers, 2008, 31(10): 1865-1873.) [24] KAUFMAN L, ROUSSEEUW P J. Finding Groups in Data: An Introduction to Cluster Analysis. London, UK: Wiley, 1990. [25] DAVIES D L,BOULDIN D W. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1979, 1(2): 224-227. [26] CUI J W, LIU H Y, HE J, et al. TagClus: A Random Walk-Based Method for Tag Clustering. Knowledge and Information Systems, 2011, 27(2): 193-225. [27] 张小平,周雪忠,黄厚宽,等.一种改进的LDA主题模型.北京交通大学学报, 2010, 34(2): 111-114. (ZHANG X P, ZHOU X Z, HUANG H K, et al. An Improved LDA Topic Model. Journal of Beijing Jiaotong University, 2010, 34(2): 111-114.) |
|
|
|