Abstract:Cross-domain sentiment classification has attracted more attention in natural language processing field currently. It aims to predict the text polarity of target domain with the help of labeled texts in source domain. Usually,traditional supervised classification approaches can not perform well due to the difference of data distribution between domains. In this paper,a weighted SimRank algorithm is proposed to address this problem. The weighted SimRank algorithm is applied to construct a Latent Feature Space (LFS) with feature similarity. Then each sample is reweighted by the mapping function learned from the LFS. After reducing the mismatch of data distribution between domains,the algorithm performs well on cross-domain sentiment classification. The experiment verifies the effectiveness of the proposed algorithm.
[1] Blitzer J, McDonald R, Pereira F. Domain Adaptation with Structural Correspondence Learning // Proc of the Conference on Empirical Methods in Natural Language Processing. Sydney, Australia, 2006: 120-128 [2] Pan S J, Ni Xiaochuan, Sun Jiantao, et al. Cross-Domain Sentiment Classification via Spectral Feature Alignment // Proc of the 19th International Conference on World Wide Web. Raleigh, USA, 2010: 751-760 [3] Meng Jina, Lin Hongfei, Li Yanpeng. Knowledge Transfer Based on Feature Representation Mapping for Text Classification. Expert Systems with Applications, 2011, 38(8): 10562-10567 [4] Wu Qiong, Tan Songbo, Zhai Haijun, et al. SentiRank: Cross-Domain Graph Ranking for Sentiment Classification // Proc of the IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technologies. Milan, Italy, 2009: 309-314 [5] Chen Wei, Zhou Jingyu. A Text Classifier with Domain Adaptation for Sentiment Classification // Cheng P J, Kan M Y, Lam W, eds. Information Retrieval Technology: Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2010: 61-72 [6] Jeh G, Widom J. SimRank: A Measure of Structural-Context Similarity // Proc of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, 2002: 538-543 [7] Ma Yunlong, Lin Hongfei, Jin Song. A Revised Simrank Approach for Query Expansion // Cheng P J, Kan M Y, Lam W, eds. Information Retrieval Technology: Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2010: 564-575 [8] Li Yanan, Xu Sheng, Wang Bin. Chinese Query Recommendation by Weighted SimRank. Journal of Chinese Information Processing, 2010, 24(3): 3-10 (in Chinese) (李亚楠,许 晟,王 斌.基于加权SimRank的中文查询推荐研究.中文信息学报, 2010, 24(3): 3-10) [9] Chen Jiong, Zhang Yongkui. Novel Chinese Text Subject Extraction Method Based on Word Clustering. Computer Applications, 2005, 25(4): 754-756 (in Chinese) (陈 炯,张永奎.一种基于词聚类的中文文本主题抽取方法.计算机应用, 2005, 25(4): 754-756) [10] Tan Songbo. Chinese Sentiment Corpus [EB/OL]. [2012-08-10]. http://www.searchforum.org.cn/tansongbo/corpus-senti.htm (in Chinese) (谭松波.中文情感挖掘语料——ChnSentiCorp [EB/OL]. [2012-08-10]. http://www.searchforum.org.cn/tansongbo/corpus-senti.htm) [11] Institute of Computing Technology, Chinese Academy of Sciences. ICTCLAS [EB/OL]. [2012-08-10]. http://www.ictclas.org (in Chinese) (中国科学院计算技术研究所. ICTCLAS [EB/OL]. [2012-08-10]. http://www.ictclas.org) [12] Li Jing, Lin Hongfei, Zhou Lijuan. Emotion Tag Based Music Retrieval Algorithm // Cheng P J, Kan M Y, Lam W, eds. Information Retrieval Technology: Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2010: 599-609 [13] Lü Shaohua, Yang Liang, Lin Hongfei. Cross-Domain Sentiment Classification Using SimRank. Journal of Chinese Information Processing, 2012, 26(6): 38-44 (in Chinese) (吕韶华, 杨 亮,林鸿飞.基于SimRank的跨领域情感倾向性分析算法研究.中文信息学报, 2012, 26(6): 38-44)