Abstract:The data is usually unlabeled in application,which makes the adaptation of cross-domain effective. However,the sentiment classification is domain-dependent. The feature space of source domain,gotten by feature selection,can not represent the common character of both domains and is not suitable for the classification of target domain. Therefore,an approach of feature selection for cross-domain sentiment classification,Log-Likelihood Ratio-Term Frequency (LLRTF) is proposed. The log likelihood ratios (LLR) of features are computed in source domain,by which the discriminative feature space is gotten. Then,the statistic information term frequency of both domains is added to the LLR,and the features which are more important in target domain are selected. The feature space construction based on the LLRTF reduces the difference between source domain and target domain. The experimental result shows that the LLRTF is superior to the baselines.
[1] Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, Boom- Boxes and Blenders: Domain Adaptation for Sentiment Classification // Proc of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic, 2007: 440-447 [2] Blitzer J, McDonald R, Pereira F. Domain Adaptation with Structural Correspondence Learning // Proc of the Conference on Empirical Methods in Natural Language Processing. Sydney, Australia, 2006: 120-128 [3] Daumé III H, Marcu D. Domain Adaptation for Statistical Classifiers. Journal of Artificial Intelligence Research, 2006, 26(1): 101-126 [4] Tan Songbo, Wang Yuefen, Cheng Xueqi. An Efficient Feature Ranking Measure for Text Categorization // Proc of the ACM Symposium on Applied Computing. Fortaleza, Brazil, 2008: 407-413 [5] Whitehead M, Yaeger L. Building a General Purpose Cross-Domain Sentiment Mining Model // Proc of the WRI World Congress on Computer Science and Information Engineering. Los Augeles, USA, 2009: 472-476 [6] Church K W, Hanks P. Word Association Norms, Mutual Information and Lexicography. Computational Linguistics, 1990, 16(1): 22-29 [7] Pan Weike, Zhong Erheng, Yang Qiang. Transfer Learning for Text Mining // Aggarwal C C, Zhai Chengxiang, eds. Mining Text Data. Berlin, Germany: Springer-Verlag, 2012: 223-257 [8] Pan S J, Ni Xiaochun, Sun Jiantao, et al. Cross-Domain Sentiment Classification via Spectral Feature Alignment // Proc of the 19th International Conference on World Wide Web. Raleigh, USA, 2010: 751-760 [9] Yoshida Y, Hirao T, Iwata T, et al. Transfer Learning for Multiple-Domain Sentiment Analysis-Identifying Domain Dependent/Independent Word Polaritys // Proc of the 25th AAAI Conference on Artificial Intelligence. San Francisco, USA. 2011:1286-1291 [10] Zhuang Fuzhen, Luo Ping, Shen Zhiyong, et al. Collaborative Dual-PLSA: Mining Distinction and Commonality across Multiple Domains for Text Classification // Proc of the 19th ACM International Conference on Information and Knowledge Management. Toronto, Canada, 2010: 359-368 [11] Li Lianghao, Jin Xiaoming, Long Mingsheng. Topic Correlation Analysis for Cross-Domain Text Classification // Proc of the 26th AAAI Conference on Artificial Intelligence. Toronto, Canada, 2012: 998-1004 [12] Mao Yong, Zhou Xiaobo, Xia Zheng, et al. A Survey for Study of Feature Selection Algorithms. Pattern Recognition and Artificial Intelligence, 2007, 20(2): 211-218 (in Chinese) (毛 勇,周晓波,夏 铮,等.特征选择算法研究综述.模式识别与人工智能, 2007, 20(2): 211-218) [13] Nicta V. Cross-Domain Feature Selection for Language Identification // Proc of the 5th International Joint Conference on Natural Language Processing. Chiang Mai, Thailand, 2011: 553-561 [14] Dunning T. Accurate Methods for the Statistics of Surprise and Coincidence.Computational Linguistics, 1993, 19(1): 61-74 [15] Manning C D, Schutze H. Foundations of Statistical Natural Language Processing. Canbridge, USA: MIT Press, 1999 [16] Pan S J, Yang Q.A Survey on Transfer Learning. IEEE Trans on Knowledge and Data Engineering, 2010, 22(10): 1345-1359