Abstract:Aiming at the sentiment classification for Chinese consumption comments, a sentiment classification method combining dictionary semantic concept and context semanteme is proposed. Firstly, a method of extracting benchmark words set of different domains is put forword. Then, the sentiment words are extracted through the similarity of HowNet based on the unigram model. Finally, HowNet and Google similarity distance (HGSD) combining the HowNet similarity and the Google similarity distance is presented to classify the sentences, which reflects the original meaning of the word and the meaning in the context. Experiments of consumption comments on books, computers and hotels show the higher F-measure of the proposed method, and meanwhile the contrast experiment shows the effectiveness of the proposed algorithm.
[1] Cui A Q, Zhang H C, Liu Y Q, et al. Lexicon-Based Sentiment Analysis on Topical Chinese Microblog Messages [EB/OL].[2013-12-20]. http://tcci.ccf.org.cn/conference/2012/dldoc/NLPCC2012papers/workshoppapers/sen/010.pdf [2] Yu L, Ma J, Tsuchiya S, et al. Opinion Mining: A Study on Semantic Orientation Analysis for Online Document // Proc of the 7th World Congress on Intelligent Control and Automation. Chongqing, China, 2008: 4548-4552 [3] Zhang H P, Yu Z G, Xu M, et al. Feature-Level Sentiment Analysis for Chinese Product Reviews // Proc of the 3rd International Conference on Computer Research and Development. Shanghai, China, 2011, II: 135-140 [4] Feng S, Wang L, Xu W L, et al. Unsupervised Learning Chinese Sentiment Lexicon from Massive Microblog Data // Proc of the 8th International Conference on Advanced Data Mining and Applications. Nanjing, China, 2012: 27-38 [5] Turney P D. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews // Proc of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, USA, 2002: 417-424 [6] Turney P D. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL // Proc of the 12th European Conference on Machine Lear-ning. Freiburg, Germany, 2001: 491-502 [7] Turney P D, Littman M L. Measuring Praise and Criticism: Infe-rence of Semantic Orientation from Association. ACM Trans on Information Systems, 2003, 21(4): 315-346 [8] Wang J H, Ye T W. Unsupervised Opinion Targets Expansion and Modification Relation Identification for Microblog Sentiment Analysis // Proc of the 5th International Conference on Social Informatics. Lyoto, Japan, 2013: 255-267 [9] Martinez-Gil J, Aldana-Montes J F. Semantic Similarity Measurement Using Historical Google Search Patterns. Information Systems Frontiers, 2013, 15(3): 399-410 [10] Zong C Q. Statistical Natural Language Processing. Beijing, China: Tsinghua University Press, 2008 (in Chinese) (宗成庆.统计自然语言处理.北京:清华大学出版社, 2008) [11] Pang B, Lee L L, Vaithyanathan S. Thumbs up? Sentiment Classification Using Machine Learning Techniques // Proc of the Confe-rence on Empirical Methods in Natural Language Processing. Philadelphia, USA, 2002: 79-86 [12] Pang B, Lee L L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts // Proc of the 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona, Spain, 2004: 271-278 [13] Dai L L, Huang H Y, Chen Z X. A Comparative Study on Feature Selection in Chinese Text Categorization. Journal of Chinese Information Processing, 2004, 18(1): 26-32 (in Chinese) (代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究.中文信息学报, 2004, 18(1): 26-32) [14] Liu Q, Li S J. Word's Semantic Similarity Computation Method Based on HowNet // Proc of the 3rd Chinese Lexical Semantic Workshop. Taibei, China, 2002: 59-76 (in Chinese) (刘 群,李素建.基于《知网》的词汇语义相似度的计算//第三届汉语词汇语义学研讨会论文集.台北, 2002: 59-76) [15] Cilibrasi R L, Vitanyi P M B. The Google Similarity Distance. IEEE Trans on Knowledge and Data Engineering, 2007, 19(3): 370-383