|
|
Product Attribute Extraction Based on Feature Selection and Pointwise Mutual Information Pruning |
GAO Lei, DAI Xin-Yu, HUANG Shu-Jian, CHEN Jia-Jun |
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023 |
|
|
Abstract Product attribute extraction is a key point in sentiment analysis. In this paper, a product attribute extraction method based on feature selection and pointwise mutual information pruning strategies is proposed. Firstly, the extraction task is transferred to a feature selection task in a classifier. The classification model with l1-norm regularization, such as Lasso, can encourage a sparse model with fewer important selected features. Secondly, some extracted features are selected through a frequency threshold. The features as the product attributes are finally generated with point mutual information pruning. The experiments on the product reviews in Chinese demonstrate the effectiveness of the proposed method.
|
Received: 30 August 2013
|
|
|
|
|
[1] Hatzivassiloglou V, McKeown K R. Predicting the Semantic Orientation of Adjectives // Proc of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the Association for Computational Linguistics. Madrid, Spain, 1997: 174-181 [2] Zhao Y Y, Qin B, Liu T. Sentiment Analysis. Journal of Software, 2010, 21(8): 1834-1848 (in Chinese) (赵妍妍,秦 兵,刘 挺.文本情感分析.软件学报, 2010, 21(8): 1834-1848) [3] Yang H. Research on the Opinion Mining and Hidden Sentiment Inclination for the Web Text. Ph.D Dissertation. Changchun, China: Jilin University, 2011 (in Chinese) (杨 卉.Web文本观点挖掘及隐含情感倾向的研究.博士学位论文.长春:吉林大学, 2011) [4] Popescu A M, Etzioni O. Extracting Product Features and Opinions from Reviews // Proc of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Vancouver, Canada, 2005: 339-346 [5] Hu M Q, Liu B. Mining and Summarizing Customer Reviews // Proc of the 10th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining. Seattle, USA, 2004: 168-177 [6] Qiu G, Liu B, Bu J J, et al. Expanding Domain Sentiment Lexicon through Double Propagation // Proc of the 21st International Joint Conference on Artificial Intelligence. Pasadena, USA, 2009: 1199-1204 [7] Zhang L, Liu B, Lim S H, et al. Extracting and Ranking Product Features in Opinion Documents // Proc of the 23rd International Conference on Computational Linguistics. Beijing, China, 2010: 1462-1470 [8] Hu K Y, Lu Y C, Zhou L Z, et al. Integrating Classification and Association Rule Mining: A Concept Lattice Framework // Proc of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular Soft Computing. Yamaguchi, Japan, 1999: 443-447 [9] Turney P D. Thumbs up or Thumbs down? Semantic Orientation Applied to Unsupervised Classification of Reviews // Proc of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, USA, 2002: 417-424 [10] Titov I, McDonald R. A Joint Model of Text and Aspect Ratings for Sentiment Summarization // Proc of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Columbus, USA, 2008: 308-316 [11] Titov I, McDonald R. Modeling Online Reviews with Multi-grain Topic Models // Proc of the 17th International Conference on World Wide Web. Beijing, China, 2008: 111-120 [12] Sauper C, Barzilay R. Automatic Aggregation by Joint Modeling of Aspects and Values. Journal of Artificial Intelligence Research, 2013, 46: 89-127 [13] Wolberg J. Data Analysis Using the Method of Least Squares: Extracting the Most Information from Experiments. Berlin, Germany: Springer, 2005 [14] Genkin A, Lewis D D, Madigan D. Large-Scale Bayesian Logistic Regression for Text Categorization. Technometrics, 2007, 49(3): 291-304 |
|
|
|