|
|
Automatic Text Summarization Approach Based on Textual Unit Association Networks |
TAO Yu-Hui1, ZHOU Shui-Geng1,2, GUAN Ji-Hong3 |
1.School of Computer Science, Fudan University, Shanghai 200433 2.Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433 3.Department of Computer Science and Technology, Tongji University, Shanghai 200092 |
|
|
Abstract An automatic text summarization approach is proposed based on textual unit association network. The word-based and sentence-based association networks are constructed respectively. For the word, a new approach is used to compute the word weights and then the weight of the sentence is evaluated based on the weights of words contained in the sentence. For the sentence, a new approach is presented to weight the salience of a sentence based on its cooccurrence information. Finally, salient sentences are extracted into the output summary till the desired summary length is satisfied. Experimental results show that the proposed approach can achieve better summarization performance than the existing methods. Moreover, the proposed scheme of term weighting can be used for keyword extraction, text classification and clustering and other information retrieval tasks.
|
Received: 18 April 2008
|
|
|
|
|
[1] Erkan G, Radev D R. LexRank: Graph-Based Lexical Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research, 2004, 22: 457-479 [2] Luhn H P. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development ,1958, 2(2): 159-165 [3] Edmundson H P. New Methods in Automatic Abstracting. Journal of the ACM, 1969, 16(2): 264-285 [4] Barzilay R, Elhadad M. Using Lexical Chains for Text Summarization // Mani I, Maybury M T, eds. Advances in Automatic Text Summarization. Cambridge, USA: MIT Press, 1999: 111-121 [5] Radev D R, Mckeown K R. Generating Natural Language Summaries from Multiple On-line Sources. Computational Linguistics, 1998, 24(3): 470-500 [6] Nenkova A, Vanderwende L. The Impact of Frequency on Summarization. Technical Report, MSR-TR-2005-101, Redmond, USA: Microsoft Research, 2005 [7] Salton G, Singhal A, Mitra M, et al. Automatic Text Structuring and Summarization. Information Processing and Management: An International Journal, 1997, 33(2): 193-207 [8] Mihalcea R, Tarau P. TextRank: Bringing Order into Texts // Proc of the Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain, 2004: 404-411 [9] Sanfilippo A. Ranking Text Units According to Textual Saliency, Connectivity and Topic Aptness // Proc of the 17th International Conference on Computational Linguistics. Québec, Canada, 1998: 1157-1163 [10] Hassan S, Mihalcea R, Banea C. Random-Walk Term Weighting for Improved Text Classification // Proc of the IEEE International Conference on Semantic Computing. Irvine, USA, 2007: 242-249 [11] Ma Yinghua, Wang Yongcheng, Su Guiyang, et al. A Novel Chinese Text Subject Extraction Method Based on Character Co-Occurrence. Journal of Computer Research and Development, 2003, 40(6): 874-878 (in Chinese) (马颖华,王永成,苏贵洋,等.一种基于字同现频率的汉语文本主题抽取方法.计算机研究与发展, 2003, 40(6): 874-878) [12] Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries // Proc of ACL Workshop on Text Summarization. Barcelona, Spain, 2004: 74-81 [13] Lin C Y, Hovy E. Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics // Proc of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, Canada, 2003: 71-78 [14] vander Wende L, Suzuki H, Brockett C. Microsoft Research at DUC 2006: Task-Focused Summarization with Sentence Simplification and Lexical Expansion [EB/OL]. [2006-06-20]. http://duc.nist.gov/pubs/2006papers/duc2006_MSR_final.pdf [15] Dang H T. Overview of DUC 2006 [EB/OL]. [2006-06-20]. http: //www-nlpir.nist.gov/projects/duc/pubs/2006 papers/duc2006.pdf |
|
|
|