|
|
Text Summary Generation ModelBased on Sentence Fusion and Self-Supervised Training |
ZOU Ao1, HAO Wenning1, JIN Dawei1, CHEN Gang1 |
1. College of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007 |
|
|
Abstract To improve the capability of sentence fusion of deep neural network text generation technique, a text summary generation model based on sentence fusion and self-supervised training is proposed. Before the model training, the training data are firstly pre-processed according to the concept of points of correspondence in the theory of sentence fusion, and thus the data can meet the needs of model training. The training of the proposed model falls into two parts. In the first stage, according to the distribution of the sentence fusion phenomenon in the dataset, the training task of the permutation language model is designed with the points of correspondence as the minimum semantic unit to enhance the ability to capture the information of the fused sentence context. In the second stage, an attention masking strategy based on the fusion information is utilized to control the information intake of the model during the text generation process to enhance the fusion ability in the text generation stage. Experiments on the open dataset show that the proposed model is superior in several evaluation metrics, including those based on statistics, deep semantics and sentence fusion ratio.
|
Received: 25 February 2022
|
|
Fund:National Natural Science Foundation of China(No.61806221) |
Corresponding Authors:
HAO Wenning, Ph.D., professor. His research interests include data mining and machine learning.
|
About author:: ZOU Ao, Ph.D. candidate. His research interests include natural language processing and deep learning. JIN Dawei, master, associate professor. His research interests include big data and text data mining. CHEN Gang, master, professor. His research interests include data simulation and deep learning. |
|
|
|
[1] LUHN H P. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, 1958, 2(2): 159-165. [2] 李金鹏,张闯,陈小军,等.自动文本摘要研究综述.计算机研究与发展, 2021, 58(1): 1-21. (LI J P, ZHANG C, CHEN X J, et al. Survey on Automatic Text Summarization. Journal of Computer Research and Development, 2021, 58(1): 1-21.) [3] CAO Z Q, WEI F R, DONG L, et al. Ranking with Recursive Neural Networks and Its Application to Multi-document Summarization // Proc of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2015: 2153-2159. [4] CHENG J P, LAPATA M.Neural Summarization by Extracting Sentences and Words[C/OL]. [2022-01-15].https://arxiv.org/pdf/1603.07252.pdf. [5] NALLAPATI R, ZHAI F F, ZHOU B W. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents // Proc of the 31st AAAI Conference on Artificial Intelligence. UPalo Alto, SA: AAAI, 2017: 3075-3081. [6] SEE A, LIU P J, MANNING C D.Get to the Point: Summarization with Pointer-Generator Networks[C/OL]. [2022-01-15].https://arxiv.org/pdf/1704.04368.pdf. [7] KRYŚCIŃSKI W, PAULUS R, XIONG C M, et al. Improving Abstraction in Text Summarization // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2018: 1808-1817. [8] CHEN Y C, BANSAL M. Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting // Proc of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL, 2018: 675-686. [9] GEHRMANN S, DENG Y T, RUSH A M. Bottom-Up Abstractive Summarization // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2018: 4098-4109. [10] CELIKYILMAZ A, BOSSELUT A, HE X D, et al. Deep Communicating Agents for Abstractive Summarization // Proc of the NAACL-HLT 2018. Stroudsburg, USA: ACL, 2018: 1662-1675. [11] PETERS M E, NEUMANN M, IYYER M, et al. Deep Contextua-lized Word Representations // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long Papers). Stroudsburg, USA: ACL, 2018, I: 2227-2237. [12] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving Language Understanding by Generative Pre-Training[C/OL].[2022-01-15]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf. [13] DEVLIN J, CHANG M W, LEE K, ,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long and Short Papers). Stroudsburg, USA: ACL, 2019, I: 4171-4186. [14] WANG A, SINGH A, MICHAEL J, et al. GLUE: A Multi-task Benchmark and Analysis Platform for Natural Language Understanding // Proc of the EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, USA: ACL, 2018: 353-355. [15] RAJPURKAR P, ZHANG J, LOPYREW K, et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2016: 2383-2392. [16] LAI G K, XIE Q Z, LIU H X, et al. RACE: Large-Scale ReAding Comprehension Dataset from Examinations // Proc of the Confe-rence on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2017: 785-794. [17] LEBANOFF L, MUCHOVEJ J, DERNONCOURT F, et al. Analyzing Sentence Fusion in Abstractive Summarization // Proc of the 2nd Workshop on New Frontiers in Summarization. Stroudsburg, USA: ACL, 2019: 104-110. [18] COHN T, LAPATA M. Sentence Compression Beyond Word Deletion // Proc of the 22nd International Conference on Computational Linguistics(COLING). Stroudsburg, USA: ACL, 2008: 137-144. [19] WANG L, RAGHAVAN H, CASTELLI V, ,et al. A Sentence Compression Based Framework to Query-Focused Multi-document Summarization // Proc of the 51st Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL, 2013, I: 1384-1394. [20] LI C, LIU F, WENG F L, et al. Document Summarization via Gui-ded Sentence Compression // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2013: 490-500. [21] FILIPPOVA K, ALFONSECA E, COLMENARES C A, et al. Sentence Compression by Deletion with LSTMs // Proc of the Confe-rence on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2015: 360-368. [22] MCDONALD R. Discriminative Sentence Compression with Soft Syn-tactic Evidence // Proc of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2006: 297-304. [23] HALLIDAY M A K, HASAN R. Cohesion in English. London, UK: Taylor and Francis, 1976. [24] LEBANOFF L, MUCHOVEJ J, DERNONCOURT F, et al. Understanding Points of Correspondence between Sentences for Abstractive Summarization // Proc of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Stroudsburg: USA: ACL, 2020: 191-198. [25] LIN C Y. ROUGE: A Package for Automatic Evaluation of Summaries // Proc of the Workshop on Text Summarization Branches Out. Stroudsburg, USA: ACL, 2004: 74-81. [26] LEBANOFF L, DERNONCOURT F, KIM D S, et al. Learning to Fuse Sentences with Transformers for Summarization // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2020: 4136-4142. [27] BARZILAY R, MCKEOWN K R, ELHADAD M. Information Fusion in the Context of Multi-document Summarization // Proc of the 37th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 1999: 550-557. [28] HERMANN K M, KOČISKÝT, GREFENSTETTE E, et al. Tea-ching Machines to Read and Comprehend // Proc of the 29th Annual Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2015: 1693-1701. [29] NALLAPATI R, ZHOU B W, DOS SANTOS C, et al. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond // Proc of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg, USA: ACL, 2016: 280-290. [30] QIU X P, SUN T X, XU Y G, et al. Pre-trained Models for Natural Language Processing: A Survey. Science China Technological Sciences, 2020, 63: 1872-1897. [31] RADFORD A, WU J, CHILD R, et al. Language Models are Unsupervised Multitask Learners[C/OL].[2022-01-15]. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf. [32] BROWN T B, MANN B, RYDER N, et al. Language Models Are Few-Shot Learners[C/OL].[2022-01-15]. https://arxiv.org/pdf/2005.14165.pdf. [33] KESKAR N S, MCCANN B, VARSHNEY L R, et al. CTRL: A Conditional Transformer Language Model for Controllable Generation[C/OL].[2022-01-15]. https://arxiv.org/pdf/1909.05858.pdf. [34] DAI Z H, YANG Z L, YANG Y M, et al. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context // Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2019: 2978-2988. [35] KITAEV N, KAISER Ł, LEVSKAYA A.Reformer: The Efficient Transformer[C/OL].[2022-01-15].https://arxiv.org/pdf/2001.04451.pdf. [36] YANG Z L, DAI Z H, YANG Y M, et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2019: 5753-5763. [37] LAN Z Z, CHEN M D, GOODMAN S, et al. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations[C/OL].[2022-01-15]. https://arxiv.org/pdf/1909.11942.pdf. [38] LIU Y H, OTT M, GOYAL N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[C/OL].[2022-01-15]. https://arxiv.org/pdf/1907.11692.pdf. [39] SANH V, DEBUT L, CHAUMOND J, et al. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter[C/OL].[2022-01-15]. https://arxiv.org/pdf/1910.01108.pdf. [40] LAMPLE G, CONNEAU A. Cross-Lingual Language Model Pretraining[C/OL].[2022-01-15].https://arxiv.org/pdf/1901.07291.pdf. [41] SAK H, SENIOR A, BEAUFAYS F. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition[C/OL].[2022-01-15]. https://arxiv.org/pdf/1402.1128.pdf. [42] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[C/OL].[2022-01-15]. https://arxiv.org/pdf/1412.3555.pdf. [43] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2017: 6000-6010. [44] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A Method for Automatic Evaluation of Machine Translation // Proc of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2002: 311-318. [45] ZHANG T Y, KISHORE V, WU F, et al. BERTScore: Evaluating Text Generation with BERT[C/OL].[2022-01-15]. https://arxiv.org/pdf/1904.09675.pdf. [46] WOLF T, DEBUT L, SANH V, et al. Transformers: State-of-the-Art Natural Language Processing // Proc of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Stroudsburg, USA: ACL, 2020: 38-45. [47] DONG L, YANG N, WANG W H, et al. Unified Language Model Pre-training for Natural Language Understanding and Generation[C/OL].[2022-01-15]. https://arxiv.org/pdf/1905.03197v3.pdf. [48] GROSZ B J, WEINSTEIN S, JOSHI A K. Centering: A Framework for Modeling the Local Coherence of Discourse. Computatio-nal Linguistics, 1995, 21(2): 203-225. |
|
|
|