1. Research Center for Language Intelligence, Dalian University of Foreign Languages, Dalian 116044;
2. School of Software, Dalian University of Foreign Languages, Dalian 116044
Automatic citation intent classification is one of hot issues in the field of bibliometrics.The existing citation intention classification models engender the limitations in extracting textual features and fusing citation contextual features and citation external features. Therefore, a citation intent classification method based on MPNet pretraining and multi-head attention feature fusion is proposed. The position compensation structure is introduced to improve the masked language model and permuted Language model.The syntactic word-frequency features and structure features of citations are combined. A feature extraction method is proposed for citation intent classification task. The multi-head attention mechanism is introduced for feature fusion to improve the classification accuracy. The experimental results on ACL-ARC datasets demonstrate that the proposed method achieves better performance in citation intent classification task with robustness on the unbalanced data.
[1] HASSAN N R, SERENKO A.Patterns of Citations for the Growth of Knowledge: A Foucauldian Perspective. Journal of Documentation, 2019, 75(3): 593-611.
[2] HARWOOD N.An Interview-Based Study of the Functions of Citations in Academic Writing Across Two Disciplines. Journal of Pragmatics, 2009, 41(3): 497-518.
[3] GARFIELD E. Can Citation Indexing be Automated?[C/OL]. [2021-07-15]. http://garfield.library.upenn.edu/essays/V1p084y1962-73.pdf.
[4] MORAVCSIK M J, MURUGESAN P.Some Results on the Function and Quality of Citations. Social Studies of Science, 1975, 5: 86-92.
[5] CHANG Y W.A Comparison of Citation Contexts between Natural Sciences and Social Sciences and Humanities. Scientometrics, 2013, 96(2): 535-553.
[6] PRIDE D, KNOTH P, HARAG J.ACT: An Annotation Platform for Citation Typing at Scale // Proc of the ACM/IEEE Joint Conference on Digital Libraries. Washington, USA: IEEE, 2019: 329-330.
[7] JURGENS D, KUMAR S, HOOVER R, et al. Measuring the Evolution of a Scientific Field through Citation Frames. Transactions of the Association for Computational Linguistics, 2018, 6: 391-406.
[8] TEUFEL S.Argumentative Zoning: Information Extraction from Scientific Text. Ph.D. Dissertation. Edinburgh, UK: University of Edinburgh, 1999.
[9] TEUFEL S, MOENS M.Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status. Computational Linguistics, 2002, 28(4): 409-445.
[10] TEUFEL S, SIDDHARTHAN A, TIDHAR D.Automatic Classification of Citation Function // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2006: 103-110.
[11] XU H, MARTIN E, MAHIDADIA A.Using Heterogeneous Features for Scientific Citation Classification // Proc of the 13th Conference of the Pacific Association for Computational Linguistics. Berlin, Germany: Springer, 2013. DOI: 10.13140/2.1.2737.2484.
[12] NAKAGAWA T, INUI K, KUROHASHI S.Dependency Tree-Ba-sed Sentiment Classification Using CRFs with Hidden Variables // Proc of the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2010: 786-794.
[13] MEYERS A.Contrasting and Corroborating Citations in Journal Articles // Proc of Recent Advances in Natural Language Proce-ssing. Stroudsburg, USA: ACL, 2013: 460-466.
[14] ABDULLATIF M, KOH Y S, DOBBIE G, et al. Verb Selection Using Semantic Role Labeling for Citation Classification // Proc of the Workshop on Computational Scientometrics: Theory & Applications. New York, USA: ACM, 2013: 25-30.
[15] VALENZUELA M, HA V A, ETZIONI O.Identifying Meaningful Citations // Proc of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2015: 21-26.
[16] HASSAN S U, IMRAN M, IQBAL S, et al. Deep Context of Citations Using Machine-Learning Models in Scholarly Full-Text Articles. Scientometrics, 2018, 117(3): 1645-1662.
[17] ROMAN M, SHAHID A, KHAN S, et al. Citation Intent Classification Using Word Embedding. IEEE Access, 2021, 9: 9982-9995.
[18] COHAN A, AMMAR W, VAN ZUYLEN M, et al. Structural Scaffolds for Citation Intent Classification in Scientific Publications // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long and Short Papers). Stroudsburg, USA: ACL, 2019: 3586-3596.
[19] YOUSIF A, NIU Z D, CHAMBUA J, et al. Multi-task Learning Model Based on Recurrent Convolutional Neural Networks for Citation Sentiment and Purpose Classification. Neurocomputing, 2019, 335: 195-205.
[20] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proc of the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2019: 4171-4186.
[21] BELTAGY I, LO K, COHAN A.SciBERT: A Pretrained Language Model for Scientific Text // Proc of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, USA: ACL, 2019: 3615-3620.
[22] ZHENG M Z, SHEN D H, SHEN Y L, et al. Improving Self-Supervised Pre-training via a Fully-Explored Masked Language Model [C/OL].[2021-07-15]. https://arxiv.org/pdf/2010.06040.pdf.
[23] MERCIER D, RIZVI S T, RAJASHEKAR V, et al. ImpactCite: An XLNet-Based Solution Enabling Qualitative Citation Impact Analysis Utilizing Sentiment and Intent // Proc of the 13th International Conference on Agents and Artificial Intelligence. Setúbal, Portugal: Scitepress, 2021: 159-168.
[24] YANG Z L, DAI Z H, YANG Y M, et al.XLNet: Generalized Autoregressive Pretraining for Language Understanding // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2019: 5753-5763.
[25] SONG K T, TAN X, QIN T, et al. MPNet: Masked and Permuted Pre-training for Language Understanding[C/OL].[2021-07-15]. https://arxiv.org/pdf/2004.09297.pdf.