Document-Level Neural Machine Translation with Target-Side Historical Information Fusion
WANG Xiaocong1,2, YU Zhengtao1,2, ZHANG Yuan1,2, GAO Shengxiang1,2, LAI Hua1,2, LI Ying1,2
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504; 2. Key Laboratory of Artificial Intelligence in Yunnan Province, Kunming University of Science and Technology, Kunming 650500
Abstract:Existing document-level neural machine translation methods struggle to effectively capture long-distance contextual information on the target side, resulting in incoherent translations. To address this issue, a method for document-level neural machine translation with target-side historical information fusion is proposed. First, the contextual representations of the source language are derived via a multi-head self-attention mechanism. Second, the preceding context representations of the target language are obtained using another multi-head self-attention mechanism. Next, an attention with linear biases is employed to dynamically inject the historical information into the current target language representation. Finally, a higher-quality translation is obtained by integrating the source language representation with the enhanced preceding context representation of the target language. Experimental results on multiple datasets demonstrate that the performance of the proposed method is superior. Moreover, the proposed method effectively improves the coherence and completeness of document-level translations through incorporating long-sequence information modeled by recurrent mechanisms during decoding.
[1] 苏劲松,陈骏轩,陆紫耀,等.篇章神经机器翻译综述.情报工程, 2020, 6(5): 4-14. (SU J S, CHEN J X, LU Z Y, et al. A Survey of Document-Level Neural Machine Translation. Technology Intelligence Engineering, 2020, 6(5): 4-14.) [2] ZHANG J C, LUAN H B, SUN M S, et al. Improving the Transformer Translation Model with Document-Level Context // Proc of the Conference on Empirical Methods in Natural Language Proce-ssing. Stroudsburg, USA: ACL, 2018: 533-542. [3] SUN Z W, WANG M X, ZHOU H, et al. Rethinking Document-Level Neural Machine Translation // Findings of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2020: 3537-3548. [4] JIANG S, WANG R, LI Z C, et al. Document-Level Neural Ma-chine Translation with Inter-Sentence Attention[C/OL].[2025-02-19]. https://arxiv.org/pdf/1910.14528v1. [5] YANG Z X, ZHANG J C, MENG F D, et al. Enhancing Context Modeling with a Query-Guided Capsule Network for Document-Level Translation // Proc of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, USA: ACL, 2019: 1527-1537. [6] SABOUR S, FROSST N, HINTON G E.Dynamic Routing between Capsules // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 3859-3862. [7] KUANG S H, XIONG D Y, LUO W H, et al. Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches // Proc of the 27th International Conference on Computational Linguistics. Stroudsburg, USA: ACL, 2018: 596-606. [8] ZHENG Z X, YUE X, HUANG S J, et al. Towards Making the Most of Context in Neural Machine Translation // Proc of the 29th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2020: 3983-3989. [9] MACÉ V, SERVAN C.Using Whole Document Context in Neural Machine Translation[C/OL]. [2025-02-19].https://aclanthology.org/2019.iwslt-1.21.pdf. [10] MARUF S, HAFFARI G.Document Context Neural Machine Trans-lation with Memory Networks // Proc of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL, 2017: 1275-1284. [11] BAO G S, TENG Z Y, ZHANG Y.Target-Side Augmentation for Document-Level Machine Translation // Proc of the 61st Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL, 2023: 10725-10742. [12] PRESS O, SMITH N A, LEWIS M. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation[C/OL]. [2025-02-19].https://arxiv.org/pdf/2108.12409. [13] Open AI.GPT-4 Technical Report[C/OL]. [2025-02-19].https://arxiv.org/pdf/2303.08774. [14] WANG L Y, LYU C Y, JI T B, et al. Document-Level Machine Translation with Large Language Models[C/OL].[2025-02-19]. https://openreview.net/pdf?id=sXErPfdA7Q. [15] MOHAMMED W, NICULAE V.Analyzing Context Utilization of LLMs in Document-Level Translation[C/OL]. [2025-02-19].https://openreview.net/pdf?id=YqwQgLVMI7. [16] RAMOS M M, FERNANDES P, AGRAWAL S, et al. Multilingual Contextualization of Large Language Models for Document-Level Machine Translation[C/OL].[2025-02-19]. https://arxiv.org/pdf/2504.12140. [17] LIU B, LYU X, LI J H, et al. Improving LLM-Based Document-Level Machine Translation with Multi-knowledge Fusion[C/OL].[2025-02-19]. https://arxiv.org/pdf/2503.12152. [18] VASWANI A, SHAZEER N, PARMAR N, et al.Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010. [19] JEAN S, LAULY S, FIRAT O, et al. Does Neural Machine Translation Benefit from Larger Context?[C/OL]. [2025-02-19]. https://arxiv.org/pdf/1704.05135. [20] KUANG S H, XIONG D Y.Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model // Proc of the 27th International Conference on Computational Linguistics. Stroudsburg, USA: ACL, 2018: 607-617. [21] WANG X Y, JERNITE Y, WESTON J, et al. Improving Conditioning in Context-Aware Sequence to Sequence Models[C/OL].[2025-02-19]. https://arxiv.org/pdf/1911.09728. [22] TIEDEMANN J, SCHERRER Y.Neural Machine Translation with Extended Context // Proc of the 3rd Workshop on Discourse in Machine Translation. Stroudsburg, USA: ACL, 2017: 82-92. [23] BAWDEN R, SENNRICH R, BIRCH A, et al. Evaluating Discourse Phenomena in Neural Machine Translation // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long Papers). Stroudsburg, USA: ACL, 2018: 1304-1313. [24] YAMAGISHI H, KOMACHI M.Improving Context-Aware Neural Machine Translation with Target-Side Context[C/OL]. [2025-02-19].https://arxiv.org/pdf/1909.00531. [25] MICULICICH L, RAM D, PAPPAS N, et al. Document-Level Neural Machine Translation with Hierarchical Attention Networks // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2018: 2947-2954. [26] DAI Z H, YANG Z L, YANG Y M, et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context // Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2019: 2978-2988. [27] TU Z P, LIU Y, SHI S M, et al. Learning to Remember Translation History with a Continuous Cache. Transactions of the Association for Computational Linguistics, 2018, 6: 407-420. [28] XU H F, XIONG D Y, VAN GENABITH J, et al. Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating // Proc of the 29th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2020: 3933-3940. [29] 贾爱鑫,李军辉,贡正仙,等.融合目标端上下文的篇章神经机器翻译.中文信息学报, 2024, 38(4): 59-68. (JIA A X, LI J H, GONG Z X, et al. Modeling Target-Side Context for Document-Level Neural Machine Translation. Journal of Chinese Information Processing, 2024, 38(4): 59-68.) [30] LIU Y H, GU J T, GOYAL N, et al. Multilingual Denoising Pre-training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 2020, 8: 726-742. [31] PETRICK F, HEROLD C, PETRUSHKOV P, et al. Document-Level Language Models for Machine Translation // Proc of the 8th Conference on Machine Translation. Stroudsburg, USA: ACL, 2023: 375-391. [32] BAO G S, ZHANG Y, TENG Z Y, et al. G-Transformer for Document-Level Machine Translation // Proc of the 59th Annual Mee-ting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Long Papers). Stroudsburg, USA: ACL, 2021: 3442-3455. [33] LI Y C, LI J H, JIANG J, et al. P-Transformer: Towards Better Document-to-Document Neural Machine Translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 3859-3870. [34] TAN X, ZHANG L Y, ZHOU G D, et al. Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context. Journal of Computer Science and Technology, 2022, 37(2): 295-308. [35] VOITA E, SENNRICH R, TITOV I.When a Good Translation Is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion // Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2019: 1198-1212. [36] XU M Z, WANG L Y, WONG D F, et al. GuoFeng: A Benchmark for Zero Pronoun Recovery and Translation // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2022: 11266-11278. [37] MARUF S, MARTINS A F T, HAFFARI G. Contextual Neural Model for Translating Bilingual Multi-speaker Conversations // Proc of the 3rd Conference on Machine Translation: Research Papers. Stroudsburg, USA: ACL, 2018: 101-112. [38] REI R, STEWART C, FARINHA A C, et al. COMET: A Neural Framework for MT Evaluation // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2020: 2685-2702. [39] MARUF S, MARTINS A F T, HAFFARI G. Selective Attention for Context-Aware Neural Machine Translation // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long and Short Papers). Stroudsburg, USA: ACL, 2019: 3092-3102. [40] MA S M, ZHANG D D. ZHOU M, et al. A Simple and Effective Unified Encoder for Document-Level Machine Translation // Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2020: 3505-3511. [41] YANG J, YIN Y W, MA S M, et al. HANOIT: Enhancing Context-Aware Translation via Selective Context // Proc of the International Conference on Database Systems for Advanced Applications. Berlin, Germany: Springer, 2023: 471-486. [42] WU M H, WANG Y F, FOSTER G, et al. Importance-Aware Data Augmentation for Document-Level Neural Machine Translation // Proc of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Long Papers). Stroudsburg, USA: ACL, 2024: 740-752.