融合目标端历史信息的篇章级神经机器翻译

doi:10.16451/j.cnki.issn1003-6059.202505001

摘要
图/表
参考文献
相关文章 (2)

全文: PDF (981 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要现有的篇章级神经机器翻译方法难以有效挖掘目标端远距离的上下文信息,翻译的译文不连贯.为此,文中提出融合目标端历史信息的篇章级神经机器翻译方法.首先,通过多头自注意力机制,获得源语言的上下文表征和目标语言的上文表征.然后,使用线性偏置注意力机制,动态地将历史信息注入当前目标语言表征.最后,通过融合源语言表征和经过增强后的目标语言上下文表征获得较优的篇章译文.在多个数据集上的实验表明,文中方法性能较优,在解码过程中融合通过循环机制建模的长序列信息,可有效提升篇章译文的连贯性和完整性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王晓聪
	余正涛
	张元
	高盛祥
	赖华
	李英

关键词 ：神经机器翻译, 篇章翻译, 线性偏置注意力, 历史信息

Abstract：Existing document-level neural machine translation methods struggle to effectively capture long-distance contextual information on the target side, resulting in incoherent translations. To address this issue, a method for document-level neural machine translation with target-side historical information fusion is proposed. First, the contextual representations of the source language are derived via a multi-head self-attention mechanism. Second, the preceding context representations of the target language are obtained using another multi-head self-attention mechanism. Next, an attention with linear biases is employed to dynamically inject the historical information into the current target language representation. Finally, a higher-quality translation is obtained by integrating the source language representation with the enhanced preceding context representation of the target language. Experimental results on multiple datasets demonstrate that the performance of the proposed method is superior. Moreover, the proposed method effectively improves the coherence and completeness of document-level translations through incorporating long-sequence information modeled by recurrent mechanisms during decoding.

Key words： Neural Machine Translation Document-Level Machine Translation Attention with Linear Biases Historical Information

收稿日期: 2025-03-12

ZTFLH:

TP391.2

基金资助:国家自然科学基金项目(No.U23A20388,62366027,U21B2027,62166023)、云南省基础研究计划重大项目(No.202401BC070021)、云南省科技重大专项项目(No.202302AD080003)、云南省基础研究计划项目(No.202301AT070471)资助

通讯作者: 余正涛,博士,教授,主要研究方向为自然语言处理、信息检索、机器翻译.E-mail:ztyu@hotmail.com.

作者简介: 王晓聪,博士研究生,主要研究方向为自然语言处理、机器翻译.E-mail:wangxcai@foxmail.com.
张元,硕士研究生,主要研究方向为机器翻译.E-mail:873867748@qq.com.
高盛祥,博士,教授,主要研究方向为自然语言处理.E-mail:gaoshengxiang.yn@foxmail.com.
赖华,硕士,副教授,主要研究方向为智能信息处理、机器学习.E-mail:405904235@qq.com.
李英,博士,讲师,主要研究方向为智能信息处理、机器学习.E-mail:yingli_hlt@foxmail.com.

引用本文:

王晓聪, 余正涛, 张元, 高盛祥, 赖华, 李英. 融合目标端历史信息的篇章级神经机器翻译[J]. 模式识别与人工智能, 2025, 38(5): 385-396. WANG Xiaocong, YU Zhengtao, ZHANG Yuan, GAO Shengxiang, LAI Hua, LI Ying. Document-Level Neural Machine Translation with Target-Side Historical Information Fusion. Pattern Recognition and Artificial Intelligence, 2025, 38(5): 385-396.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202505001 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2025/V38/I5/385

[1] 苏劲松,陈骏轩,陆紫耀,等.篇章神经机器翻译综述.情报工程, 2020, 6(5): 4-14.
(SU J S, CHEN J X, LU Z Y, et al. A Survey of Document-Level Neural Machine Translation. Technology Intelligence Engineering, 2020, 6(5): 4-14.)
[2] ZHANG J C, LUAN H B, SUN M S, et al. Improving the Transformer Translation Model with Document-Level Context // Proc of the Conference on Empirical Methods in Natural Language Proce-ssing. Stroudsburg, USA: ACL, 2018: 533-542.
[3] SUN Z W, WANG M X, ZHOU H, et al. Rethinking Document-Level Neural Machine Translation // Findings of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2020: 3537-3548.
[4] JIANG S, WANG R, LI Z C, et al. Document-Level Neural Ma-chine Translation with Inter-Sentence Attention[C/OL].[2025-02-19]. https://arxiv.org/pdf/1910.14528v1.
[5] YANG Z X, ZHANG J C, MENG F D, et al. Enhancing Context Modeling with a Query-Guided Capsule Network for Document-Level Translation // Proc of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, USA: ACL, 2019: 1527-1537.
[6] SABOUR S, FROSST N, HINTON G E.Dynamic Routing between Capsules // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 3859-3862.
[7] KUANG S H, XIONG D Y, LUO W H, et al. Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches // Proc of the 27th International Conference on Computational Linguistics. Stroudsburg, USA: ACL, 2018: 596-606.
[8] ZHENG Z X, YUE X, HUANG S J, et al. Towards Making the Most of Context in Neural Machine Translation // Proc of the 29th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2020: 3983-3989.
[9] MACÉ V, SERVAN C.Using Whole Document Context in Neural Machine Translation[C/OL]. [2025-02-19].https://aclanthology.org/2019.iwslt-1.21.pdf.
[10] MARUF S, HAFFARI G.Document Context Neural Machine Trans-lation with Memory Networks // Proc of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL, 2017: 1275-1284.
[11] BAO G S, TENG Z Y, ZHANG Y.Target-Side Augmentation for Document-Level Machine Translation // Proc of the 61st Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL, 2023: 10725-10742.
[12] PRESS O, SMITH N A, LEWIS M. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation[C/OL]. [2025-02-19].https://arxiv.org/pdf/2108.12409.
[13] Open AI.GPT-4 Technical Report[C/OL]. [2025-02-19].https://arxiv.org/pdf/2303.08774.
[14] WANG L Y, LYU C Y, JI T B, et al. Document-Level Machine Translation with Large Language Models[C/OL].[2025-02-19]. https://openreview.net/pdf?id=sXErPfdA7Q.
[15] MOHAMMED W, NICULAE V.Analyzing Context Utilization of LLMs in Document-Level Translation[C/OL]. [2025-02-19].https://openreview.net/pdf?id=YqwQgLVMI7.
[16] RAMOS M M, FERNANDES P, AGRAWAL S, et al. Multilingual Contextualization of Large Language Models for Document-Level Machine Translation[C/OL].[2025-02-19]. https://arxiv.org/pdf/2504.12140.
[17] LIU B, LYU X, LI J H, et al. Improving LLM-Based Document-Level Machine Translation with Multi-knowledge Fusion[C/OL].[2025-02-19]. https://arxiv.org/pdf/2503.12152.
[18] VASWANI A, SHAZEER N, PARMAR N, et al.Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010.
[19] JEAN S, LAULY S, FIRAT O, et al. Does Neural Machine Translation Benefit from Larger Context?[C/OL]. [2025-02-19]. https://arxiv.org/pdf/1704.05135.
[20] KUANG S H, XIONG D Y.Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model // Proc of the 27th International Conference on Computational Linguistics. Stroudsburg, USA: ACL, 2018: 607-617.
[21] WANG X Y, JERNITE Y, WESTON J, et al. Improving Conditioning in Context-Aware Sequence to Sequence Models[C/OL].[2025-02-19]. https://arxiv.org/pdf/1911.09728.
[22] TIEDEMANN J, SCHERRER Y.Neural Machine Translation with Extended Context // Proc of the 3rd Workshop on Discourse in Machine Translation. Stroudsburg, USA: ACL, 2017: 82-92.
[23] BAWDEN R, SENNRICH R, BIRCH A, et al. Evaluating Discourse Phenomena in Neural Machine Translation // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long Papers). Stroudsburg, USA: ACL, 2018: 1304-1313.
[24] YAMAGISHI H, KOMACHI M.Improving Context-Aware Neural Machine Translation with Target-Side Context[C/OL]. [2025-02-19].https://arxiv.org/pdf/1909.00531.
[25] MICULICICH L, RAM D, PAPPAS N, et al. Document-Level Neural Machine Translation with Hierarchical Attention Networks // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2018: 2947-2954.
[26] DAI Z H, YANG Z L, YANG Y M, et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context // Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2019: 2978-2988.
[27] TU Z P, LIU Y, SHI S M, et al. Learning to Remember Translation History with a Continuous Cache. Transactions of the Association for Computational Linguistics, 2018, 6: 407-420.
[28] XU H F, XIONG D Y, VAN GENABITH J, et al. Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating // Proc of the 29th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2020: 3933-3940.
[29] 贾爱鑫,李军辉,贡正仙,等.融合目标端上下文的篇章神经机器翻译.中文信息学报, 2024, 38(4): 59-68.
(JIA A X, LI J H, GONG Z X, et al. Modeling Target-Side Context for Document-Level Neural Machine Translation. Journal of Chinese Information Processing, 2024, 38(4): 59-68.)
[30] LIU Y H, GU J T, GOYAL N, et al. Multilingual Denoising Pre-training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 2020, 8: 726-742.
[31] PETRICK F, HEROLD C, PETRUSHKOV P, et al. Document-Level Language Models for Machine Translation // Proc of the 8th Conference on Machine Translation. Stroudsburg, USA: ACL, 2023: 375-391.
[32] BAO G S, ZHANG Y, TENG Z Y, et al. G-Transformer for Document-Level Machine Translation // Proc of the 59th Annual Mee-ting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Long Papers). Stroudsburg, USA: ACL, 2021: 3442-3455.
[33] LI Y C, LI J H, JIANG J, et al. P-Transformer: Towards Better Document-to-Document Neural Machine Translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 3859-3870.
[34] TAN X, ZHANG L Y, ZHOU G D, et al. Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context. Journal of Computer Science and Technology, 2022, 37(2): 295-308.
[35] VOITA E, SENNRICH R, TITOV I.When a Good Translation Is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion // Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2019: 1198-1212.
[36] XU M Z, WANG L Y, WONG D F, et al. GuoFeng: A Benchmark for Zero Pronoun Recovery and Translation // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2022: 11266-11278.
[37] MARUF S, MARTINS A F T, HAFFARI G. Contextual Neural Model for Translating Bilingual Multi-speaker Conversations // Proc of the 3rd Conference on Machine Translation: Research Papers. Stroudsburg, USA: ACL, 2018: 101-112.
[38] REI R, STEWART C, FARINHA A C, et al. COMET: A Neural Framework for MT Evaluation // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2020: 2685-2702.
[39] MARUF S, MARTINS A F T, HAFFARI G. Selective Attention for Context-Aware Neural Machine Translation // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long and Short Papers). Stroudsburg, USA: ACL, 2019: 3092-3102.
[40] MA S M, ZHANG D D. ZHOU M, et al. A Simple and Effective Unified Encoder for Document-Level Machine Translation // Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2020: 3505-3511.
[41] YANG J, YIN Y W, MA S M, et al. HANOIT: Enhancing Context-Aware Translation via Selective Context // Proc of the International Conference on Database Systems for Advanced Applications. Berlin, Germany: Springer, 2023: 471-486.
[42] WU M H, WANG Y F, FOSTER G, et al. Importance-Aware Data Augmentation for Document-Level Neural Machine Translation // Proc of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Long Papers). Stroudsburg, USA: ACL, 2024: 740-752.