Chinese-Burmese Neural Machine Translation Model Based on Attention-Optimized Adversarial Training
LAI Hua1,2, LI Yanduo1,2, ZHANG Siqi1,2, LI Ying1,2, YU Zhengtao1,2, MAO Cunli1,2, HUANG Yuxin1,2
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500; 2. Key Laboratory of Artificial Intelligence in Yunnan Province, Kunming University of Science and Technology, Kunming 650500
Abstract:Excessive noise introduction during adversarial training can degrade the robustness of translation models. To address this issue, a Chinese-Burmese neural machine translation method based on attention-optimized adversarial training is proposed. In the training phase, white-box adversarial attacks are utilized to generate perturbation samples along the gradient direction. A mixed attention weight filtering strategy is introduced to prioritize perturbations on words that produce a greater impact on translation quality, thereby improving the specificity of perturbations without increasing the overall noise ratio. During the inference phase, a relative entropy loss is employed to effectively narrow the gap between noisy and clean distributions and balance the robustness of the model to noise and its fitting ability on clean data. Experiments on the Burmese-Chinese translation task demonstrate that the proposed method achieves a significant improvement over multiple baseline models.
[1] 李亚超,熊德意,张民.神经机器翻译综述.计算机学报, 2018, 41(12): 2734-2755. (LI Y C, XIONG D Y, ZHANG M.A Survey of Neural Machine Translation. Chinese Journal of Computers, 2018, 41(12): 2734-2755.) [2] BELINKOV Y, BISK Y. Synthetic and Natural Noise Both Break Neural Machine Translation[C/OL].[2025-08-27]. https://arxiv.org/pdf/1711.02173. [3] VH A, CHACKO A M.Cooperative Embedding-A Novel Approach to Tackle the Out-of-Vocabulary Dilemma in Bot Classification//Proc of the 39th ACM/SIGAPP Symposium on Applied Computing. New York, USA: ACM, 2024: 1479-1486. [4] JOHNSON M, SCHUSTER M, LE Q V, et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Transactions of the Association for Computational Linguistics, 2017, 5: 339-351. [5] ARTETXE M, LABAKA G, AGIRRE E, et al. Unsupervised Neural Machine Translation[C/OL].[2025-08-27]. https://arxiv.org/pdf/1710.11041. [6] LI J H, XIONG D Y, TU Z P, et al. Modeling Source Syntax for Neural Machine Translation//Proc of the 55th Annual Meeting of the Association for Computational Linguistics(Long Papers). Strouds-burg, USA: ACL, 2017: 688-697. [7] ZHANG J Y, UTIYAMA M, SUMITA E, et al. Guiding Neural Machine Translation with Retrieved Translation Pieces//Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long Papers). Stroudsburg, USA: ACL, 2018: 1325-1335. [8] GOODFELLOW I J, SHLENS J, SZEGEDY C.Explaining and Har-nessing Adversarial Examples[C/OL].[2025-08-27]. https://arxiv.org/pdf/1412.6572. [9] CHENG Y, TU Z P, MENG F D, et al. Towards Robust Neural Machine Translation//Proc of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL, 2018: 1756-1766. [10] MIYATO T, MAEDA S, KOYAMA M, et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-supervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(8): 1979-1993. [11] EBRAHIMI J, RAO A Y, LOWD D, et al. HotFlip: White-Box Adversarial Examples for Text Classification//Proc of the 56th Annual Meeting of the Association for Computational Linguistics(Short Papers). Stroudsburg, USA: ACL, 2018: 31-36. [12] SAMANTA S, MEHTA S.Towards Crafting Text Adversarial Samples[C/OL].[2025-08-27]. https://arxiv.org/pdf/1707.02812. [13] MADRY A, MAKELOV A, SCHMIDT L, et al. Towards Deep Lear-ning Models Resistant to Adversarial Attacks[C/OL].[2025-08-27]. https://arxiv.org/pdf/1706.06083. [14] ZHU C, CHENG Y, GAN Z, et al. FreeLB: Enhanced Adversarial Training for Natural Language Understanding[C/OL].[2025-08-27]. https://arxiv.org/pdf/1909.11764. [15] ZHAO Z L, DUA D, SINGH S.Generating Natural Adversarial Examples[C/OL].[2025-08-27]. https://arxiv.org/pdf/1710.11342. [16] RIBEIRO M T, SINGH S, GUESTRIN C.Semantically Equivalent Adversarial Rules for Debugging NLP Models//Proc of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL, 2018: 856-865. [17] CHENG Y, JIANG L, MACHEREY W.Robust Neural Machine Translation with Doubly Adversarial Inputs//Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2019: 4324-4333. [18] CHENG Y, JIANG L, MACHEREY W, et al. AdvAug: Robust Adversarial Augmentation for Neural Machine Translation//Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2020: 5961-5970. [19] SHEN D H, ZHENG M Z, SHEN Y L, et al. A Simple But Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation[C/OL].[2025-08-27]. https://arxiv.org/pdf/2009.13818v2. [20] ZHANG X Z, ZHANG J Z, CHEN Z H, et al. Crafting Adversarial Examples for Neural Machine Translation//Proc of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Long Papers). Stroudsburg, USA: ACL, 2021: 1967-1977. [21] SADRIZADEH S, DOLAMIC L, FROSSARD P.TransFool: An Adversarial Attack Against Neural Machine Translation Models[C/OL].[2025-08-27]. https://arxiv.org/pdf/2302.00944. [22] WEI J, ZOU K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks//Proc of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, USA: ACL, 2019: 6382-6388. [23] TIEDEMANN J. Parallel Data, Tools and Interfaces in OPUS[C/OL].[2025-08-27].http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf. [24] THU Y K, PA W P, UTIYAMA M, et al. Introducing the Asian Language Treebank(ALT)//Proc of the 10th International Conference on Language Resources and Evaluation. Stroudsburg, USA: ACL, 2016: 1574-1578. [25] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need//Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010.