清华大学学报(自然科学版)  2021, Vol. 61 Issue (9): 927-935    DOI: 10.16511/j.cnki.qhdxxb.2021.22.003
满志博, 毛存礼, 余正涛, 李训宇, 高盛祥, 朱俊国
昆明理工大学 信息工程与自动化学院, 云南省人工智能重点实验室, 昆明 650500
Chinese-English-Burmese neural machine translation based on multilingual joint training
MAN Zhibo, MAO Cunli, YU Zhengtao, LI Xunyu, GAO Shengxiang, ZHU Junguo
Yunnan Key Laboratory of Artificial Intelligence, Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
摘要 多语言神经机器翻译是解决低资源语言翻译的有效方法,现有方法通常依靠共享词表的方式解决英语、法语以及德语等相似语言之间的多语言翻译问题。缅甸语属于典型的低资源语言。汉语、英语以及缅甸语之间的语言结构差异较大。为了缓解由差异性引起的共享词表大小受限制问题,该文提出一种基于多语言联合训练的汉英缅神经机器翻译方法。在Transformer框架下将丰富的汉英平行语料与较少的汉缅、英缅语料进行联合训练,模型训练过程中分别在编码端和解码端将汉英缅映射在同一语义空间以降低汉英缅语言结构差异性对共享词表的影响,通过共享汉英语料训练参数来弥补汉缅、英缅语料缺失的问题。实验结果表明:在一对多、多对多的翻译场景下,所提方法的BLEU值比基线模型的汉英、英缅以及汉缅翻译结果有明显提升。
关键词 汉语-英语-缅甸语低资源语言多语言神经机器翻译联合训练语义空间映射共享参数    
Abstract:Multilingual neural machine translation is an effective method for translations of low-resource languages that have relatively small amounts of data available to train machine translations. Existing methods usually rely on shared vocabulary for multilingual translations between similar languages such as English, French, and German. However, the Burmese language is a typical low-resource language. The language structures of Chinese, English and Burmese are also quite different. A multilingual joint training method is presented here for a Chinese-English-Burmese neural machine translation that alleviates the problem of the limited amount of shared vocabulary between these languages. The rich Chinese-English parallel corpus and the poor Chinese-Burmese and English-Burmese corpora are jointly trained using the Transformer framework. The model maps the Chinese-Burmese, Chinese-English and English-Burmese vocabulary to the same semantic space on the encoding and decoding sides to reduce the differences between the Chinese, English and Burmese language structures. The influence of the shared vocabulary compensates for the lack of Chinese-Burmese and English-Burmese data by sharing the Chinese-English corpus training parameters. Tests show that in one-to-many and many-to-many translation scenarios, this method has significantly better BLEU scores over the baseline models for Chinese-English, English-Burmese, and Chinese-Burmese translations.
Key wordsChinese-English-Burmese    low resource language    multilingual neural machine translation    joint training    semantic space mapping    shared parameters
收稿日期: 2020-11-30      出版日期: 2021-08-21
通讯作者: 毛存礼,副教授,     E-mail:
满志博, 毛存礼, 余正涛, 李训宇, 高盛祥, 朱俊国. 基于多语言联合训练的汉-英-缅神经机器翻译方法[J]. 清华大学学报(自然科学版), 2021, 61(9): 927-935.
MAN Zhibo, MAO Cunli, YU Zhengtao, LI Xunyu, GAO Shengxiang, ZHU Junguo. Chinese-English-Burmese neural machine translation based on multilingual joint training. Journal of Tsinghua University(Science and Technology), 2021, 61(9): 927-935.
