MAN Zhibo, MAO Cunli, YU Zhengtao, LI Xunyu, GAO Shengxiang, ZHU Junguo
Multilingual neural machine translation is an effective method for translations of low-resource languages that have relatively small amounts of data available to train machine translations. Existing methods usually rely on shared vocabulary for multilingual translations between similar languages such as English, French, and German. However, the Burmese language is a typical low-resource language. The language structures of Chinese, English and Burmese are also quite different. A multilingual joint training method is presented here for a Chinese-English-Burmese neural machine translation that alleviates the problem of the limited amount of shared vocabulary between these languages. The rich Chinese-English parallel corpus and the poor Chinese-Burmese and English-Burmese corpora are jointly trained using the Transformer framework. The model maps the Chinese-Burmese, Chinese-English and English-Burmese vocabulary to the same semantic space on the encoding and decoding sides to reduce the differences between the Chinese, English and Burmese language structures. The influence of the shared vocabulary compensates for the lack of Chinese-Burmese and English-Burmese data by sharing the Chinese-English corpus training parameters. Tests show that in one-to-many and many-to-many translation scenarios, this method has significantly better BLEU scores over the baseline models for Chinese-English, English-Burmese, and Chinese-Burmese translations.