多策略切分粒度的藏汉双向神经机器翻译研究

(1.北京理工大学计算机学院,北京市海量语言信息处理与云计算应用工程技术研究中心,北京 100081; 2.北京理工大学外国语学院,工信部语言工程与认知计算重点实验室,北京 100081)

音词融合; 藏汉双向; 神经机器翻译

Multi-strategic granularity of segmentation on Tibetan-Chinese bidirectional neural machine translation
SHA Jiu1,FENG Chong1*,ZHANG Tianfu1,GUO Yuhang1,LIU Fang2

(1.Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications,School of Computer Science & Technology,Beijing Institute of Technology,Beijing 100081,China; 2.Key Laboratory of Language Engineeri

DOI: 10.6043/j.issn.0438-0479.201908030

备注

现有的机器翻译模型通常在词粒度切分的数据集上进行训练,然而不同的切分粒度蕴含着不同的语法、语义的特征和信息,仅考虑词粒度将制约神经机器翻译系统的高效训练.这对于藏语相关翻译因其语言特点而显得尤为突出.为此提出针对藏汉双向机器翻译的具有音节、词语以及音词融合的多粒度训练方法,并基于现有的注意力机制神经机器翻译框架,在解码器中融入自注意力机制以捕获更多的目标端信息,提出了一种新的神经机器翻译模型.在CWMT2018藏汉双语数据集上的实验结果表明,多粒度训练方法的翻译效果明显优于其余切分粒度的基线系统,同时解码器中引入自注意力机制的神经机器翻译模型能够显著提升翻译效果.此外在WMT2017德英双语数据集上的实验结果进一步证明了该方法在其他语种方向上的适用性.

Existing machine translation models are usually trained on word-granularity data sets.However,different segmentations contain different grammatical,semantic features.Segmenting word granularity merely will interfere efficient training of neural machine translation(NMT)models,and is particularly prominent for Tibetan-related translation due to Tibetan linguistic features.Hence,for bidirectional Tibetan-Chinese NMT,we propose a multi-granularity training method focusing on syllables,words and phonetic fusion.We also propose a novel NMT model within the attention-based NMT framework,where a self-attention mechanism is incorporated into the decoder to capture more target-side information.Experimental results on CWMT2018 Tibetan-Chinese bilingual dataset show that the translation performance of the phonetic word fusion segmentation granularity significantly outperforms other segmentation granularity,and that integrating self-attention mechanism into the decoder can improve the translation quality greatly.In this paper,we also use the additional WMT2017 German-English bilingual dataset to demonstrate the universality of the proposed method across different languages.