面向维汉机器翻译的层次化多特征融合模型
潘一荣1,2,3,李 晓1,2,3*,杨雅婷1,2,3,董 瑞1,2,3

(1.中国科学院新疆理化技术研究所,新疆 乌鲁木齐 830011; 2.中国科学院大学,北京 100049; 3.新疆民族语音语言信息处理实验室,新疆 乌鲁木齐 830011)

维汉机器翻译; 形态复杂性; 层次化融合; 语法特征; 附加信息

Hierarchical multi-features combination model for Uyghur-Chinese machine translation
PAN Yirong1,2,3,LI Xiao1,2,3*,YANG Yating1,2,3,DONG Rui1,2,3

(1.Xinjiang Technical Institute of Physics & Chemistry,Chinese Academy of Sciences,Urumqi 830011,China; 2.University of Chinese Academy of Sciences,Beijing 100049,China; 3.Xinjiang Laboratory of Minority Speech and Language Information Processing,Urum

DOI: 10.6043/j.issn.0438-0479.201909003

备注

针对维汉机器翻译中存在的维吾尔语(下文简称维语)形态复杂性和数据稀疏性问题,提出了一种层次化融合多个维语语法特征的神经网络机器翻译模型.该模型采用4种特征(词干、词性、词缀、词缀形态)作为源端语言附加信息,用于辅助单一词汇形式表示的维语语句; 同时引入层次化多特征融合的神经网络结构,用于分层处理维语的词干级和词缀级特征,以增强机器翻译系统对维语的句法结构和语义知识的学习能力,从而提高维汉机器翻译质量.在维汉公开数据集上的实验结果表明,所提出的层次化多特征融合模型可以有效提高维汉机器翻译系统性能,其双语互译评估(BLEU)值和字符匹配度(ChrF3)值均有明显提升.

Focusing on the issue of the complex morphology and data sparseness of Uyghur in Uyghur-Chinese machine translation,we proposes a neural hierarchical combination model for multiple Uyghur linguistic features.This model employs four features(lemma,part-of-speech tag,affix and affix morphology)as additional information to enrich the Uyghur sentences with single word surface form.Moreover,in the model we introduces a hierarchical multi-features combined neural network to hierarchically process the lemma-level and affix-level Uyghur features to enhance the ability of machine translation system and learn the Uyghur syntactic structure and semantic knowledge accordingly.Experimental results on Uyghur-Chinese public dataset show that the hierarchical multi-features combination model can effectively improve the performance of Uyghur-Chinese machine translation system on BLEU and ChrF3 scores.