多特征融合的句子级译文质量估计方法

(沈阳航空航天大学人机智能研究中心,辽宁 沈阳 110136)

译文质量估计; 特征融合; 语境化词嵌入; 语言表示; 句法

Sentence-level translation quality estimation based on multi-feature fusion
YE Na*,WANG Yuanyuan,CAI Dongfeng

(Human-Computer Intelligence Research Center,Shenyang Aerospace University,Shenyang 110136,China)

DOI: 10.6043/j.issn.0438-0479.201908032

备注

与传统的机器译文评价方法不同,译文质量估计技术旨在无参考译文的情况下对机器译文质量进行评价.针对目前流行的基于深度学习的译文质量估计方法因数据匮乏和模型限制导致所提取的深度学习特征不充分的现状,提出一种多特征融合的方法.该方法将词预测特征、语境化词嵌入特征、依存句法特征和基线特征等从不同模型中提取到的特征分别输入到基于循环神经网络的下游模型中,进一步学习后采用不同的特征融合方式进行融合,以此来提高译文质量估计的准确性.通过对比实验表明,本文所提出的多特征融合策略相比于单个特征能更好地对双语信息进行表达,且进一步提高了译文质量估计的皮尔逊相关系数等评价指标.

Unlike traditional machine translation evaluation methods,translation quality estimation technique aims to evaluate the quality of machine translations without references.At present,the deep learning-based features extracted by translation quality estimation methods are not sufficient due to the lack of data and the limitation of models.Focusing on this problem,we propose a multi-feature fusion method.In this method,features extracted from different aspects such as word prediction features,contextualized word embedding features,dependent syntactic features and baseline features are input to the downstream model based on the recurrent neural network.Then different strategies are adopted to combine these features.Comparative experiments show that the proposed method can better express the bilingual information compared with the single-feature method,and can improve the Pearson correlation coefficient as well as other evaluation metrics of sentence-level translation quality estimation.