多层信息融合的神经机器翻译

(苏州大学计算机科学与技术学院,江苏 苏州 215006)

神经机器翻译; 残差网络; 融合

Multi-layer information fusion for neural machine translation
ZHOU Xiaoqing,DUAN Xiangyu*,YU Hongfei,ZHANG Min

(School of Computer Science and Technology,Soochow University,Suzhou 215006,China)

DOI: 10.6043/j.issn.0438-0479.201811012

备注

现有最先进的神经机器翻译模型大都依赖于多层神经网络结构,针对多层网络结构易导致信息退化的问题,提出通过融合层与层之间的输出信息来改善各个层之间的残差连接关系的方法,从而使得层与层之间联系更紧密.相比于原来的残差网络连接,进一步优化了深层网络的信息流动结构,使得整个结构有效信息流动更充分.在Transformer模型和序列到序列的卷积(convolutional sequence to sequence, Conv S2S)模型上进行相关实验,大规模中-英翻译任务的实验结果表明,该方法提高了Transformer和Conv S2S的翻译性能.

Since most advanced neural machine translation models depend on the multi-layer structure,it is easy to cause information degradation for multi-layer network structure.This paper improves the residual connection between layers by fusion the output information between layers and layers,thus shortening the layer-to-layer connection.Compared to the original residual network connection,this paper further optimizes the information flow structure of the deep network,making the whole structure more closely connected.In this paper,experiments are performed on the most advanced models "Transformer" and "convolutional sequence to sequence(Conv S2S)".Experimental results in large-scale Chinese-English translation tasks show that the information between layers and layers enhances translation performances of the Transformer and the Convolutional Sequence to Sequence.