《厦门大学学报（自然科学版）》

现有最先进的神经机器翻译模型大都依赖于多层神经网络结构,针对多层网络结构易导致信息退化的问题,提出通过融合层与层之间的输出信息来改善各个层之间的残差连接关系的方法,从而使得层与层之间联系更紧密.相比于原来的残差网络连接,进一步优化了深层网络的信息流动结构,使得整个结构有效信息流动更充分.在Transformer模型和序列到序列的卷积(convolutional sequence to sequence, Conv S2S)模型上进行相关实验,大规模中-英翻译任务的实验结果表明,该方法提高了Transformer和Conv S2S的翻译性能.

Since most advanced neural machine translation models depend on the multi-layer structure,it is easy to cause information degradation for multi-layer network structure.This paper improves the residual connection between layers by fusion the output information between layers and layers,thus shortening the layer-to-layer connection.Compared to the original residual network connection,this paper further optimizes the information flow structure of the deep network,making the whole structure more closely connected.In this paper,experiments are performed on the most advanced models "Transformer" and "convolutional sequence to sequence(Conv S2S)".Experimental results in large-scale Chinese-English translation tasks show that the information between layers and layers enhances translation performances of the Transformer and the Convolutional Sequence to Sequence.

引言
1 基准系统
2 多层信息融合模型
3 实验
4 结论

图1 Transformer基准系统及其改进<br/>Fig.1 Transformer baseline system and its improvement

图1 Transformer基准系统及其改进
Fig.1 Transformer baseline system and its improvement

图2 Conv S2S基准系统及其改进<br/>Fig.2 Conv S2S baseline system and its improvement

图2 Conv S2S基准系统及其改进
Fig.2 Conv S2S baseline system and its improvement

表1 Transformer及不同融合方法的BLEU值<br/>Tab.1 BLEU score of Transformer with different fusion%

表1 Transformer及不同融合方法的BLEU值
Tab.1 BLEU score of Transformer with different fusion%

表2 Transformer不同融合方法的运算速度<br/>Tab.2 Speed of Transformer with different fusion

表2 Transformer不同融合方法的运算速度
Tab.2 Speed of Transformer with different fusion

表3 Transformer不同融合方法的一元语法BLEU值<br/>Tab.3 1-gram BLEU score of Transformer with different fusion%

表3 Transformer不同融合方法的一元语法BLEU值
Tab.3 1-gram BLEU score of Transformer with different fusion%

表4 不同融合方法下Transformer系统的英德语料翻译结果<br/>Tab.4 Translation results of German-English on Transformer systems with different fusion ways%

表4 不同融合方法下Transformer系统的英德语料翻译结果
Tab.4 Translation results of German-English on Transformer systems with different fusion ways%

表5 Conv S2S模型不同融合方法下的BLEU值<br/>Tab.5 BLEU score of Conv S2S with different fusion%

表5 Conv S2S模型不同融合方法下的BLEU值
Tab.5 BLEU score of Conv S2S with different fusion%

表6 Conv S2S系统不同的融合方法下的运算速度<br/>Tab.6 Speed of Conv S2S with different fusion method

表6 Conv S2S系统不同的融合方法下的运算速度
Tab.6 Speed of Conv S2S with different fusion method

表7 Conv S2S不同融合方法的一元语法 BLEU值<br/>Tab.7 1-gram BLEU score of Conv S2S with different fusion%

表7 Conv S2S不同融合方法的一元语法 BLEU值
Tab.7 1-gram BLEU score of Conv S2S with different fusion%

表8 Conv S2S不同融合方法下的英德语料BLEU值<br/>Tab.8 German-English BLEU score of Conv S2S with different fusion

表8 Conv S2S不同融合方法下的英德语料BLEU值
Tab.8 German-English BLEU score of Conv S2S with different fusion

表9 译文示例<br/>Tab.9 Example of translation

表9 译文示例
Tab.9 Example of translation

[1] 戴新宇,尹存燕,陈家骏,等.机器翻译研究现状与展望[J].计算机科学,2004,31(11):176-179.
[2] 刘洋.神经机器翻译前沿进展[J].计算机研究与发展,2017,54(6):1144-1149.
[3] 刘群.统计机器翻译综述[J].中文信息学报,2003,17(4):1-12.
[4] SUTSKEVER I,VINYALS O,LE Q V,et al.Sequence to sequence learning with neural networks[C]∥Neural Information Processing Systems.Montréal:NIPS,2014:3104-3112.
[5] BAHDANAU D,CHO K,BENGIO Y,et al.Neural machine translation by jointly learning to align and translate[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1409.0473.pdf.
[6] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[EB/OL].[2018-10-26].http:∥arxiv.org/pdf/1705.03122.pdf.
[7] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]∥Neural Information Processing Systems.Los Angeles:NIPS,2017:5998-6008.
[8] WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:bridging the gap between human and machine translation[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1609.08144.pdf.
[9] CHEN M X,FIRAT O,BAPNA A,et al.The best of both worlds:combining recent advances in neural machine translation[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1804.09849.pdf.
[10] ZHANG W,HU J,FENG Y,et al.Information-propogation-enhanced neural machine translation by relation model[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1709.01766.pdf.
[11] SHEN Y,TAN X,HE D,et al.Dense information flow for neural machine translation[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1806.00722.pdf.
[12] TARG S,ALMEIDA D,LYMAN K.Resnet in resnet:generalizing residual architectures[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1603.08029.pdf.
[13] VEIT A,WILBER M J,BELONGIE S.Residual networks behave like ensembles of relatively shallow networks[C]∥Neural Information Processing Systems.Barcelona:NIPS,2016:550-558.
[14] YU X,YU Z,RAMALINGAM S.Learning strict identity mappings in deep residual networks[C]∥Computer Vision and Pattern Recognition.Salt Lake City:CVPR,2018:4432-4440.
[15] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]∥Computer Vision and Pattern Recognition.Las Vegas:CVPR,2016:770-778.
[16] ZHANG B,XIONG D,SU J,et al.Accelerating neural transformer via an average attention network[C]∥Meeting of the Association for Computational Linguistics.Melbourne:ACL,2018:1789-1798.
[17] DAUPHIN Y N,FAN A,AULI M,et al.Language modeling with gated convolutional networks[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1612.08083.pdf.
[18] PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a method for automatic evaluation of machine translation[C]∥Meeting of the Association for Computational Linguistics.Philadephia,Pennsylvania:ACL,2002:311-318.
[19] KUANG S,LI J,BRANCO A,et al.Attention focusing for neural machine translation by bridging source and target embeddings[C]∥Meeting of the Association for Computational Linguistics.Melbourne:ACL,2018:1767-1776.
[20] GOODFELLOW I,BENGIO Y,COURVILLE A.Deep learning[EB/OL].[2018-10-14].http:∥www.deeplearningbook.org.

备注

引言

1 基准系统

2 多层信息融合模型

3 实验

4 结论

学报简介

备注

引言

1 基准系统

2 多层信息融合模型

3 实 验

4 结 论

学报简介

3 实验

4 结论