|本期目录/Table of Contents|

[1]周孝青,段湘煜*,俞鸿飞,等.多层信息融合的神经机器翻译[J].厦门大学学报(自然科学版),2019,58(02):149-157.[doi:10.6043/j.issn.0438-0479.201811012]
 ZHOU Xiaoqing,DUAN Xiangyu*,YU Hongfei,et al.Multi-layer information fusion for neural machine translation[J].Journal of Xiamen University(Natural Science),2019,58(02):149-157.[doi:10.6043/j.issn.0438-0479.201811012]
点击复制

多层信息融合的神经机器翻译(PDF/HTML)
分享到:

《厦门大学学报(自然科学版)》[ISSN:0438-0479/CN:35-1070/N]

卷:
58卷
期数:
2019年02期
页码:
149-157
栏目:
机器翻译模型
出版日期:
2019-03-27

文章信息/Info

Title:
Multi-layer information fusion for neural machine translation
文章编号:
0438-0479(2019)02-0149-09
作者:
周孝青段湘煜*俞鸿飞张 民
苏州大学计算机科学与技术学院,江苏 苏州 215006
Author(s):
ZHOU XiaoqingDUAN Xiangyu*YU HongfeiZHANG Min
School of Computer Science and Technology,Soochow University,Suzhou 215006,China
关键词:
神经机器翻译 残差网络 融合
Keywords:
neural machine translation residual network fusion
分类号:
TP 391.2
DOI:
10.6043/j.issn.0438-0479.201811012
文献标志码:
A
摘要:
现有最先进的神经机器翻译模型大都依赖于多层神经网络结构,针对多层网络结构易导致信息退化的问题,提出通过融合层与层之间的输出信息来改善各个层之间的残差连接关系的方法,从而使得层与层之间联系更紧密.相比于原来的残差网络连接,进一步优化了深层网络的信息流动结构,使得整个结构有效信息流动更充分.在Transformer模型和序列到序列的卷积(convolutional sequence to sequence, Conv S2S)模型上进行相关实验,大规模中-英翻译任务的实验结果表明,该方法提高了Transformer和Conv S2S的翻译性能.
Abstract:
Since most advanced neural machine translation models depend on the multi-layer structure,it is easy to cause information degradation for multi-layer network structure.This paper improves the residual connection between layers by fusion the output information between layers and layers,thus shortening the layer-to-layer connection.Compared to the original residual network connection,this paper further optimizes the information flow structure of the deep network,making the whole structure more closely connected.In this paper,experiments are performed on the most advanced models "Transformer" and "convolutional sequence to sequence(Conv S2S)".Experimental results in large-scale Chinese-English translation tasks show that the information between layers and layers enhances translation performances of the Transformer and the Convolutional Sequence to Sequence.

参考文献/References:

[1] 戴新宇,尹存燕,陈家骏,等.机器翻译研究现状与展望[J].计算机科学,2004,31(11):176-179.
[2] 刘洋.神经机器翻译前沿进展[J].计算机研究与发展,2017,54(6):1144-1149.
[3] 刘群.统计机器翻译综述[J].中文信息学报,2003,17(4):1-12.
[4] SUTSKEVER I,VINYALS O,LE Q V,et al.Sequence to sequence learning with neural networks[C]∥Neural Information Processing Systems.Montréal:NIPS,2014:3104-3112.
[5] BAHDANAU D,CHO K,BENGIO Y,et al.Neural machine translation by jointly learning to align and translate[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1409.0473.pdf.
[6] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[EB/OL].[2018-10-26].http:∥arxiv.org/pdf/1705.03122.pdf.
[7] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]∥Neural Information Processing Systems.Los Angeles:NIPS,2017:5998-6008.
[8] WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural machine translation system:bridging the gap between human and machine translation[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1609.08144.pdf.
[9] CHEN M X,FIRAT O,BAPNA A,et al.The best of both worlds:combining recent advances in neural machine translation[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1804.09849.pdf.
[10] ZHANG W,HU J,FENG Y,et al.Information-propogation-enhanced neural machine translation by relation model[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1709.01766.pdf.
[11] SHEN Y,TAN X,HE D,et al.Dense information flow for neural machine translation[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1806.00722.pdf.
[12] TARG S,ALMEIDA D,LYMAN K.Resnet in resnet:generalizing residual architectures[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1603.08029.pdf.
[13] VEIT A,WILBER M J,BELONGIE S.Residual networks behave like ensembles of relatively shallow networks[C]∥Neural Information Processing Systems.Barcelona:NIPS,2016:550-558.
[14] YU X,YU Z,RAMALINGAM S.Learning strict identity mappings in deep residual networks[C]∥Computer Vision and Pattern Recognition.Salt Lake City:CVPR,2018:4432-4440.
[15] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]∥Computer Vision and Pattern Recognition.Las Vegas:CVPR,2016:770-778.
[16] ZHANG B,XIONG D,SU J,et al.Accelerating neural transformer via an average attention network[C]∥Meeting of the Association for Computational Linguistics.Melbourne:ACL,2018:1789-1798.
[17] DAUPHIN Y N,FAN A,AULI M,et al.Language modeling with gated convolutional networks[EB/OL].[2018-10-10].http:∥arxiv.org/pdf/1612.08083.pdf.
[18] PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a method for automatic evaluation of machine translation[C]∥Meeting of the Association for Computational Linguistics.Philadephia,Pennsylvania:ACL,2002:311-318.
[19] KUANG S,LI J,BRANCO A,et al.Attention focusing for neural machine translation by bridging source and target embeddings[C]∥Meeting of the Association for Computational Linguistics.Melbourne:ACL,2018:1767-1776.
[20] GOODFELLOW I,BENGIO Y,COURVILLE A.Deep learning[EB/OL].[2018-10-14].http:∥www.deeplearningbook.org.

备注/Memo

备注/Memo:
收稿日期:2018-11-09 录用日期:2019-01-12
基金项目:国家重点研发计划(2016YFE0132100); 国家自然科学基金(61673289)
*通信作者:xiangyuduan@suda.edu.cn
更新日期/Last Update: 1900-01-01