《厦门大学学报（自然科学版）》

现有的神经机器翻译模型的注意力机制仅考虑目标端对应源端的关联信息,未考虑源端单词之间的关联信息.通过在源端进行关联性建模,融入依存关联指导,以此加强源端单词之间的关联性,提高机器翻译的性能.首先构建源端隐藏层之间的关联性,其次构建依存关联损失函数,从而将依存关联指导融入基准的神经机器翻译系统.利用循环神经网络基准模型和Transformer基准模型分别在大规模的中-英测试数据集上进行实验,结果表明,相较于基准神经机器翻译系统,融入依存关联指导可以有效提升机器翻译质量.

The attention mechanism commonly used by the existing neural machine translation only considers the correlation information between the target and the source,and does not take source correlation information among words into account.This paper enhances the correlation information among source words and improves the performance of machine translation by building correlation models at the source and incorporating dependency guidance.We constructed the correlation information between the source hidden layers,and built dependent loss function into neural machine translation.We experimented with large-scale Chinese-to-English data set on RNN system and Transformer model.Experiment results show that dependency guidance for neural machine translation can effectively improve the translation quality.

引言
1 基准模型
2 融入依存关联指导的NMT
3 实验结果和分析
4 结论

图1 Stanford parser解析的依存关系<br/>Fig.1 Example of dependencies parsed by Stanford parser

图1 Stanford parser解析的依存关系
Fig.1 Example of dependencies parsed by Stanford parser

图2 源端依存关联性指导示例<br/>Fig.2 Example of source dependency-based guidance

图2 源端依存关联性指导示例
Fig.2 Example of source dependency-based guidance

表1 Lamtram系统融合依存信息的BLEU<br/>Tab.1 BLEU values of dependency-based correlation guidance on Lamtram%

表1 Lamtram系统融合依存信息的BLEU
Tab.1 BLEU values of dependency-based correlation guidance on Lamtram%

表2 Transformer系统融合依存信息的BLEU<br/>Tab.2 BLEU values of dependency-based correlation guidance on Transformer%

表2 Transformer系统融合依存信息的BLEU
Tab.2 BLEU values of dependency-based correlation guidance on Transformer%

表3 不同源端语句长度对应译文的BLEU值<br/>Tab.3 BLEU values of translation correspond to the different source lengths

表3 不同源端语句长度对应译文的BLEU值
Tab.3 BLEU values of translation correspond to the different source lengths

图3 Lamtram系统不同源端长度对应的BLEU值<br/>Fig.3 BLEU values correspond to the different source lengths on Lamtram

图3 Lamtram系统不同源端长度对应的BLEU值
Fig.3 BLEU values correspond to the different source lengths on Lamtram

图4 Transformer系统不同源端长度对应的BLEU值<br/>Fig.4 BLEU values values correspond to the different source lengths on Transformer

图4 Transformer系统不同源端长度对应的BLEU值
Fig.4 BLEU values values correspond to the different source lengths on Transformer

[1] SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]∥Advances in neural information processing systems.[S.l.]:NIPS,2014:3104-3112.
[2] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].[2018-10-01].http:∥arxiv.org/abs/1409.0473.
[3] LUONG M T,PHAM H,MANNING C D.Effective approaches to attention-based neural machine translation[EB/OL].[2018-10-01].http:∥arxiv.org/abs/1508.04025.
[4] 李亚超,熊德意,张民.神经机器翻译综述[EB/OL].[2018-10-05].https:∥max.book118.com/html/2018/0105/147392219.shtm.
[5] KOEHN P,OCH F J,MARCU D.Statistical phrase-based translation[C]∥Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology.Edmonton:Association for Computational Linguistics,2003:48-54.
[6] WU S,ZHANG D,YANG N,et al.Sequence-to-dependency neural machine translation[C]∥Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.[S.l.]:Association for Computational Linguistics,2017:698-707.
[7] WU S,ZHOU M,ZHANG D.Improved neural machine translation with source syntax[C]∥Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.Melbourne:IJCAI,2017:4179-4185.
[8] HASHIMOTO K,TSURUOKA Y.Neural machine translation with source-side latent graph parsing[EB/OL].[2018-10-01].http:∥arxiv.org/abs/1702.02265.
[9] CHEN K,WANG R,UTIYAMA M,et al.Syntax-Directed Attention for Neural Machine Translation[EB/OL].[2018-10-01].http:∥arxiv.org/abs/1711.04231.
[10] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[11] CHO K,VAN MERRIËNBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL].[2018-10-01].http:∥arxiv.org/abs/1406.1078.
[12] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]∥Advances in Neural Information Processing Systems.[S.l.]:arXiv,2017:5998-6008.
[13] BA J L,KIROS J R,HINTON G E.Layer nnormalization[EB/OL].[2018-10-01].http:∥arxiv.org/abs/1607.06450.
[14] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778.
[15] KINGMA D P,BA J.Adam:a method for stochastic optimization[EB/OL].[2018-10-01].http:∥arxiv.org/abs/1412.6980.
[16] SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].The Journal of Machine Learning Research,2014,15(1):1929-1958.

备注

引言

1 基准模型

2 融入依存关联指导的NMT

3 实验结果和分析

4 结论

学报简介

备注

引言

1 基准模型

2 融入依存关联指导的NMT

3 实验结果和分析

4 结 论

学报简介

4 结论