《厦门大学学报（自然科学版）》

依赖于大规模的平行语料库,神经机器翻译在某些语言对上已经取得了巨大的成功.然而高质量平行语料的获取却是机器翻译研究的主要难点之一.为了解决这一问题,一种可行的方案是采用无监督神经机器翻译(unsupervised neural machine translation,UNMT),该方法仅仅使用两门不相关的单语语料就可以进行训练,并获得一个不错的翻译结果.受多任务学习在有监督神经机器翻译上取得的良好效果的启发,本文主要探究UNMT在多语言、多任务学习上的应用.实验使用3门互不相关的单语语料,两两建立双向的翻译任务.实验结果表明,与单任务UNMT相比,该方法在部分语言对上最高取得了2～3个百分点的双语互译评估(BLEU)值提升.

Depending on the large-scale parallel corpus,neural machine translation has achieved great success in some language pairs.Unfortunately,for the vast majority of language pairs,the acquisition of high quality parallel corpus remains one of the main difficulties in machine translation research.To solve this problem,we propose to use unsupervised neural machine translation(UNMT).This method can train two unrelated monolingual corpora in a neural machine translation system,and obtain a good translation result.Inspired by meritorious results of multi-task learning in supervised neural machine translation,we explore the application of the UNMT in multi-task learning.In our experiment,we use three unrelated monolingual corpora to create a translation task.According to the experimental results,compared with the single-task UNMT,this method has performed greatly in some language pairs.

引言
1 架构与模型
2 实验
3 总结

图1 Transformer架构<br/>Fig.1 Transformer architecture

图1 Transformer架构
Fig.1 Transformer architecture

图2 3种语言之间翻译的三角架构<br/>Fig.2 Triangular architecture for the translation in 3 languages

图2 3种语言之间翻译的三角架构
Fig.2 Triangular architecture for the translation in 3 languages

图3 多语言UNMT模型结构<br/>Fig.3 The model architecture of multi-language

图3 多语言UNMT模型结构
Fig.3 The model architecture of multi-language

表1 多任务UNMT系统和单任务UNMT系统的性能比较<br/>Tab.1 Performance comparison between multi-task UNMT and single-task UNMT

表1 多任务UNMT系统和单任务UNMT系统的性能比较
Tab.1 Performance comparison between multi-task UNMT and single-task UNMT

表2 不同词表大小的实验结果比较<br/>Tab.2 Experimental comparison of different vocabulary sizes

表2 不同词表大小的实验结果比较
Tab.2 Experimental comparison of different vocabulary sizes

图4 两种模型在defr翻译任务上的比较<br/>Fig.4 Comparison of the two translation models on the defr task

图4 两种模型在defr翻译任务上的比较
Fig.4 Comparison of the two translation models on the defr task

[1] 李亚超,熊德意,张民.神经机器翻译综述[J].计算机学报,2018,41(12):100-121.
[2] BAHDANAU D,CHO K,BENGIO Y,et al.Neural machine translation by jointly learning to align and translate[EB/OL].[2019-08-01].https:∥arxiv.org/pdf/1409.0473.pdf.
[3] WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:bridging the gap between human and machine translation[EB/OL].[2019-08-01].https:∥arxiv.org/pdf/1609.08144.pdf.
[4] SUTSKEVER I,VINYALS O,LE Q V,et al.Sequence to sequence learning with neural networks[C]∥Neural Information Processing Systems.Montréal:NIPS,2014:3104-3112.
[5] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]∥Neural Information Processing Systems.Los Angeles:NIPS,2017:5998-6008.
[6] ARTETXE M,LABAKA G,AGIRRE E,et al.Learning bilingual word embeddings with(almost)no bilingual data[C]∥Annual Meeting of the Association for Computational Linguistics.Vancouver:ACL,2017:451-462.
[7] ARTETXE M,LABAKA G,AGIRRE E,et al.Unsupervised neural machine translation[EB/OL].[2019-08-01].https:∥arxiv.org/pdf/1710.11041.pdf.
[8] LAMPLE G,OTT M,CONNEAU A,et al.Phrase-based & neural unsupervised machine translation[C]∥Empirical Methods in Natural Language Processing.Brussels:EMNLP,2018:5039-5049.
[9] LAMPLE G,CONNEAU A,DENOYER L,et al.Unsupervised machine translation using monolingual corpora only[EB/OL].[2019-08-01].https:∥arxiv.org/pdf/1711.00043.pdf.
[10] DONG D,WU H,HE W,et al.Multi-task learning for multiplel anguage translation[C]∥International Joint Conference on Natural Language Processing.Beijing:IJCNLP,2015:1723-1732.
[11] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[EB/OL].[2019-08-01].https:∥arxiv.org/pdf/1512.03385.pdf.
[12] MIKOLOV T,CHEN K,CORRADO G S,et al.Efficient estimation of word representations in vector space[EB/OL].[2019-08-01].https:∥arxiv.org/pdf/1301.3781.pdf.
[13] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[EB/OL].[2019-08-01].https:∥arxiv.org/pdf/1310.4546.pdf.
[14] YANG Z,CHEN W,WANG F,et al.Unsupervised neural machine translation with weight sharing[C]∥Annual Meeting of the Association for Computational Linguistics.Melbourne:ACL,2018:46-55.
[15] SENNRICH R,HADDOW B,BIRCH A,et al.Neural machine translation of rare words with subword units[EB/OL].[2019-08-01].https:∥arxiv.org/pdf/1508.07909.pdf.
[16] GULCEHRE C,AHN S,NALLAPATI R,et al.Pointing the unknown words[C]∥Annual Meeting of the Association for Computational Linguistics.Berlin:ACL,2016:140-149.
[17] BARONE A.Towards cross-lingual distributed represe-ntations without parallel text trained with adversarial autoencoders[J].Annual Meeting of the Association for Computational Linguistics.Berlin:ACL,2016:121-126.
[18] YANG Z,CHEN W,WANG F,et al.Improving neural machine translation with conditional sequence genera-tive adversarial nets[C]∥North American Chapter of the Association for Computational Linguistics.New Orleans:NAACL,2018:1346-1355.
[19] BALABANOVIC M,SHOHAM Y.Fab:content-based,collaborative recommendation[J].Communications of the ACM,1997,40(3):66-72.
[20] KINGMA D P,BA J.Adam:a method for stochastic optimization[EB/OL].[2019-08-01].http:∥arxiv.org/pdf/1412.6980.pdf.
[21] REN S,CHEN W,LIU S,et al.Triangular architecture for rare language translation[C]∥Annual Meeting of the Association for Computational Linguistics.Melbourne:ACL,2018:56-65.

备注

引言

1 架构与模型

2 实验

3 总结

学报简介

备注

引言

1 架构与模型

2 实 验

3 总 结

学报简介

2 实验

3 总结