《厦门大学学报（自然科学版）》

一种简单的神经机器翻译的动态数据扩充方法

刘志东,李军辉^*,贡正仙

(苏州大学计算机科学与技术学院,江苏苏州 215006)

关键词：神经机器翻译; 数据扩充; 单词覆盖

A simple dynamic data expansion method for neural machine translation

LIU Zhidong,LI Junhui^*,GONG Zhengxian

(School of Computer Science and Technology,Soochow University,Suzhou 215006,China)

DOI: 10.6043/j.issn.0438-0479.202011027

备注

摘要

全文

图/表

参考文献

反向翻译作为一种用于神经机器翻译的数据扩充方法,被广泛应用于单语数据的训练.然而,这些方法通常需要大规模源端或目标端单语数据、双语词典等.基于此,提出了一种在不引入外部资源情况下的简单数据扩充方法.该方法在每次加载目标端句子时,按照一定策略对句子中单词进行随机噪声化,以实现原始平行数据目标端的动态数据扩充,从而提高目标端语言模型对句子的表达能力.不同于需要大量单语数据的反向翻译,该方法只使用平行语料.这一策略意味着不需要训练额外的逆向模型.在英德和中英翻译任务上的实验结果表明,该方法使标准Transformer系统的双语互译评估(BLEU)值分别提高了0.69和0.66个百分点.

As a type of data expansion method for neural machine translation,back-translation has been widely used to train with monolingual data.However,these methods often require large-scale source side or target side monolingual datasets,bilingual dictionaries and so on.This paper proposes a simple data expansion method without introducing external resources.Each time the target sentence is loaded,the words in the sentence are randomly noised according to a certain strategy to realize the target data dynamic expansion of the original parallel data,so as to improve the expression ability of the target language model to the sentence.Specifically,different from back-translation which requires huge amount of monolingual data,this method only use parallel corpuses.This strategy means that we do not need to train an additional reverse model.Experimental results regarding English-German and Chinese-English translation tasks show that our approach significantly improves the bilingual evaluation understudy(BLEU)values of a standard Transformer system by 0.69 and 0.66 percentage points respectively.

引言
1 背景知识
2 目标端动态数据扩充方法
3 实验结果与分析
4 结论

pdf格式下载

+分享

导出

学报简介

《厦门大学学报（自然科学版）》于1931年创刊，是由教育部主管，厦门大学主办，国内外公开发行的综合性学术期刊（双月刊），是我国自然科学核心期刊。本刊以印刷版、网络版的方式同时出版。主要刊载自然科学各学科的最新研究成果，包括自然科学基础理论研究、应用基础研究、高新技术方面的学术论文。所刊载的论文分三大类型：（1）“快讯”：报道某前沿领域具有突破性的最新研究成果。（2）“研究论文”：刊载理工科基础理论研究与实验研究学术论文。（3）“研究简报”：刊载内容新颖、实用（或阶段性）的成果。更多>>

备注

引言

1 背景知识

2 目标端动态数据扩充方法

3 实验结果与分析

4 结 论

学报简介

4 结论