一种多源领域自适应命名实体识别方法

(北京交通大学计算机与信息技术学院,北京 100044)

命名实体识别; 领域自适应; 贡献度加权; 多源

A multi-source domain adaptation approach in named entity recognition
LI Jiarui,LIU Jian,CHEN Yufeng*,XU Jin'an,ZHANG Yujie

(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China)

named entity recognition; domain adaptation; importance weighting; multi-source

DOI: 10.6043/j.issn.0438-0479.202109033

备注

领域自适应是解决低资源问题的一种通用方式,可应用于各种自然语言处理的任务中.当前针对命名实体识别(named entity recognition,NER)任务的领域自适应研究通常从单一的源领域迁移到目标领域,在目标领域和源领域相近的情况下,这种方式能够取得较好的识别效果,但是在目标领域与源领域相关度不高的情况下,单一领域迁移方式存在很大的局限性.针对这一问题,提出一种融合多源领域贡献度加权的自适应NER模型(multi-domain adaptation NER model based on importance weighting,MDAIW).1)通过多个领域的知识迁移来提升目标领域的实体识别性能; 2)根据不同领域及其领域内样本对目标领域的重要性,计算领域贡献度; 3)将领域贡献度引入到NER模型中,以此来实现更好的模型领域适应性.最终在多个目标领域上进行实验,性能皆优于当前性能最好的方法,验证了模型的有效性.

Objective : With the development of artificial intelligence, the demand for named entity recognition (NER) in specific domains is increasing. However, these domains often lack sufficient labeled corpus, that is, low resource domains. As a solution to the problem of low resource NER, cross domain named entity recognition has become a hot research direction. Although the existing cross domain NER methods have made some research progress, there are still problems of single source domain transfer and relying on large-scale target domain unlabeled text.
Methods : A multi-source domain adaptation named entity recognition model based on importance weighting (MDAIW) is proposed, which improves the model effect of the target domain based on the knowledge migration of multiple source domains. MDAIW promotes the performance of the target domain through the knowledge of multiple domains, models the importance of the target domain according to different domains and their samples, and weights to the named entity recognition model, so as to achieve better model adaptability.
Results : CoNLL and CrossNER are used for experiments. CoNLL is a news domain dataset, including four entity tags: person(PER), local(LOC), organization (ORG) and miscellaneous (MISC). CrossNER is a dataset specially built for cross domain NER task, including five domains: politics, science, music, literature and artificial intelligence. In addition to four general entities, domain terms are added. F1 value is selected as the evaluation index of the experiment. During the experiment, one domain in CrossNER is taken as the target domain, and the other domains are taken as the source domain. Compared with the best results at present, the F1 values of the proposed MDAIW model in five domains are increased by 2.44, 0.75, 1.67, 1.82 and 5.23 percentage points respectively. It can be seen from the results that if the datasets in other domains are directly used to expand the data in the target domain without any processing, sometimes great improvement can be achieved, but sometimes negative migration effect will occur. Therefore, due to different distribution, the mixing effect of multiple domains may decline, and this method can effectively alleviate this problem. It is proved that the model is effective and applicable to many different domains. In order to verify the effectiveness of each structure in MDAIW, ablation experiments were carried out. From the two-stage domain adaptive ablation experiment, it can be seen that: after removing the target domain pre-training, the performance decreases more, indicating that the target domain pre-training improves the effect of the model more; At the same time, the source domain pre-training also improves the effect of the model. From the two-stage domain adaptive exchange experiment, it can be seen that the effect of target domain pre-training after advanced source domain pre-training decreases, which explains the necessity of two-step pre-training sequence.
Conclusions : A MDAIW model and a two-stage domain adaptive pre-training method are proposed. The two-stage domain adaptive pre-training solves the problem of large-scale data dependence in the target domain and achieves the same domain adaptive effect at a small resource cost. In the adaptive model integrating multi-source domain contribution weighting, the model effect of the target domain is improved through multiple source domains. At the same time, the domain adaptation effect is further improved by calculating the importance parameters at the domain level and sample level and introducing them into the NER model. MDAIW model surpasses the comparison method in experiments in multiple target areas, and can be applied to upstream tasks in low resource areas to improve performance. In the future work, we will discuss how to select multiple source domains that are most suitable for the target domain through domain selection, hoping to further improve the effect of domain adaptation through domain selection.