基于门限卷积神经网络和词嵌入的中文分词法

(1.厦门大学航空航天学院,福建 厦门 361102; 2.集美大学轮机工程学院,福建 厦门 361021)

自然语言处理; 深度学习; 卷积神经网络; 中文分词; 词嵌入

Gated Convolutional Neural Networks and Word Embedding for Chinese Word Segmentation
ZHANG Jing1,LIU Tundong1,CHEN Meiqian2*

(1.School of Aerospace Engineering,Xiamen University,Xiamen 361102,China; 2.School of Marine Engineering,Jimei University,Xiamen 361021,China)

natural language processing; deep learning; convolutional neural networks; Chinese word segmentation; word embedding

DOI: 10.6043/j.issn.0438-0479.201804008

备注

目前,许多研究者将神经网络模型应用到中文分词任务中,其表现虽然优于传统的机器学习分词法,但未能充分发挥神经网络自动学习特征的优势,且未使用词向量信息.针对该问题,提出基于门限卷积神经网络(gated convolutional neural networks,GCNNs)的中文分词法,并利用词嵌入方法将词向量融入模型中,使该模型在不需要大量特征工程的情况下可以自动学习二元特征.通过在简体中文数据集(PKU、MSRA和CTB6)上进行实验,结果表明,与以往的神经网络模型相比,在不依赖特征工程的情况下,该模型仍能取得较好的分词效果.

Currently,researchers have applied neural network models to Chinese word segmentation tasks.Although these models outperform traditional machine learning models,they fail to take full advantages of neural networks to learn features automatically.In addition,the word embedding is not used.In this article,we proposes gated convolutional neural networks(GCNNs)for Chinese word segmentation,and integrate the word embeddings into GCNNs,which can learn bigram features automatically without any feature engineering.Without feature engineering,experimental results on simplified Chinese datasets(PKU,MSRA and CTB6)show that,compared with previous neural network models,the proposed model performs satisfactorily in segmentation.