《厦门大学学报（自然科学版）》

基于门限卷积神经网络和词嵌入的中文分词法

张竞¹,刘暾东¹,陈美谦^2*

(1.厦门大学航空航天学院,福建厦门 361102; 2.集美大学轮机工程学院,福建厦门 361021)

Gated Convolutional Neural Networks and Word Embedding for Chinese Word Segmentation

ZHANG Jing¹,LIU Tundong¹,CHEN Meiqian^2*

(1.School of Aerospace Engineering,Xiamen University,Xiamen 361102,China; 2.School of Marine Engineering,Jimei University,Xiamen 361021,China)

Keywords：natural language processing; deep learning; convolutional neural networks; Chinese word segmentation; word embedding

DOI: 10.6043/j.issn.0438-0479.201804008

备注

摘要

全文

图/表

参考文献

目前,许多研究者将神经网络模型应用到中文分词任务中,其表现虽然优于传统的机器学习分词法,但未能充分发挥神经网络自动学习特征的优势,且未使用词向量信息.针对该问题,提出基于门限卷积神经网络(gated convolutional neural networks,GCNNs)的中文分词法,并利用词嵌入方法将词向量融入模型中,使该模型在不需要大量特征工程的情况下可以自动学习二元特征.通过在简体中文数据集(PKU、MSRA和CTB6)上进行实验,结果表明,与以往的神经网络模型相比,在不依赖特征工程的情况下,该模型仍能取得较好的分词效果.

Currently,researchers have applied neural network models to Chinese word segmentation tasks.Although these models outperform traditional machine learning models,they fail to take full advantages of neural networks to learn features automatically.In addition,the word embedding is not used.In this article,we proposes gated convolutional neural networks(GCNNs)for Chinese word segmentation,and integrate the word embeddings into GCNNs,which can learn bigram features automatically without any feature engineering.Without feature engineering,experimental results on simplified Chinese datasets(PKU,MSRA and CTB6)show that,compared with previous neural network models,the proposed model performs satisfactorily in segmentation.

引言
1 神经网络中文分词通用架构
2 基于GCNNs模型的中文分词法
3 实验和结果分析
4 结论

pdf格式下载

+分享

导出

学报简介

《厦门大学学报（自然科学版）》于1931年创刊，是由教育部主管，厦门大学主办，国内外公开发行的综合性学术期刊（双月刊），是我国自然科学核心期刊。本刊以印刷版、网络版的方式同时出版。主要刊载自然科学各学科的最新研究成果，包括自然科学基础理论研究、应用基础研究、高新技术方面的学术论文。所刊载的论文分三大类型：（1）“快讯”：报道某前沿领域具有突破性的最新研究成果。（2）“研究论文”：刊载理工科基础理论研究与实验研究学术论文。（3）“研究简报”：刊载内容新颖、实用（或阶段性）的成果。更多>>

备注

引言

1 神经网络中文分词通用架构

2 基于GCNNs模型的中文分词法

3 实验和结果分析

4 结 论

学报简介

4 结论