|本期目录/Table of Contents|

[1]张 竞,刘暾东,陈美谦*.基于门限卷积神经网络和词嵌入的中文分词法[J].厦门大学学报(自然科学版),2018,57(06):890-895.[doi:10.6043/j.issn.0438-0479.201804008]
 ZHANG Jing,LIU Tundong,CHEN Meiqian*.Gated Convolutional Neural Networks and Word Embedding for Chinese Word Segmentation[J].Journal of Xiamen University(Natural Science),2018,57(06):890-895.[doi:10.6043/j.issn.0438-0479.201804008]
点击复制

基于门限卷积神经网络和词嵌入的中文分词法(PDF/HTML)
分享到:

《厦门大学学报(自然科学版)》[ISSN:0438-0479/CN:35-1070/N]

卷:
57卷
期数:
2018年06期
页码:
890-895
栏目:
自然语言处理
出版日期:
2018-11-28

文章信息/Info

Title:
Gated Convolutional Neural Networks and Word Embedding for Chinese Word Segmentation
文章编号:
0438-0479(2018)06-0890-06
作者:
张 竞1刘暾东1陈美谦2*
1.厦门大学航空航天学院,福建 厦门 361102; 2.集美大学轮机工程学院,福建 厦门 361021
Author(s):
ZHANG Jing1LIU Tundong1CHEN Meiqian2*
1.School of Aerospace Engineering,Xiamen University,Xiamen 361102,China; 2.School of Marine Engineering,Jimei University,Xiamen 361021,China
关键词:
自然语言处理 深度学习 卷积神经网络 中文分词 词嵌入
Keywords:
natural language processing deep learning convolutional neural networks Chinese word segmentation word embedding
分类号:
TP 391.1
DOI:
10.6043/j.issn.0438-0479.201804008
文献标志码:
A
摘要:
目前,许多研究者将神经网络模型应用到中文分词任务中,其表现虽然优于传统的机器学习分词法,但未能充分发挥神经网络自动学习特征的优势,且未使用词向量信息.针对该问题,提出基于门限卷积神经网络(gated convolutional neural networks,GCNNs)的中文分词法,并利用词嵌入方法将词向量融入模型中,使该模型在不需要大量特征工程的情况下可以自动学习二元特征.通过在简体中文数据集(PKU、MSRA和CTB6)上进行实验,结果表明,与以往的神经网络模型相比,在不依赖特征工程的情况下,该模型仍能取得较好的分词效果.
Abstract:
Currently,researchers have applied neural network models to Chinese word segmentation tasks.Although these models outperform traditional machine learning models,they fail to take full advantages of neural networks to learn features automatically.In addition,the word embedding is not used.In this article,we proposes gated convolutional neural networks(GCNNs)for Chinese word segmentation,and integrate the word embeddings into GCNNs,which can learn bigram features automatically without any feature engineering.Without feature engineering,experimental results on simplified Chinese datasets(PKU,MSRA and CTB6)show that,compared with previous neural network models,the proposed model performs satisfactorily in segmentation.

参考文献/References:

[1] XUE N.Chinese word segmentation as character tagging[J].Computational Linguistics and Chinese Language Processing,2003,8(1):29-47.
[2] 黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19.
[3] COLLOBERT R,WESTON J,BOTTOU L,et al.Natural language processing(almost)from scratch[J].Journal of Machine Learning Research,2011,12(1):2493-2537.
[4] ZHENG X,CHEN H,XU T.Deep learning for Chinese word segmentation and POS tagging[C]∥Conference on Empirical Methods in Natural Language Processing.Seattle:Association for Computational Linguistics,2013:647-657.
[5] PEI W,GE T,CHANG B.Max-margin tensor neural network for Chinese word segmentation[C]∥Proceedings of the 52nd Meeting of the Association for Computational Linguistics.Maryland:Association for Computational Linguistics,2014:293-303.
[6] CHEN X,QIU X,ZHU C,et al.Long short-term memory neural networks for Chinese word segmentation[C]∥Conference on Empirical Methods in Natural Language Processing.Lisbon:Association for Computational Linguistics,2015:1197-1206.
[7] XU J,SUN X.Dependency-based gated recursive neural network for Chinese word segmentation[C]∥Meeting of the Association for Computational Linguistics.Berlin:Association for Computational Linguistics,2016:567-572.
[8] 李雪莲,段鸿,许牧.基于门循环单元神经网络的中文分词法[J].厦门大学学报(自然科学版),2017,56(2):237-243.
[9] 金宸,李维华,姬晨,等.基于双向LSTM神经网络模型的中文分词[J].中文信息学报,2018,32(2):29-37.
[10] KIM Y.Convolutional neural networks for sentence classification[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).Doha:Association for Computational Linguistics,2014:1746-1751.
[11] LIU Y,CHE W,GUO J,et al.Exploring segment representations for neural segmentation models[C]∥International Joint Conference on Artificial Intelligence.Palo Alto:AAAI Press,2016:2880-2886.
[12] YANN N.DAUPHIN.Language modeling with gated convolutional networks[EB/OL].[2018-04-17].https:∥arxiv.org/abs/1612.08083.
[13] COLLOBERT R,WESTON J,BOTTOU L,et al.Natural language processing(almost)from scratch[J].Journal of Machine Learning Research,2011,12(1):2493-2537.
[14] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[J].CoRR ABS,2013,26(5):3111-3119.
[15] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[16] EMERSON T.The second international chinese word segmentation bakeoff[C]∥Proceedings of the Fourth SIGHAN Workshopon Chinese Language Processing.Jeju:Association for Computational Linguistics,2005:123-133.
[17] TSENG H,CHANG P,ANDREW G,et al.A conditional random field word segmenter for sighan bakeoff 2005[C]∥Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing.Jeju:Association for Computational Linguistics,2005:168-171.
[18] SCHUSTER M,PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing,2002,45(11):2673-2681.
[19] CAI D,ZHAO H.Neural word segmentation learning for chinese[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin:Association for Computational Linguistics,2016:409-420.

备注/Memo

备注/Memo:
收稿日期:2018-04-19 录用日期:2018-08-19
基金项目:福建省自然科学基金(2018J01488); 福建省高校产学合作项目(2018H6018); 厦门大学校长基金(20720160085)
*通信作者:chycmqccr@163.com
引文格式:张竞,刘暾东,陈美谦.基于门限卷积神经网络和词嵌入的中文分词法[J].厦门大学学报(自然科学版),2018,57(6):890-895.
Citation:ZHANG J,LIU T D,CHEN M Q.Gated convolutional neural networks and word embedding for Chinese word segmentation[J].J Xiamen Univ Nat Sci,2018,57(6):890-895.(in Chinese)
更新日期/Last Update: 1900-01-01