|本期目录/Table of Contents|

[1]范亚超,罗天健,周昌乐*.基于降噪自编码器特征学习的作者识别及其在《西游记》诗词上的应用[J].厦门大学学报(自然科学版),2018,57(06):884-889.[doi:10.6043/j.issn.0438-0479.201804012]
 FAN Yachao,LUO Tianjian,ZHOU Changle*.Author Identification Based on Denoising Autoencoder and It’s Application in "Journey to the West"[J].Journal of Xiamen University(Natural Science),2018,57(06):884-889.[doi:10.6043/j.issn.0438-0479.201804012]
点击复制

基于降噪自编码器特征学习的作者识别及其在《西游记》诗词上的应用(PDF/HTML)
分享到:

《厦门大学学报(自然科学版)》[ISSN:0438-0479/CN:35-1070/N]

卷:
57卷
期数:
2018年06期
页码:
884-889
栏目:
自然语言处理
出版日期:
2018-11-28

文章信息/Info

Title:
Author Identification Based on Denoising Autoencoder and It’s Application in "Journey to the West"
文章编号:
0438-0479(2018)06-0884-06
作者:
范亚超罗天健周昌乐*
厦门大学 信息科学与技术学院,福建省类脑计算技术及应用重点实验室,福建 厦门 361005
Author(s):
FAN YachaoLUO TianjianZHOU Changle*
Fujian Keylab of the Brain-like Computing and Applications,School of Information Science and Engineering,Xiamen University,Xiamen 361005,China
关键词:
降噪自编码器 编码特征 作者识别
Keywords:
denoising autoencoder code feature authorship identification
分类号:
TP 391.1
DOI:
10.6043/j.issn.0438-0479.201804012
文献标志码:
A
摘要:
由于作者归属问题较为复杂,采用传统自然语言处理模型难以完成作者识别.为了深入挖掘作者归属问题,首先采用降噪自编码器深度模型提取文本结构特征,再采用支持向量机分类器完成作者识别.模型的优势在于能够考虑未知文本特征的噪声多样性和复杂性,且能够重构添加噪声的原始文本输入.将该方法应用于吴承恩、王廷陈、薛蕙等人的诗词作者识别,识别准确率最高为78.2%,验证了该方法的有效性,进一步将该方法应用于《西游记》诗词作者识别.
Abstract:
Because of the complexity of the author’s attribution,it is difficult to use the traditional natural language processing model to complete the authorship identification.To discover the author’s attribution,we use the deep model of the denoising autoencoder to analyze the text structure and identify the author’s writing style in the text,and the SVM classifier is used to accomplish the recognition of authors.The advantage of the model lies in considering the noise diversity and complexity of unknown text features,and it can reconstruct the original text input with noise.This method is applied to the recognition of poetry authors such as Wu Chengen,Wang Tingchen, Xue Hui, etc.The most accuracy of recognition is 78.2%,it verifies the validity of the method.Furthermore this method is applied to the identification of poetry authors in "Journey to the West".

参考文献/References:

[1] MENDENHALL T C.The characteristic curve of composition[J].Science,1887,9(S214):237-246.
[2] MOSTELLER F,WALLACE D L.Inference and disputed authorship:the federalist[J].Revue De L Institut International De Statistique,1964,22(1):353.
[3] 肖天久,刘颖.基于聚类和分类的金庸与古龙小说风格分析[J].中文信息学报,2015,29(5):167-177.
[4] STAMATATOS E,FAKOTAKIS N,KOKKINAKIS G.Computer-based authorship attribution without lexical measures[J].Computers & the Humanities,2001,35(2):193-214.
[5] MOHSEN A M,EL-MAKKY N M,GHANEM N.Author identification using deep learning[C]∥IEEE International Conference on Machine Learning and Applications.Anaheim:IEEE,2017:898-903.
[6] 施建军.基于支持向量机技术的《红楼梦》作者研究[J].红楼梦学刊,2011(5):35-52.
[7] 肖天久,刘颖.《红楼梦》词和N元文法分析[J].现代图书情报技术,2015,31(4):50-57.
[8] 李晓军,刘怀亮,杜坤.一种基于复杂网络模型的作者身份识别方法[J].图书情报工作,2015(18):102-107.
[9] CHEN T,SUN Y.Task-guided and path-augmented heterogeneous network embedding for author identification[C]∥Tenth ACM International Conference on Web Search and Data Mining.New York:ACM,2017:295-304.
[10] 易勇,郑艳,何中市,等.基于机器学习的古典诗词作者的判别研究[J].心智与计算,2007(3):359-364.
[11] HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
[12] 刘颖,肖天久.《红楼梦》计量风格学研究[J].红楼梦学刊,2014(4):260-281.
[13] EDER M.Does size matter? Authorship attribution,small samples,big problem[J].Digital Scholarship in the Humanities,2015,30(2):167-182.
[14] 吴承恩.吴承恩诗文集笺校[M].上海:上海古籍出版社,1991:1-357.
[15] 王廷陈.梦泽集[M].台北:台湾商务印书馆,1969:5-368.
[16] 薛蕙.考功集[M].上海:上海古籍出版社,1993:15-323.
[17] 杨慎.升庵集[M].上海:上海古籍出版社,1993:212-530.
[18] 王力.汉语语法纲要[M].北京:中华书局出版社,2015:409-418.
[19] 谢晓晖.《西游记》虚词“着”的词义探析[J].湖南第一师范学报,2004,4(4):74-76.
[20] 杨载武.《西游记》虚词“却”词义探[J].贵州师范学院学报,1994(1):28-34.
[21] 杜贵晨,王艳.四百年《西游记》作者问题论争综述[J].泰山学院学报,2006,28(4):19-25.

备注/Memo

备注/Memo:
收稿日期:2018-04-23 录用日期:2018-10-04
基金项目:国家自然科学基金(61673322,61573294)
*通信作者:dozero@xmu.edu.cn
引文格式:范亚超,罗天健,周昌乐.基于降噪自编码器特征学习的作者识别及其在《西游记》诗词上的应用[J].厦门大学学报(自然科学版),2018,57(6):884-889.
Citation:FAN Y C,LUO T J,ZHOU C L.Author identification based on denoising autoencoder and it’s application in "Journey to the West"[J].J Xiamen Univ Nat Sci,2018,57(6):884-889.(in Chinese)
更新日期/Last Update: 1900-01-01