基于降噪自编码器特征学习的作者识别及其在《西游记》诗词上的应用

(厦门大学 信息科学与技术学院,福建省类脑计算技术及应用重点实验室,福建 厦门 361005)

降噪自编码器; 编码特征; 作者识别

Author Identification Based on Denoising Autoencoder and It's Application in "Journey to the West"
FAN Yachao,LUO Tianjian,ZHOU Changle*

(Fujian Keylab of the Brain-like Computing and Applications,School of Information Science and Engineering,Xiamen University,Xiamen 361005,China)

denoising autoencoder; code feature; authorship identification

DOI: 10.6043/j.issn.0438-0479.201804012

备注

由于作者归属问题较为复杂,采用传统自然语言处理模型难以完成作者识别.为了深入挖掘作者归属问题,首先采用降噪自编码器深度模型提取文本结构特征,再采用支持向量机分类器完成作者识别.模型的优势在于能够考虑未知文本特征的噪声多样性和复杂性,且能够重构添加噪声的原始文本输入.将该方法应用于吴承恩、王廷陈、薛蕙等人的诗词作者识别,识别准确率最高为78.2%,验证了该方法的有效性,进一步将该方法应用于《西游记》诗词作者识别.

Because of the complexity of the author's attribution,it is difficult to use the traditional natural language processing model to complete the authorship identification.To discover the author's attribution,we use the deep model of the denoising autoencoder to analyze the text structure and identify the author's writing style in the text,and the SVM classifier is used to accomplish the recognition of authors.The advantage of the model lies in considering the noise diversity and complexity of unknown text features,and it can reconstruct the original text input with noise.This method is applied to the recognition of poetry authors such as Wu Chengen,Wang Tingchen, Xue Hui, etc.The most accuracy of recognition is 78.2%,it verifies the validity of the method.Furthermore this method is applied to the identification of poetry authors in "Journey to the West".