《厦门大学学报（自然科学版）》

大量的微博广告影响了微博数据分析模型的使用.针对微博广告文本识别问题,利用基于图的半监督的标签传播算法,指导计算机从大量的非结构化的微博文本中自动识别出微博广告.通过对实验数据的评测,结果显示,当已有标签样本较少时,基于图的半监督的标签传播算法能够获得比有监督的支持向量机和朴素贝叶斯算法更好的性能.

Many advertisements in micro-blog affected the use of micro-blog data analysis models.Aiming at implementing micro-blog advertisement text recognition,this paper investigates a graph-based semi-supervised learning algorithm,that is,the label propagation,to recognize micro-blog advertisement from a large number of micro-blog texts.Experimental results on the large-scale data shows that this method achieves a better performance than supervised learning algorithm,such as support vector machine and naive Bayes,do when only very few labeled examples are available.

引言
1 LPA基本理论
2 建立微博广告文本识别图模型
3 基于图的微博广告文本自动识别
4 实验和结果分析
5 结论

表1 实验数据

表2 4种算法的准确率

表3 LPA微博广告文本识别准确率

[1] 张俊丽.文本分类中的关键技术研究[D].武汉:华中师范大学,2008:1-5.
[2] 柯慧燕.Web 文本分类研究及应用[D].武汉:武汉理工大学,2006:1-7.
[3] KIM Y.Convolutional neural networks for sentence classification[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).Doha:Association for Computational Linguistics,2014:1746-1751.
[4] ZHANG X,ZHAO J,LECUN Y.Character-level convolutional networks for text classification[C]∥Advances in Neural Information Processing Systems.Montreal:Association for the Advancement of Artificial Intelligence,2015:649-657.
[5] JOHNSON R,ZHANG T.Effective use of word order for text categorization with convolutional neural networks[EB/OL].[2016-09-16].http:∥arxiv.org/abs/1412.1058.
[6] BEIKIN M,NIYOGI P.Using manifold structure for partially labelled classification[C]∥Proceedings of Advances in Neural Information Processing Systems.British Columbia,Canada:MIT Press,2002:929-936.
[7] BLUM A,CHAWLA S.Learning from labeled and unlabeled data using graph mincuts[C]∥Proceedings of the Eighteenth International Conference on Machine Learning.Williamstown,USA:Morgan Kaufmann,2001:19-26.
[8] SZUMMER M,JAAKKOLA T S.Partially labeled classification with Markov random walks[C]∥Proceedings of Advances in Neural Information Processing Systems.British Columbia,Canada:MIT Press,2001:945-952.
[9] VAPNIK V.Statistical learning theory[M].New York:Wiley,1998:1-25.
[10] ZHOU D,BOUSQUET O,LAL T N,et al.Learning with local and global consistency[C]∥Proceedings of Advances in Neural Information Processing Systems.British Columbia:MIT Press,2003:321-328.
[11] ZHU X J,GHAHRAMANI Z,LAFFERTY J D.Semi-supervised learning using gaussian fields and harmonic functions[C]∥Proceedings of International Conference on Machine Learning.Washington:AAAI Press,2003:912-919.
[12] ZHU X J,GHAHRAMANI Z.Learning from labeled and unlabeled data with label propagation[EB/OL].[2016-09-16].http:∥citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.3864&rep=rep1&type=pdf.
[13] 张俊丽,常艳丽,师文.标签传播算法理论及其应用研究综述[J].计算机应用研究,2013,30(1):21-25.
[14] 朱明.数据挖掘[M].合肥:中国科学技术大学出版社,2002:58-62.
[15] SALTON G,YU C T.On the construction of effective vocabularies for information retrieval[C]∥ACM SIGIR Forum.Maryland:ACM,1973:48-60.
[16] 陈锦秀,姬东鸿.基于图的半监督关系抽取[J].软件学报,2008,19(11):2843-2852.
[17] CHANG C C,LIN C J.LIBSVM:a library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27.

备注

引言

1 LPA基本理论

2 建立微博广告文本识别图模型

3 基于图的微博广告文本自动识别

4 实验和结果分析

5 结论

学报简介

备注