基于图的微博广告文本识别

(1.厦门大学软件学院,福建 厦门 361005; 2.北京大学软件与微电子学院, 北京 102600; 3.厦门大学航空航天学院,福建 厦门 361005)

微博广告文本; 识别; 半监督; 标签传播算法

Graph-based Micro-blog Advertisement Text Recognition
LUO Bin1*,TANG Hongyan1,2,WANG Zhihao3,QIN Yue1,SU Jinsong1

(1.Software School,Xiamen University,Xiamen 361005,China; 2.School of Software and Microelectronics,Peking University,Beijing 102600,China; 3.School of Aerospace Engineering,Xiamen University,Xiamen 361005,China)

micro-blog advertisement text; recognition; semi-supervised; label propagation algorithm

DOI: 10.6043/j.issn.0438-0479.201612030

备注

大量的微博广告影响了微博数据分析模型的使用.针对微博广告文本识别问题,利用基于图的半监督的标签传播算法,指导计算机从大量的非结构化的微博文本中自动识别出微博广告.通过对实验数据的评测,结果显示,当已有标签样本较少时,基于图的半监督的标签传播算法能够获得比有监督的支持向量机和朴素贝叶斯算法更好的性能.

Many advertisements in micro-blog affected the use of micro-blog data analysis models.Aiming at implementing micro-blog advertisement text recognition,this paper investigates a graph-based semi-supervised learning algorithm,that is,the label propagation,to recognize micro-blog advertisement from a large number of micro-blog texts.Experimental results on the large-scale data shows that this method achieves a better performance than supervised learning algorithm,such as support vector machine and naive Bayes,do when only very few labeled examples are available.