|本期目录/Table of Contents|

[1]饶丽丽,刘雄辉,张东站*.基于特征相关的改进加权朴素贝叶斯分类算法[J].厦门大学学报(自然科学版),2012,51(4):682.
 RAO Li li,LIU Xiong hui,ZHANG Dong zhan*.An Improved Weighted Naive Bayes Classification Algorithm Using Feature Correlation[J].Journal of Xiamen University(Natural Science),2012,51(4):682.
点击复制

基于特征相关的改进加权朴素贝叶斯分类算法(PDF)
分享到:

《厦门大学学报(自然科学版)》[ISSN:0438-0479/CN:35-1070/N]

卷:
51卷
期数:
2012年第4期
页码:
682
栏目:
研究论文
出版日期:
2012-07-15

文章信息/Info

Title:
An Improved Weighted Naive Bayes Classification Algorithm Using Feature Correlation
作者:
饶丽丽1刘雄辉2张东站1*
1.厦门大学信息科学与技术学院,福建 厦门 361005;2.龙岩烟草工业有限责任 公司信息技术部,福建 龙岩 364021
Author(s):
RAO Lili1LIU Xionghui2ZHANG Dongzhan1*
1.School of Information Science and Technology,Xiamen University,Xiamen 361005,China; 2.Department of Information Technology,Longyan Tobacco Industrial Co.Ltd,Longyan 364021,China
关键词:
关键词:朴素贝叶斯文本分类器加权朴素贝叶斯文本分类算法TFIDF权重特征项间的相关度
Keywords:
naive Bayes text classificationweighted naive Bayes text classificationTF IDF weightfeature correlation
分类号:
TP 391.1
文献标志码:
-
摘要:
朴素贝叶斯分类算法的特征项间强独立性的假设在现实中是很难满足的.为了在一定程度上放松这一假设,提出了基于特征相关的改进加权朴素贝叶斯分类算法,该算法采用一种新的权重计算方法,这种权重计算方法是在传统词频反文档频率(TFIDF)权重计算基础上,考虑到特征项在类内和类间的分布情况,另外还结合特征项间的相关度,调整权重计算值,加大最能代表所属类的特征项的权重,将它称之为TFIDFFC权重计算.与基于传统TFIDF权重的加权朴素贝叶斯分类算法和其他常用加权朴素贝叶斯分类算法比较,如基于属性加权的朴素贝叶
Abstract:
An Improved Weighted Naive Bayes Classification Algorithm Using Feature Correlation RAO Lili1,LIU Xionghui2,ZHANG Dongzhan1* (1.School of Information Science and Technology,Xiamen University,Xiamen 361005,China; 2.Department of Information Technology,Longyan Tobacco Industrial Co.Ltd,Longyan 364021,China) Abstract:The strong independence condition between the feature required by naive Bayes classification algorithm is very difficult to realize in reality.This paper puts forward an improved weighted naive naive Bayes classification algorithm using feature correlation to loose this condition to some extent, this algorithm adopts a new weighting method called TFIDFFC weight calculation,it takes into account the feature distribution within and between class based on the traditional TFIDF weight calculation method and adjusts feature weight in combination with feature correlation in order to make the weight of the feature which can represent its class mostly.Compared with weighted naive Bayes classification based on the traditional TFIDF weight and other commonly used weighted naive Bayes classification algorithms,such as attribute weighted naive Bayes classification,this algorithm improve the performance of classification to a certain extent.

参考文献/References:

[1]Han J W,Kamber M.数据挖掘概念与技术[M].范明,孟小锋,译.北京:机械工业出版社,2000:173175. [2]程克非,张聪.基于特征加权的朴素贝叶斯分类器[J].计算机仿真,2006,23(10):9294. [3]秦锋,任诗流,程泽凯.基于属性加权的朴素贝叶斯分类算法[J].计算机工程与应用,2008,44(6):107109. [4]刘林.基于词语权重改进的朴素贝叶斯分类算法的研究与应用[D].广州:中山大学,2009. [5]鲁明羽,李凡,庞淑英.基于权值调整的文本分类改进方法[J].清华大学学报:自然科学版,2003,43(4):513515. [6]罗海飞,吴刚,杨金生.基于贝叶斯的文本分类方法[J].计算机工程与设计,2006,27(24):47464748. [7]郑伟.文本分类特征选取技术研究[D].呼和浩特:内蒙古大学,2008. [8]徐凤亚,罗振声.文本自动分类中特征权重算法的改进研究[J].计算机工程与应用,2005,41(1):181184.

备注/Memo

备注/Memo:
收稿日期:20111020*通信作者:zdz@xmu.edu.cn
更新日期/Last Update: 2012-07-15